Case Study: SocialFlow

One of the best features of Cassandra is its simple schema, Cassandra does only a few things, but it does them extremely well.

Case Study: SocialFlowDownload the SocialFlow Case Study

“We knew our write load was going to be huge. So when evaluating solutions, I looked hard at the write path. If a solution couldn’t support an extremely high volume of writes, it wouldn’t work. But it seemed Apache Cassandra would do what we needed- and do it right.” - Drew Robb, SocialFlow

Company: SocialFlow

Overview: SocialFlow is one of the few companies to enjoy full, unlimited access to Twitter’s “firehose”-a real-time stream of every tweet. The fast-growing startup is the developer of the first and only social media optimization technology that uses real-time data from the streaming updates produced from the Twitter firehose, Bitly, Facebook, and other sources to help publishers, retailers and brands maximize potential engagement of Twitter followers. Through its SocialFlow AttentionScore™ algorithm, it can identify the precise time to publish a particular message so it’s likely to be seen by the greatest number of interested followers.

Given that the Twitter firehose channels about 250 million tweets a day- and that most of SocialFlow’s clients have millions of followers and receive thousands of clicks on just about everything they post-it’s not surprising that SocialFlow faces some big data challenges. Drew Robb, director of data systems at SocialFlow, says there is simply no way for all the data SocialFlow takes in from the Twitter firehose to fit on one machine. “The stream equates to about 500 GB per day, uncompressed, and about 3,000 static updates per second,” he says. “We need a big distributed system just to do anything with that data.”

Not long after joining SocialFlow in early 2011, Robb began the search for a NoSQL database with “incredible write ability.” He says, “We knew our write load was going to be huge. So when evaluating solutions, I looked hard at the write path. If a solution couldn’t support an extremely high volume of writes, it wouldn’t work. But it seemed Apache Cassandra would do what we needed-and do it right.”

Data Size: 12-node cluster, 4-node cluster in development, and some single-cluster development boxes

Challenge: The need for a fault-tolerant, distributed database with incredible write ability to handle the real-time data stream from Twitter’s “firehose,” which channels about 250 million tweets per day.

Solution: The elastically scalable and reliable Apache Cassandra platform, which allows SocialFlow to support an extremely high volume of writes, publish the right tweet at the right time for its clients, and gives its IT team more time to work on new services.

Powered by Rackspace
Apache, Apache Cassandra, Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation.