Realtime analytics on Cassandra, videos from the 2013 Summit
I recently posted my favorite slide decks from the 2013 Cassandra Summit. Now that the videos are up, I’d like to share my favorites that really need that for good comprehension. Interestingly, the first four are all variations on a realtime analytics theme.
Realtime analytics with Cassandra
In Real World, Real Time Data Modeling, Tim Moreton explains how to build realtime analytics on top of Cassandra, i.e., what’s going on under the hood of his product, Acunu Analytics. Acunu’s approach is akin to CEP, where you define queries you’re interested in and Acunu keeps the results up to date for you in realtime. If you want the short version, skip to the live demo about 3/4 of the way in.
Real-time Analytics using Cassandra, Spark and Shark is a completely different take on realtime analysis. Shark gives you more of a traditional data warehouse approach, queryable via the Hive SQL dialect — a lot like Cloudera’s Impala or Apache Drill, if you follow the Hadoop world. Again, a deep link to the demo at the end.
Matt Stump’s talk on Large Queries in Real-Time for Enterprise deals with the realtime-analytics-as-search problem — combining multiple, arbitrary predicates at runtime to let users slice and dice their metrics. Matt digs into the nuts and bolts of building a custom bitmap index engine in C++ on top of Cassandra, currently capable of delivering results for an 8-clause query across 4 billion rows in under 2s.
Titan: Distributed Graph Computing. Graph databases can be thought of as a specialization of relationship analytics, and Aurelius, the company behind Titan, is seeing a lot of traction this year. Matthias Broecheler gives a good overview of Titan’s progress; the Q&A includes graph partitioning strategies and a comparison with Giraph.
One more for the road
If you run Cassandra in production, you’ll want to watch Aaron Morton’s talk, In Case Of Emergency, Break Glass. Good overview of real-world Cassandra concerns from Aaron’s experience as a successful Cassandra consultant, as well as probably the most prolific contributor to the Cassandra users list.