The Five Minute Interview – AppScale
This article is one in a series of quick-hit interviews with companies using Cassandra for key parts of their business. For this interview, we caught up with Raj Chohan, Ph.D. student at UCSB and one of the founders of AppScale.
DataStax: Raj, tell us a little about AppScale’s history and what you guys do.
Raj: We released our first version of AppScale in 2009. What we allow you to do is take your Google App Engine application and run it on your own hardware. We started with Eucalyptus for private clouds, which makes it easy to also support OpenStack for both public and private clouds since they’re a fork of Eucalyptus. We also have our AMI for EC2.
DataStax: And you support different back-end databases?
Raj: Yes. We support 12 datastores so the user can choose what datastore they want. We support Cassandra, MySQL, MongoDB, HBase, and others.
DataStax: Tell us about your Cassandra usage and support.
Raj: We’ve made Cassandra our default datastore because our benchmarks have shown it to be the highest performing database over all the others. We’ve done quite a bit of research in this area that shows this to be the case, and have written a number of research papers on the subject.
DataStax: Does data volume play into that at all?
Raj: Yes. For example, we saw that HBase did well up until a certain size, but then slowed down. With Cassandra, this wasn’t the case. And we haven’t moved to Cassandra 1.0 yet, which I understand is even faster.
DataStax: What about ease-of-use and setup of Cassandra over the others?
Raj: For us, Cassandra has been very simple to work with and use. The only knock I have about Cassandra is sometimes when we do see errors, the messages aren’t as meaningful as I would like. Having an error message that links to the online docs or a troubleshooting guide would be nice.
DataStax: What other things do you like about Cassandra over the other databases?
Raj: Configuration is very simple compared to everybody else. For example, with HBase, you have to configure HDFS, Zookeeper, and then HBase, whereas Cassandra is just one package by itself.
When we started out with both MongoDB and Cassandra, we found that Cassandra was a much easier start to do the things we needed. Also, at the time, MongoDB wasn’t doing all the distributed sharding and what not. They kind of came late to the party with all those features. So that’s one reason we like and use Cassandra so much; it has all the core features that we want and has had them for some time.
MongoDB can be fast in some cases, but its data persistence can be an issue as well as the global write lock. But overall we’ve found MongoDB to be slower than Cassandra.
DataStax: What other functionality – database-wise – is important to you?
Raj: Google App Engine doesn’t have great support for OLAP type operations, so things like Hive and Pig support are meaningful to us. These are things, I know, that are in your Enterprise offering.
DataStax: Raj, thanks for the time.
Raj: Sure thing.
For more information on AppScale, visit: http://appscale.cs.ucsb.edu/