The Five Minute Interview – Easou
This article is one in a series of quick-hit interviews with companies using Apache Cassandra and/or DataStax Enterprise for key parts of their business. For this interview, we caught up with David Cao, a senior developer at Easou.
DataStax: David, thanks for making time for us today. What can you tell us about Easou?
David: Easou is a Google-like company, but we focus on mobile device search. We provide a searching service that covers things like webpages, images, news, Internet novels, commodities, etc.
In general, we help people touch the information they need in the easiest possible way with their mobile devices, such as phones, tablets etc. Currently, we have more than 10 million daily active users, and our total number of users is more than 200 million.
DataStax: What does your tech environment look like?
David: We use Linux for our servers, and Java and C++ as our primary development languages. Besides the Cassandra Hector client, we also developed the C++ connector.
DataStax: I get the impression you use Cassandra in a big way?
David: Correct. Cassandra is our primary NoSQL solution, and we have it deployed in about a dozen applications.
DataStax: Can you give us examples of a few use cases?
David: Sure. We have one Cassandra database cluster, which is about 72 nodes. We have a very high read load and average about 7 million users a day and about 400 million read requests. We also have a very high write/update load on this application as well and easily see over 400 million writes per day.
DataStax: What else?
David: Our image application is comprised of over 200 Cassandra nodes and provides offline computing and also online viewing of images. Right now we maintain hundreds of millions of images, each of which has several copies in different resolutions. This is our biggest application from a data volume perspective; right now it’s a little over 300TB.
DataStax: Now, you run Cassandra a little differently than others do – you actually created a technical environment where you run multiple Cassandra nodes on a single machine, correct?
David: That’s right. We make use of multi-disk servers and have a Cassandra node per disk. We modified Cassandra so a node will be identified by an IP and port. This allows us to have physical Cassandra machines that can hold up to 20TB of data and yet deliver very fast performance, along with faster recovery times when a node goes down.
DataStax: What caused you to choose Cassandra in the first place?
David: We looked at MongoDB, but didn’t think it was too stable at the time. HBase was ruled out because it didn’t offer high availability like we need. We chose Cassandra primarily because of its continuous availability and no single point of failure, along with its scalability.
We also benchmarked Cassandra against Voldemort, and in our test cases, Cassandra delivered 50% better read performance.
DataStax: What about ease of use?
David: We found Cassandra to be much easier to use than Mongo (and I have lots of experience with Mongo), HBase, Voldemort, and even MySQL. I can also say we saved at least 30% where staffing and servers are concerned over a comparable Mongo or MySQL implementation.
DataStax: David, lots of good information. Thanks for the time.
David: You bet.