Cassandra vs. MongoDB vs. Couchbase vs. HBase

Apache Cassandra™ delivers higher performance under heavy load and bests its top NoSQL database rivals in many use cases.

Learn NoSQL Basics At DataStax Academy

Benchmarking NoSQL Databases: Cassandra vs. MongoDB vs. HBase vs. Couchbase

Understanding the performance behavior of a NoSQL database like Apache Cassandra under various conditions is critical. Conducting a formal proof of concept (POC) in the environment in which the database will run is the best way to evaluate platforms. POC processes that include the right benchmarks such as production configurations, parameters and anticipated data and concurrent user workloads give both IT and business stakeholders powerful insight about platforms under consideration and a view for how business applications will perform in production.

Independent benchmark analyses and testing of various NoSQL platforms under big data, production-level workloads have been performed over the years and have consistently identified Apache Cassandra as the platform of choice for businesses interested in adopting NoSQL as the database for modern Web, mobile and IoT applications.

One benchmark analysis (Solving Big Data Challenges for Enterprise Application Performance Management) by engineers at the University of Toronto, which in evaluating six different data stores, found Apache Cassandra the “clear winner throughout our experiments”. Also, End Point Corporation, a database and open source consulting company, benchmarked the top NoSQL databases including: Apache Cassandra, Apache HBase, Couchbase, and MongoDB using a variety of different workloads on AWS EC2.

The databases involved were:

  • Apache Cassandra: Highly scalable, high performance distributed database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.
  • Apache HBase: Open source, non-relational, distributed database modeled after Google’s BigTable and is written in Java. It is developed as part of Apache Software Foundation’s Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System), providing BigTable-like capabilities for Hadoop.
  • MongoDB: Cross-platform document-oriented database system that eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas making the integration of data in certain types of applications easier and faster.
  • Couchbase: Distributed NoSQL document-oriented database that is optimized for interactive applications.

End Point conducted the benchmark of these NoSQL database options on Amazon Web Services EC2 instances, which is an industry-standard platform for hosting horizontally scalable services. In order to minimize the effect of AWS CPU and I/O variability, End Point performed each test 3 times on 3 different days. New EC2 instances were used for each test run to further reduce the impact of any “lame instance” or “noisy neighbor” effects sometimes experienced in cloud environments, on any one test.

NoSQL Database Performance Testing Results

When it comes to performance, it should be noted that there is (to date) no single “winner takes all” among the top NoSQL databases or any other NoSQL engine for that matter. Depending on the use case and deployment conditions, it is almost always possible for one NoSQL database to outperform another and yet lag its competitor when the rules of engagement change. Here are a couple snapshots of the performance benchmark to give you a sense of how each NoSQL database stacks up.

Throughput by Workload

Each workload appears below with the throughput/operations-per-second (more is better) graphed vertically, the number of nodes used for the workload displayed horizontally, and a table with the result numbers following each graph.

Load process

For load, Couchbase, HBase, and MongoDB all had to be configured for non-durable writes to complete in a reasonable amount of time, with Cassandra being the only database performing durable write operations. Therefore, the numbers below for Couchbase, HBase, and MongoDB represent non-durable write metrics.

NoSQL Benchmark Chart 1

Nodes
Cassandra
HBase
MongoDB
Couchbase
1

18,683.43

15,617.98

8,368.44

13,761.12

2

31,144.24

23,373.93

13,462.51

26,140.82

4

53,067.62

38,991.82

18,038.49

40,063.34

8

86,924.94

74,405.64

34,305.30

76,504.40

16

173,001.20

143,553.41

73,335.62

131,887.99

32

326,427.07

296,857.36

134,968.87

192,204.94

Mixed Operational and Analytical Workload

Note that Couchbase was eliminated from this test because it does not support scan operations (producing the error: “Range scan is not supported”).

NoSQL Benchmark Chart 2

Nodes
Cassandra
HBase
MongoDB
1 4,690.41 269.30 939.01
2 10,386.08 333.12 30.96
4 18,720.50 1,228.61 10.55
8 36,773.58 2,151.74 39.28
16 78,894.24 5,986.65 377.04
32 128,994.91 8,936.18 227.80

For a comprehensive analysis, please download the complete report: Benchmarking Top NoSQL Databases.

NoSQL Database Performance Conclusion

These performance metrics are just a few of the many that have solidified Apache Cassandra as the NoSQL database of choice for businesses needing a modern, distributed database for their Web, mobile and IoT applications. Each database option (Cassandra, HBase, Couchbase and MongoDB) will certainly shine in particular use cases, so it’s important to test your specific use cases to ensure your selected database meets your performance SLA. Whether you are primarily concerned with throughput or latency, or more interested in the architectural benefits such as having no single point of failure or being able to have elastic scalability across multiple data centers and the cloud, much of an application’s success comes down to its ability to deliver the response times Web, mobile and IoT customers expect.

As the benchmarks referenced here showcase, Cassandra’s reputation for fast write and read performance, and delivering true linear scale performance in a masterless, scale-out design, bests its top NoSQL database rivals in many use cases.

Icon
Report
DataStax Enterprise 6 vs. Apache Cassandra Benchmark Report

With DataStax Enterprise 6 (DSE 6), we upped the bar substantially for us, our partners, our customers, and our competitors. We also came out and said that DSE 6 was twice as fast as open source Apache Cassandra™, and now, we have a third-party validation of this claim. Read this benchmark report from zData to get the results of their test of DSE 6 against Cassandra, for which they ran a different series of workloads on an AWS-built cluster.

Get the Report
Icon
Blog
How to Tap Into the Power of a NoSQL Database

For years, organizations have relied on relational databases management systems (RDBMSs) to store, process, and analyze critical business information. The idea originated in a paper written in 1970 by a computer scientist named Edgar Codd, who thought to archive information in tables containing rows and columns. The concept was a major leap forward from the slow and inefficient flat file systems that businesses were using at the time, although these systems did work in conjunction with pre-relational model databases. The Rise of SQL Shortly after, IBM developed the SQL language to scan and manipulate sets of transactional data sets stored within RDBMSs. With SQL, it became possible to quickly access and modify large pools of records without having to create complex commands. SQL essentially enabled one-click access to sets of data. The idea took off, and the RDBMS eventually emerged as the most widely used data management system. Today, most organizations are still using RDBMSs one way or another. RDBMSs, however, have one major limitation: They are only capable of efficiently processing relatively small amounts of structured data—like names and ZIP codes. The NoSQL Imperative When the era of big data hit, a new kind of database was required. The real driver for NoSQL was the sheer shift in data volumes that the Internet brought. Prior to the internet, and in its early days, relational databases only had to deal with the data of a single company or organization. But when faced with the millions of Internet users that could discover a company's service in waves, the RDBMS model either broke or became very challenging to shard correctly. Relational databases also required a tremendous amount of maintenance. A database of a few thousand objects may handle things decently, but as you scale up, performance declines. This is a big problem—especially considering the massive volume of unstructured data that is being generated on a daily basis. According to 451 Research, 63% of enterprises and service providers today are managing storage capacities of at least 50 petabytes—and more than half of that data is unstructured. The concept of NoSQL has been around for decades. Believe it or not, businesses have been using non-relational databases to store and retrieve unstructured data since the 1960s. The technology, however, wasn’t referred to as NoSQL until developer Carlo Strozzi created the Strozzi NoSQL Open Source Relational Database in 1998. Strozzi’s database, though, was really just a relational database that didn’t have an SQL interface. It wasn’t until 2009 that we saw a true departure from the relational database model and the first working NoSQL application. NoSQL databases offer several advantages over relational databases. Most importantly, they can handle large volumes of big data. Other advantages include: Elastic scalability. Unlike relational databases, NoSQL databases can scale outward into new nodes instead of upward. This strategy is much more flexible, efficient and affordable than scaling with traditional legacy storage systems. Lower operating costs. One of the biggest downsides to using an RDBMS is the fact that you will have to deal with expensive servers. Since NoSQL databases leverage commodity server clusters, you can process and store larger data volumes at a lower cost. Reduced management. NoSQL databases are much easier to install and maintain as they are simpler and come with advanced auto-repair capabilities. While it’s not completely hands-off, NoSQL is much easier for network teams to manage on a daily basis. Bridging RDBMS With NoSQL Right now, NoSQL databases only account for about 3% of the $46 billion database market, but  they are quickly gaining traction and on pace to become a legitimate long-term market disruptor. But while NoSQL is heating up and the RDBMS market is experiencing a significant slowdown, this doesn’t mean that businesses are running out and abandoning their RDBMS systems altogether. RBDMSs, after all, are still great at managing transactional workloads, which are heavily used today. The best solution often involves finding a way to use your legacy technology to support your new applications, and this means getting an enterprise data layer. What’s an enterprise data layer? It’s a way to connect your systems of record with your systems of engagement. Essentially, it’s a data management layer that precludes you from having to go through a painfully expensive and time-consuming “rip and replace” process, and it allows you to salvage your legay tech and put it to good use. You may still be stuck in the relational age, but that doesn’t mean you can’t take full advantage of the NoSQL revolution. The Architect’s Guide to NoSQL (white paper) READ NOW

Get the Blog
Icon
Blog
The Evolution of NoSQL

For years, organizations have relied on relational databases management systems (RDBMSs) to store, process, and analyze critical business information. The idea originated in a paper written in 1970 by a computer scientist named Edgar Codd, who thought to archive information in tables containing rows and columns. The concept was a major leap forward from the slow and inefficient flat file systems that businesses were using at the time, although these systems did work in conjunction with pre-relational model databases. The Rise of SQL Shortly after, IBM developed the SQL language to scan and manipulate sets of transactional data sets stored within RDBMSs. With SQL, it became possible to quickly access and modify large pools of records without having to create complex commands. SQL essentially enabled one-click access to sets of data. The idea took off, and the RDBMS eventually emerged as the most widely used data management system. Today, most organizations are still using RDBMSs one way or another. RDBMSs, however, have one major limitation: They are only capable of efficiently processing relatively small amounts of structured data—like names and ZIP codes. The NoSQL Imperative When the era of big data hit, a new kind of database was required. The real driver for NoSQL was the sheer shift in data volumes that the Internet brought. Prior to the internet, and in its early days, relational databases only had to deal with the data of a single company or organization. But when faced with the millions of Internet users that could discover a company's service in waves, the RDBMS model either broke or became very challenging to shard correctly. Relational databases also required a tremendous amount of maintenance. A database of a few thousand objects may handle things decently, but as you scale up, performance declines. This is a big problem—especially considering the massive volume of unstructured data that is being generated on a daily basis. According to 451 Research, 63% of enterprises and service providers today are managing storage capacities of at least 50 petabytes—and more than half of that data is unstructured. The concept of NoSQL has been around for decades. Believe it or not, businesses have been using non-relational databases to store and retrieve unstructured data since the 1960s. The technology, however, wasn’t referred to as NoSQL until developer Carlo Strozzi created the Strozzi NoSQL Open Source Relational Database in 1998. Strozzi’s database, though, was really just a relational database that didn’t have an SQL interface. It wasn’t until 2009 that we saw a true departure from the relational database model and the first working NoSQL application. NoSQL databases offer several advantages over relational databases. Most importantly, they can handle large volumes of big data. Other advantages include: Elastic scalability. Unlike relational databases, NoSQL databases can scale outward into new nodes instead of upward. This strategy is much more flexible, efficient and affordable than scaling with traditional legacy storage systems. Lower operating costs. One of the biggest downsides to using an RDBMS is the fact that you will have to deal with expensive servers. Since NoSQL databases leverage commodity server clusters, you can process and store larger data volumes at a lower cost. Reduced management. NoSQL databases are much easier to install and maintain as they are simpler and come with advanced auto-repair capabilities. While it’s not completely hands-off, NoSQL is much easier for network teams to manage on a daily basis. Bridging RDBMS With NoSQL Right now, NoSQL databases only account for about 3% of the $46 billion database market, but  they are quickly gaining traction and on pace to become a legitimate long-term market disruptor. But while NoSQL is heating up and the RDBMS market is experiencing a significant slowdown, this doesn’t mean that businesses are running out and abandoning their RDBMS systems altogether. RBDMSs, after all, are still great at managing transactional workloads, which are heavily used today. The best solution often involves finding a way to use your legacy technology to support your new applications, and this means getting an enterprise data layer. What’s an enterprise data layer? It’s a way to connect your systems of record with your systems of engagement. Essentially, it’s a data management layer that precludes you from having to go through a painfully expensive and time-consuming “rip and replace” process, and it allows you to salvage your legay tech and put it to good use. You may still be stuck in the relational age, but that doesn’t mean you can’t take full advantage of the NoSQL revolution. The Architect’s Guide to NoSQL (white paper) READ NOW

Get the Blog