email iconemail phone iconcall

What is Apache Cassandra™?

Apache Cassandra is a distributed NoSQL database that began internally at Facebook and was released as an open source project in July 2008. The platform delivers the continuous availability (zero downtime), high performance, and linear scalability that successful applications require, while also offering operational simplicity and effortless replication across data centers and geographies. Cassandra, which can handle petabytes of information and thousands of concurrent operations per second, enables organizations to manage large amounts of data across hybrid cloud environments.

Relational Databases and the Need for NoSQL

For decades, traditional relational database management systems (RDBMS) were the primary systems used to process, store, and analyze critical business information. While an RDBMS is perfectly capable of handling data sets for many use cases, relational databases often fall short in an era where companies are increasingly dealing with big data. A new kind of database was required to accommodate these kinds of data sets (e.g., social media content), and NoSQL (i.e., “not only SQL”) databases emerged as a result. These databases were designed to deliver:

  • Operational simplicity. Today’s leading NoSQL databases come with advanced auto-repairing features, which makes them easier to manage.
  • Reduced operating expenses. By leveraging commodity hardware, NoSQL databases enable organizations to reduce their expenses significantly.
  • Elastic scalability. NoSQL databases can effortlessly scale outward into new nodes without forcing you to change anything about your applications—which is much more efficient than scaling upward with traditional RDBMS.
While NoSQL adoption continues to increase, these powerful databases still only account for a minority of the total database market as many organizations continue to cling to their legacy RDBMS deployments.
market share comparison graph of NoSQL and Hadoop vs. other types of databases
Still, it’s expected that by 2020 the NoSQL database market will reach $4.2 billion as more and more enterprises deploy them to support modern applications, ensure consistent user experiences, and unlock the true power of their data.

The History of Apache Cassandra

explanation of Cassandra myth with half profile of statue face

Apache Cassandra was developed by Avinash Lakshman and Prashant Malik when both were working as engineers at Facebook. The database was designed to power Facebook’s inbox search feature, making it easy for users to quickly find the conversations and other content they were looking for.

Cassandra uses Cassandra Query Language (CQL), which is similar to SQL, meaning most developers should have a fairly easy time becoming familiar with it. (Here’s an introduction to CQL if you need some help. Also, check this out if you want to learn a few advanced CQL tricks.)

In July 2008, Facebook open sourced Cassandra. In March 2009, Cassandra became an Apache Incubator project. In April 2010, it graduated from the incubator, becoming a top-level project for the Apache Foundation. Today, Cassandra is freely available under the Apache License 2.0; the team at DataStax is accelerating the evolution of the open source database and is responsible for most of the project’s code commits. Organizations like CERN, Comcast, eBay, GitHub, Hulu, Instagram, and Netflix use Cassandra to support modern applications and meet user expectations.

Apache Cassandra vs. Traditional Relational Databases

How does Cassandra differ from a relational database? Although non-relational databases provide different features and benefits, a database like Cassandra differs from a typical relational database in the following ways: Table 1. A quick comparison of RDBMS and a NoSQL database like Cassandra

Relational Database
Handles moderate incoming data velocity Handles high incoming data velocity
Supports complex/nested transactions Supports simple transactions
Single points of failure with failover No single points of failure; constant uptime
Supports moderate data volumes Supports very high data volumes
Centralized deployments Decentralized deployments
Data written in mostly one location Data written in many locations
Supports read scalability (with consistency sacrifices) Supports read and write scalability
Deployed in vertical scale-up fashion Deployed in horizontal scale-out fashion

The Key Features and Advantages of Apache Cassandra

Making the smartest decisions starts with being able to analyze and understand all of the data your organization has under its control. To this end, Apache Cassandra’s flexible design liberates organizations from the rigid schema legacy databases are known for. Whether you need to process server logs, emails, social media posts, or PDFs, Cassandra’s got you covered. As a result, you’ll be able to make better-informed decisions without leaving any of your data on the table. Beyond that, Cassandra delivers a slew of other benefits, including:

1. Open source

For years, organizations were hesitant to use open source software because they believed that the technology had serious security issues and other shortcomings. But today, as organizations become better educated on the promise of open source, those misconceptions are becoming less common. In fact, today’s leading enterprises are increasingly leveraging open source solutions, and for good reason: open source software provides a number of benefits, including:  

  • Affordability. Most open source solutions are free to use, and Apache Cassandra is no different. However, you also have the option to upgrade to DataStax Distribution of Apache Cassandra, which includes expert support and is 100% open source compatible.
  • Flexibility. Open source frees you from vendor lock-in. In the event you want to migrate to a new infrastructure, you don’t have to worry about paying to take your data with you.
  • Extensibility. Since you have access to source code, you can extend open source software to integrate with existing systems and tools. This increases organizational efficiency and simplifies operational management.
  • Security. With a community of dedicated enthusiasts contributing to open source projects and reviewing code regularly, some software analysts argue that open source solutions are even more secure than their proprietary counterparts. Once a bug is noticed by the community, developers convene to patch it as quickly as they can.

2. Masterless

The majority of traditional databases feature what’s referred to as master-slave—or primary/secondary—architecture. In these configurations, a single node is designated the master, which can then perform read and write operations. The rest of the nodes serve as the slaves, which are only able to perform read operations. There are many downsides to this kind of architecture:

  • Latency can become a major problem, particularly for distributed teams.
  • Costs can shoot up considerably when applications need to scale.
  • Availability can suffer, too. In the event a master node fails, database operations can grind to a halt until an administrator designates a new master.
Built with masterless architecture, Apache Cassandra doesn’t have these limitations. No nodes are masters, which means that no single node is in charge of replicating data across a cluster. Instead, every node is capable of performing read and write operations. This improves performance and adds resiliency to the database. (source)
small table with main features of a masterless architecture

3. High availability and fault tolerance

Since every Cassandra node is capable of performing read and write operations, data is quickly replicated across hybrid cloud environments and geographies. In the event a node fails, users are automatically routed to the nearest healthy node. They won’t even notice that a node has been knocked offline because applications will behave as designed even in the event of failure. As a result, applications are always available and data is always accessible and never lost. What’s more, Cassandra features built-in repair services that can actually fix problems immediately after they occur—without any manual intervention. Productivity doesn’t even need to take a hit should nodes fail.
diagram of globally distributed Cassandra nodes to provide high availabilit

4. Scalability

In traditional environments, scaling applications is largely a time-consuming and costly process which is usually accomplished by scaling upward. Cassandra, on the other hand, enables you to increase capacity in a linear fashion by simply adding more nodes to the cluster. If, for example, four nodes can handle 200,000 transactions/second, eight nodes will be able to handle 400,000 transactions/second. (source)
diagram illustrating Apache Cassandra’s linear scalability via nodes and transactions per second

5. High performance

Taken together, Cassandra’s masterless architecture and natively distributed data replication deliver high performance at scale, regardless of how much data is involved in the transaction. Not only will your employees be able to stay productive no matter where they happen to be, your customers will enjoy positive experiences interacting with your apps—no matter how many folks are using them concurrently.

An Architecture Optimized for Multi-Data Center and Multi-Cloud

Today’s leading enterprises are increasingly moving to multi-cloud deployments to take advantage of the strengths of several cloud vendors without getting locked into any single provider’s ecosystem. Getting the most out of multi-cloud environments, however, starts with having an underlying cloud database that offers: scalability, security, performance, and availability. For these reasons, it should come as no surprise that the cloud database market is expected to grow nearly 65% each year and reach $68.9 billion by 2022. Not every cloud database is the same, though. But before we explain why Cassandra is the best database for multi-cloud environments, let’s first explore why more and more enterprises are moving to cloud databases to begin with.

Why cloud databases?

Cloud databases move faster. This is a big deal, since quicker loading times translate into more revenue. Beyond that, cloud databases deliver several benefits, including:  

  • Scalability. Leading cloud databases provide linear scalability. If four nodes can handle 400,000 transactions, eight nodes can handle 800,000. Simply put, organizations need to build scalable solutions. Otherwise, they risk isolating their users during high-traffic periods.
  • Performance. This scalability translates into increased performance. When tons of concurrent users are online at the same time, performance remains high as nodes are automatically added or subtracted as bandwidth needs change.
  • Security. Most cloud databases include data encryption features, ensuring data remains protected at rest and in transit. They also offer authentication controls so companies can ensure only authorized users can access data.
  • Availability. Whether you’re goal is ensuring positive user experiences or increasing employee productivity, your applications need to be highly available. Cloud databases deliver on this, ensuring users can access apps at any time and from any connected device.
  • Redundancy. In the age of global teams and global applications, it is critical that data stays protected in any event. Leading cloud databases ensure that data remains secure and available—even when a data center gets knocked offline during a natural disaster.
Unfortunately, you can’t just move to any cloud database and expect to get the results you’re hoping for. Let’s take a look at why, specifically, Cassandra is the best database for the cloud and especially for building and running applications in hybrid and multi-cloud computing environments.

Why Cassandra for the cloud?

Cassandra offers certain key advantages for deploying modern applications in hybrid and multi-cloud environments:  

  • Multi-cloud ready. Deploy Cassandra on-premises or in hybrid cloud and multi-cloud environments. Build your infrastructure exactly how you want to, with full data autonomy.
  • Tunable consistency. Cassandra protects data like a traditional RDBMS. But it also allows for tunable data consistency, enabling developers to relax data consistency when application use cases allow.
  • Open source. Tap into a robust community of open source developers who are innovating with Cassandra to take advantage of cutting-edge features your internal team doesn’t need to build. Enjoy a level of freedom and flexibility that simply isn’t possible with proprietary solutions.
  • Masterless architecture. Experience limited latency with a database that performs much faster than traditional master-slave architecture. With Cassandra, every node is capable of performing read and write operations. In the event a node gets knocked offline, the database automatically reroutes traffic to the nearest available node.
  • Regional awareness. Cassandra treats data centers as local or remote, meaning more latency or less bandwidth can automatically be supported when the use case warrants it.
  • Predictable scalability, performance, and cost. Cassandra gives you the peace of mind that comes with knowing exactly how the database will scale and perform during high-traffic periods, as well as how much it will cost. There are no surprises here.
  The factors above make Cassandra an obvious choice for a database that is cloud-ready.

DataStax: The Best Distribution of Apache Cassandra

Since it first appeared on the scene, the DataStax team has been the driving force behind Apache Cassandra, contributing the majority of the commits to the open source project.
graph of monthly Apache Cassandra commits by company for Uber, Apple, DataStax, and others
While Cassandra may be enough to serve your enterprise’s needs by itself, the DataStax Distribution of Apache Cassandra™ delivers a host of additional benefits, including expert support, and DataStax Enterprise is the industry’s highest performing active everywhere database platform.

Apache Cassandra™
DataStax Distribution of Apache Cassandra™
DataStax Basic
DataStax Enterprise
Search and Analytics

Management tools - OpsCenter and NodeSync

Developer Tools - DataStax Studio

Advanced Security

Advanced Database Features - Replication, In-memory, Tiered Storage

DataStax Kafka Connector
Production Docker Images
Coming soon
DataStax Drivers
DataStax Bulk Loader
Open Source Cassandra Compatible
Platform Certification
Technical Support
Professional Services
Hot Fixes
Bug Escalation