What is Cassandra?

Apache Cassandra™ is a distributed NoSQL database that delivers continuous availability, high performance, and linear scalability that successful applications require.

Cassandra Logo

Cassandra Explained

Apache Cassandra is an open source, distributed NoSQL database that began internally at Facebook and was released as an open-source project in July 2008. Cassandra delivers continuous availability (zero downtime), high performance, and linear scalability that modern applications require, while also offering operational simplicity and effortless replication across data centers and geographies. Cassandra can handle petabytes of information and thousands of concurrent operations per second, enabling organizations to manage large amounts of data across hybrid cloud and multi cloud environments.

At DataStax, we’re working hard with the open-source community to build on Cassandra’s decade-plus of maturity to solidify its position as the leading database for cloud-native applications.

Where did Cassandra come from?

Apache Cassandra was developed by Avinash Lakshman and Prashant Malik when both were working as engineers at Facebook. The database was designed to power Facebook’s inbox search feature, making it easy for users to quickly find the conversations and other content they were looking for. The architecture combined the distribution model proposed in Amazon’s Dynamo paper to allow horizontal scaling across multiple nodes with the log-structured storage engine described in Google’s BigTable paper. The result was a highly scalable database that could address the most data-rich and performance-intensive use cases.

In July 2008, Facebook open-sourced Cassandra. In March 2009, Cassandra became an Apache Incubator project. In April 2010, it graduated from the incubator, becoming a top-level project for the Apache Foundation. Today, Cassandra is freely available under the Apache License 2.0. The team at DataStax were leaders in evolution of the open-source database, responsible for the majority of the project’s code commits through the 3.0 release, and we have rededicated ourselves as active collaborators for the project’s future, assisting with the 4.0 release and beyond.

What Are the Key Features and Advantages of Cassandra?

Whether you need to process server logs, emails, social media posts, or PDFs, Cassandra’s got you covered. As a result, you’ll be able to make better-informed decisions without leaving any of your data on the table. Beyond that, Cassandra delivers a slew of other benefits.

open source

Open Source: Modern software development organizations have overwhelmingly moved to adopt open source technologies, starting with the Linux operating system and progressing into infrastructure for managing data. Open source technologies are attractive because of their affordability and extensibility, as well as the flexibility to avoid vendor lock-in. Organizations adopting open source report higher speed of innovation and faster adoption.

Flexible, familiar interface

Flexible, Familiar Interface: The Cassandra Query Language (CQL) is similar to SQL, meaning most developers should have a fairly easy time becoming familiar with it. (Here’s an introduction to CQL if you need some help).

High performance

High Performance: The majority of traditional databases feature a primary / secondary architecture. In these configurations, a single primary replica performs read and write operations, while secondary replicas are only able to perform read operations. Downsides to this architecture include increased latency, as well as higher costs and lower availability at scale. In Cassandra, no single node is in charge of replicating data across a cluster. Instead, every node is capable of performing all read and write operations. This improves performance and adds resiliency to the database.

zero downtime

Active Everywhere / Zero-Downtime: Since every Cassandra node is capable of performing read and write operations, data is quickly replicated across hybrid cloud environments and geographies. In the event a node fails, users are automatically routed to the nearest healthy node. They won’t even notice that a node has been knocked offline because applications behave as designed even in the event of failure. As a result, applications are always available and data is always accessible and never lost. What’s more, Cassandra’s built-in repair services fix problems immediately after they occur—without any manual intervention. Productivity doesn’t even need to take a hit should nodes fail.

Scalability

Scalability: In traditional environments, scaling applications is a time-consuming and costly process typically accomplished by scaling vertically with more expensive machines. Cassandra enables you to scale horizontally by simply adding more nodes to the cluster. If, for example, four nodes can handle 200,000 transactions/second, eight nodes will be able to handle 400,000 transactions/second. (source)

Seamless replication

Seamless Replication: Today’s leading enterprises are increasingly moving to multi-data center, hybrid cloud and even multi-cloud deployments to take advantage of the strengths of various deployments without getting locked into any single provider’s ecosystem. Getting the most out of multi-cloud environments, however, starts with having an underlying cloud database that offers: scalability, security, performance, and availability. For these reasons, it should come as no surprise that the cloud database market is expected to grow nearly 65% each year and reach $68.9 billion by 2022.


Cassandra

Where is Cassandra headed next?

Here are a few of the ideas we’re exploring:

Cassandra has traditionally been known as an extremely powerful database that stands up to the most demanding use cases, but also one that is difficult to learn and operate. DataStax is committed to working with the Cassandra community to make Cassandra easier to use, adopt, and extend for your needs.

  • Providing simplified developer APIs including REST and GraphQL
  • Adding more SQL-like capabilities into CQL, including indexing, Joins, and ACID, as well as full JSON support
  • Standard Management APIs and an official, project-backed Kubernetes operator
  • Making the storage engine pluggable, along with other APIs to allow customization of the database for different deployments and usage profiles

How can I get started?

If you are looking to learn more about Apache Cassandra, we’ve got plenty of resources here to help you get started.

Try it out

DataStax for Developers

Learn how to succeed with Apache Cassandra™.

Visit Now
Astra

Try DataStax Astra

Rapidly build cloud-native applications with DataStax Astra, a database-as-a-service powered by Apache Cassandra.

Try For Free
Definitive Guide Cassandra

O’Reilly’s Cassandra: The Definitive Guide, 3rd Edition

Get your free digital copy to harness Cassandra’s speed and flexibility.

Get the Ebook