For decades, traditional relational database management systems (RDBMS) were the primary systems used to process, store, and analyze critical business information. While an RDBMS is perfectly capable of handling data sets for many use cases, relational databases often fall short in an era where companies are increasingly dealing with big data. A new kind of database was required to accommodate these kinds of data sets (e.g., social media content), and NoSQL (i.e., “not only SQL”) databases emerged as a result. These databases were designed to deliver:
Apache Cassandra was developed by Avinash Lakshman and Prashant Malik when both were working as engineers at Facebook. The database was designed to power Facebook’s inbox search feature, making it easy for users to quickly find the conversations and other content they were looking for.
Cassandra uses Cassandra Query Language (CQL), which is similar to SQL, meaning most developers should have a fairly easy time becoming familiar with it. (Here’s an introduction to CQL if you need some help. Also, check this out if you want to learn a few advanced CQL tricks.)
In July 2008, Facebook open sourced Cassandra. In March 2009, Cassandra became an Apache Incubator project. In April 2010, it graduated from the incubator, becoming a top-level project for the Apache Foundation. Today, Cassandra is freely available under the Apache License 2.0; the team at DataStax is accelerating the evolution of the open source database and is responsible for most of the project’s code commits. Organizations like CERN, Comcast, eBay, GitHub, Hulu, Instagram, and Netflix use Cassandra to support modern applications and meet user expectations.
How does Cassandra differ from a relational database? Although non-relational databases provide different features and benefits, a database like Cassandra differs from a typical relational database in the following ways: Table 1. A quick comparison of RDBMS and a NoSQL database like Cassandra
|Handles moderate incoming data velocity||Handles high incoming data velocity|
|Supports complex/nested transactions||Supports simple transactions|
|Single points of failure with failover||No single points of failure; constant uptime|
|Supports moderate data volumes||Supports very high data volumes|
|Centralized deployments||Decentralized deployments|
|Data written in mostly one location||Data written in many locations|
|Supports read scalability (with consistency sacrifices)||Supports read and write scalability|
|Deployed in vertical scale-up fashion||Deployed in horizontal scale-out fashion|
Making the smartest decisions starts with being able to analyze and understand all of the data your organization has under its control. To this end, Apache Cassandra’s flexible design liberates organizations from the rigid schema legacy databases are known for. Whether you need to process server logs, emails, social media posts, or PDFs, Cassandra’s got you covered. As a result, you’ll be able to make better-informed decisions without leaving any of your data on the table. Beyond that, Cassandra delivers a slew of other benefits, including:
For years, organizations were hesitant to use open source software because they believed that the technology had serious security issues and other shortcomings. But today, as organizations become better educated on the promise of open source, those misconceptions are becoming less common. In fact, today’s leading enterprises are increasingly leveraging open source solutions, and for good reason: open source software provides a number of benefits, including:
The majority of traditional databases feature what’s referred to as master-slave—or primary/secondary—architecture. In these configurations, a single node is designated the master, which can then perform read and write operations. The rest of the nodes serve as the slaves, which are only able to perform read operations. There are many downsides to this kind of architecture:
Since every Cassandra node is capable of performing read and write operations, data is quickly replicated across hybrid cloud environments and geographies.
In the event a node fails, users are automatically routed to the nearest healthy node. They won’t even notice that a node has been knocked offline because applications will behave as designed even in the event of failure.
As a result, applications are always available and data is always accessible and never lost.
What’s more, Cassandra features built-in repair services that can actually fix problems immediately after they occur—without any manual intervention. Productivity doesn’t even need to take a hit should nodes fail.
In traditional environments, scaling applications is largely a time-consuming and costly process which is usually accomplished by scaling upward.
Cassandra, on the other hand, enables you to increase capacity in a linear fashion by simply adding more nodes to the cluster.
If, for example, four nodes can handle 200,000 transactions/second, eight nodes will be able to handle 400,000 transactions/second.
Taken together, Cassandra’s masterless architecture and natively distributed data replication deliver high performance at scale, regardless of how much data is involved in the transaction. Not only will your employees be able to stay productive no matter where they happen to be, your customers will enjoy positive experiences interacting with your apps—no matter how many folks are using them concurrently.
Cloud databases move faster. This is a big deal, since quicker loading times translate into more revenue. Beyond that, cloud databases deliver several benefits, including:
Cassandra offers certain key advantages for deploying modern applications in hybrid and multi-cloud environments:
Since it first appeared on the scene, the DataStax team has been the driving force behind Apache Cassandra, contributing the majority of the commits to the open source project.
While Cassandra may be enough to serve your enterprise’s needs by itself, the DataStax Distribution of Apache Cassandra™ delivers a host of additional benefits, including expert support, and DataStax Enterprise is the industry’s highest performing active everywhere database platform.
DataStax Distribution of Apache Cassandra™
|Search and Analytics||✓|
|Management tools - OpsCenter and NodeSync||✓|
|Developer Tools - DataStax Studio||✓|
|Advanced Database Features - Replication, In-memory, Tiered Storage||✓|
|DataStax Kafka Connector||✓||✓||✓|
|Production Docker Images||Coming soon||✓||✓|
|DataStax Bulk Loader||✓||✓||✓|
|Open Source Cassandra Compatible||✓||✓||✓||✓|
But don’t just take our word for it—see why InfoWorld says DataStax makes Cassandra even easier and faster.
A Brief Introduction to Apache Cassandra (Course)
The 5 Main Benefits of Apache Cassandra (eBook)
The Untold Story of Apache Cassandra (eBook)
Why Enterprises Need the Best Distribution of Apache Cassandra (White Paper)
DataStax and the Cassandra Community (Blog Post)
DataStax Distribution of Apache Cassandra (Web Page)
DS201: DataStax Enterprise Foundations of Apache Cassandra (Course)