Apache Cassandra is a distributed NoSQL database that began internally at Facebook and was released as an open-source project in July 2008. The platform delivers continuous availability (zero downtime), high performance, and linear scalability that successful applications require, while also offering operational simplicity and effortless replication across data centers and geographies. Cassandra, which can handle petabytes of information and thousands of concurrent operations per second, enables organizations to manage large amounts of data across hybrid cloud environments.

Relational Databases and the Need for NoSQL

For decades, traditional relational database management systems (RDBMS) were the primary systems used to process, store, and analyze critical business information. While an RDBMS is perfectly capable of handling data sets for many use cases, relational databases often fall short in an era where companies are increasingly dealing with big data. A new kind of database was required to accommodate these kinds of data sets (e.g., social media content), and NoSQL (i.e., “not only SQL”) databases emerged as a result. These databases were designed to deliver:

  • Operational simplicity. Today’s leading NoSQL databases come with advanced auto-repairing features, which makes them easier to manage.
  • Reduced operating expenses. By leveraging commodity hardware, NoSQL databases enable organizations to reduce their expenses significantly.
  • Elastic scalability. NoSQL databases can effortlessly scale outward into new nodes without forcing you to change anything about your applications—which is much more efficient than scaling upward with traditional RDBMS.

While NoSQL adoption continues to increase, these powerful databases still only account for a minority of the total database market as many organizations continue to cling to their legacy RDBMS deployments. Still, it’s expected that by 2020 the NoSQL database market will reach $4.2 billion as more and more enterprises deploy them to support modern applications, ensure consistent user experiences, and unlock the true power of their data.

NoSQL and Hadoop vs Other Databases

The History of Apache Cassandra

Apache Cassandra was developed by Avinash Lakshman and Prashant Malik when both were working as engineers at Facebook. The database was designed to power Facebook’s inbox search feature, making it easy for users to quickly find the conversations and other content they were looking for.

Cassandra uses Cassandra Query Language (CQL), which is similar to SQL, meaning most developers should have a fairly easy time becoming familiar with it. (Here’s an introduction to CQL if you need some help. Also, check this out if you want to learn a few advanced CQL tricks.)

In July 2008, Facebook open-sourced Cassandra. In March 2009, Cassandra became an Apache Incubator project. In April 2010, it graduated from the incubator, becoming a top-level project for the Apache Foundation. Today, Cassandra is freely available under the Apache License 2.0; the team at DataStax is accelerating the evolution of the open-source database and is responsible for most of the project’s code commits. Organizations like CERN, Comcast, eBay, GitHub, Hulu, Instagram, and Netflix use Cassandra to support modern applications and meet user expectations.

Apache Cassandra vs. Traditional Relational Databases

How does Cassandra differ from a relational database? Although non-relational databases provide different features and benefits, a database like Cassandra differs from a typical relational database in the following ways:

Relational Database
Cassandra

Handles moderate incoming data velocity

Handles high incoming data velocity

Supports complex/nested transactions

Supports simple transactions

Single points of failure with failover

No single points of failure; constant uptime

Supports moderate data volumes

Supports very high data volumes

Centralized deployments

Decentralized deployments

Data written in mostly one location

Data written in many locations

Supports read scalability (with consistency sacrifices)

Supports read and write scalability

Deployed in vertical scale-up fashion

Deployed in horizontal scale-out fashion

The Key Features and Advantages of Apache Cassandra

Making the smartest decisions starts with being able to analyze and understand all of the data your organization has under its control. To this end, Apache Cassandra’s flexible design liberates organizations from the rigid schema legacy databases are known for. Whether you need to process server logs, emails, social media posts, or PDFs, Cassandra’s got you covered. As a result, you’ll be able to make better-informed decisions without leaving any of your data on the table. Beyond that, Cassandra delivers a slew of other benefits.

1. Open Source

For years, organizations were hesitant to use open source software because they believed that the technology had serious security issues and other shortcomings. But today, as organizations become better educated on the promise of open source, those misconceptions are becoming less common. In fact, today’s leading enterprises are increasingly leveraging open source solutions, and for good reason: open source software provides a number of benefits, including:  

  • Affordability. Most open source solutions are free to use, and Apache Cassandra is no different. However, you also have the option to upgrade to DataStax Distribution of Apache Cassandra, which includes expert support and is 100% open-source compatible.
  • Flexibility. Open source frees you from vendor lock-in. In the event you want to migrate to a new infrastructure, you don’t have to worry about paying to take your data with you.
  • Extensibility. Since you have access to source code, you can extend open-source software to integrate with existing systems and tools. This increases organizational efficiency and simplifies operational management.
  • Security. With a community of dedicated enthusiasts contributing to open source projects and reviewing code regularly, some software analysts argue that open source solutions are even more secure than their proprietary counterparts. Once a bug is noticed by the community, developers convene to patch it as quickly as they can.

2. Masterless

The majority of traditional databases feature what’s referred to as master-slave—or primary/secondary—architecture. In these configurations, a single node is designated the master, which can then perform read and write operations. The rest of the nodes serve as the slaves, which are only able to perform read operations. There are many downsides to this kind of architecture:

  • Latency can become a major problem, particularly for distributed teams.
  • Costs can shoot up considerably when applications need to scale.
  • Availability can suffer, too. In the event a master node fails, database operations can grind to a halt until an administrator designates a new master.

Built with masterless architecture, Apache Cassandra doesn’t have these limitations. No nodes are masters, which means that no single node is in charge of replicating data across a cluster. Instead, every node is capable of performing read and write operations. This improves performance and adds resiliency to the database. (source)

3. High Availability and Fault Tolerance

Since every Cassandra node is capable of performing read and write operations, data is quickly replicated across hybrid cloud environments and geographies. In the event a node fails, users are automatically routed to the nearest healthy node. They won’t even notice that a node has been knocked offline because applications will behave as designed even in the event of failure. 

As a result, applications are always available and data is always accessible and never lost. What’s more, Cassandra features built-in repair services that can actually fix problems immediately after they occur—without any manual intervention. Productivity doesn’t even need to take a hit should nodes fail.

4. Scalability

In traditional environments, scaling applications is largely a time-consuming and costly process which is usually accomplished by scaling upward. Cassandra, on the other hand, enables you to increase capacity in a linear fashion by simply adding more nodes to the cluster. If, for example, four nodes can handle 200,000 transactions/second, eight nodes will be able to handle 400,000 transactions/second. (source)

5. High Performance

Taken together, Cassandra’s masterless architecture and natively distributed data replication deliver high performance at scale, regardless of how much data is involved in the transaction. Not only will your employees be able to stay productive no matter where they happen to be, but your customers will also enjoy positive experiences interacting with your apps—no matter how many folks are using them concurrently.

An Architecture Optimized for Multi-Data Center and Multi-Cloud

Today’s leading enterprises are increasingly moving to multi-cloud deployments to take advantage of the strengths of several cloud vendors without getting locked into any single provider’s ecosystem. Getting the most out of multi-cloud environments, however, starts with having an underlying cloud database that offers: scalability, security, performance, and availability. For these reasons, it should come as no surprise that the cloud database market is expected to grow nearly 65% each year and reach $68.9 billion by 2022. Not every cloud database is the same, though. But before we explain why Cassandra is the best database for multi-cloud environments, let’s first explore why more and more enterprises are moving to cloud databases to begin with.

Why Cloud Databases?

Cloud databases move faster. This is a big deal, since quicker loading times translate into more revenue. Beyond that, cloud databases deliver several benefits.

Unfortunately, you can’t just move to any cloud database and expect to get the results you’re hoping for. Let’s take a look at why, specifically, Cassandra is the best database for the cloud and especially for building and running applications in hybrid and multi-cloud computing environments.

Why Cassandra for the Cloud?

Cassandra offers certain key advantages for deploying modern applications in hybrid and multi-cloud environments:  

  • Multi-cloud ready. Deploy Cassandra on-premises or in hybrid cloud and multi-cloud environments. Build your infrastructure exactly how you want to, with full data autonomy.
  • Tunable consistency. Cassandra protects data like a traditional RDBMS. But it also allows for tunable data consistency, enabling developers to relax data consistency when application use cases allow.
  • Open-source. Tap into a robust community of open source developers who are innovating with Cassandra to take advantage of cutting-edge features your internal team doesn’t need to build. Enjoy a level of freedom and flexibility that simply isn’t possible with proprietary solutions.
  • Masterless architecture. Experience limited latency with a database that performs much faster than traditional master-slave architecture. With Cassandra, every node is capable of performing read and write operations. In the event a node gets knocked offline, the database automatically reroutes traffic to the nearest available node.
  • Regional awareness. Cassandra treats data centers as local or remote, meaning more latency or less bandwidth can automatically be supported when the use case warrants it.
  • Predictable scalability, performance, and cost. Cassandra gives you the peace of mind that comes with knowing exactly how the database will scale and perform during high-traffic periods, as well as how much it will cost. There are no surprises here.

  The factors above make Cassandra an obvious choice for a database that is cloud-ready.

Apache Cassandra Logo

DataStax: The Best Distribution of Apache Cassandra

Since it first appeared on the scene, the DataStax team has been the driving force behind Apache Cassandra, contributing the majority of the commits to the open-source project. While Cassandra may be enough to serve your enterprise’s needs by itself, the DataStax Distribution of Apache Cassandra™ delivers a host of additional benefits, including expert support, and DataStax Enterprise is the industry’s highest-performing active everywhere database platform.

Learn More

Icon
Whitepaper
How to Save Millions on Legacy Mainframe Operations

Modern, agile, scalable data management systems are quickly becoming the new norm due to the impossible demands placed on mainframe technology. Without a proper data modernization strategy that incorporates their mainframes, companies will continue to struggle with scaling their business and building modern applications. This white paper explains the challenges of relying heavily on mainframes to meet modern data needs and offers an approach to implement continuously available and highly scalable data and analytics solutions.

Get the Whitepaper
Icon
Blog
The Four Main Challenges with Apache Cassandra™

Enterprises are increasingly flocking to open source technology because of its accessibility, theoretical cost-effectiveness, and ability to attract top talent. According to the 2018 Open Source Program Management Survey, 53% of companies say their organization has an open source software program or plan to establish one within the next year, and according to the 2016 Global Developer Report, 98% of developers use open source tools—even when they’re not supposed to. Here at DataStax we’re HUGE Apache Cassandra fans! We based our technology on Cassandra for good reason: it’s fast, flexible, and foundational. Enterprises can form their data management strategies on it and be confident they’ll be able to scale with their growth. That said, as with other open source tools, Cassandra does present certain challenges at the enterprise level. While these challenges are easily overcome with the right strategy and resources, we think it’s worth exploring exactly what these challenges are, the hidden costs associated with them, and why most enterprises end up needing a little extra help to tap into the full potential of Cassandra. 1. Rising maintenance costs Open source solutions are becoming more and more popular in the enterprise because they’re easier to adopt and they eliminate licensing fees. They eliminate the need for extensive contract negotiations, which can be stressful and time-consuming. However, while open source tools may be free to deploy, they do come with hidden ongoing maintenance costs that can have a significant impact on total cost of ownership (TCO) beyond the cost of acquiring the software. When companies move to open source they end up either investing in internal talent to develop and maintain the technology or depending on a network of third-party developers, especially the open source community. Contributions are voluntary and are made when a contributor has the time and not necessarily when an organization has a need. Still, companies that use open source depend on these contributions for things like maintenance, bug fixes, and new features. These dependencies introduce a lot of risk into the equation, making it more difficult for enterprises to meet service-level agreements as well as bringing the potential of downtime and the costs associated with lost business.   2. Security, compliance, and governance risk HIPAA, Sarbanes-Oxley, GDPR—oh my. Different industries in different countries are forced to comply with different regulations. One of the main reasons open source projects fail or run into issues is because of security compliance. It’s often difficult for organizations to implement global security standards to ensure compliance, particularly in hybrid cloud environments. This makes the complete adoption and use of open source software that much more challenging. Failure to comply with these regulations exposes organizations in regulated industries to significant financial and reputational risk. While Cassandra does offer some built-in security features out of the box—like role-based authentication and authorization—these features, by themselves, can’t guarantee security for organizations that operate in heavily regulated industries.   3. Ad hoc support from multiple sources Because Cassandra’s free, it’s easy to adopt. This ease of implementation, however, comes with its own challenges. Individual teams usually end up implementing the database on an ad hoc basis. As the deployment scales and multiplies across the organization, the need for support services increases. In many cases, organizations end up with a patchwork quilt of support and services from a variety of different sources: some in-house resources, the open source community, and third-party agencies. All of these come with varying levels of Cassandra expertise and response time. It’s not the most efficient, cost-effective, or reliable approach, to say the least.   4. Limited Apache Cassandra expertise Cassandra boasts a robust community that offers a rich set of collective knowledge. But much of that knowledge isn’t organized in an intuitive way. Implementing and configuring Cassandra requires a significant learning curve. Most companies find out that it’s very difficult and costly to hire in-house expertise because there’s a limited supply of talent. Employees usually end up educating themselves on Cassandra, using a combination of open source documentation, help from the community, and trial and error. This slows down adoption and puts an enormous administrative burden on IT. While open source software can help organizations achieve their goals, it is not without its drawbacks. Hidden costs, security risks, a patchwork network of support services, and a lack of expertise are all reasons why organizations struggle with open source adoption. The good news is that, with the right partner, you can unlock the full power of Cassandra without any of the downsides. That’s the ticket to helping your organization realize its full potential.   eBook: The 5 Main Benefits of Apache Cassandra™ READ NOW

Get the Blog
Icon
Whitepaper
How to Save Millions on Legacy Mainframe Operations

Modern, agile, scalable data management systems are quickly becoming the new norm due to the impossible demands placed on mainframe technology. Without a proper data modernization strategy that incorporates their mainframes, companies will continue to struggle with scaling their business and building modern applications. This white paper explains the challenges of relying heavily on mainframes to meet modern data needs and offers an approach to implement continuously available and highly scalable data and analytics solutions.

Get the Whitepaper
Icon
Report
DataStax Enterprise 6 vs. Apache Cassandra Benchmark Report

With DataStax Enterprise 6 (DSE 6), we upped the bar substantially for us, our partners, our customers, and our competitors. We also came out and said that DSE 6 was twice as fast as open source Apache Cassandra™, and now, we have a third-party validation of this claim. Read this benchmark report from zData to get the results of their test of DSE 6 against Cassandra, for which they ran a different series of workloads on an AWS-built cluster.

Get the Report
Icon
Whitepaper
Apache Cassandra™ Architecture

The data management needs of the average large organization have changed dramatically over the last ten years, requiring data architects, operators, designers, and developers to rethink the databases they use as their foundation. The proliferation of large-scale, globally distributed data led to the birth of Apache Cassandra™, one of the world’s most powerful and now most popular NoSQL databases. Read this white paper to learn how Cassandra was born, how it’s evolved, how it operates, and what DataStax Distribution of Apache Cassandra™ adds to the equation.

Get the Whitepaper