Get your copy of the O’Reilly Cassandra eBook: The Definitive Guide - Download FREE Today
Apache Cassandra® is an open-source, distributed NoSQL database with many advantages over rival systems. That’s why top companies around the world use it for a broad range of use cases. Before taking a look at its most common applications, let’s have a brief history lesson.
Developed at Facebook, Cassandra was open-sourced 2008. It went on to join the Apache Incubator in 2009, and became a top-level Apache Foundation project in 2010. Since then, thousands of companies have adopted it, including Apple, Instagram, Uber, Spotify, Twitter, Cisco, Rackspace, eBay, and Netflix. Later, we’ll take a closer look at how some of those companies are using Cassandra.
Common Apache Cassandra use cases
Let’s examine the advantages that make Cassandra one of the most widely used NoSQL databases.
Cassandra’s benefits include:
- Open source: Increases innovation, speed of implementation, flexibility, and extensibility. More cost effective, while avoiding vendor lock-in.
- Handles a high volume of data with ease: Built to handle a massive amount of data across many servers. Some large organizations are using it to manage petabytes of information.
- Continuous availability: No single point of failure means zero downtime. If a particular node fails, users will be automatically moved to the closest working node. The system will continue to work as designed, with applications always available, and data always accessible. Users will never know there was an outage. This is a key for companies that can’t afford to ever have their database go offline, or to lose any data.
- High performance and fast: Cassandra has a peer-to-peer, distributed architecture, where every node can perform all read and write operations. This adds resiliency, while improving performance. Write speed is especially fast. And Cassandra can write loads of data, without speed or accuracy being affected.
- Straightforward scalability: Cassandra’s horizontal scaling is straightforward and cost-effective. Instead of scaling vertically with expensive hardware, Cassandra enables companies to expand to any size simply by adding low-cost commodity servers or virtual machines—no shutdowns required. And its linear scalability ensures high performance is maintained across all nodes. Cassandra’s scalability benefits make it popular with companies working with large datasets, that have many concurrent users, and are expecting continued growth.
- Seamless replication: Like other NoSQL databases, Cassandra doesn’t require a fixed schema, making replication simple. And, since it’s a peer-to-peer system, data can be quickly replicated across the entire system, regardless of geographic location. Wide, even global, distribution is possible. Replication across data centers creates high fault tolerance and zero data loss—an outage in any particular region won’t matter. And placing data closer to end users also leads to low latency.
- Familiar Interface: Most developers will be able to pick up Cassandra’s query language quickly. That’s because Cassandra Query Language (CQL) has a strong resemblance to SQL.
Those are just some of the reasons organizations are embracing Cassandra. It has plenty of other benefits, including the flexibility to handle structured, semi-structured, and unstructured data, along with automatic workload and data balancing. To top it off, it offers operational simplicity, low overhead, and the ability to support hybrid and multi-cloud environments.
Sure, Cassandra has a laundry list of benefits, but how is it actually being used to help companies? On the community team at DataStax we spend a lot of time talking to, and hearing from, companies that are using Apache Cassandra in production.
Here are some use cases we see often:
- E-commerce and inventory management
- Personalization, recommendations, and customer experience
- Internet of things and edge computing
- Fraud detection and authentication
E-commerce and inventory management
E-commerce companies can’t afford to have their site go down, and that’s especially true during a peak period. Every minute they’re offline quickly eats away at their bottom line. Since rapid growth is always the goal, they also need the ability to cost-effectively scale their online inventory on the fly. For the same reason, these organizations need a database that can handle an enormous amount of data with ease. And to meet or exceed customer expectations, they need the flexibility to continuously modify their product mix.
Here’s why Cassandra is a good fit for e-commerce and inventory management:
- Resilient with zero downtime: Distributed with multi-region replication, Cassandra ensures zero downtime. Even the loss of an entire region won’t bring it down.
- Highly responsive: Cassandra’s peer-to-peer architecture also allows data to reside in regions around the world and closer to any particular customer—allowing the system to be highly responsive and fast.
- Predictable scalability: Cassandra’s horizontal scalability is straightforward, predictable, and cost-effective.
- Provides faster catalog refreshes.
- Analyzes its catalog and inventory in real time.
Personalization, recommendations, and customer experience
Personalization and recommendation engines are everywhere now. Almost like personal assistants built-into apps and websites, they help us decide what events to buy tickets to, surfacing articles we might find interesting, and much more. Eventbrite now uses Cassandra instead of MySQL to power their mobile experience, letting users know what events are happening around them that they will be interested in attending. Eventbrite chose Cassandra for its read/write capacity and ease of deployment. Outbrain, a company you use frequently, but may be unfamiliar with, uses Cassandra to power their content discovery platform, helping companies add revenue streams by serving up applicable third-party articles you may find interesting.
Near real-time, relevant, personalized experiences are now expected. Here’s why Cassandra is the right choice to power tailored experiences:
- Fast response times.
- Extremely low latency, even as your customer base expands.
- Handles all types of data, structured and unstructured, from a variety of sources.
- Built to scale while staying cost-effective.
- Ability to store, manage, query, and modify extremely large datasets, while delivering personalized experiences to millions of customers.
- High read/write capacity.
- Ease of deployment.
- Flexible, enabling continuous customer experience innovation.
Consider the success story of Macquarie Bank. With an architectural foundation built on Cassandra, the company moved from no retail banking presence to a top contender in the digital banking space in less than two years, by truly understanding customer behavior and prioritizing personalization. Learn how MacQuarie Bank uses Cassandra to provide personalization for their customers.
Internet of things and edge computing
Whether tracking weather, traffic, energy consumption, inventory levels, health indicators, video game stats, farming conditions or countless other metrics, internet of things (IoT) sensors, wearables, vehicles, mobile devices, appliances, drones, and other devices at the edge produce an avalanche of never-ending data. This data needs to be securely collected—sometimes from millions of devices—aggregated, processed, and analyzed on an ongoing basis.
Consider how the National Renewable Energy Laboratory uses Cassandra to store and analyze sensor data at the world's most environmentally friendly building. They find ways to save water and energy by running the world's smartest thermostat on top of Cassandra. The system continuously learns about energy usage patterns, and automatically adjusts settings, even when no one is there to program it.
Here are some of the reasons Cassandra is a good fit for IoT and edge computing needs:
- Cassandra can ingest concurrent data from any node in the cluster, since all have read/write capacity.
- Ability to handle a large volume of high-velocity, time-series data.
- High availability.
- Supports continuous, real-time analysis.
Fraud detection and authentication
Security threats continue to rise, and many companies are always on the defensive, playing catch-up with their smart fraud detection capabilities. That’s because fraudsters are constantly on the attack, looking for new and creative ways to steal customer data and compromise other sensitive information.
To have any chance of preventing illegitimate users from gaining access, companies need data and a lot of it. Continuous, real-time analysis of large and diverse datasets is required to find patterns and anomalies that can be indicators of fraud. A high priority for all businesses, fraud detection’s importance is elevated in areas like financial services, banking, payments, and insurance. As an example, take a look at how ACI Worldwide has used Cassandra to drastically improve its fraud detection rate and false positive rate.
Identity authentication is the other side of the fraud detection coin. Instead of focusing on keeping fraudsters out, the goal of authentication is to confirm that only legitimate customers gain access. The trick is, you want to make the log-in process as painless and fast as possible, while still making absolutely sure they are who they say they are. As with fraud detection, to pull this off, you need to conduct real-time analysis of a wide variety and high volume of data. And since authentication is likely a central part of all your systems, outages must be avoided at all costs. If a customer experiences friction trying to access your site, whether due to a false positive or because the auth system is down, it likely won’t take too long for them to leave in frustration.
Here are some reasons Cassandra is a great database choice for fighting fraud and ensuring identity authentication:
- Flexible schema: Handles numerous data types, and they can be quickly added to the mix.
- Enables complex, real-time analysis, including the ability to incorporate and support machine learning and AI.
- Handles large-scale, growing datasets.
Other common Cassandra use cases
There are countless other applications that can benefit from Cassandra. Here are a few more:
- Financial services and payments
- Logistics and asset management
- Content management systems
- Transaction logging
- Tracking of all kinds, including packages and orders
- Digital and media management
Now that we’ve reviewed some of Cassandra’s most common use cases, let’s explore how it has helped some brands you might have heard of.
Notable Cassandra use cases in action
With so many prominent companies using Apache Cassandra, it’s highly likely we all interact with it in some way multiple times a day. For example, the next time you go for a jog and queue up your Spotify playlist, you’re using an application built on top of Cassandra. In fact, Cassandra has been instrumental in the expansion of many iconic brands, including Netflix, Uber, Instagram, Reddit, Soundcloud, and more.
Let’s dig deeper to see how Netflix, Soundcloud, and Instagram are leveraging some of Cassandra’s most powerful features.
Compliance Audit Logging
It’s important for companies to have a bullet-proof audit trail for their database. Using audit logging, organizations can track and record significant changes to the database, along with noting the time they occur and who triggered them. Reviewing these records is necessary to ensure regulatory compliance and security standards are being met. Audit logging can also be very helpful to uncover the root cause of bugs. Apache Cassandra has audit logging built-in allowing users to easily create a persistent record of important changes.
Cassandra in action: Netflix
It’s no surprise Netflix deploys Cassandra’s audit logging capability at scale. After all, their own cloud database architects and engineers contributed heavily to its development. When implementing audit logging, Netflix wanted to make sure it was performant, accurate, usable, and extensible. Their setup audits everything and logs user, host, source IP address, source port, timestamp, type, category, keyspace, scope and operation.
Dashboards provide a handy and visual way to quickly get a read on a situation. They are often used to access the latest information on a particular topic, check status, or to monitor a process or project. Companies use dashboards for many purposes both internally for employees, and externally for customers. Either way, users are often given the ability to personalize dashboards to best fit their needs.
Cassandra provides a solid foundation for dashboards for many reasons, including:
- Easily handles frequent updates—there are typically many, ongoing updates for each user.
- Built to take on extremely large datasets—hundreds of millions of events can reside in one table.
- Efficient way to store time-series data.
Cassandra use case in action: Soundcloud
The dashboard Soundcloud provides its customers is one of its most popular features. In fact, the company credits much of its rapid data growth to it. Soundcloud customers can personalize their dashboard with the option to see where in the world their audio uploads are being listened to, and by which users, along with being a home for incoming sound clips from people they follow, and much more.
Soundcloud turned to Cassandra because of its ability to store and access vast amounts of data, its built-in persistence of that data, and for its scalability. Cassandra’s read/write capabilities were also a strong selling point for Soundcloud’s adoption. With Cassandra, they can provide each customer with a sequential read path—so posts can be browsed in the correct time order. Cassandra also allows Soundcloud users to randomly access write events, and have any particular one put into sequential order. One write event could end up in millions of users’ dashboards, and Soundcloud uses Cassandra to ensure it's always displayed in the right place. The company also leans on Cassandra to explore relationships between customers, and personalize their experiences.
As discussed earlier, creating and storing replicas of datasets at geographically dispersed data centers makes a lot of sense. It increases fault tolerance, reliability, and availability. If a data center in the cluster goes down, operations will continue without a blip. Having data closer to customers, no matter where they’re using your app around the world, also decreases read and write latency.
Cassandra has a peer-to-peer, distributed architecture, without the need of a primary node. Instead, every node can perform read and write operations, and all replicas, across the cluster, are equally important. That means data can quickly be replicated across all nodes and Cassandra doesn’t have a single point of failure. That translates into always-on availability with zero downtime. That’s why so many companies turn to Cassandra when data storage is mission critical, and they need a database that can comfortably handle petabyte-sized datasets and full global replication.
Cassandra in action: Instagram
Instagram has used Cassandra from the beginning, way back in 2010. When they started expanding, they created replicas in each new data center. As Instagram kept replicating in data centers around the world, they found their performance dropped. To combat the dip, they started storing data only in the region closest to where it was generated. Local data access has helped them provide a faster, more efficient service to the more than one billion active daily users they have today. Learn more about how Instagram uses Cassandra to replicate on a global scale.
Explore how Cassandra can help your company
Leading companies around the world, ranging from social media to international banking, are using Cassandra for all kinds of use cases. That’s because it can help any company that requires the ability to manage a large volume of data, always-on availability, high fault tolerance, easy and cost-effective scalability, and seamless replication—all without compromising performance. It’s also a perfect fit for cloud-native applications, or hybrid cloud and multi-cloud environments.
Could Cassandra be a good fit for your company? Many of the largest internet apps and the Fortune 100 use DataStax Enterprise (DSE) as their implementation of Cassandra.
Apache Cassandra Resources
- Get started with our Learning Series on Cassandra Fundamentals
Download the whitepaper: Apache Cassandra™ Architecture