Data Sheet: Apache Cassandra

Apache Cassandra™ Backgrounder

What is Apache Cassandra™

Apache Cassandra™, an Apache Software Foundation project, is an open-source NoSQL distributed database management system. Cassandra is designed to handle Big Data workloads across multiple data centers with no single point of failure, providing enterprises with extremely high database performance and availability. Cassandra was initially developed at Facebook, and is used by companies such as Twitter, Netflix, and Cisco.

Cassandra’s “cluster ring”, peer-to-peer architecture, which can include hundreds to thousands of identical nodes, protects from data loss and business disruption because there is no single point of failure. Cassandra is architected to deliver extreme levels of performance for both read and write workloads, and is designed to offer linear performance increases via the addition of new nodes to an existing cluster.

What are the benefits of Apache Cassandra?

Massively scalable ring architecture: Based on the best of Amazon Dynamo and Google BigTable, Cassandra’s peer-to-peer architecture overcomes the limitations of master-slave designs and allows for both high availability and massive scalability.

Linear scale performance: Nodes added to a Cassandra cluster (all done online) increase the throughput of your database in a predictable, linear fashion for both read and write operations.

No single point of failure: Data is replicated to multiple nodes to protect from loss during node failure, and new machines can be added incrementally while online to increase the capacity and data protection of your Cassandra cluster.

Transparent fault detection and recovery: Cassandra was designed with the understanding the hardware failures can and do occur, and therefore, failover is handled transparently.

Flexible dynamic schema data modeling: Cassandra’s dynamic schema can easily store structured, semi-structured, and unstructured data.

Guaranteed data safety: Cassandra far exceeds other systems on write performance, while ensuring durability, due to its innovative append-only commit log.

Location transparency: Cassandra allows for reading and writing to any node in a cluster.

Tunable consistency: Cassandra gives users the flexibility on a per operation basis to choose between very strong consistency or varying degrees of eventual consistency.

Multi-data center replication: Cassandra offers support for easily spanning multiple data centers and the cloud.

Cloud enabled: Cassandra’s architecture maximizes the benefits of running in the Cloud.

Data compression: Cassandra supplies built-in data compression, with some use cases showing up to an 80% reduction in raw data footprint.

CQL (Cassandra Query Language): Cassandra provides a SQL-like language called CQL that mirrors SQL’s DDL, DML, and SELECT syntax.

No caching layer required: Cassandra offers caching on each of its nodes, so there’s no need for a separate caching layer.

Simple install and setup: Cassandra can be downloaded and installed in minutes, even for multi-cluster installs.

AT-A-GLANCE

Built for Big Data

  • Peer to peer architecture ensures fast performance at extreme scale.
  • Linear performance gains possible via online node additions.

Continuous Avalability

  • Transparent fault-tolerance built into the design.
  • Replication across nodes, multiple data centers, and the cloud ensures no downtime.

Dynamic Data Model

  • Flexible schema is built for managing modern types of data.
  • Supports structured, semi-structured and unstructured data.