Apache Cassandra 0.7 Documentation

Introduction

Cassandra was created for solving the problem of inbox search at Facebook. It combined ideas from Amazon’s Dynamo with the data model of Google’s Bigtable. It was open-sourced by Facebook in 2008 and became an Apache Incubator project. In early 2010, Cassandra became a top level Apache project.

Strengths of Cassandra

Scalable

Cassandra is incrementally and linearly scalable; capacity can be added with no downtime. The schema-less data model improves agility in development, alleviating the need for updates.

Reliable

The failure of multiple nodes can be tolerated. Failed nodes can be replaced with no downtime. Cross-data center replication is well supported. Failures of nodes within the cluster are monitored with an Accrual Style Failure Detector. Because all nodes are symmetric and there are no “master” nodes, there is no single point of failure.

Durable

Durability is the property that writes, once completed, will survive permanently even in the face of hardware failure. Cassandra provides configurable durability by appending writes to a commitlog first (which obviates the need for disk seeks since this is a sequential operation), then using the fsync system call to flush the data to disk.

Analytics Without ETL

Hadoop jobs can be executed directly against your cluster.

Performant

Consistency is tunable per operation, allowing consistency levels to be traded for faster response times when needed. There are no reads or seeks in the write path. Multiple cache tuning options allow for optimizing towards specific workloads and data models.