Why Apache Cassandra?

Apache Cassandra Real Time NoSQL Database

Apache Cassandra is a standout among the NoSQL/post-relational database solutions on the market for many reasons. Today, major companies, educational institutions, and government agencies are using Cassandra to power key aspects of their business because of the benefits they derive from the following core features:

Massively scalable peer-to-peer architecture – Based on the best of Amazon Dynamo and Google BigTable, Cassandra’s peer-to-peer architecture overcomes the limitations of master-slave designs and allows for both high availability and massive scalability.  Cassandra is the acknowledged NoSQL leader when it comes to comfortably scaling to terabytes or petabytes of data.

Linear scale performance – Nodes added to a Cassandra cluster (all done online) increase the throughput of your database in a predictable, linear fashion for both read and write operations.

No single point of failure – Data is replicated to multiple nodes to protect from loss during node failure, and new machines can be added incrementally while online to increase the capacity and data protection of your Cassandra cluster.

Transparent fault detection and recovery – Cassandra clusters can grow into the hundreds or thousands of nodes. Because Cassandra was designed for commodity servers, machine failure is expected. Cassandra utilizes gossip protocols to detect machine failure and recover when a machine is brought back into the cluster – all without your application noticing.

Flexible, dynamic schema data modeling – Cassandra offers the organization of a traditional RDBMS table layout combined with the flexibility and power of no stringent structure requirements. This allows you to store your data as you need to without performance penalty for changes as your needs evolve. Plus, Cassandra can store structured, semi-structured, and unstructured data.

Guaranteed data safety – Cassandra far exceeds other systems on write performance, while ensuring durability, due to its innovative append-only commit log. Users no longer have to trade off durability to keep up with immense write streams. Data is absolutely safe in Cassandra; there is no possibility of data loss.

Distributed, read/write anywhere design – Cassandra’s peer-to-peer architecture avoids the hotspots and read/write issues found in master-slave designs. This means you can have a highly distributed database (multi-geography, data center, etc.) and read or write to any node in a cluster without concern over what node is being accessed.

Tunable Data Consistency – Cassandra is a distributed system that can span multiple machines, multiple racks, and multiple data centers. Because you know your requirements for latency across those barriers better than anyone, it allows you to choose strong consistency or allow varying degrees of more relaxed consistency (incorporating advanced anti-entropy protocols). The full ‘CAP‘ spectrum between consistency and availability is yours. Data consistency can be controlled on a per-operation basis (i.e. per INSERT, per UPDATE, etc.)

Multi-datacenter replication – Whether it’s keeping your data in multiple locations for disaster recovery scenarios or for blazing performance to keep it near your end user, Cassandra offers support for multiple data centers. Simply configure how many copies of your data you want in each data center, and Cassandra handles the rest – replicating your data for you. Cassandra is also rack-aware and can keep replicas of data stored on different physical racks, which helps ensure uptime in the case of single rack failures.

Cloud enabled – Cassandra’s architecture maximizes the benefits of running in the Cloud. Plus, Cassandra allows for hybrid data distribution where some data can be kept on premise and some in the Cloud.

Data compression – Cassandra supplies built-in data compression, with some use cases showing up to an 80% reduction in raw data footprint. Plus, Cassandra’s compression results in no performance penalty, with some use cases showing actual read/write speedup’s due to less physical I/O being managed.

CQL (Cassandra Query Language) – Cassandra provides a SQL-like language called CQL that mirrors SQL’s DDL, DML, and SELECT syntax. CQL greatly lessens the learning curve for those coming from RDBMS systems because they can use familiar syntax for all object creation and data access operations.

No caching layer required – Cassandra offers caching on each of its nodes. Coupled with Cassandra’s scalability characteristics, and you can incrementally add nodes to the cluster to keep as much of your data in memory as you need. The result? There’s no need for a separate caching layer. Caching + disk persistence in one layer – ease of development, ease of operations.

No special hardware needed – Cassandra runs on commodity machines and requires no expensive or special hardware.

Incremental and elastic expansion – The Cassandra ring allows you to add nodes easily without manual migration of data needed from one to another. The result is your Cassandra cluster can grow as you need it to – and you can increase your cost incrementally as your data needs demand. Simply add new nodes to the Cassandra cluster as needed.

Simple install and setup – Cassandra can be downloaded and installed in minutes, even for multi-cluster installs.

Powered by Rackspace
Apache, Apache Cassandra, Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation.