What is Cassandra?
date: October 12, 2010
Apache Cassandra is a high performance, scalable open source database designed for real time transactions and analytics. Cassandra offers sub millisecond read and write times - it's fast. It's designed for linear, incremental scalability on top of commodity hardware. That means you can run it in the cloud or on your own hardware - whatever is easiest for you.
Cassandra excels at read and write performance. Writes are extremely fast on hard disks due to Cassandra's commit log - an append only log that tracks every write query. When this log is on its own disk, the drive never seeks, and as a result you get blazing write throughput. Reads are extremely fast due to Cassandra's two caching layers - the row cache and the index cache.
As you need more storage, or more read and write operation capacity, you can add more nodes. Each node will offer the performance described above with some overhead due to network latency.
That's called linear scalability - add capacity as you need it. That matters for multiple reasons. First, you can grow your database as you need without redesigning your application. Growth becomes operational vs development based - simply add hardware as your demands grow. It's a lot better to do that than shard your database and redesign your application - that's precious time you shouldn't have to worry about. Second, economics matter. Scaling vertically involves buying bigger, faster, more expensive hardware to increase performance. That's expensive and it's painful to move from one machine to another. Cassandra allows you to add commodity hardware incrementally as your demands grow - which is easy on the bank account.
Cassandra is for use cases where high read and write performance is needed or large storage requirements will continue to grow over time. It allows you to grow as you need and add resources on your schedule to handle those demands.