Solid state disks now available on Amazon EC2
Amazon’s EC2 is a popular choice for Cassandra deployments. It’s a good match in both directions: EC2 makes it easy to add capacity to take advantage of Cassandra’s transparent scaling, and Cassandra’s category-leading support for multiple datacenters allows your cluster to tolerate even failures of an entire region.
Today Amazon announced general availability of solid state disk-based high-io instance types. This is great news for Cassandra deployments and many other EC2 users.
A few years ago I called SSDs a silver bullet for database scalability. This is still true today: even with Cassandra’s advanced log-structured storage engine and integrated caching, most Cassandra clusters are i/o bound on reads.
SSDs are clearly a great fit in this respect, since there is no penalty for random reads. Less obviously, this also makes it easier to tune Cassandra’s background compaction operations that clear out obsolete data. Overall, Netflix found that SSDs give them about a 50% savings in a real world workload even though the individual instance cost is higher.
Cassandra and SSDs work well together for writes as well as reads. Traditional storage engines, like the ubiquitous b-tree, incur a high write amplification cost on updates. This can affect ssd lifetime as well as performance. Since Cassandra’s storage engine is designed to do only sequential writes, it avoids this weakness entirely.
Finally, I should point out that taking advantage of SSDs in a Cassandra cluster doesn’t have to be all or nothing. You can mix SSD and spinning disks either at the individual node level, or at the cluster level. For the former, Cassandra allows putting “hot” tables on SSD while leaving “cold” ones on spinning disks. But if you want to use a group of nodes for analytical workloads the way DataStax Enterprise does, Cassandra will also be comfortable with having just those nodes be entirely based on cheaper spinning disks, with the remaining, “realtime” nodes based on SSDs. This latter configuration is a good fit for EC2 deployments.
In short: today’s announcement is a big win for databases in the cloud generally, and for Cassandra in particular!