Cassandra vs DynamoDB TCO
I’ve written on our technical blog about Cassandra vs DynamoDB features; TCO is also an important factor in infrastructure decisions. Amazon recently published their take on TCO for DyanamoDB, contrasted with an unnamed “NoSQL” database. While a useful starting point, most of the assumptions on display there apply poorly to Cassandra, as I will explain.
First, at a high level, I do agree with Amazon that when you have a workload that peaks at a fraction of a single machine’s throughput, SaaS pricing can make a lot of sense.
However, Cassandra and DataStax Enterprise are targeted firmly at workloads requiring scale out, either for volume of data or velocity (i.e. req/s). If we redo Amazon’s comparison with their server pricing but using Cassandra performance instead of an unnamed generic nosql competitor, the 5000req/s from the “high” usage scenario fit easily on a single machine. (One high-cpu XL Cassandra node can handle about 25,000 1KB inserts or reads per second.) Adding in x3 for replication and leaving the rest of the numbers unchanged gets us to $2432 for Cassandra, vs $2560 for DynamoDB.
But, there’s one glaring error in Amazon’s numbers, which is requiring $1/GB for “redundant storage” (i.e. SAN or similar). Cassandra and all other scale-out nosql solutions *strongly* recommend using direct attached storage for better reliablity, performance, and cost effectiveness. (This applies even more so to avoiding EBS.) Fixing that takes us down to $1082 for Cassandra.
Now, if we go to 2x the load at 14k req/s peak (10,000 peak writes, 4,000 reads), that’s still not maxing out our tiny 3-node Cassandra cluster. The Cassandra cost remains $1082/m, but DynamoDB is up to $5120/m.
Multiply this out by a factor of 3x or 10x and you see why DynamoDB isn’t showing up on our customers’ evaluation radar a whole lot.
One last note: a reasonable person might ask, “but what if that 1.2TB of data is accessed in a highly random pattern? Clever storage engines can only go so far to reduce the iops required for random reads. How will you deal with keeping up with i/o demand?”
We actually have several customers deploying Cassandra on SSDs for exactly this reason, where it works quite well. And if the rumor mill is to be believed, this will soon be an option for those deploying Cassandra on EC2 as well. Switching our Cassandra nodes to SSDs would surely add some cost, but but nowhere near the 5x required to bring it up to DynamoDB’s range.