DataStax Blog

Cassandra vs DynamoDB TCO

By Jonathan Ellis -  May 10, 2012 | 7 Comments

I’ve written on our technical blog about Cassandra vs DynamoDB features; TCO is also an important factor in infrastructure decisions. Amazon recently published their take on TCO for DyanamoDB, contrasted with an unnamed “NoSQL” database. While a useful starting point, most of the assumptions on display there apply poorly to Cassandra, as I will explain.

First, at a high level, I do agree with Amazon that when you have a workload that peaks at a fraction of a single machine’s throughput, SaaS pricing can make a lot of sense.

However, Cassandra and DataStax Enterprise are targeted firmly at workloads requiring scale out, either for volume of data or velocity (i.e. req/s). If we redo Amazon’s comparison with their server pricing but using Cassandra performance instead of an unnamed generic nosql competitor, the 5000req/s from the “high” usage scenario fit easily on a single machine. (One high-cpu XL Cassandra node can handle about 25,000 1KB inserts or reads per second.) Adding in x3 for replication and leaving the rest of the numbers unchanged gets us to $2432 for Cassandra, vs $2560 for DynamoDB.

But, there’s one glaring error in Amazon’s numbers, which is requiring $1/GB for “redundant storage” (i.e. SAN or similar). Cassandra and all other scale-out nosql solutions *strongly* recommend using direct attached storage for better reliablity, performance, and cost effectiveness. (This applies even more so to avoiding EBS.) Fixing that takes us down to $1082 for Cassandra.

Now, if we go to 2x the load at 14k req/s peak (10,000 peak writes, 4,000 reads), that’s still not maxing out our tiny 3-node Cassandra cluster. The Cassandra cost remains $1082/m, but DynamoDB is up to $5120/m.

Multiply this out by a factor of 3x or 10x and you see why DynamoDB isn’t showing up on our customers’ evaluation radar a whole lot.

One last note: a reasonable person might ask, “but what if that 1.2TB of data is accessed in a highly random pattern? Clever storage engines can only go so far to reduce the iops required for random reads. How will you deal with keeping up with i/o demand?”

We actually have several customers deploying Cassandra on SSDs for exactly this reason, where it works quite well. And if the rumor mill is to be believed, this will soon be an option for those deploying Cassandra on EC2 as well. Switching our Cassandra nodes to SSDs would surely add some cost, but but nowhere near the 5x required to bring it up to DynamoDB’s range.



Comments

  1. Mark says:

    What about the cost of of sys admins?

  2. Jonathan Ellis says:

    Amazon estimates $400/cluster/month for NoSQL on EC2, which seems reasonable to me. (They estimate much more for run-your-own-hardware, but I don’t want to get into that here.)

  3. Evan says:

    I also was very disappointed in Amazon’s TCO report and agree with your assessment of the inaccuracies.

  4. gregory says:

    Why use SSD instead of RAM?

  5. Edward Capriolo says:

    Cost is sys admins is the biggest Amazon joke ever. Like with Amazons network you do not need sys-admin. Your “cost of sysadmin” is higher because your paying programmers to act like/do sys admin things and most programmers are not trained sys admins. Companies with weak sysadmin skills usually end up in a bad state.

  6. Marc says:

    Well Edward (you might remember me from a past life), unfortunately most of the ‘sys admins’ out there are pretty weak in general. Usually the problem stemming from an admin not having enough interaction with different areas of IT….

    Leaving that aside, it is still no excuse to dump the sys admin model as you say. Looking into cassandra vs dynamodb and others…found this article interesting.

  7. Dale says:

    Everyone seems to talk about TOC and personal cost is a big issue and sys admins usually don’t have the CS or hardware knowledge to do the job right so they lean on computer engineers with both sets of skills. I wonder why no one has a system settings and test bench that clearly shows you how to setup a cluster(on BYOS) that performs that well.
    even 14k ops/s with a 1k payload per node in a cluster is hard to reach unless you have the right OS settings and FS in place. Publish it or its all just noise to anyone listening.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>