Company•January 18, 2012

Amazon DynamoDB

Jonathan EllisTechnology

Today Amazon announced DynamoDB, a hosted database for Amazon Web Services.

With DynamoDB, Amazon and Cassandra come full circle: just as Cassandra was inspired by Amazon's 2007 Dynamo paper, DynamoDB has adopted many of the improvements Cassandra has made since then.

Here's a summary of how DynamoDB compares to Cassandra:

	Cassandra	DynamoDB
Key + columns data model	Yes	Yes
Composite key support	Yes	Yes
Tuneable consistency	Yes	Most operations
Distributed counters	Yes	Yes
Largest value supported	2GB	64KB
Idempotent write batches	Yes	No
Time-to-live support	Yes	No
Conditional updates	No	Yes
Indexes on column values		No
Hadoop integration	M/R,Hive, Pig	M/R, Hive
Multi-datacenter support	Full cross-region	Mutiple availability zones only
Integrated caching	Yes	Unclear
High performance on commodity disks	Yes	N/A
Monitorable	Yes	Yes
Backups	Low-impact snapshot + incremental	Manually with EMR
Deployable	Anywhere	Only on AWS

Both Cassandra and DynamoDB achieve high scalablity, using many of the same techniques. Cassandra has been shown to scale to millions of ops/s, and Amazon announced on this morning's webcast that they have a customer doing over 250 thousand op/s on DynamoDB. This kind of scale needs to be baked in from the start, not tacked on as an afterthought.

The multi-datacenter availability story is a bit more complex. DynamoDB replicates data "across multiple Availability Zones in a Region," but cross region is not supported. Since Availability Zones are not geographically dispersed, you need cross-region replication if you're worried about outages affecting entire regions or if you want to provide local data latencies to all your clients in any region. This requires the kind of fine-grained control over data consistency seen in Cassandra.

The data model is one area where DynamoDB is much closer to Cassandra than to the original Dynamo. The original Dynamo design had a primitive key/value data model, which has serious implications: to update a single field, the entire value must be read, updated, and re-written. (Other NoSQL databases have grafted a document API onto a key/value engine; this is one reason their performance suffers compared to Cassandra.) Another direct consequence is requiring complex vector clocks to handle conflicts from concurrent updates to separate fields. Cassandra was one of the first NoSQL databases to offer a more powerful ColumnFamily data model, which DynamoDB's strongly resembles.

However, secondary indexes were one of the best-received features when Cassandra introduced them a year ago. Going back to only being able to query by primary key would feel like a big step backwards to me now.

Like Cassandra, DynamoDB offers Hadoop integration for analytical queries. It's unclear if DynamoDB is also able to partition analytical workloads to a separate group of machines the way you can with Cassandra and DataStax Enterprise. Realtime and analytical workloads have very different performance characteristics, so partitioning them to avoid resource conflicts is necessary at high request volumes.

Hadoop is rapidly becoming the industry standard for big data analytics, so it makes sense to offer Hadoop support instead of an incompatible custom API. It's worth noting that Cassandra supports Pig as well as Hive on top of Hadoop. Many people think pig is a better fit for the underlying map/reduce engine.

Cassandra has been in production in many demanding environments for years now, so it's no surprise that Cassandra has a substantial lead in delivering real-world features like backup. Cassandra's log-structured storage engine-- which is also the reason Cassandra runs so performantly without SSDs -- allows both full and incremental backups with no impact on performance. (You can then upload your backup files with a tool like tablesnap).

As an engineer, it's nice to see so many of Cassandra's design decisions imitated by Amazon's next-gen NoSQL product. I feel like a proud uncle! But in many important ways, Cassandra retains a firm lead in power and flexibility.

Discover more

Amazon Web Services

More Company

View All

DataStax on Microsoft Azure: The Best Destination for Generative AI Applications

Company • July 16, 2024

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.

Learn More

Get Started for Free

Amazon DynamoDB

Jonathan EllisTechnology

Discover more

Share

Share

More Company

DataStax on Microsoft Azure: The Best Destination for Generative AI Applications

An Introduction to David Jones-Gilardi, Developer Relations

Introducing Tejas Kumar, Developer Relations Engineer

An Introduction to Phil Nash, Developer Relations

One-stop Data API for Production GenAI