DataStax Developer Blog

Oracle’s MySQL misses the NoSQL mark

By Jonathan Ellis -  February 12, 2013 | 0 Comments

The big news for MySQL 5.6 was the inclusion of “NoSQL” features in the form of a memcached api for get and put operations.

In cases like this, it’s tough to tell whether Oracle got this so wrong deliberately to sow confusion in the market, or because they really think that’s what NoSQL is about.

Scaling

Perceptive observers have known for years that the main force driving NoSQL is the need to scale out. This is why companies like Healthcare Anytime, Ooyala, ShopSavvy, Mahalo, Formspring, Next Big Sound, Pantheon, WSO2, SmallRivers, MetaBroadcast, VNG, Boxever, Easou, Hallo, and many others have migrated to Cassandra from MySQL — and more recently, from other NoSQL systems like MongoDB.

At the high end, Cassandra’s ability to easily and transparently scale across many machines allows using clusters of commodity hardware instead of expensive servers and SANs. But just as importantly, the ability to transparently grow and shrink a cluster in response to demand allows companies to take advantage of flexibility in the cloud, adding capacity in relatively small increments.

Continuous availability

A related reason for Cassandra’s increasing adoption is the ability to span multiple datacenters in a single cluster. Tuneable consistency allows Cassandra to replicate synchronously within the datacenter and asynchronously to others.

Master-based replication, on the other hand, are limited by the speed of light to accepting writes within a single datacenter, or else suffering large latency penalties. Even multi-master replication does not solve this in a relational world, since two-phase commit also requires multiple round-trips between participants.

With masterless multi-datacenter replication, Cassandra can tolerate even the loss of entire datacenters, failing over seamlessly to an unaffected region and repairing the desynchronized machines when power or connectivity is restored.

Flexibility

At the low end, NoSQL has also been popular for making prototyping more flexibile, in particular with document-oriented databases like MongoDB. Adding JSON extensions can be a useful response, as PostgreSQL recently did.

But MySQL’s memcached layer only gives you a key/value API, which is a huge step backwards in expressiveness and will be of limited use.

Performance

B-tree-based storage engines like InnoDB and MyISAM are a poor fit for both rotational disks and SSDs because they perform many small, random writes, but relational databases are effectively locked in because of SQL semantics. For instance, INSERT or UPDATE operations must first perform a read of the row in question to make sure it does or does not exist yet.

This is why MySQL write performance degrades as the data set size increases. Even for relatively modest datasets, the effect can be pronounced in mixed workloads.

See my recent blog post on the other architectural decisions behind Cassandra’s performance advantage. These factors are far more important for real-world workloads than query planner overhead, especially given the availability of prepared statements.

Query Language

The query language is thus the least important aspect of NoSQL, but that’s what Oracle focuses on because it’s the only option open to them.

Ironically, Cassandra is itself moving away from baroque APIs like batch_mutate and get_slice_range and towards a SQL-inspired language, with suitable improvements for a distributed world.



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>