What’s new in Cassandra 0.7: Live schema updates

By August 9, 2010 | 3 Comments

(This is a guest post by Gary Dusbabek, who works on Cassandra full-time for Rackspace. You can contact him at gdusbabek@gmail.com or @gdusbabek on Twitter.)

Cassandra has always been schemaless within a ColumnFamily, in the sense that columns may be created at will simply by using them in a row. ColumnFamilies themselves, however, and Keyspaces, have to be explicitly defined before use (so Cassandra knows how to index the columns within their rows). “Live schema updates” refer to this ability to create, rename, and remove both Keyspaces and ColumnFamilies in a live cluster, and were added early in the 0.7 development cycle in CASSANDRA-44. Prior to 0.7, changing the schema meant editing the configuration file for every node and then manually executing a rolling restart of the cluster, which afforded the opportunity for humans to make mistakes.

This post covers some changes this feature required under the hood and explains what you can do to be ready if you are upgrading from 0.6. It is an expansion of the wiki article I wrote when live schema updates first appeared in the trunk.

Starting up, or “Dude, where’s my schema?”

We now store schema in the system keyspace using two column families. The first (Schema) stores the keyspace and column family definitions, while the second (Migrations) stores individual keyspace changes over time. All migrations and schema definitions are keyed by a time-based UUID. storage-conf.xml if you are upgrading, or cassandra.yaml* on a new 0.7 node may still contain keyspace definitions, but Cassandra ignores them during startup. Instead, Cassandra looks up the latest schema version UUID it has stored. If it finds nothing it loads nothing and logs a warning:


Couldn't detect any schema definitions in local storage.

If a schema does exist Cassandra loads the correct keyspace definitions from local storage and applies them using the same approach used in previous versions in which keyspaces were loaded from schema-conf.xml.

At the same time, the node incorporates the version UUID from its schema into the gossip digests it sends to other nodes. If this node does not have the latest schema definitions when it starts up (as a result of a network partition, restart or bootstrapping a new node), a version mismatch is detected by the gossiper and the definition promulgation mechanism described next is invoked.

Promulgation

Definition promulgation consists of two asynchronous phases: announce and push. Announce is a way for node A to declare to node B “this is the schema version I have.” If the versions are equal, the message is ignored. If A is older than B (Case 1), B responds with a push containing all the migrations from B that A doesn’t have. If A is newer than B (Case 2), B responds with announce to A (this functions as a request for updates) after which A responds with a push to B.

Cassandra Migrations

Schema updates can also be pushed from the client (thrift). When this happens gossip promulgation is invoked using the announce-announce-push.

These schema changes typically take seconds to finish. Time to complete will scale linearly with the size of your cluster.

IMPORTANT: since schema changes need to be applied and promulgated serially, operators shouldn’t issue schema changes from multiple nodes simultaneously. If two changes make their way across the cluster at the same time they will collide and leave the cluster in an inconsistent state. Cassandra does a few things to guard against this, but an ounce of prevention goes a long way. Cluster operators should adopt the practice of issuing schema changes from a single node and always use the same node, preferably a seed.

Initial schema loading

We have made it convenient for you to import the schema formerly defined in storage-conf.xml (0.6) or cassandra.yaml (0.7). You should use JMX to call StorageService.loadSchemaFromYaml() or perform the same operation from the command line using bin/schematool. This manual operation can be performed only once. It will fail if you try to load the schema again. If you are upgrading from 0.6, make sure you have already run the storage-conf.xml to cassandra.yaml converter. One caveat of this process is that your cluster must have enough live nodes greater than or equal to the maximum replication factor of all your keyspaces.

Loading schema via JMX must be done on exactly one node in your cluster (preferably a seed node). Changes will be promulgated from that node to the rest of the cluster. This capability will be deprecated in the next version of Cassandra (0.7+1) and will be completely removed in the version after that (0.7+2).

Further schema modifications

Once your schema is saved in the system table, any schema modifications will have to be made via the Thrift. There are six methods that accomplish this:


system_add_column_family()
system_drop_column_family()
system_rename_column_family()
system_add_keyspace()
system_drop_keyspace()
system_rename_keyspace()

These methods do exactly what their names imply. Some things to note:

  • The drop and rename methods create a snapshot of your existing data before doing their work.
  • The rename methods block while filenames are changed.
  • All methods go through a bit of validation to check for sanity.

Conclusion

Live schema changes will give you the ability to make low level changes to your cluster without any kind of restart. We plan on taking this feature further to allow you to make more fine-grained schema changes in the future.

Are you interested in learning more about Cassandra? We invite users and developers to participate in the Cassandra Summit in San Francisco on August 10th co-sponsored by Rackspace and Riptano.

* What is this YAML of which you speak? The Cassandra configuration file was changed from XML to YAML between 0.6 and 0.7. Don’t worry—we provide a converter that will export your old storage-conf.xml to a newer cassandra.yaml. You will need to do that before attempting to import your old schema.



Comments

Your email address will not be published. Required fields are marked *