Coming up in Cassandra 1.1: Row Level Isolation
While Apache Cassandra does not provide ACID properties (no complex transactions support), it still provides some useful atomicity guarantees.
More precisely, Cassandra has always provided row-level atomicity of batch mutations. This means that multiple batched writes to the same row are persisted by nodes atomically. When doing
SET login='eric22' AND password='f3g$dq!'
Cassandra guarantees that the new login and password are either both persisted or none are.
However, up to Cassandra 1.0, the isolation of such an update was not guaranteed. In other words, it is possible (during a very brief moment during the update) that a read like
SELECT login, password
returns the new login ('eric22') but not the new password ('f3g$dq!'). This changes in Cassandra 1.1 as row-level updates are now made in isolation. Cassandra 1.1 guarantees that if you update both the login and password in the same update (for the same row key) then no concurrent read may see only a partial update.
These atomicity and isolation guarantees apply to columns written under the same physical row, i.e. that are within the same column family and share the same partition key. For atomicity, the guarantee actually extends across column families (within the same keyspace): updates for the same partition key are persisted atomically even for different column families. This is not the case however for isolation (updates to different column families are not isolated).
Note that when we say that Cassandra persists row-level writes atomically, this applies to each node of the cluster individually; Cassandra does not provide any cluster-wide rollback mechanism. In the preceding example, the guarantee is that the new login cannot be persisted without the new password being persisted too (and vice-versa). It is however possible for both to be persisted even if the client operation end up with a timeout (because not enough nodes have acknowledge the write to satisfy the requested consistency level). It is up to the client to retry a failed write in such cases.
Internally, the row-level atomicity is guaranteed mainly by the commit log. Upon reception by the coordinator, each write query is transformed into a bunch of ‘RowMutation’. Each of those RowMutation regroups all updates for a given row key (even for different column families). On every replica, each RowMutation is first serialized and written to the commit log as one mutation (individually checksummed for assessing integrity in case of failure). This ensures that on failure, that RowMutation is either replayed entirely (if it had been completely written in the commit log and isn’t corrupted) or not at all. The other part of guaranteeing the atomicity of persistence comes from the fact that a given RowMutation is applied to one and only one memtable. It follows that the RowMutation (all the updates from a client query for a given row key) can only be persisted together or not at all.
To a large extent, the log-structured nature of Cassandra storage engine makes row-level isolation easier. Writes are applied to memtables that are then persisted as sstables which are immutable. Thus ensuring that a RowMutation is applied to the current memtable in isolation (of other writes and reads) is enough to ensure complete isolation. That is what was added to Cassandra 1.1, the application of RowMutation to memtables in isolation. Technically, we use SnapTree copy-on-write clone facilities: all the columns of a new mutation are applied to a non-visible (and thus isolated) copy of the in-memtable row they are applied to and then we atomically replace the original row with the new copy through a compare-and-set.