DataStax Developer Blog

Coming up in Cassandra 1.1: Row Level Isolation

By Sylvain Lebresne -  February 21, 2012 | 7 Comments

While Apache Cassandra does not provide ACID properties (no complex transactions support), it still provides some useful atomicity guarantees.

More precisely, Cassandra has always provided row-level atomicity of batch mutations. This means that multiple batched writes to the same row are applied by nodes atomically. When doing

    UPDATE Users
    SET login='eric22' AND password='f3g$dq!'
    WHERE key='550e8400-e29b-41d4-a716-446655440000'

Cassandra guarantees that the changes to login and password are either both applied or none are.

However, up to Cassandra 1.0, the isolation of such an update was not guaranteed. In other words, it is possible (during a very brief moment during the update) that a read like

    SELECT login, password
    FROM Users
    WHERE key='550e8400-e29b-41d4-a716-446655440000'

returns the new login ('eric22') but not the new password ('f3g$dq!'). This changes in Cassandra 1.1 as row-level updates are now made in isolation. Cassandra 1.1 guarantees that if you update both the login and password in the same update (for the same row key) then no concurrent read may see only a partial update.

These atomicity and isolation guarantees apply to columns written under the same physical row, i.e. that are within the same column family and share the same partition key. For atomicity, the guarantee actually extends across column families (within the same keyspace): updates for the same partition key are persisted atomically even for different column families. This is not the case however for isolation (updates to different column families are not isolated).

Note that when we say that Cassandra persists row-level writes atomically, this applies to each node of the cluster individually; Cassandra does not provide any cluster-wide rollback mechanism. In the preceding example, the guarantee is that the new login cannot be persisted without the new password being persisted too (and vice-versa). It is however possible for both to be persisted even if the client operation end up with a timeout (because not enough nodes have acknowledge the write to satisfy the requested consistency level). It is up to the client to retry a failed write in such cases.

Implementation details

Atomicity

Internally, the row-level atomicity is guaranteed mainly by the commit log. Upon reception by the coordinator, each write query is transformed into a bunch of ‘RowMutation’. Each of those RowMutation regroups all updates for a given row key (even for different column families). On every replica, each RowMutation is first serialized and written to the commit log as one mutation (individually checksummed for assessing integrity in case of failure). This ensures that on failure, that RowMutation is either replayed entirely (if it had been completely written in the commit log and isn’t corrupted) or not at all. The other part of guaranteeing the atomicity of persistence comes from the fact that a given RowMutation is applied to one and only one memtable. It follows that the RowMutation (all the updates from a client query for a given row key) can only be persisted together or not at all.

Isolation

To a large extent, the log-structured nature of Cassandra storage engine makes row-level isolation easier. Writes are applied to memtables that are then persisted as sstables which are immutable. Thus ensuring that a RowMutation is applied to the current memtable in isolation (of other writes and reads) is enough to ensure complete isolation. That is what was added to Cassandra 1.1, the application of RowMutation to memtables in isolation. Technically, we use SnapTree copy-on-write clone facilities: all the columns of a new mutation are applied to a non-visible (and thus isolated) copy of the in-memtable row they are applied to and then we atomically replace the original row with the new copy through a compare-and-set.

A caution

Cassandra guarantees that updates to the same row will be applied together, but not that they will be resolved the same way. Suppose that the original user row was inserted at time 100. We can easily construct an update that leaves us with a new login but not a new password:

    BEGIN BATCH;

    UPDATE Users SET password='f3g$dq!'
    WHERE key='550e8400-e29b-41d4-a716-446655440000'
    USING TIMESTAMP 99;

    UPDATE Users SET login='eric22'
    WHERE key='550e8400-e29b-41d4-a716-446655440000'
    USING TIMESTAMP 101;

    APPLY BATCH;

Here, the login column will be updated since the new timestamp is higher than the old; the password column will not. (For equal timestamps, it depends.) See this post for more details on Cassandra’s philosophy on conflict resolution.



Comments

  1. vk says:

    Are the updates also atomic during read repair? i.e. the update was made with ConsistencyLevel.ANY.

  2. That’s a good feature. Thanks for sharing.

    The audience asked the isolation level question when I was doing a Cassandra data model presentation last Saturday.
    I told them Cassandra should do row level atomicity. Now it comes true.

    Have an awesome day!

    Charlie | Data Solution Architect Developer

  3. Sylvain Lebresne says:

    @vk yes, atomicity/isolation works with read repair and when using CL.ANY. Though for the record those two are not linked as your comment seems to imply.

  4. monk says:

    Atomicity and Isolation only guarantee the two property in one replica. For example, key “http://abc.com” was stored in replica A,B,C. When a update comes, A updates its data, while B ,C do not update its data. how to make consistency among different replicas.

  5. Spud says:

    When can developers use row level ‘compare-and-set’ themselves ? It’s a shame you use it internally but don’t expose it like dynamodb does…

  6. Sylvain Lebresne says:

    @monk Consistency is a different problem than atomicity and isolation. And on that front nothing changes in 1.1, consistency is still handled through per-request consistency levels (http://www.datastax.com/docs/1.0/dml/data_consistency).

    @Spud When I talked of compare-and-set above, I’m referencing Java AtomicReference compareAndSet (to explain implementation details). This is not about doing a compare-and-set on some actual column value. In other words, we do not have a row level ‘compare-and-set’ internally.

  7. Kiran MK says:

    Hi,

    How does the isolations levels are being maintained between the concurrent connections. Saying, One is trying to update a row and other is trying read (select) the same row and other is trying to delete the same record ?

    Can you please brief on this.

    Best Regards,
    Kiran.M.K.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>