DataStax Developer Blog

What’s New in Cassandra 0.8, Part 2: Counters

By Sylvain Lebresne -  June 17, 2011 | 11 Comments

Deprecation warning

This post covers the obsolete Cassandra 0.8.  Modern Cassandra uses counters via CQL.

Original post

One of the features making its debut in Cassandra 0.8.0 is distributed counters. They allow you to … count things. (Or sum things; the counter increment need not be 1, or even positive). But a lot of stuff, very quickly, which makes them invaluable for real-time analytical tasks.

Why Counters?

Prior to 0.8, Cassandra had no simple and efficient way to count. By
“counting,” we mean here to provide an atomic increment operation in a single column value, as opposed to counting the number of columns in a row, or rows in a column family, both of which were already supported.

If you had to count or sum things, available solutions previously included:

  • inserting a different column for each increment with a batch process to merge those
  • use an external synchronization like Zookeeper (preferably through the
    use of the Cages library for simplicity)
  • use another database such as redis to handle those counts

Those solutions all had one or more of the following problems:

  • unfriendly to develop against
  • poor performance
  • not scalable (in particular, none scales to multiple datacenter usage)
  • requires additional software

The new counters feature solves this lack of simple and efficient counting
facility without any of the above problems.

Using Counters

A counter is a specific kind of column whose user-visible value is a 64-bit signed
integer, though this is more complex internally. When a new value is written
to a given counter column, this new value is added to whatever was the
previous value of the counter.

To create a column family holding counters, you simply indicate to Cassandra
that the default_validation_class on that column family is
CounterColumnType. For instance, using the CLI, you can create such
a column family using:


[default@unknown] create keyspace test;
54900c80-9378-11e0-0000-242d50cf1f9d
Waiting for schema agreement...
... schemas agree across the cluster
[default@unknown] use test;
Authenticated to keyspace: test

[default@test] create column family counters with default_validation_class=CounterColumnType and key_validation_class=UTF8Type and comparator=UTF8Type;
6c7db090-9378-11e0-0000-242d50cf1f9d
Waiting for schema agreement…
… schemas agree across the cluster

Super column families holding counters are also supported the usual way,
by specifying column_type=Super.

Using counters is then straightforward:


[default@test] incr counters[row][c1];
Value incremented.
[default@test] incr counters[row][c2] by 3;
Value incremented.
[default@test] get counters[row];
=> (counter=c1, value=1)
=> (counter=c2, value=3)

Returned 2 results.
[default@test] decr counters[row][c2] by 4;
Value decremented.
[default@test] incr counters[row][c1] by -2;
Value incremented.
[default@test] get counters[row];
=> (counter=c1, value=-1)
=> (counter=c2, value=-1)
Returned 2 results.

[default@test] del counters[row][c1];
column removed.
[default@test] get counters[row];
=> (counter=c2, value=-1)
Returned 1 results.

Note that the CLI provides a decr (decrement) operation, but this
is simply syntactic sugar for incrementing by a negative number. The
usual consistency level trade-offs apply to counter operations.

Using CQL

Let us start by noting that the support for counters in CQL is not part of
0.8.0 (the official release at the time of this writing) but has been added
for the 0.8.1 release.
Considering the counters column family created above:


cqlsh> UPDATE counters SET c1 = c1 + 3, c2 = c2 - 4 WHERE key = row2;
cqlsh> select * from counters where key=row2;
     KEY | c1 | c2 |
    row2 |  3 | -4 |

Operational Considerations

Performance

Counters have been designed to allow for very fast writes. However, increment
does involve a read on one of the replica as part of replication. As a consequence,
counter increments are expected to be slightly slower than regular writes. Note
however that:

  • For each write, only one of the replica has to perform a read, even with many replicas.
  • At ConsistencyLevel.ONE, this read is not part of the latency the client will
    observe, but is still part of the write itself. It follows that the
    latency of increments at CL.ONE is very good, but care should be taken to
    not overload the cluster by writing faster than it can handle.
    (In JMX, you can monitor the pending tasks on the REPLICATE_ON_WRITE stage.)

Counter reads use the same code path than regular reads and thus offer comparable performance.

Dealing with data loss

With regular column families, if an SSTable on disk is lost or corrupted (because
of disk failure, for instance), a standard way to deal with it is to remove
the problematic file and run repair to have the missing informations pulled from
the other replicas.

This is unfortunately not as simple with counters. Currently, the only
safe way to handle the loss of an sstable for a counter column family
is to remove all data for that column family, restart the node with
-Dcassandra.renew_counter_id=true (or remove the NodeIdInfo
system sstables on versions earlier than 0.8.2) and run repair once
the node is up.

(The reason you must remove all the counter sstables, even undamaged
ones, is that each node maintains a sub-count of the counter to which
it adds new increments and for which other nodes trust it to have the
most up-to-date value. Wiping the data on A ensures the replicas have
recognized that A is missing its sub-count and will re-replicate to it
on repair.)

Other considerations

Internally, counters use server side timestamps order to deal with
deletions. This does mean that you will need to keep the Cassandra servers in
sync. Of course, using ntpd on an server deployment is good practice anyway, so this should not be an
important constraint.

Current limitations, known problems and the future

Besides the operational considerations above, Counters have a number of
limitations in their current form that you should be aware of:

  • If a write times out in Cassandra,
    the client cannot know if the write was persisted or not. This is not a
    problem for regular columns, where the recommended way to cope with such
    exception is to replay the write, since writes are idempotent. For counters however, replaying the write
    in those situations may result in an over-count. On the other hand, not
    replaying it may mean the write never gets recorded.
    CASSANDRA-2783 is open to add an optional replay ID to counter writes.
  • Support for counter removal is exposed by the API, but is limited. If
    you perform in a short sequence a counter increment, followed by a delete and then by
    another increment, there is no guarantee that the end value will only be
    the value of the second increment (the deletion could be fully ignored). The only safe use of deletion is for permanent removal,
    where no new increment follows the deletion.
  • There is no support for time to live (TTL) on counter columns as there is
    for regular columns (see CASSANDRA-1952
    for more information on why).
  • There is no support for secondary indexes on counter columns.
  • At the time of this writing, you cannot have a counter column inside a column
    family of regular columns (and vice versa). The only way to use
    counters is to create a column family with
    default_validation_class=CounterColumnType, in which case all
    columns are counters
    (CASSANDRA-2614
    is open to lift this limitation).

Previously



Comments

  1. Tsiki says:

    I’ve seen several questions about sequences in Cassandra. Most were answered by “use UUIDs instead”. But for many applications, a shorter, sequential value is preferred and sometimes even mandatory. So, my question is, would it be possible to add incrementAndGet() or updateAndGet() to the API? Is there any design or implementation difficulty in this in the cluster? (Another alternative could be compareAndAdd() function for a counter column).

    Thanks !

  2. Sylvain Lebresne says:

    @Tsiki Unfortunately it would be hard to implement such API methods, because they would require knowing the up-to-date version of the counter cluster-wide, and this in a concurrent-safe fashion. Which, as far as I can tell, cannot be achieve without some form of cluster synchronization. Our current design avoids this, which allows much better performance and availability.

    If you do need to generate unique sequential ids, an option is to use a service like Apache ZooKeeper (which is built for that very use case).

  3. Tsiki says:

    Sylvain thanks for the explanation. It is clear to me that I can get a sequence with ZooKeeper and other solutions as well. I wondering if one could avoid adding another component to the system just for this function. I understand that there is no way around it for now.

    (My line of thinking was if one uses concurrency mode all, then all nodes that hold the counter are accessed anyway – so it shouldn’t be much overhead to enable the additional synchronization between the nodes in order to be able to provide such API)

    I am also thinking of implementing an optimistic locking approach to getting this done using counters without resorting to adding another component. This I can do at the client side (read the counter value, increment, read again …) to allow for concurrency with less collisions I can use several counters.

  4. Tsiki says:

    (Small correction: I meant to write to “consistency level all” not concurrency mode …)

  5. Tom says:

    Are these counters now also used to keep track of the number of rows? E.g. a real insert does a +1 and a delete (or ttl expire) does a -1?

  6. Sylvain Lebresne says:

    @Tom No, they are not use to keep track of the number or rows. An insertion in Cassandra does not involve a read. In other words, an insert is always an ‘insert or update’ operation. This is true at the very core of the storage engine and as a consequence we cannot +1 on insert, since we don’t know if it is an insert.

  7. Tom says:

    Clear. Doing a +1 on insert is something I can do myself, since I know when it is an insert. But it unfortunately also means that I cannot use TTL to delete rows (since each node will delete them for itself). So a batch clean (which does the -1).

    Thanks.

  8. Tom says:

    Still pondering the counter problem. Would it be a terrible idea to optionally be able to configure Cassandra to maintain an in memory counter to solve get_count calls with consistency level ONE?

    I assume Cassandra’s storage engine knows:
    - when it adds a column to a table (+1)
    - when it removes a column, either explicitly or via TTL (-1)
    - when it is merging two columns because they are identical (-1, not sure if this actually can occur)

    Naturally when Cassandra starts, it first does a old fashion count (this will be a startup penalty) to initialize the in memory counters. I think this may not be a big problem to implement and could greatly improve certain scenario’s.

  9. Adrian says:

    Is there any plans to allow for a secondary index on counter columns?

    So then you could retrieve all the keys that have a count of greater > 10 for example.

  10. Sylvain Lebresne says:

    @Tom: No, the storage engine doesn’t really know when it removes a column (i.e. it knows you issued a delete but it doesn’t know if the column was existing already or not), which is why (or at lest is one of the main reason) we don’t expose a column count. In any case, counting the number of columns in a row is a fairly orthogonal problem to the the counters described by this post.

    @Adrian: No, there is no concrete plan to allow for secondary indexes on counter columns, mainly because it’s unclear how to do that efficiently and correctly.

  11. Tejinder Singh says:

    how to create auto_increment index as row key in the cassandra database model?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>