Sylvain Lebresne

<h1>Deprecation warning</h1>

This post covers the obsolete Cassandra 0.8. Modern Cassandra uses counters via CQL.

<h1>Original post</h1>

One of the features making its debut in Cassandra 0.8.0 is distributed counters. They allow you to ... count things. (Or sum things; the counter increment need not be 1, or even positive). But a lot of stuff, very quickly, which makes them invaluable for real-time analytical tasks.

<h3>Why Counters?</h3>

Prior to 0.8, Cassandra had no simple and efficient way to count. By 
"counting," we mean here to provide an atomic increment operation in a single column value, as opposed to counting the number of columns in a row, or rows in a column family, both of which were already supported.

If you had to count or sum things, available solutions previously included:

<ul>
	<li>inserting a different column for each increment with a batch process to merge those</li>
	<li>use an external synchronization like Zookeeper (preferably through the 
	use of the&nbsp;<a href="http://code.google.com/p/cages">Cages library</a>&nbsp;for simplicity)</li>
	<li>use another database such as redis to handle those counts</li>
</ul>

Those solutions all had one or more of the following problems:

<ul>
	<li>unfriendly to develop against</li>
	<li>poor performance</li>
	<li>not scalable (in particular, none scales to multiple datacenter usage)</li>
	<li>requires additional software</li>
</ul>

The new counters feature solves this lack of simple and efficient counting 
facility without any of the above problems.

<h3>Using Counters</h3>

A counter is a specific kind of column whose user-visible value is a 64-bit signed 
integer, though this is more complex internally. When a new value is written 
to a given counter column, this new value is added to whatever was the 
previous value of the counter.

To create a column family holding counters, you simply indicate to Cassandra 
that the&nbsp;<tt>default_validation_class</tt>&nbsp;on that column family is 
<tt>CounterColumnType</tt>. For instance, using the CLI, you can create such 
a column family using:

 
<code>[default@unknown] create keyspace test; 
54900c80-9378-11e0-0000-242d50cf1f9d 
Waiting for schema agreement... 
... schemas agree across the cluster 
[default@unknown] use test; 
Authenticated to keyspace: test</code>

[default@test] create column family counters with default_validation_class=CounterColumnType and key_validation_class=UTF8Type and comparator=UTF8Type; 
6c7db090-9378-11e0-0000-242d50cf1f9d 
Waiting for schema agreement... 
... schemas agree across the cluster

Super column families holding counters are also supported the usual way, 
by specifying&nbsp;<tt>column_type=Super</tt>.

Using counters is then straightforward:

 
<code>[default@test] incr counters[row][c1]; 
Value incremented. 
[default@test] incr counters[row][c2] by 3; 
Value incremented. 
[default@test] get counters[row]; 
=&gt; (counter=c1, value=1) 
=&gt; (counter=c2, value=3)</code>

Returned 2 results. 
[default@test] decr counters[row][c2] by 4; 
Value decremented. 
[default@test] incr counters[row][c1] by -2; 
Value incremented. 
[default@test] get counters[row]; 
=&gt; (counter=c1, value=-1) 
=&gt; (counter=c2, value=-1) 
Returned 2 results.

[default@test] del counters[row][c1]; 
column removed. 
[default@test] get counters[row]; 
=&gt; (counter=c2, value=-1) 
Returned 1 results.

Note that the CLI provides a&nbsp;<tt>decr</tt>&nbsp;(decrement) operation, but this 
is simply syntactic sugar for incrementing by a negative number. The 
usual consistency level trade-offs apply to counter operations.

<h3>Using CQL</h3>

Let us start by noting that the support for counters in &gt;CQL is not part of 
0.8.0 (the official release at the time of this writing) but has been added 
for the 0.8.1 release. 
Considering the&nbsp;<tt>counters</tt>&nbsp;column family created above:

 
<code>cqlsh&gt; UPDATE counters SET c1 = c1 + 3, c2 = c2 - 4 WHERE key = row2; 
cqlsh&gt; select * from counters where key=row2; 
&nbsp;&nbsp;&nbsp;&nbsp; KEY | c1 | c2 | 
&nbsp;&nbsp;&nbsp;&nbsp;row2 | &nbsp;3 | -4 |</code>

<h3>Operational Considerations</h3>

<h4>Performance</h4>

Counters have been designed to allow for very fast writes. However, increment 
does involve a read on one of the replica as part of replication. As a consequence, 
counter increments are expected to be slightly slower than regular writes. Note 
however that:

<ul>
	<li>For each write, only one of the replica has to perform a read, even with many replicas.</li>
	<li>A ConsistencyLevel.ONE, this read is not part of the latency the client will 
	observe, but is still part of the write itself. It follows that the 
	latency of increments at CL.ONE is very good, but care should be taken to 
	not overload the cluster by writing faster than it can handle. 
	(In JMX, you can monitor the pending tasks on the&nbsp;<tt>REPLICATE_ON_WRITE</tt>&nbsp;stage.)</li>
</ul>

Counter reads use the same code path than regular reads and thus offer comparable performance.

<h4>Dealing with data loss</h4>

With regular column families, if an&nbsp;<a href="http://wiki.apache.org/cassandra/MemtableSSTable">SSTable</a>&nbsp;on disk is lost or corrupted (because 
of disk failure, for instance), a standard way to deal with it is to remove 
the problematic file and run repair to have the missing informations pulled from 
the other replicas.

This is unfortunately not as simple with counters. Currently, the only 
safe way to handle the loss of an sstable for a counter column family 
is to remove all data for that column family, restart the node with 
<tt>-Dcassandra.renew_counter_id=true</tt>&nbsp;(or remove the NodeIdInfo 
system sstables on versions earlier than 0.8.2) and run repair once 
the node is up.

(The reason you must remove all the counter sstables, even undamaged 
ones, is that each node maintains a sub-count of the counter to which 
it adds new increments and for which other nodes trust it to have the 
most up-to-date value. Wiping the data on A ensures the replicas have 
recognized that A is missing its sub-count and will re-replicate to it 
on repair.)

<h4>Other considerations</h4>

Internally, counters use server side timestamps order to deal with 
deletions. This does mean that you will need to keep the Cassandra servers in 
sync. Of course, using&nbsp;<a href="http://www.ntp.org/"><tt>ntpd</tt></a>&nbsp;on an server deployment is good practice anyway, so this should not be an 
important constraint.

<h4>Current limitations, known problems and the future</h4>

Besides the operational considerations above, Counters have a number of 
limitations in their current form that you should be aware of:

<ul>
	<li>If a write times out in Cassandra, 
	the client cannot know if the write was persisted or not. This is not a 
	problem for regular columns, where the recommended way to cope with such 
	exception is to replay the write, since writes are idempotent. For counters however, replaying the write 
	in those situations may result in an over-count. On the other hand, not 
	replaying it may mean the write never gets recorded. 
	<a href="https://issues.apache.org/jira/browse/CASSANDRA-2783">CASSANDRA-2783</a>&nbsp;is open to add an optional replay ID to counter writes.</li>
	<li>Support for counter removal is exposed by the API, but is limited. If 
	you perform in a short sequence a counter increment, followed by a delete and then by 
	another increment, there is no guarantee that the end value will only be 
	the value of the second increment (the deletion could be fully ignored). The only safe use of deletion is for permanent removal, 
	where no new increment follows the deletion.</li>
	<li>There is no support for time to live (TTL) on counter columns as there is 
	for regular columns (see&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-1952">CASSANDRA-1952</a>&nbsp;for more information on why).</li>
	<li>There is no support for secondary indexes on counter columns.</li>
	<li>At the time of this writing, you cannot have a counter column inside a column 
	family of regular columns (and vice versa). The only way to use 
	counters is to create a column family with 
	<tt>default_validation_class=CounterColumnType</tt>, in which case all 
	columns are counters (<a href="https://issues.apache.org/jira/browse/CASSANDRA-2614">CASSANDRA-2614</a>&nbsp;is open to lift this limitation).</li>
</ul>

<h3>Previously</h3>

<ul>
	<li>What's new in Cassandra 0.8, part 1: CQL</li>
</ul>

What’s New in Cassandra 0.8, Part 2: Counters

Sylvain Lebresne

Share

Share

Deprecation warning

Original post

Why Counters?

Using Counters

Using CQL

Operational Considerations

Performance

Dealing with data loss

Other considerations

Current limitations, known problems and the future

Previously

More Company

DataStax Acquires Langflow to Accelerate Generative AI Development

The Top 5 DataStax Stories from 2023

2023 Recap: Data = AI

DataStax Astra DB Nabs Three Prestigious 2023 TrustRadius “Best of” Awards, Dominates the Vector Databases Category

One-stop Data API for Production GenAI