DataStax Developer Blog

Metric Collection and Storage with Cassandra

By Nick Bailey -  May 25, 2012 | 15 Comments

One of the often cited use cases for Cassandra is storing time series data. Time series data comes in many forms but one of the more common use cases is storing metrics. More specifically, you might be storing application/server data used for monitoring or capacity planning purposes. In fact, a perfect example of this use case in action is OpsCenter, our tool for managing and monitoring your Cassandra cluster. OpsCenter tracks metrics about your Cassandra nodes and stores this data inside Cassandra before presenting it as user friendly graphs in the OpsCenter UI.

Since metric tracking/collection is such a common use case in Cassandra, we’ll dive into some of the specifics about the data modeling and general practices that OpsCenter uses when collecting metrics about your cluster.

Note: I’ll be using CQL to illustrate some of the ideas in this post. Specifically I’ll be using Cassandra 1.1.0 and CQL 3.0. You can find more information about CQL 3.0 here.

The Basics

At the most basic level, your data model for metric storage in Cassandra will consist of two items: a metric id (the row key), and a collection of timestamp/value pairs (the columns in a row).

Metric Id

When storing metrics in cassandra you want to have a way to uniquely identify each metric you want to track. In the basic case this consists of two parts: a metric name and a server id (ip address). By simply joining these two parts together you generate the row key you will use to store collected data points in Cassandra. For example, a row key for storing cassandra read request metrics in OpsCenter might be ’127.0.0.1-getSPReadOperations’.

In addition to the basic case, you may often append additional components to your metric id for more dynamic metrics. For example, OpsCenter tracks metrics about the specific column families in the cluster. The keys for these metrics are generated dynamically based on a the configuration of the Cassandra cluster being monitored. An example row key for this types of metric would be ’127.0.0.1-Keyspace1-Counter1-getLiveSSTableCount’.

An important note about this data model is that it assumes you know all components of a metric id when retrieving it. It isn’t possible to retrieve the read request metric for all nodes in a cluster without actually knowing the ip addresses of the nodes. Keep this in mind when designing the part of your application that is going to be reading metric data.

Data Points

The data points being collected for a given metric are going to be the columns that you store in your cassandra row. The biggest thing to note here is that we are going to take advantage of Cassandra’s ordering within rows. By using a timestamp as the column name and the data point as the column value, data points are automatically sorted by time. This makes pulling out data points in order to graph them extremely efficient.

A Brief Example

cqlsh> CREATE KEYSPACE stats
                  WITH strategy_class = 'SimpleStrategy'
         ...      AND  strategy_options:replication_factor = 1;
cqlsh> USE stats;
cqlsh:stats> CREATE TABLE metrics (
         ...    metric_id varchar,
         ...    ts timestamp,
         ...    value float,
         ...    PRIMARY KEY (metric_id, ts)
         ... );
cqlsh:stats> INSERT INTO metrics (metric_id, ts, value)
         ...              VALUES ('127.0.0.1-readlatency', '2012-05-25 08:30:29', 13.1);
cqlsh:stats> INSERT INTO metrics (metric_id, ts, value)
         ...              VALUES ('127.0.0.1-readlatency', '2012-05-25 08:31:31', 10.8);
cqlsh:stats> SELECT * FROM metrics;
 metric_id             | ts                       | value
-----------------------+--------------------------+-------
 127.0.0.1-readlatency | 2012-05-25 08:30:29-0500 |  13.1
 127.0.0.1-readlatency | 2012-05-25 08:31:31-0500 |  10.8

Aggregation

The basic model we’ve described above allows for easy storing and collecting of raw collected data points with Cassandra. While the raw data points contain the most information, they often can be hard for an application to work with without any processing. For example, graphing a months worth of read latency data that’s been sampled once  per minute will require about 44,000 data points. Usually, we want to sacrifice some precision in our graph and aggregate those data points into a more manageable set.

The aggregation techniques OpsCenter uses are actually modeled after the open source project rrdtool. OpsCenter predefines multiple ‘buckets’ or ‘rollups’ that data points are aggregated into. These are the different column families created in the OpsCenter keyspace: rollups60, rollups300, rollups7200, and rollups86400. In the rollups86400 column family, each data point in a row represents a value for a 24 hour span (86400 seconds). Now if we want to graph read latency for the past month we only need to pull 30 data points.

OpsCenter actually computes these aggregate values on the fly as it is collecting data. For each rollup size, OpsCenter will calculate the average, min, and max of a metric during that window. One of the benefits of the modeling after rrdtool is that these aggregates can be computed on the fly as data collection is happening and without knowing every raw value (see cumulative moving average). OpsCenter will sample a metric every minute and update an in memory view of each rollup size. Since the rollups60 column family only aggregates data for a single minute, these data points can be written almost immediately. After 5 minutes/samples we will have already computed the min, max, and average value of the metric during that 5 minute period and can write that aggregate and reset those values for the next 5 minute period.

You might have noticed that as part of the aggregation we are calculating 3 data points for a given time: min, max, and average. This doesn’t fit into the basic data model we described above where a column is single timestamp/data point pair. In order to save disk space OpsCenter will actually store all three of these values in single byte array which is stored as a single column value in Cassandra. The more flexible approach however is to just add these as additional columns to your schema.

cqlsh:stats> CREATE TABLE rollups300 (
         ...    metric_id varchar,
         ...    ts timestamp,
         ...    min float,
         ...    max float,
         ...    avg float,
         ...    PRIMARY KEY (metric_id, ts)
         ... );
cqlsh:stats> INSERT INTO rollups300 (metric_id, ts, min, max, avg)
         ...                 VALUES ('127.0.0.1-readlatency', '2012-05-25 08:35:00', 2.4, 24.8, 14.1);
cqlsh:stats> SELECT * FROM rollups300;
 metric_id             | ts                       | avg  | max  | min
-----------------------+--------------------------+------+------+-----
 127.0.0.1-readlatency | 2012-05-25 08:35:00-0500 | 14.1 | 24.8 | 2.4

Considerations for Wide Rows

One of the reasons Cassandra is such a good fit for this use case is that its data model supports rows with quite a large number of columns. Even with this support though, there is a limit on the number of columns you can practically store in a single row. Depending on your sample rate, you can potentially hit this limit fairly quickly. There are two general approaches to preventing rows from becoming too large: expiring data, and row sharding.

The approach OpsCenter takes is to expire older metric data. This approach generally works well if your collection rate isn’t too high and you are also aggregating your data points. Using Cassandra’s built in ttl support, OpsCenter expires the columns in the rollups60 column family after 7 days, the rollups300 column family after 4 weeks, the rollups 7200 column family after 1 year, and the data in the rollups86400 column family never expires. This prevents any rows from growing too large but also allows you to view aggregated metric data for arbitrary times in the past. OpsCenter also takes advantage of the fact that metric data is only ever written once and never updated. Because of this property, metric column families can set gc_grace_seconds to 0, and allow expired columns to be removed immediately. For more info on why this is useful see this wiki page.

The other approach to preventing rows from becoming too large is to shard your metric rows into multiple rows, often based on time. We have another blog post completely dedicated to solving this problem in Cassandra, so the details of this approach won’t be described here.

Conclusion

Hopefully this post has helped to give some insight into how OpsCenter is handling the metric storage for you cluster as well as general approaches to storing metrics in Cassandra. Feel free to ask any questions you have about OpsCenter or metric collection in Cassandra in the comments section below.



Comments

  1. A Cassandra backend for http://code.google.com/p/rrd4j/ would be wonderful. That would mean full have full fledged RRD feature support and not just min/max/avg.

  2. @Nick – thanks for sharing. It sounds like your rows are super wide. What happens if you want to collect metrics every 10 seconds or even every 1 second. How much historical data could be kept in a single row before hitting the row width limit?

  3. Nick Bailey says:

    @Ashwin, I agree a Cassandra backend for rrd4j would be interesting. Our implementation originally was written in python but is currently implemented in clojure, so it could potentially take advantage of something like that. I’m not sure we will have the time to investigate an rrd4j backend but we could certainly provide advice if someone is interested in implementing that.

  4. Nick Bailey says:

    @Otis yes the rows can become quite wide, however that isn’t extremely unusual in Cassandra. The technical limit for the number of columns in a row is 2 billion however actually approaching that limit is generally not a good idea.

    Generally how wide your rows should be is very dependent on your specific application, but rows with millions of columns is a common use case.

  5. John Sanda says:

    @Nick thanks for the timely post. Two questions. First, how did you decide on the bucket sizes that you are using? Secondly, do you guys use or recommend any Clojure libraries for Cassandra?

  6. Nick Bailey says:

    @John The bucket sizes were picked because we felt those sizes allowed you to see more recent metric data at a fine enough granularity while still keeping older data around at a reduced granularity for the life of the cluster. In our case the number of buckets and expiration times need to be picked while also attempting to keep the total size of metric data and any OpsCenter impact on the cluster very low.

    Regarding clojure, the best library I know of at the moment is clj-hector (https://github.com/pingles/clj-hector). It is just a thin wrapper around the java hector client. OpsCenter’s client needs are actually extremely basic so it works well for us. If you decide to try it out and have any issues though, they generally aren’t hard to fix since hector is a fully featured client. I try to fix bugs and accept pull requests in a pretty timely manner.

  7. Manoj says:

    i wish to collect metric data every 1 second and need those stats generated to a file….how can i configure this in opscenter ??

  8. Nick Bailey says:

    Manoj,

    Unfortunately OpsCenter does not have that ability. You can use our support forums for any additional questions though.

    http://datastax.com/support-forums/

    -Nick

  9. Manoj says:

    Hi Nick,

    What is the sampling interval for metric data collection ??
    how is the stats computed for 1 minute intervals ??

  10. Nick Bailey says:

    Manoj,

    The current sampling interval is 60 seconds. The 1 minute data points are calculated using a weighted average approach similar to http://oss.oetiker.ch/rrdtool/

    -Nick

  11. Walter Santos says:

    Hi, Nick,

    Very nice article! Thanks for sharing.

    You said “OpsCenter will sample a metric every minute and update an in memory view of each rollup size”. How do you handle these rollups in case of shutdown? The data in memory is lost. Do you rebuild the rollups using previously stored metrics?

    Thanks

  12. Nick Bailey says:

    The in memory view of the rollups are saved every five minutes to a different column family in cassandra. On startup we load the saved version of the rollups and continue from there. This way we will lose at most, five minutes of data.

    1. Andrew says:

      Hey, this is a really useful article. One question regarding roll ups…if you are storing roll ups in memory, this works OK for a single instance of an application writing to Cassandra. However, imagine you had multiple instances collecting data, for the same metric – how could roll ups be reliably calculated? My two concerns are:

      1) Storing the cumulative moving average in a distributed way.

      2) If data points were coming in thick and fast, there could be concurrency issues where multiple processes are trying to aggregate the same data point.

      Andrew

      1. Nick Bailey says:

        Currently, we avoid this problem by doing any aggregation like that on the client side. For example if we want to display the average read latency for the entire cluster, we fetch the read latencies for each individual node and calculate the average then.

        In order to do that kind of aggregation before storing data you would need to have some sort of central process for collecting and processing the different data points. You could use some sort of CEP system like storm to do that in a distributed way

      2. Rajeev Jha says:

        Andrew
        How about using some in-memory store like Redis? individual application instances can write to one redis instance. From the post, a job is triggered every x minute to process the rollup so latency should not be an issue.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>