Apache Cassandra 0.6 Documentation

Monitoring

This document corresponds to an earlier product version. Make sure you are using the version that corresponds to your version.

Latest Cassandra documentation | Earlier Cassandra documentation

Understanding the performance characteristics of your cluster is critical to correctly plannning capacity requirements and to diagnosing issues.

Recognizing this need, Cassandra has a number of attributes and management operations exposed via JMX. Knowing what these attributes mean and spending some time exploring them as you develop will make base lining, monitoring and tuning your Cassandra cluster significantly easier.

To get the most out of the available output, you should use a remote monitoring tool that supports JMX queries and has the ability to capture and store statistics over time.

JMX pendingtasks

Monitoring compaction performance is an important aspect of knowing when to add capacity to your cluster. The attributes exposed through CompactionManagerMBean are listed below:

Attribute Description
CompletedTasks Number of completed compactions since the last start of this Cassandra instance
PendingTasks Number of estimated tasks remaining to perform
ColumnFamilyInProgress ColumnFamily currently being compacted. null if no compactions are in progress.
BytesTotalInProgress Total number of data bytes (index and filter are not included) being compacted. null if no compactions are in progress.
BytesCompacted The progress of the current compaction. null if no compactions are in progress.

Thread Pool Statistics

Cassandra maintains distinct thread pools for different stages of execution. Each of these thread pools provide statistics on the number of tasks that are active, pending and completed. Watching trends on these pools for increases in the pending tasks column is an excellent indicator of the need to add additional capacity. Once a baseline is established, alarms should be configured for any increases past normal in the pending tasks column. See below for details on each thread pool (this list can also be obtained via command line using nodetool tpstats).

Thread Pool Description
AE_SERVICE_STAGE Shows anti-entropy tasks
CONSISTENCY-MANAGER Handles the background consistency checks if they were triggered from the client’s consistency level <consistency>
FLUSH-SORTER-POOL Sorts flushes that have been submitted
FLUSH-WRITER-POOL Writes the sorted flushes
GOSSIP_STAGE Activity of the Gossip protocol on the ring
LB-OPERATIONS The number of load balancing operations
LB-TARGET Used by nodes leaving the ring
MEMTABLE-POST-FLUSHER Memtable flushes that are waiting to be written to the commit log.
MESSAGE-STREAMING-POOL Streaming operations. Usually triggered by bootstrapping or decommissioning nodes.
MIGRATION_STAGE Tasks resulting from the call of system_* methods in the API that have modified the schema
MISC_STAGE  
MUTATION_STAGE API calls that are modifying data
READ_STAGE API calls that have read data
RESPONSE_STAGE Response tasks from other nodes to message streaming from this node
STREAM_STAGE Stream tasks from this node

StorageProxy Latency

Cassandra keeps tracks latency (averages and totals) of read, write and slicing operations at the server level through StorageProxyMBean.

ColumnFamily Statistics

For individual column families, ColumnFamilyStoreMBean provides the same general latency attributes as StorageProxyMBean. Unlike StorageProxyMBean, ColumnFamilyStoreMBean has a number of other statistics that are important to monitor for performance trends. The most important of these are listed below:

Attribute Description
MemtableDataSize The total size consumed by this column family’s data (not including meta data)
MemtableColumnsCount Returns the total number of columns present in the memtable (across all keys)
MemtableSwitchCount How many times the memtable has been flushed out
RecentReadLatencyMicros The average read latency since the last call to this bean
RecentWriterLatencyMicros The average write latency since the last call to this bean
LiveSSTableCount The number of live SSTables for this ColumnFamily

The first three memtable attributes are discussed in detail on the Tuning page.

The recent read latency and write latency counters are important in making sure that operations are happening in a consistent manner. If these counters start to increase after a period of staying flat, it is probably an indication of a need to add cluster capacity.

LiveSSTableCount can be monitored with a threshold to ensure that the number of SSTables for a given ColumnFamily does not become too great.