Understanding the performance characteristics of your Cassandra cluster is critical to diagnosing issues and planning capacity.
Cassandra exposes a number of statistics and management operations via Java Management Extensions (JMX). Java Management Extensions (JMX) is a Java technology that supplies tools for managing and monitoring Java applications and services. Any statistic or operation that a Java application has exposed as an MBean can then be monitored or manipulated using JMX.
During normal operation, Cassandra outputs information and statistics that you can monitor using JMX-compliant tools, such as:
Using the same tools, you can perform certain administrative commands and operations such as flushing caches or doing a node repair.
DataStax OpsCenter is a graphical user interface for monitoring and administering all nodes in a Cassandra cluster from one centralized console. DataStax OpsCenter is bundled with DataStax support offerings. You can register for a free version for development or non-production use.
OpsCenter provides a graphical representation of performance trends in a summary view that is hard to obtain with other monitoring tools. The GUI provides views for different time periods as well as the capability to drill down on single data points. Both real-time and historical performance data for a Cassandra or DataStax Enterprise cluster are available in OpsCenter. OpsCenter metrics are captured and stored within Cassandra.
Within OpsCenter you can customrize the performance metrics viewed to meet your monitoring needs. Administrators can also perform routine node administration tasks from OpsCenter. Metrics within OpsCenter are divided into three general categories: table metrics, cluster metrics, and OS metrics. For many of the available metrics, you can view aggregated cluster-wide information or view information on a per-node basis.
The nodetool utility is a command-line interface for monitoring Cassandra and performing routine database operations. Included in the Cassandra distribution, nodetool and is typically run directly from an operational Cassandra node.
The nodetool utility supports the most important JMX metrics and operations, and includes other useful commands for Cassandra administration. This utility is commonly used to output a quick summary of the ring and its current state of general health with the status command. For example:
The nodetool utility provides commands for viewing detailed metrics for tables, server metrics, and compaction statistics. Commands include decommissioning a node, running repair, and moving partitioning tokens.
JConsole is a JMX-compliant tool for monitoring Java applications such as Cassandra. It is included with Sun JDK 5.0 and higher. JConsole consumes the JMX metrics and operations exposed by Cassandra and displays them in a well-organized GUI. For each node monitored, JConsole provides these six separate tab views:
The Overview and Memory tabs contain information that is very useful for Cassandra developers. The Memory tab allows you to compare heap and non-heap memory usage, and provides a control to immediately perform Java garbage collection.
For specific Cassandra metrics and operations, the most important area of JConsole is the MBeans tab. This tab lists the following Cassandra MBeans:
When you select an MBean in the tree, its MBeanInfo and MBean Descriptor are displayed on the right, and any attributes, operations or notifications appear in the tree below it. For example, selecting and expanding the org.apache.cassandra.db MBean to view available actions for a table results in a display like the following:
If you choose to monitor Cassandra using JConsole, keep in mind that JConsole consumes a significant amount of system resources. For this reason, DataStax recommends running JConsole on a remote machine rather than on the same host as a Cassandra node.
The JConsole CompactionManagerMBean exposes compaction metrics that can indicate when you need to add capacity to your cluster.
Monitoring compaction performance is an important aspect of knowing when to add capacity to your cluster. The following attributes are exposed through CompactionManagerMBean:
| Attribute | Description |
|---|---|
| CompletedTasks | Number of completed compactions since the last start of this Cassandra instance |
| PendingTasks | Number of estimated tasks remaining to perform |
| ColumnFamilyInProgress | The table currently being compacted. This attribute is null if no compactions are in progress. |
| BytesTotalInProgress | Total number of data bytes (index and filter are not included) being compacted. This attribute is null if no compactions are in progress. |
| BytesCompacted | The progress of the current compaction. This attribute is null if no compactions are in progress. |
Cassandra maintains distinct thread pools for different stages of execution. Each of the thread pools provide statistics on the number of tasks that are active, pending, and completed. Trends on these pools for increases in the pending tasks column indicate when to add additional capacity. After a baseline is established, configure alarms for any increases above normal in the pending tasks column. Use nodetool tpstats on the command line to view the thread pool details shown in the following table.
| Thread Pool | Description |
|---|---|
| AE_SERVICE_STAGE | Shows anti-entropy tasks |
| CONSISTENCY-MANAGER | Handles the background consistency checks if they were triggered from the client's consistency level. |
| FLUSH-SORTER-POOL | Sorts flushes that have been submitted. |
| FLUSH-WRITER-POOL | Writes the sorted flushes. |
| GOSSIP_STAGE | Activity of the Gossip protocol on the ring. |
| LB-OPERATIONS | The number of load balancing operations. |
| LB-TARGET | Used by nodes leaving the ring. |
| MEMTABLE-POST-FLUSHER | Memtable flushes that are waiting to be written to the commit log. |
| MESSAGE-STREAMING-POOL | Streaming operations. Usually triggered by bootstrapping or decommissioning nodes. |
| MIGRATION_STAGE | Tasks resulting from the call of system_* methods in the API that have modified the schema. |
| MISC_STAGE | |
| MUTATION_STAGE | API calls that are modifying data. |
| READ_STAGE | API calls that have read data. |
| RESPONSE_STAGE | Response tasks from other nodes to message streaming from this node. |
| STREAM_STAGE | Stream tasks from this node. |
Cassandra tracks latency (averages and totals) of read, write, and slicing operations at the server level through StorageProxyMBean.
For individual tables, ColumnFamilyStoreMBean provides the same general latency attributes as StorageProxyMBean. Unlike StorageProxyMBean, ColumnFamilyStoreMBean has a number of other statistics that are important to monitor for performance trends. The most important of these are:
| Attribute | Description |
|---|---|
| MemtableDataSize | The total size consumed by this table's data (not including metadata). |
| MemtableColumnsCount | Returns the total number of columns present in the memtable (across all keys). |
| MemtableSwitchCount | How many times the memtable has been flushed out. |
| RecentReadLatencyMicros | The average read latency since the last call to this bean. |
| RecentWriterLatencyMicros | The average write latency since the last call to this bean. |
| LiveSSTableCount | The number of live SSTables for this table. |
The recent read latency and write latency counters are important in making sure operations are happening in a consistent manner. If these counters start to increase after a period of staying flat, you probably need to add capacity to the cluster.
You can set a threshold and monitor LiveSSTableCount to ensure that the number of SSTables for a given table does not become too great.