Jonathan Ellis

<h3>Guest post by&nbsp;<a href="https://github.com/cburroughs/">Chris Burroughs</a></h3>

<p>Starting in 1.1, Apache Cassandra® began exposing its already bountiful internal metrics using the popular&nbsp;<a href="http://metrics.dropwizard.io/">Metrics</a>&nbsp;library. The number of metrics has since been&nbsp;<a href="https://www.datastax.com/dev/blog/metrics-in-cassandra12">greatly expanded</a>&nbsp;in 1.2 and beyond. There are now a&nbsp;<a href="http://wiki.apache.org/cassandra/Metrics">variety of metrics</a>&nbsp;for cache size, hit rate, client request latency, thread pool status, per column family statistics, and other operational measurements.</p>

<p>You could always write some custom java code to send these metrics onto a system like&nbsp;<a href="http://graphite.readthedocs.org/">graphite</a>&nbsp;or&nbsp;<a href="http://ganglia.info/">ganglia</a>&nbsp;for data storage and graphing. Starting in Cassandra 2.0.2, pluggable Metrics reporter support is&nbsp;<a href="https://issues.apache.org/jira/browse/CASSANDRA-4430">built in</a>.</p>

<h2 id="setup">Setup</h2>

<ol>
	<li>Grab your favorite reporter jar (such as&nbsp;<a href="https://mvnrepository.com/artifact/com.yammer.metrics/metrics-graphite/2.2.0">metrics-graphite</a>) and add it to the server's&nbsp;<code>lib</code></li>
	<li>Create a configuration file for the reporters following the&nbsp;<a href="https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=conf/metrics-reporter-config-sample.yaml;hb=refs/tags/cassandra-2.0.2">sample</a>.</li>
	<li>Start the server with&nbsp;<code>-Dcassandra.metricsReporterConfigFile=yourCoolFile.yaml</code></li>
	<li>Happy Graphing!</li>
</ol>

<p>A config file to send some basic metrics to a single local graphite server once a minute might look like:</p>

<pre>
<code>graphite:
  -
    period: 60
    timeunit: 'SECONDS'
    hosts:
     - host: 'graphite-server.domain.local'
       port: 2003
    predicate:
      color: "white"
      useQualifiedName: true
      patterns:
        - "^org.apache.cassandra.metrics.Cache.+"
        - "^org.apache.cassandra.metrics.ClientRequest.+"
        - "^org.apache.cassandra.metrics.Storage.+"
        - "^org.apache.cassandra.metrics.ThreadPools.+"
</code></pre>

<p>You can specifically include or exclude groups of metrics. For example, detailed per column family metrics for a cluster with a single column family might be useful, while excluding them to avoid overwhelming the graphing system might be preferable for a cluster with hundreds of column families. See the&nbsp;<a href="https://github.com/addthis/metrics-reporter-config">metrics-reporter-config</a>&nbsp;library for all of the configuration details.</p>

<h2 id="examplegraphs">Example Graphs</h2>

<h3 id="dataload">Data Load</h3>

<p><img alt="simple load" data-entity-type="file" data-entity-uuid="ccfc2e8a-b0a1-4c0d-ab6b-905e798af46d" src="https://www.datastax.com/sites/default/files/inline-images/simple-load-700x310.png" /></p>

<p>A simple stacked graph showing the amount of data stored in a cluster growing over the past month. Presumably new nodes will eventually be required if the trend continues.</p>

<h3 id="readlatency">Read Latency</h3>

<p><img alt="client vs cf" data-entity-type="file" data-entity-uuid="295d703a-6d8e-46cf-8cc9-470815900bf6" src="https://www.datastax.com/sites/default/files/inline-images/client-vs-cf-700x328.png" /></p>

<p>Troubleshooting 95th percentile latency on a node that clients detected erratic behavior on. The top blue line is coordinator latency, while the bottom line is latency for satisfying read requests within this node's range. The lack of correlation between the two implies the problem causing the large blue spikes lies elsewhere in the cluster and not with the coordinator that happens to be receiving client requests (or at the least that the problem does not have to do with local IO).</p>

<h3 id="cachesize">Cache Size</h3>

<p><img alt="cache size" data-entity-type="file" data-entity-uuid="35ae5a79-6bde-41f5-8640-b7c48629c459" src="https://www.datastax.com/sites/default/files/inline-images/cache-size-700x350.png" /></p>

<p>For a newly bootstrapped node, both the number of entries in the RowCache and the graphite calculated derivative showing growth rate.</p>

<p>&nbsp;</p>


Pluggable metrics reporting in Cassandra 2.0.2

Jonathan EllisTechnology

Share

Share

Guest post by Chris Burroughs

More Technology

How to Build a Crystal Image Search App with Vector Search

Knowledge Graphs for RAG without a GraphDB

How Winweb Built its AI Assistant with DataStax Astra DB and LangChain

Vercel + Astra DB: Get Data into Your GenAI Apps Fast

One-stop Data API for Production GenAI