Monitoring Your Solr Filter Caches with the DataStax Enterprise Performance Service
In DSE 4.5, we released the first iteration of the DataStax Enterprise Performance Service, a CQL-based interface that helps users monitor the state of their Cassandra cluster. With the release of DSE 4.6, we've added a new set of diagnostic tables that make it easier to monitor the state of Solr, the core component of DSE Search. These tables record information about aberrant events, like slow queries on particular nodes in a distributed query, and the performance of various components, like Solr's caches, over time.
In this post, we'll take a look at how you can use these new diagnostic tables to monitor the filter cache for a given core. (If filter cache monitoring isn't your cup of tea, take a look at our official documentation to explore the other Solr monitoring tools now at your disposal in DataStax Enterprise.)
If you're a Solr veteran, you're probably familiar with some of the traditional methods used to monitor and tune the filter caches for your cores. Particularly useful is the Plugins & Stats Screen in the Solr Admin web UI. Here, you can inspect statistics like the number of insertions/lookups/hits as well as the cache's hit ratio and size (in terms of the number of documents). While this is a great tool to get some basic visibility into what's going on with your filter caches, it has a few drawbacks. First, the statistics presented here are only snapshots, and they give you very little information about the status of the cache over time. (The reason I say "little" and not "none" is that there are some cumulative metrics that preserve statistics through index searcher replacement, but even those are still snapshots.) Second, the web UI (and to a lesser extent the REST-like interface behind it) starts to become unwieldy as the size of your DSE Search cluster increases and you'd like to inspect the statistics over groups of nodes.
The New Hotness
Enabling the collection of Solr cache statistics through the DSE Performance Service is as simple as editing your dse.yaml and restarting a node. (It is also possible to start and configure this without a restart, but we'll keep it simple.) Once you've done that, your node will automatically start taking regular snapshots of your filter cache stats and record them in a table called solr_filter_cache_stats in the dse_perf keyspace. To get a core up, populate it with data, and exercise the cache, I ran the Solr Stress demo that ships with DSE. (If you're having trouble finding that, take a look at the documentation for our Wikipedia search example.)
With a core created, data in place, and some query history behind us, it's time to open up cqlsh and see what's been going on with our filter cache. For now, let's pull up the 5 most recent snapshots.
Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 188.8.131.52 | DSE 4.7.0 | CQL spec 3.2.0 | Native protocol v3]
cqlsh> use dse_perf;
cqlsh:dse_perf> SELECT * FROM solr_filter_cache_stats WHERE node_ip = '127.0.0.1' AND core = 'demo.solr' AND date = '2015-1-15' ORDER BY time DESC LIMIT 5;
node_ip | core | date | time | cumulative_hitratio | cumulative_hits | ...
127.0.0.1 | demo.solr | 2015-01-15 00:00:00-0800 | 2015-01-15 15:47:18-0800 | 0.99 | 205630 | ...
127.0.0.1 | demo.solr | 2015-01-15 00:00:00-0800 | 2015-01-15 15:46:18-0800 | 0.99 | 182253 | ...
127.0.0.1 | demo.solr | 2015-01-15 00:00:00-0800 | 2015-01-15 15:45:18-0800 | 0.99 | 118580 | ...
127.0.0.1 | demo.solr | 2015-01-15 00:00:00-0800 | 2015-01-15 15:44:18-0800 | 0.99 | 54200 | ...
127.0.0.1 | demo.solr | 2015-01-15 00:00:00-0800 | 2015-01-15 15:43:18-0800 | 1 | 126 | ...
(Note: I've truncated the full results for readability.)
It's always better to have historical data that can be analyzed at any time than be forced to watch the system when a problem occurs. Even in this contrived example, we can see how the CQL-based interface allows us to look at things like the cache hit-ratio over time, and not just at a point in time.
In some ways, the new monitoring tools we've built for DSE 4.6 are only a starting point. While cqlsh is a capable tool, the historical data recorded in the new diagnostic tables would be much easier to consume in graphical form. Fortunately, as usage increases, integration with OpsCenter becomes more and more likely.
DataStax has many ways for you to advance in your career and knowledge.
You can take free classes, get certified, or read one of our many white papers.
register for classes
DBA's Guide to NoSQL