DataStax Developer Blog

Inside OpsCenter 3.0 – Storing OpsCenter Data in a Separate Cluster

By Tyler Hobbs -  March 25, 2013 | 0 Comments

With OpsCenter Enterprise Edition 3.0, we’ve made it easier to store all of OpsCenter’s data in a separate cluster instead of the cluster being monitored. The data that OpsCenter stores includes performance metrics, a log of notable events, and settings such as alert configurations and snapshot schedules. Normally, OpsCenter stores this data in a dedicated keyspace on the cluster being monitored. This has the advantage of being simple to set up and able to handle a wide range of cluster sizes. However, there are several reasons why you might want to store this data in a separate cluster:

  • If node failures result in portions of the OpsCenter data being unavailable, some performance metrics may not be viewable, making it harder to diagnose problems with the cluster.
  • If the monitored cluster has a large number of column families, capturing full metrics for all of those column families may generate more data than is acceptable for your production cluster. While it is possible to limit the amount of metric data collected, you may instead prefer to simply capture the full set of metrics and store them in a separate, dedicated cluster.
  • You may want to avoid any extra reads or writes on your production cluster.

Configuration

In the past, we have suggested using a separate datacenter with NetworkTopologyStrategy to store OpsCenter data on non-production nodes. In OpsCenter 3.0, we’ve made this easier by adding a configuration option. In the cluster-specific configuration file, you can now add a [storage_cassandra] section with the following options and defaults:

[storage_cassandra]
seed_hosts =
api_port = 9160
connect_timeout = 6.0
username =
password =
keyspace = OpsCenter

The seed_hosts option takes a comma-separated list of node IP addresses in the data storage cluster, and api_port should match the Thrift RPC port for the data storage cluster. The username and password options correspond to normal Thrift authentication credentials. The keyspace option simply specifies the name of the keyspace that OpsCenter will store its data in.

Once this section is defined in your config file, restart opscenterd for the changes to take effect.

How it Works

OpsCenter will store all performance metrics, settings, and event logs in the data storage cluster and will no longer read from or write to the monitored cluster, although it will still establish Thrift and JMX connections to the monitored cluster in order to watch the cluster state.

Normally, each OpsCenter agent will write its performance metric data to the local node. Instead, each agent will open a small connection pool against the data storage cluster and store the data there. This does require that each agent be able to reach nodes in the data storage cluster on the Thrift port.

Multi-Data Center Storage Clusters

Each OpsCenter agent will attempt to automatically discover and use nodes in the data storage cluster. However, they will take some care to not connect to nodes in a remote data center if possible. For this reason, the agents will only use data storage cluster nodes in a data center with the same name as the data center of the local node that the agent is monitoring. If this default behavior doesn’t work for you, you will need to explicitly set an option like the following in the address.yaml file (located in /var/lib/opscenter-agent/conf/ for package installations of the agent) for each agent:

storage_dc: datacenter1

In this case, the agent would only connect to and use storage cluster nodes in the “datacenter1″ data center.

Deployment Recommendations

Here are a few tips for deploying your data storage cluster:

Monitoring the Data Storage Cluster

By default, OpsCenter will not monitor the data storage cluster. To enable this, simply add the data storage cluster as another cluster to be monitored through the OpsCenter interface. If you plan to do this, note that OpsCenter will attempt to use the “OpsCenter” keyspace by default, so you will need to change the keyspace option’s value away from the default in the [storage_cassandra] section mentioned above.

Using the opscenterd Machine for the Data Storage Cluster

We typically recommend that opscenterd be run on a dedicated machine to avoid unwanted interference with production cluster nodes. However, opscenterd typically does not consume much CPU, RAM, or disk, so also using that machine as a node in the data storage cluster should not cause any problems.

Sizing the Data Storage Cluster

A single-node data storage cluster should be sufficient to power OpsCenter when monitoring fairly large clusters (perhaps 50 to 100 nodes), but you may need to add a second or third node to handle larger clusters. Be sure to keep an eye on the data storage cluster’s performance metrics when the monitored cluster is large.

It’s also worth noting that you may want to have more than one node in the data storage cluster for availability and data durability purposes. If you increase the number of nodes in the data storage cluster, make sure to also update the OpsCenter keyspace’s replication strategy options to have the desired amount of replication.

Try it Out

Separate data storage clusters are a feature of the Enterprise Edition of OpsCenter. You can download it here today and try it for free on your development clusters.