DataStax OpsCenter Documentation

Retrieving Metric Data

Using the metric retrieval methods you can retrieve performance metrics at the cluster, node, and column family levels.

Metric Retrieval Methods URL
Retrieve cluster-wide metrics. GET /{cluster_id}/cluster-metrics/{dc}/{metric}
Retrieve cluster-wide metrics about a device. GET /{cluster_id}/cluster-metrics/{dc}/{metric}/{device}
Retrieve cluster-wide metrics about a column family. GET /{cluster_id}/cluster-metrics/{dc}/{ks_name}/{cf_name}/{metric}
Retrieve metrics about a node. GET /{cluster_id}/metrics/{node_ip}/{metric}
Retrieve node-specific metrics about a device. GET /{cluster_id}/metrics/{node_ip}/{metric}/{device}
Retrieve node-specific metrics about a column family. GET /{cluster_id}/metrics/{node_ip}/{ks_name}/{cf_name}/{metric}

You can choose from a large number of metric keys an option to pass with these methods, making retrieval of a wide spectrum of performance information possible.

Filtering the Metric Data Output

You can also use the following query parameters with these methods to filter the output:

Query Parameter Description
start (optional) A timestamp in seconds indicating the beginning of a range for aggregating the metric.
end (optional) A timestamp in seconds indicating the end of a range for aggregating the metric.
step (optional) The resolution of the input. Valid input options are: 1, 5, 120, or 1440 minutes; corresponding output intervals are 60, 200, 7200, or 86400 seconds.
function (optional) The type of aggregation to perform on the metric: min, max, or average. By default, results are returned for all three types of aggregation.

Results of calls to retrieve metrics are returned in the following format:

{
  [<node_ip>: | <device>: | <keyspace.columnfamily>:]
    {
      <function>:
        [
          [<timestamp> <value>],
               ...
        ]
    }
}

By default, the output is metric data points at 60-second intervals over a 24-hour period.

GET /{cluster_id}/cluster-metrics/{dc}/{metric}

Aggregate a metric across multiple nodes in the cluster rather than retrieving data about a single node.

Path arguments:
  • cluster_id -- A key, which identifies the node's cluster, in the dictionary returned by GET /cluster-configs.
  • dc -- The name of the data center for the nodes. Use the name all to aggregate a metric across all data centers.
  • metric -- One of the Cluster Metrics Keys.
Query params:

parameters -- The parameters listed in Filtering the Metric Data Output.

Returns metric data across multiple nodes in a cluster.

Example

Get the average write requests per second over to the cluster over all data centers on May 1, 2012 from 8 AM to 5 PM GMT. Show data points at 2-hour (120-minute) intervals.

curl
  http://127.0.0.1:8888/Test_Cluster/cluster-metrics/all/write-ops
    -d 'step=120'
    -d 'start=1335859200'
    -d 'end=1335891600'
    -d 'function=average'
    -G -X GET

Output:

Data points at 2-hour (7200 seconds) intervals show the number of write requests per second during business hours on May 1.

{
  "Total": {
    "AVERAGE": [
      [
        1335859200,
        null
      ],
      [
        1335866400,
        13.376885890960693
      ],
      [
        1335873600,
        13.372154712677002
      ],
      [
        1335880800,
        13.365732669830322
      ],
      [
        1335888000,
        13.392115592956543
      ]
    ]
  }
}
GET /{cluster_id}/cluster-metrics/{dc}/{metric}/{device}

Aggregate a disk or network metric, which pertains to a specific device, across multiple nodes in the cluster rather than retrieving data about a single node.

Path arguments:
  • cluster_id -- A key, which identifies the node's cluster, in the dictionary returned by GET /cluster-configs.
  • dc -- The name of the data center for the nodes. Use the name all to aggregate a metric across all data centers.
  • metric -- One of the Cluster Metrics Keys or Operating System Metrics Keys.
  • device -- The device to be measured, which the Node object lists. Use the name all to measure all devices, for example all disk devices associated with a disk metric.
Query params:

parameters -- The parameters listed in Filtering the Metric Data Output.

Examples of Device Arguments

Example devices Description
"network_interfaces": ["lo0", "en1"] Devices measured by network metrics
"devices": {"saved_caches": "disk1", "commitlog": "disk1", "other": ["disk0"], "data": ["disk1"]} Devices measured by disk metrics
"partitions": {"saved_caches": "/dev/disk1s2", "commitlog": "/dev/disk1s2", "other": ["/dev/disk0s2"], "data": ["/dev/disk1s2"]} Devices measured by partition metrics

Using a partition, network interface, or other device name for the device arguement returns disk or network metric data about a specific device across multiple nodes. Using all for the device name returns a dictionary of keys (device names) and the values (results for that device).

Example

Get the average GB of space on all disks in all data centers used each day by the cluster from April 11, 2012 00:00:00 to April 26, 2012 00:00:00 GMT.

curl
  http://127.0.0.1:8888/Test_Cluster/cluster-metrics/all/os-disk-used/all
    -d 'step=1440'
    -d 'start=1334102400'
    -d 'end=1335398400'
    -d 'function=average'
    -G -X GET

Output:

{
  "Total": {
    "AVERAGE": [
      [
        1334102400,
        null
      ],
      [
        1334188800,
        21.000694274902344
      ],
      [
        1334275200,
        8.736943244934082
      ],
      [
        1334361600,
        9.0
      ],
      [
        1334448000,
        19.0
      ],
      [
        1334534400,
        19.0
      ],
      [
        1334620800,
        19.0
      ],
      [
        1334707200,
        19.0
      ],
      [
        1334793600,
        18.629029273986816
      ],
      [
        1334880000,
        19.923184394836426
      ],
      [
        1334966400,
        25.0
      ],
      [
        1335052800,
        25.0
      ],
      [
        1335139200,
        25.923053741455078
      ],
      [
        1335225600,
        26.0
      ],
      [
        1335312000,
        26.549484252929688
      ]
    ]
  }
}
GET /{cluster_id}/cluster-metrics/{dc}/{ks_name}/{cf_name}/{metric}

Aggregate a column family metric across multiple nodes in the cluster rather than retrieving data about a single node.

Path arguments:
  • cluster_id -- A key, which identifies the node's cluster, in the dictionary returned by GET /cluster-configs.
  • dc -- The name of the data center for the nodes. Use the name all to aggregate a metric across all data centers.
  • ks_name -- The keyspace that contains the column family to be measured.
  • cf_name -- The column family to be measured.
  • metric -- One of the Column Family Metrics Keys.
Query params:

parameters -- The parameters listed in Filtering the Metric Data Output.

Returns metric data for multiple nodes.

Example

Get the maximum bytes of disk space used for live data by the rollups60 column family in the Opscenter keyspace of the cluster over all data centers from May 1, 2012 00:00:00 to May 5, 2012 00:00:00 GMT.

curl http://127.0.0.1:8888/Test_Cluster/cluster-metrics/all/OpsCenter/rollups60/cf-live-disk-used
  -d 'function=max'
  -d 'start=1335830400'
  -d 'end=1336176000'
  -d 'step=1440'
  -G -X GET

Output:

Data points at 24-hour intervals show the metrics for the period.

{

  "Total": {
    "MAX": [
      [
        1335830400,
        9740462592.0
      ],
      [
        1335916800,
        9932527616.0
      ],
      [
        1336003200,
        null
      ],
      [
        1336089600,
        10644448512.0
      ]
    ]
  }
}
GET /{cluster_id}/metrics/{node_ip}/{metric}

Retrieve metric data for a single node.

Path arguments:
Query params:

parameters -- The parameters listed in Filtering the Metric Data Output.

Returns metric data for a single node.

Example

Get the daily average data load on cluster node 10.11.12.150 from April 20, 2012 00:00:00 to April 26, 2012 00:00:00 GMT.

curl
http://127.0.0.1:8888/Test_Cluster/metrics/10.11.12.150/data-load
  -d 'step=1440'
  -d 'start=1334880000'
  -d 'end=1335398400'
  -d 'function=average'
  -G -X GET

Output:

{
  "10.11.12.150": {
    "AVERAGE": [
      [
        1334880000,
        null
      ],
      [
        1334966400,
        6353770496.0
      ],
      [
        1335052800,
        6560092672.0
      ],
      [
        1335139200,
        6019291136.0
      ],
      [
        1335225600,
        6149050880.0
      ],
      [
        1335312000,
        6271239680.0
      ]
    ]
  }
}
GET /{cluster_id}/metrics/{node_ip}/{metric}/{device}

Aggregate a disk or network metric for a single node.

Path arguments:
Query params:

parameters -- The parameters listed in Filtering the Metric Data Output.

Returns disk or network metrics data for a single node.

Example

Get the maximum GB of disk space for all disks used by cluster node 10.11.12.150 from April 30, 2012 at 22:05 to May 1, 2012 8:00:00 GMT.

curl
http://127.0.0.1:8888/Test_Cluster/metrics/10.11.12.150/os-disk-used/all
  -d 'start=1335823500'
  -d 'end=1335859200'
  -d 'step=120'
  -d 'function=max'
-G -X GET

Output:

Data points at 2-minute intervals show the disk space used by device /dev/sda1.

{
  "/dev/sda1": {
    "MAX": [
      [
        1335823200,
        null
      ],
      [
        1335830400,
        17.0
      ],
      [
        1335837600,
        16.0
      ],
      [
        1335844800,
        17.0
      ],
      [
        1335852000,
        16.0
      ]
    ]
  }
}
GET /{cluster_id}/metrics/{node_ip}/{ks_name}/{cf_name}/{metric}

Retrieve metric data about a column family on a single node.

Path arguments:
  • cluster_id -- A key, which identifies the node's cluster, in the dictionary returned by GET /cluster-configs.
  • node_ip -- IP address of the target Node.
  • ks_name -- The keyspace that contains the column family to be measured.
  • cf_name -- The column family to be measured.
  • metric -- One of the Column Family Metrics Keys.
Query params:

parameters -- The parameters listed in Filtering the Metric Data Output.

Example

Get the daily, maximum response time (in microseconds) to write requests on the rollups60 column family in the OpsCenter keyspace by cluster node 10.11.12.150 from May 1, 2012 at 00:00:00 to May 5, 2012 00:00:00 GMT.

curl
http://127.0.0.1:8888/Test_Cluster/metrics/10.11.12.150/OpsCenter/rollups60/cf-write-latency-op
  -d 'function=max'
  -d 'start=1335830400'
  -d 'end=1336176000'
  -d 'step=1440'
  -G -X GET

Output:

{
  "OpsCenter.rollups60": {
    "MAX": [
      [
        1335830400,
        102.28681945800781
      ],
      [
        1335916800,
        124.86614227294922
      ],
      [
        1336003200,
        null
      ],
      [
        1336089600,
        127.14733123779297
      ]
    ]
  }
}

Metrics Attribute Key Lists

This section contains these tables of metric keys to use with resources that retrieve OpsCenter performance data:

Cluster Metrics Keys

This list of keys corresponds to Cassandra metrics collected by OpsCenter:

Key Units Description
data-load Bytes Size of the data on the node.
pending-compaction-tasks -- Number of compaction operations queued and waiting to run.
pending-flush-sorter-tasks -- Number of pending tasks related to the first step in flushing memtables to disk as SSTables.
pending-flushes -- Number of memtables queued for the flush process.
pending-gossip-tasks -- Number of gossip messages and acknowledgments queued and waiting to be sent or received.
pending-hinted-handoff -- Number of hints in the queue waiting to be delivered after a failed node comes up.
pending-internal-responses -- Number of pending tasks from internal tasks, such as nodes joining and leaving the cluster.
pending-memtable-post-flushers -- Number of pending tasks related to the last step in flushing memtables to disk as SSTables.
pending-migrations -- Number of pending tasks from system methods that modified the schema.
pending-misc-tasks -- Number of pending tasks from infrequently run operations, not measured by another metric.
pending-read-ops -- Number of read requests received by the cluster and waiting to be handled.
pending-read-repair-tasks -- Number of read repair operations in the queue waiting to run.
pending-repair-tasks -- Manual repair tasks pending, operations to be completed during anti-entropy repair of a node.
pending-repl-on-write-tasks -- Pending tasks related replication of data after an insert or update to a row.
pending-request-responses -- Progress of streamed rows from the receiving node.
pending-streams -- Progress of streamed rows from the sending node.
pending-write-ops -- Number of write requests received by the cluster and waiting to be handled.
read-latency-op microseconds Average response time to a client read request.
read-ops -- The number of read requests per second.
write-latency-op microseconds The average response time to a client write request.
write-ops -- The write requests per second.

Column Family Metrics Keys

This list of keys corresponds to column family-specific metrics collected by OpsCenter:

Key Units Description
cf-keycache-hit-rate % Cache requests that resulted in a key cache hit.
cf-keycache-hits -- Number of read requests that resulted in the requested row key being found in the key cache.
cf-keycache-requests -- Total number of read requests on the key cache.
cf-live-disk-used bytes Disk space used by a column family for readable data.
cf-live-sstables -- Current number of SSTables for a column family.
cf-pending-tasks -- Number of pending reads and writes on a column family.
cf-read-latency-op microseconds Internal response time to a successful request to read data from a column family.
cf-read-ops -- Read requests per second on a column family.
cf-rowcache-hit-rate -- Percentage of cache requests that resulted in a row cache hit.
cf-rowcache-hits -- Number of read requests on the row cache.
cf-rowcache-requests -- Total number of read requests on the row cache.
cf-total-disk-used -- Disk space used by a column family for live or old data (not live).
cf-write-latency-op microseconds Internal response time to a successful request to write data to a column family.
cf-write-ops -- Write requests per second on a column family.

Operating System Metrics Keys

This list of keys corresponds to operating system (OS) metrics collected by OpsCenter:

Key OS Units Description
heap-committed all* bytes Allocated memory guaranteed for the Java heap.
heap-max all* bytes Maximum amount that the Java heap can grow.
heap-used all* bytes Average amount of Java heap memory used by Cassandra processes.
nonheap-committed all* bytes Allocated memory, guaranteed for Java nonheap.
nonheap-max all* bytes Maximum amount that the Java nonheap can grow.
nonheap-used all* bytes Average amount of Java nonheap memory used by Cassandra processes.
os-cpu-idle all* % Time the CPU is idle.
os-cpu-iowait Linux % Time the CPU devotes to waiting for I/O to complete.
os-cpu-nice Linux % Time the CPU devotes to processing nice tasks.
os-cpu-privileged Windows % Time the CPU devotes to processing privileged instructions.
os-cpu-steal Linux % Time the CPU devotes to tasks stolen by virtual operating systems.
os-cpu-system Linux, OSX % Time the CPU devotes to system processes.
os-cpu-user all* % Time the CPU devotes to user processes.
os-disk-await Linux, Windows MS Average completion time of each request to the disk.
os-disk-free all* GB Free space on a specific disk partition.
os-disk-queue-size Linux, Windows -- Average number of requests queued due to disk latency issues.
os-disk-read-rate Linux, Windows -- Rate of reads per second to the disk.
os-disk-read-throughput Linux, Windows mb/sec Average disk throughput for read operations.
os-disk-request-size Linux sectors Average size of read requests issued to the disk.
os-disk-request-size-kb Windows KB Average size of read requests issued to the disk.
os-disk-throughput OSX mb/sec Average disk throughput for read and write operations.
os-disk-usage all* % Disk space used by Cassandra at a given time.
os-disk-used all* GB Disk space used by Cassandra at a given time.
os-disk-utilization Linux, Windows % CPU time consumed by disk I/O.
os-disk-write-rate Linux, Windows -- Rate of writes per second to the disk.
os-disk-write-throughput Linux, Windows mb/sec Average disk throughput for write operations.
os-load all* -- Minimum, average, and maximum OS load expressed as an integer.
os-memory-avail Windows MB Available physical memory.
os-memory-buffers Linux MB Total system memory currently buffered.
os-memory-cached Linux MB Total system memory currently cached.
os-memory-committed Windows MB Memory in use by the operating system.
os-memory-free Linux, OSX MB Total system memory currently free.
os-memory-pool-nonpaged Windows MB Allocated pool-nonpaged memory.
os-memory-pool-paged Windows MB Allocated pool-paged-resident memory.
os-memory-sys-cache-resident Windows MB Memory used by the file cache.
os-memory-used Linux, OSX MB Total system memory currently used.
os-net-received all* kb/sec Speed of data received from the network.
os-net-sent all* kb/sec Speed of data sent across the network.
solr-avg-time-per-req all* -- Average time a search query takes in a DSE cluster using DSE search.
solr-errors all* -- Errors per second that occur for a specific Solr core/index.
solr-requests all* -- Requests per second made to a specific Solr core/index.
solr-timeouts all* -- Timeouts per second on a specific Solr core/index.
  • all means Linux, OSX, and Windows operating systems.