DataStax OpsCenter Documentation

Managing Events and Alerts

Using these methods, you can get information about log events, such as node compactions and repairs triggered through OpsCenter, and configure alert thresholds for a number of Cassandra metrics.

Event and Alert Methods URL
Retrieve OpsCenter events. GET /{cluster_id}/events
Alert Methods  
Retrieve configured alert rules. GET /{cluster_id}/alert-rules
Retrieve a specific alert rule. GET /{cluster_id}/alert-rules/{alert_id}
Create a new alert rule. POST /{cluster_id}/alert-rules/
Update an alert rule. PUT /{cluster_id}/alert-rules/{alert_id}
Delete an alert rule. DELETE /{cluster_id}/alert-rules/{alert_id}
Retrieve active alerts. GET /{cluster_id}/alerts/fired

Event Methods

GET /{cluster_id}/events

Retrieve historical events logged by OpsCenter.

Path arguments:

cluster_id -- A Cluster Config ID.

Query params:
  • count -- The number of events to return. Defaults to 10.
  • timestamp -- A timestamp specifying the point in time to start retrieving events. Specified as a unix timestamp in microseconds. Defaults to the current time.
  • reverse -- A boolean (0 or 1) indicating whether to retrieve events in reverse order. Defaults to 1 (true). Events are retrieved starting from the time specified by the timestamp and going backward in time until 'count' events are found or there are no more events to retrieve.

Returns a list of dictionaries where each dictionary represents an event. An event dictionary contains properties describing that event.

Example

curl http://127.0.0.1:8888/Test_Cluster/events?count=1

Output:

{
  "action": 28,
  "api_source_ip": 192.168.1.12,
  "event_source": "OpsCenter",
  "level": 1,
  "level_str": "INFO",
  "message": "Restarting node 192.168.100.3",
  "source_node": 192.168.100.3,
  "success": null,
  "target_node": null,
  "time": "1334768517145625",
  "user": joe
}

Alert Methods

GET /{cluster_id}/alert-rules

Retrieve a list of configured alert rules in OpsCenter.

Path arguments:cluster_id -- A Cluster Config ID.

Returns a list of AlertRule objects.

AlertRule
{
  "id": <value>,
  "type": <value>,
  "threshold": <value>,
  "comparator": <value>,
  "duration": <value>,
  "notify_interval": <value>,
  "enabled": <value>,
  "metric": <value>,
  "cf": <value>,
  "item": <value>,
  "dc": <value>
}

This table describes the property values of an AlertRule object:

Property Type Description of Values
id String A unique ID that references an alert rule. Use only for retrieving alert rules.
type String The event or metric aggregation that triggers an alert. Accepted values include rolling-avg, cluster-balance, and node-down. This field is not editable.
threshold Float The metric boundary that triggers an alert when the threshold is crossed. Applicable only when the type is rolling-avg.
comparator String Optional. Values are < or >.
duration Int How long (in minutes) the problem continues before firing the alert.
notify_interval Int How often (in minutes) to repeat the alert. Use 0 for a single notification.
enabled Int The state of the alert. Values are 0 (disabled) or 1 (enabled).
metric String A key from list of metrics. This field is only valid if the type is rolling-avg.
cf String Optional. The column family to monitor if the metric property is one of the Column Family Metrics Keys.
item String Optional. The device to monitor if the metric is one of the Operating System Metrics Keys.
dc String Optional. The name of the data center that contains nodes to be monitored. If omitted, all nodes will be monitored.

Example

curl http://127.0.0.1:8888/Test_Cluster/alert-rules

Output:

[
  {
    "comparator": ">",
    "dc": "us-east",
    "duration": 1.0,
    "enabled": 1,
    "id": "e0c356c7-62ff-4aa8-9b17-e305f101b69a",
    "metric": "write-latency",
    "notify_interval": 1.0,
    "threshold": 10000.0,
    "type": "rolling-avg"
  },
  ...
]
GET /{cluster_id}/alert-rules/{alert_id}

Retrieve a specific alert rule.

Path arguments:

Returns an AlertRule.

Example

curl http://127.0.0.1:8888/Test_Cluster/alert-rules/e0c356c7-62ff-4aa8-9b17-e305f101b69a

Output:

{
  "comparator": ">",
  "dc": "us-east",
  "duration": 1.0,
  "enabled": 1,
  "id": "e0c356c7-62ff-4aa8-9b17-e305f101b69a",
  "metric": "write-latency",
  "notify_interval": 1.0,
  "threshold": 10000.0,
  "type": "rolling-avg"
}
POST /{cluster_id}/alert-rules

Create a new alert rule.

Path arguments:cluster_id -- A Cluster Config ID.
Body :A dictionary in the format of AlertRule describing the alert to create.
Responses:201 -- Alert rule was created successfully

Returns the ID of the newly created alert.

Example:

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/alert-rules
  -d '{
    "comparator": ">",
    "dc": "",
    "duration": 60.0,
    "enabled": 1,
    "metric": "heap-used",
    "notify_interval": 5.0,
    "threshold": 6291456000.0,
    "type": "rolling-avg"
  }'

Output:

"b375fd3e-3908-4be5-ae37-d8f3b8699a9f"
PUT /{cluster_id}/alert-rules/{alert_id}

Update an existing alert rule.

Path arguments:
Body :

A dictionary of fields from AlertRule to update.

Responses:

200 -- Alert rule updated successfully

Example:

curl -X PUT
  http://127.0.0.1:8888/Test_Cluster/alert-rules/b375fd3e-3908-4be5-ae37-d8f3b8699a9f
  -d '{"duration": 120.0}'
DELETE /{cluster_id}/alert-rules/{alert_id}

Delete an existing alert rule.

Path arguments:
Responses:

200 -- Alert rule removed successfully

Example:

curl -X DELETE
  http://127.0.0.1:8888/Test_Cluster/alert-rules/b375fd3e-3908-4be5-ae37-d8f3b8699a9f
GET /{cluster_id}/alerts/fired

Get all alerts which are currently fired.

Path arguments:cluster_id -- A Cluster Config ID.

Returns a list of alerts that have been triggered. Each item in the list is a dictionary describing the triggered alert.

Example:

curl http://127.0.0.1:8888/Test_Cluster/alerts/fired

Output:

[
  {
    "alert_rule_id": "ca4cf071-03bd-486a-a8be-428e6cd7218a",
    "current_value": 31676303.333333332,
    "first_fired": 1336669233,
    "node": "10.11.12.150"
  },
  {
    "alert_rule_id": "ca4cf071-03bd-486a-a8be-428e6cd7218a",
    "current_value": 28380117.5,
    "first_fired": 1336669233,
    "node": "10.11.12.152"
  }
]