TechnologyAugust 22, 2012

Integrating Alerts via the OpsCenter REST API

Mike Bulman
Mike Bulman
Integrating Alerts via the OpsCenter REST API

The Alerts feature in DataStax OpsCenter allows users to be notified when specific metrics have exceeded configurable thresholds. While OpsCenter ships with two push notification plugins email and custom url), we've heard from some users that they would rather fetch any active alerts in a custom script in order to integrate with systems they already have in place. In this post, I'm going to walk you through retrieving this information using Python via the OpsCenter REST API.

Note: These steps assume you have already setup alert rules in OpsCenter, and are working with a cluster called MyCluster.

Getting Started

The first thing we need is a way to talk to the API.

import urllib, json
def getUrl(path):
    url = 'http://localhost:8888/%s' % path
    contents = urllib.urlopen(url).read()
    return json.loads(contents)

This gets any url from the API and parses the JSON that's returned into a native Python type. Here's a simple example of using this to get the load for a single node:

load = getUrl('MyCluster/nodes/127.0.0.1/load')
print load

 

# Prints: 0.32

Getting Alerts Information

Now we're ready to actually find out what alerts are currently fired (ie, alerts that have passed their thresholds).API Reference

fired_alerts = getUrl('MyCluster/alerts/fired')

This will give us a list of dictionaries that look like this:

{
    "alert_rule_id": "80fc5e0a-355a-4f8f-86ef-5de8fd8fb5bb",
    "node": "127.0.0.1",
    "current_value": 1234,
    "first_fired": 1345575248
}

That gives us some information, but obviously we still don't have the full picture about what's going on, or what the "current_value" property really means. The next step is to get our list of configured alert rules, so we can get more information via the "alert_rule_id" property.API Reference

alert_rules = getUrl('MyCluster/alert-rules')
rules_map = dict((rule['id'], rule) for rule in alert_rules)

This retrieves a list of all configured alert rules, and converts it to a dictionary with the alert rule id as the key, for easy lookup. See the AlertRule definition for a full reference of properties in a single rule.

Putting it all together

Now that we have a list of fired alerts and the rules that they reference, it's time to connect the dots.

for alert in fired_alerts:
    rule = rules_map.get(alert['alert_rule_id'])
    doSomething(alert, rule)

For this example, I've defined doSomething as a function that simply builds a human readable message and prints it -- but this is where you could use the same data to integrate OpsCenter alerts into your existing systems via their APIs.

def doSomething(alert, rule):                                                                       
    if rule['type'] == 'rolling-avg':                                                               
        msg = "%s on node %s is at %.2f" %
                        (rule['metric'], alert['node'], alert['current_value'])
    elif rule['type'] == 'node-down':
        msg = "Node %s is down" % alert['node']

 

    datetime_str = time.strftime("%m/%d/%Y %H:%M:%S %Z",
                                 time.localtime(alert['first_fired']))
    msg += " (since %s)" % datetime_str
    print msg

One thing to notice here is that there are two types of alert rules: rolling-avg and node-down. All rules except for the one that checks whether or not a node is down are of type rolling-avg, which just means it's comparing the average value of a given metric to the configured threshold.

Wrap Up

And that's all there is to it. If you'd like to see all of this code put together, as well as some added functionality for looping through all of your clusters, check out fetch-opscenter-alerts.py.

We would love to hear how you're using the OpsCenter API, or would like to use it, so don't hesitate to drop us a message in the comments.

Discover more
OpsCenter
Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.