TechnologyMay 27, 2014

Quick and Easy Cluster Metadata Server using Python and Pyro4

Michael Allen
Michael Allen
Quick and Easy Cluster Metadata Server using Python and Pyro4

Introduction

In the Test Engineering department here at DataStax, we use Python quite a bit in our test infrastructure. So it's natural I suppose that when working in a clustered environment, as we often do here, we run into the desire to get shared cluster metadata directly from Python. There are quite a few ways in which this could be accomplished, but wouldn't it be nice if you could create a single Python metadata object at a central location and access it easily from any node of the cluster(in native Python code)? Well, that's exactly what we are going to see how to do here in this short post, using Python and Pyro4.

Pyro stands for Python Remote Objects and not surprisingly it allows you to use a remote Python object as if it were a local object: some Python code running on machine A is natively callable by some other Python code running on machine B. It turns out that using Pyro4 is a great way to get a quick and easy cluster metadata server up and running that all the nodes of the cluster can use.

In this post, we'll use a simple 4 node Ubuntu EC2 cluster that has been previously created.

Installing Pyro4

To get Pyro4 installed and ready to use, run the following on each node of your cluster.

apt-get update
apt-get install python-virtualenv
virtualenv metadata-env
. metadata-env/bin/activate
pip install pyro4

That is all you need to do in terms of OS level setup.

Cluster Configuration:

------------------------------------
node 0: nameserver & metadata server (10.196.16.34)
------------------------------------
   /        \          \
------    ------    ------
node 1    node 2    node 3
------    ------    ------

One node of the cluster will be used to run the Pyro4 nameserver and metadata server. The nameserver is a built-in Pyro4 feature and allows us to register and lookup arbitrary Python objects from any machine in the cluster.  The metadata server is a Python module that is run daemonized and contains code to register the metadata object with the nameserver as well as the code for the actual metadata object itself.

Starting the nameserver

First thing we need to do is start the nameserver so that other code can register and lookup objects with it:

python -m Pyro4.naming -n 10.196.16.34 &

With the nameserver up and running, we can now use it to register our metadata object. So next up let's take a look at that.

Metadata Server Implementation

# metadata.py

import random
import Pyro4

class Metadata(object):

    def __init__(self, name, site_id, role='enterprise'):
        self._cluster_name = name
        self._site_id = site_id
        self._cluster_role = role

    def _node_is_disabled(self, node_id):
        # lookup node disabled status.
        return random.choice([True, False])

    def cluster_name(self):
        return self._cluster_name

    def site_id(self):
        return self._site_id

    def cluster_role(self):
        return self._cluster_role

    def am_i_disabled(self, node_id):
        return self._node_is_disabled(node_id)

    #... Additional operations as needed by your specific situation.

def main():
    host = '10.196.16.34'
    ns = Pyro4.locateNS(host=host)
    metadata = Metadata()
    daemon = Pyro4.Daemon()
    metadata_uri = daemon.register(metadata)
    ns.register('shared_metadata:{host}'
                .format(host=host))
    daemon.requestLoop()

if __name__ == '__main__':
    main()

The Metadata class is used to create a metadata object that we will make available to other nodes of the cluster. To accomplish this, we first create an instance of it and then register it with the nameserver. After it has been registered, other nodes of the cluster can get a handle to it and invoke operations on it. They will do this by looking the object up directly in the name server. Pyro4 will handle the necessary serialization and network operations for us behind the scenes, essentially providing a proxy object (locally) that uses the remote object for any of its available operations.

Starting and using the metadata server

To start the metadata server, we run the following on node 0:

python metadata.py 2>&1 >metadata.log &

With the server up and running we can now easily access our metadata object on other nodes of the cluster:

On node 1:

# foo_script.py

    import Pyro4

    def business_as_usual():
        print 'doing business'

    def stop_doing_stuff():
        print "I have been disabled so can't continue working"

    metadata = Pyro4.Proxy('PYRONAME:metadata')
    cluster_name = metadata.cluster_name()
    if not metadata.am_i_disabled('node1'):
        business_as_usual()
    else:
        stop_doing_stuff()

This is of course a contrived example, but it does illustrate how easy it is to use a remote Python object, which is the main point here.

Accessing the metadata object is the same on other nodes too. Every node of the cluster can access the remote metadata object, as long as it is in the same network. Here we are using EC2 in a single region so every node in this cluster can access the metadata object in this way with no additional steps required.

Final thoughts

I've found this particular mechanism of cluster metadata serving to be quite useful in my day-to-day work here at DataStax. It's incredibly fast to standup and even easier to use. It's simple to maintain since the code logic is all in a central location, and it's trivial to add new operations: update the implementation of your metadata class, restart the metadata server and the other nodes of the cluster will be able to access the new operations right away. As is normally the case, there are many ways to accomplish any given task, and so it is with this too. This is just one more tool to put in your cluster automation toolbox. Take it out and use it when/where it makes sense.

Happy coding!

Discover more
Python
Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.