How to Setup and Monitor a Multi-Node Cassandra Cluster on Windows
Microsoft Windows is one of the most popular platforms available not only for development work, but also for production applications. While some believe Linux is the only safe option for database servers, there are lots of IT shops that have production database deployments of Windows and not just ones running Microsoft SQL Server.
If you’re using Windows and are interested in how to install and setup an Apache Cassandra cluster on your Microsoft platform, this article will help guide you through the process. For my example below, I’ll be using two Windows 2008 Server R2 boxes that I’ve provisioned on Rackspace. One other note: at this time, the Windows installer provided by DataStax does not officially support installing and monitoring multiple Cassandra nodes on Windows, so until further testing is done, keep in mind that various Windows use cases may cause hiccups in the procedures that follow.
Security and Browser Prerequisites
The only prerequisites that you’ll have to deal with prior to installation is ensuring that your firewall is setup and has ports open for Cassandra and the monitoring solution we’ll use, DataStax OpsCenter. You can pull up the Windows firewall interface by going to Start->Control Panel->Security->Windows Firewall:
The ports you need to open for Cassandra are 7000 and 9160. For OpsCenter you need to open 7199, 8888, 61620, and 61621. You may already have these and other ports open on your boxes, but if not, you’ll need to ensure these are open and available.
The only other thing you’ll need is either a Google Chrome or Firefox browser on any box that you want to use for managing and monitoring your Cassandra cluster with the DataStax OpsCenter browser-based user interface. Note that this doesn’t have to be on your server machines that have Cassandra installed, but can be on any client workstation or laptop.
Downloading and Installing the Software
DataStax makes a free bundled MSI installer available for Windows that you can download and use. The bundle includes the latest version of Apache Cassandra, all server utilities including the CQL (Cassandra Query Language) interface, and a free version of DataStax OpsCenter.
The installer has been primarily designed for a developer working on a single laptop or workstation, but it can be used to do a multi-node install as well; we’ll just need to make a few manual changes after installation.
Once you have the Windows download file, install the software on all the Windows machines you want to use for your new Cassandra cluster, taking the defaults, which should take only about a minute or so to complete.
Post Installation Tasks for the Database Cluster
Once you’ve installed the software, go to the Windows services panel and stop the services for the server, DataStax OpsCenter, and the DataStax OpsCenter agent:
One of the great features of Cassandra is that it automatically distributes data in a cluster evenly between all the machines via a hashing algorithm that’s applied to incoming data. All you have to do is assign what’s called a token to each machine in the cluster, which is a numerical identifier that determines the machine’s position in the cluster and the range of data that each machine is responsible for.
To generate the tokens for your new cluster, you can use the Python code that follows. Simply substitute the number of machines you intend to use for your new Windows cluster in the
"num=2" part of the code, open a command prompt, paste the code into the prompt, and hit enter. Note that the code below does need to be run all on one line:
python -c "num=2; print ""\n"".join([(""token %d: %d"" %(i,(i*(2**127)/num))) for i in range(0,num)])"
Copy your token numerical identifiers from the output because we’ll need them in the next step.
Now we’ll need to edit the Cassandra configuration file –
cassandra.yaml – on each Windows box and make some changes so that we’ll have a clustered database configuration vs. a standalone database on each machine. You can find the
cassandra.yaml file in the
[installation directory]\apache-cassandra\conf directory.
Open the file in notepad, and make the following changes:
initial_token: for each machine in your cluster, copy one of the tokens you generated and paste it in. Start with token 0 and assign a unique token to each box.
listen_address: make sure this parameter is set to the IP address (or server name if you’re using Windows name resolution) of the box
seeds: input the IP address or server name of the first node you want to use in your cluster. When a node first starts, it contacts a seed node to bootstrap Cassandra’s gossip communication process. The seed node designation has no purpose other than bootstrapping new nodes joining the cluster. Seed nodes are not a single point of failure and you can input multiple seed nodes if you’d like.
That’s it. Save the file.
Now go back to the Windows services panel on the node you’re using as your seed node and start the
DataStax_Cassandra_Community_Server service. Once that starts, do the same on all your other Windows machines that you’re using for your new cluster.
After a few seconds, you can check that your new Windows cluster is working properly by using the nodetool utility that you can find in the
[installation directory>]\apache-cassandra\bin directory. Just open a command prompt to that directory and enter:
nodetool –h localhost ring. The output should show all your machines that comprise your cluster:
Great – you’ve got your new Cassandra Windows cluster running. Now, let’s make a few changes to DataStax OpsCenter so you can monitor it.
Post Installation Tasks for DataStax OpsCenter
DataStax OpsCenter is a distributed management and monitoring solution for Cassandra. We’ve already installed the OpsCenter software on all your Windows boxes, but there are a few tweaks we’ll need to make to some configuration files to monitor your new cluster.
First, decide on which Windows machine you want to run the primary OpsCenter service. On that machine, using notepad (or whatever your favorite Windows text editor is) edit the
opscenter.conf file, which is in the
[installation directory]\opscenter\conf directory.
opscenter.conf file, you’ll find a section labeled
[webserver] and a parameter called
interface. Set the
interface parameter to
Then, locate the section labeled
[agents] and add the following line at the bottom of that section:
incoming_interface=[ IP address or name of the server].
Save the opscenter.conf file.
Next, edit the
address.yaml file, which is the OpsCenter agent file and is in the
[installation directory]\opscenter\agent\conf directory. Set the
stomp_interface and the
local_interface parameters to the IP address or machine name of the Windows machine. Save the file.
So you’ve got your primary OpsCenter machine set to go. Next, on each machine that’s part of your cluster, you’ll need to edit the OpsCenter agent
address.yaml file. Set the
stomp_interface to the IP address or machine name of the primary OpsCenter Windows box and the
local_interface parameters to the IP address or machine name of the Windows machine you’re on. Save the file.
Now, go to the Windows services panel on the primary OpsCenter Windows machine and start the
DataStax_OpsCenter_Agent services. Once those are started, then start the
DataStax_OpsCenter_Agent service on the other machines in your cluster.
You can now invoke the DataStax OpsCenter browser user interface by either using the Windows group item for OpsCenter on one of your just-installed machines (remember, you’ll need either Chrome or Firefox set as your default browser):
Or you can pull up OpsCenter manually on any Google Chrome or Firefox browser by typing in
http://[IP address or machine name of primary OpsCenter machine]:8888/opscenter/index.html into the browser’s address bar:
You’re done! You can now use other client utilities such as the CQL interface or CLI to create Cassandra keyspaces, column families, and insert/update/delete/query data.
To learn more about interfacing with and creating applications for Cassandra, or to download other software such as client drivers and libraries, visit the DataStax resources page or the downloads page of the DataStax website.