Apache Cassandra 0.6 Documentation

Getting Started

A distributed system like Cassandra is designed from the ground up to run on a cluster (or clusters) of nodes. For testing and evaluation purposes however, it is easiest to run on a single node.

Installation

The most recent stable version can be found in the Downloads section of the Cassandra website. Given that Cassandra is written in the Java programming language, a recent JVM is required to run Cassandra. (Java 1.6.0_22 works well, although any version after 1.6.0_19 should be fine.)

By default, Cassandra uses the following directories for data and commitlog storage:

/var/lib/cassandra
/var/log/cassandra

Make sure that both of these directories exist and are writeable by Cassandra, either by changing their ownership or permissions. In Linux, this can be done as follows:

sudo mkdir /var/lib/cassandra
sudo mkdir /var/log/cassandra
sudo chown -R $USER:$USER /var/lib/cassandra
sudo chown -R $USER:$USER /var/log/cassandra

This assumes the current user is the same one that will run Cassandra.

Starting a Single Node Cluster

Once you have extracted the file, starting Cassandra for the first time is pretty simple:

cd $CASSANDRA_HOME
sh bin/cassandra -f

By default, an instance of Cassandra has started and is listening on the ports described below:

Port Description Defined In
9160 Client traffic via the Thrift protocol storage-conf
7000 Cluster traffic via gossip storage-conf
8080 Port for monitoring attributes via JMX cassandra.in.sh

You can verify connectivity to your Cassandra instance with the nodetool command line utility:

~$ nodetool -h localhost -p 8080 ring
Address       Status     Load          Range                                      Ring
127.0.0.1     Up         495 bytes     95315431979199388464207182617231204396     |<--|

Starting a Multi Node Cluster

Working with a single Cassandra node is a good way to get a feel for the API, but to truly understand the functionality, operations, and performance characteristics, installing and running your own cluster is the best method.

For those with a background in administering large RDBMS systems, the term cluster carries a lot of baggage when considering installation and operation. In truth, the same features that provide Cassandra’s inherent scalability and fault tolerance actually help to make cluster configuration significantly easier. The install process for a multi node cluster is almost as direct as for the single node example above, but requires some minor edits to storage-config.xml on each node as described below.

Seed List

At least one node must be provided that will be the Seed for other hosts that will join the ring. There is no hard and fast rule about what hosts need to be listed as seeds, but all nodes need the same list of seeds. The Gossip protocol simply uses this list to disseminate ring topology. Edit storage-config.xml for each node and add the first node (10.203.55.185 in this example) as the seed in each.

<Seeds>
    <Seed>10.203.55.186</Seed>
<Seeds>

ListenAddress and ThriftAddress

The next change involves setting the interfaces on which your nodes will listen for client traffic via Thrift and inter-cluster traffic via Gossip. This is accomplished by changing the ThriftAddress and ListenAddress elements to interfaces that are routable from clients and other servers in the cluster, respectively.

Again, edit storage-config.xml on both nodes and replace the default localhost entries to specify the interfaces which will listen for traffic. For the first node:

<ListenAddress>10.203.55.186</ListenAddress>
...
<ThriftAddress>10.203.55.186</ThriftAddress>

and for the second node (10.205.2.67 for this example):

<ListenAddress>10.205.2.67</ListenAddress>
...
<ThriftAddress>10.205.2.67</ThriftAddress>

What Happens Next

Start a seed node, and verify connectivity with nodetool ring as in the single node example above. Now start the remaining nodes. After a few minutes of pauses to exchange data (you can follow the progress on the second node via the system log located by default in /var/log/cassandra/system.log), running nodetool ring again should give you something like the following (the example here shows two nodes):

~$ nodetool -h localhost -p 8080 ring
Address       Status     Load          Range                                      Ring
                                       95315431979199388464207182617231204396
10.205.2.67   Up         495 bytes     61078635599166706937511052402724559481     |<--|
10.203.55.186 Up         1.24 KB       95315431979199388464207182617231204396     |-->|

Congratulations, you now have a multi node Cassandra cluster.