Apache Cassandra 0.8 Documentation

Configuring and Initializing a Brisk Cluster

Before you can start Brisk, be it on a single or multi-node cluster, there are a few Cassandra configuration properties you must set on each node in the cluster. These are set in the cassandra.yaml file (located in /etc/brisk/cassandra in packaged installations or $BRISK_HOME/resources/cassandra/conf in binary distributions).

Initializing a Single-Node Brisk Cluster (for evaluation purposes)

Brisk is intended to be run on multiple nodes, however you may want to start with a single node Brisk cluster for evaluation purposes. To start Brisk on a single node:

  1. Set the following properties in the cassandra.yaml file:

    cluster_name: 'BriskTest'
    initial_token: 0
    
  2. Start Brisk.

    brisk cassandra -t
    

The -t option starts Cassandra (with CassandraFS) and the Hadoop Job Tracker and Task Tracker services. Because there is no Hadoop NameNode with CassandraFS, there is no additional configuration to run MapReduce jobs in single mode versus distributed mode.

When running on a single node, there are no additional steps to configure the Cassandra seed node and Brisk job tracker node, as they are automatically set to localhost.

Initializing a Multi-Node Brisk Cluster

Before you start a multi-node Brisk cluster you must determine the following:

  • A name for your cluster
  • How many total nodes your Brisk cluster will have
  • The IP addresses of each node
  • The token for each node (see Generating Tokens). If you are deploying a mixed-workload Brisk Cluster, make sure to alternate token assignments between Cassandra nodes and Brisk nodes so that replicas are evenly distributed around the Cassandra ring.
  • Which nodes will serve as the seed nodes. If you are configuring a mixed-workload cluster, you should have at least one seed node for each side (the Cassandra real-time side and the Brisk analytics side).
  • If you intend to run a mixed-workload cluster determine which nodes will serve which purpose.

For example, suppose you are starting a 6 node mixed-workload cluster with 3 Brisk nodes and 3 Cassandra nodes. The nodes have the following IPs:

  • node0 (Cassandra seed) 110.82.155.0
  • node1 (Cassandra) 110.82.155.1
  • node2 (Cassandra) 110.82.155.2
  • node3 (Brisk seed) 110.82.155.3
  • node4 (Brisk) 110.82.155.4
  • node5 (Brisk) 110.82.155.5

The cassandra.yaml file for each node would have the following modified property settings. Note that in a mixed-workload cluster, the token placement alternates between Cassandra and Brisk nodes. This ensures even distribution of replicas on both sides of the cluster. For example:

  • node 0: 0
  • node 3: 28356863910078205288614550619314017621
  • node 1: 56713727820156410577229101238628035242
  • node 4: 85070591730234615865843651857942052864
  • node 2: 113427455640312821154458202477256070485
  • node 5: 141784319550391026443072753096570088106

Node0

cluster_name: 'BriskTest'
initial_token: 0
seed_provider:
       - seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.0
rpc_address: 0.0.0.0

Node1

cluster_name: 'BriskTest'
initial_token: 56713727820156410577229101238628035242
seed_provider:
       - seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.1
rpc_address: 0.0.0.0

Node2

cluster_name: 'BriskTest'
initial_token: 113427455640312821154458202477256070485
seed_provider:
       - seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.2
rpc_address: 0.0.0.0

Node3

cluster_name: 'BriskTest'
initial_token: 28356863910078205288614550619314017621
seed_provider:
       - seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.3
rpc_address: 0.0.0.0

Node4

cluster_name: 'BriskTest'
initial_token: 85070591730234615865843651857942052864
seed_provider:
       - seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.4
rpc_address: 0.0.0.0

Node5

cluster_name: 'BriskTest'
initial_token: 141784319550391026443072753096570088106
seed_provider:
       - seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.5
rpc_address: 0.0.0.0

Generating Tokens

Tokens are used to assign a range of data to a particular node. Assuming you are using the RandomPartitioner, this approach will ensure even data distribution.

  1. Create a new file for your token generator program:

    vi tokengentool
    
  2. Paste the following Python program into this file:

    #! /usr/bin/python
    import sys
    if (len(sys.argv) > 1):
        num=int(sys.argv[1])
    else:
        num=int(raw_input("How many nodes are in your cluster? "))
    for i in range(0, num):
        print 'node %d: %d' % (i, (i*(2**127)/num))
    
  3. Save and close the file and make it executable:

    chmod +x tokengentool
    
  4. Run the script:

    ./tokengentool
    
  5. When prompted, enter the total number of nodes in your cluster:

    How many nodes are in your cluster? 6
    node 0: 0
    node 1: 28356863910078205288614550619314017621
    node 2: 56713727820156410577229101238628035242
    node 3: 85070591730234615865843651857942052864
    node 4: 113427455640312821154458202477256070485
    node 5: 141784319550391026443072753096570088106
    
  6. On each node, edit the cassandra.yaml file and enter its corresponding token value in the initial_token property.

Starting a Brisk Cluster

After you have installed and configured Brisk on one or more nodes, you are ready to start your Brisk cluster. If you want to run a multi-node Brisk cluster, you must first install the Brisk packages on each node, and then configure each node according to the instructions in Initializing a Brisk Cluster.

Packaged installations include startup scripts for running Brisk as a service. Binary packages do not.

Starting Brisk as a Stand-Alone Process

If running a mixed workload cluster, determine which nodes to start as Cassandra nodes and which nodes to start as Brisk nodes. To start Brisk as a service see Starting Brisk as a Service. Otherwise, you can start the Brisk server process as follows:

On a Brisk node:

brisk cassandra -t

On a Cassandra node:

brisk cassandra

Starting Brisk as a Service

Packaged installations provide startup scripts in /etc/init.d for starting Brisk as a service. Before starting Brisk as a service on a node, you must first configure the Cassandra service to start the Hadoop Job Tracker and Task Tracker services as well.

Note

For mixed-workload clusters, nodes that are Cassandra-only can simply start the Cassandra service (skip step 1).

  1. Create the file /etc/default/brisk, and add the following line as the contents of this file:

    HADOOP_ENABLED=1
    
  2. Start the Brisk service:

    sudo service brisk start
    

Note

On Enterprise Linux systems, the Brisk service runs as a java process. On Debian systems, the Brisk service runs as a jsvc process.

Powered by Rackspace
Apache, Apache Cassandra, Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation.