Before you can start Brisk, be it on a single or multi-node cluster, there are a few Cassandra configuration properties you must set on each node in the cluster. These are set in the cassandra.yaml file (located in /etc/brisk/cassandra in packaged installations or $BRISK_HOME/resources/cassandra/conf in binary distributions).
Brisk is intended to be run on multiple nodes, however you may want to start with a single node Brisk cluster for evaluation purposes. To start Brisk on a single node:
Set the following properties in the cassandra.yaml file:
cluster_name: 'BriskTest'
initial_token: 0
Start Brisk.
brisk cassandra -t
The -t option starts Cassandra (with CassandraFS) and the Hadoop Job Tracker and Task Tracker services. Because there is no Hadoop NameNode with CassandraFS, there is no additional configuration to run MapReduce jobs in single mode versus distributed mode.
When running on a single node, there are no additional steps to configure the Cassandra seed node and Brisk job tracker node, as they are automatically set to localhost.
Before you start a multi-node Brisk cluster you must determine the following:
For example, suppose you are starting a 6 node mixed-workload cluster with 3 Brisk nodes and 3 Cassandra nodes. The nodes have the following IPs:
The cassandra.yaml file for each node would have the following modified property settings. Note that in a mixed-workload cluster, the token placement alternates between Cassandra and Brisk nodes. This ensures even distribution of replicas on both sides of the cluster. For example:
Node0
cluster_name: 'BriskTest'
initial_token: 0
seed_provider:
- seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.0
rpc_address: 0.0.0.0
Node1
cluster_name: 'BriskTest'
initial_token: 56713727820156410577229101238628035242
seed_provider:
- seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.1
rpc_address: 0.0.0.0
Node2
cluster_name: 'BriskTest'
initial_token: 113427455640312821154458202477256070485
seed_provider:
- seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.2
rpc_address: 0.0.0.0
Node3
cluster_name: 'BriskTest'
initial_token: 28356863910078205288614550619314017621
seed_provider:
- seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.3
rpc_address: 0.0.0.0
Node4
cluster_name: 'BriskTest'
initial_token: 85070591730234615865843651857942052864
seed_provider:
- seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.4
rpc_address: 0.0.0.0
Node5
cluster_name: 'BriskTest'
initial_token: 141784319550391026443072753096570088106
seed_provider:
- seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.5
rpc_address: 0.0.0.0
Tokens are used to assign a range of data to a particular node. Assuming you are using the RandomPartitioner, this approach will ensure even data distribution.
Create a new file for your token generator program:
vi tokengentool
Paste the following Python program into this file:
#! /usr/bin/python
import sys
if (len(sys.argv) > 1):
num=int(sys.argv[1])
else:
num=int(raw_input("How many nodes are in your cluster? "))
for i in range(0, num):
print 'node %d: %d' % (i, (i*(2**127)/num))
Save and close the file and make it executable:
chmod +x tokengentool
Run the script:
./tokengentool
When prompted, enter the total number of nodes in your cluster:
How many nodes are in your cluster? 6
node 0: 0
node 1: 28356863910078205288614550619314017621
node 2: 56713727820156410577229101238628035242
node 3: 85070591730234615865843651857942052864
node 4: 113427455640312821154458202477256070485
node 5: 141784319550391026443072753096570088106
On each node, edit the cassandra.yaml file and enter its corresponding token value in the initial_token property.
After you have installed and configured Brisk on one or more nodes, you are ready to start your Brisk cluster. If you want to run a multi-node Brisk cluster, you must first install the Brisk packages on each node, and then configure each node according to the instructions in Initializing a Brisk Cluster.
Packaged installations include startup scripts for running Brisk as a service. Binary packages do not.
If running a mixed workload cluster, determine which nodes to start as Cassandra nodes and which nodes to start as Brisk nodes. To start Brisk as a service see Starting Brisk as a Service. Otherwise, you can start the Brisk server process as follows:
On a Brisk node:
brisk cassandra -t
On a Cassandra node:
brisk cassandra
Packaged installations provide startup scripts in /etc/init.d for starting Brisk as a service. Before starting Brisk as a service on a node, you must first configure the Cassandra service to start the Hadoop Job Tracker and Task Tracker services as well.
Note
For mixed-workload clusters, nodes that are Cassandra-only can simply start the Cassandra service (skip step 1).
Create the file /etc/default/brisk, and add the following line as the contents of this file:
HADOOP_ENABLED=1
Start the Brisk service:
sudo service brisk start
Note
On Enterprise Linux systems, the Brisk service runs as a java process. On Debian systems, the Brisk service runs as a jsvc process.