Before you can start DataStax Enterprise (DSE) on either a single or multi-node cluster, there are a few Cassandra configuration properties you must set on each node in the cluster. You set these properties in the cassandra.yaml file (located in /etc/dse/cassandra in packaged installations or <install_location>/resources/cassandra/conf in binary distributions).
Note
These instructions apply only to single data center clusters. For information about configuring clusters with multiple data centers, see Configuring Multiple Data Centers Quick Start.
In DataStax Enterprise, the term data center is a grouping of nodes. You should configure these data centers by type of node: Cassandra, Analytics, and Search.
Before starting a multi-node DSE cluster, you must determine the following:
To determine tokens assignments:
For example, suppose you are starting a 8 node mixed-workload cluster with 3 Analytics nodes, 3 Cassandra nodes, and 2 Search nodes. The nodes have the following IPs:
To assign tokens in a multi data-center cluster, you generate tokens for the nodes in one data center, and then offset those token numbers by 1 for all nodes in the next data center, by 2 for the nodes in the next data center, and so on (larger increments are allowed, such as 10 or 50).
Because the number of nodes are not the same in each data center, you need to run the Token Generating Tool twice. The first run generates the tokens for the Cassandra data center. The second run generates tokens for the Search data center. For the Analytics data center, you offset the tokens generated by the first run. In this example, the tokens are incremented by 10. For the Solr data center, you use the tokens generated by the tool and then increment the first Solr node by 20.
| Node | Token | Offset | Type |
|---|---|---|---|
| Token Generation - First Run | |||
| node 0 | 0 | Na | Cassandra seed |
| node 1 | 56713727820156410577229101238628035242 | NA | Cassandra |
| node 2 | 113427455640312821154458202477256070485 | NA | Cassandra |
| node 3 | 10 | 10 | Analytics seed |
| node 4 | 56713727820156410577229101238628035252 | 10 | Analytics |
| node 5 | 113427455640312821154458202477256070495 | 10 | Analytics |
| Token Generation - Second Run | |||
| node 6 | 20 (offset twice) | 20 | Search |
| node 7 | 85070591730234615865843651857942052864 | 10 | Search |
Since this is a mixed-workload cluster, the token placement alternates between Cassandra, Analytics, and Search nodes. This ensures even distribution of replicas on both sides of the cluster. The cassandra.yaml file for each node has the following modified property settings.
Node0
cluster_name: 'DSECluster'
initial_token: 0
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "110.82.155.3,110.82.155.0"
listen_address: 110.82.155.0
rpc_address: 0.0.0.0
Node1
cluster_name: 'DSECluster'
initial_token: 56713727820156410577229101238628035242
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "110.82.155.3,110.82.155.0"
listen_address: 110.82.155.1
rpc_address: 0.0.0.0
Node2
cluster_name: 'DSECluster'
initial_token: 113427455640312821154458202477256070485
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "110.82.155.3,110.82.155.0"
listen_address: 110.82.155.2
rpc_address: 0.0.0.0
Node3
cluster_name: 'DSECluster'
initial_token: 10
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "110.82.155.3,110.82.155.0"
listen_address: 110.82.155.3
rpc_address: 0.0.0.0
Node4
cluster_name: 'DSECluster'
initial_token: 56713727820156410577229101238628035252
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "110.82.155.3,110.82.155.0"
listen_address: 110.82.155.4
rpc_address: 0.0.0.0
Node5
cluster_name: 'DSECluster'
initial_token: 113427455640312821154458202477256070495
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "110.82.155.3,110.82.155.0"
listen_address: 110.82.155.5
rpc_address: 0.0.0.0
Node6
cluster_name: 'DSECluster'
initial_token: 85070591730234615865843651857942052864
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "110.82.155.3,110.82.155.0"
listen_address: 110.82.155.6
rpc_address: 0.0.0.0
Node7
cluster_name: 'DSECluster'
initial_token: 85070591730234615865843651857942052874
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "110.82.155.3,110.82.155.0"
listen_address: 110.82.155.7
rpc_address: 0.0.0.0
Tokens are used to assign a range of data to a particular node within a data center. Assuming you are using the RandomPartitioner, this approach ensures even data distribution. For a multi data-center cluster, generate the tokens for the nodes in one data center, and then offset those token numbers by 1 for all nodes in the next data center, by 2 for the nodes in the next data center, and so on. (Instead of using single digits, you might want to offset the token number by a larger value, such as 10 or 50.)
Note
The following steps illustrate token generation for the above example.
To create tokens:
Create a new file for your token generator program:
vi tokengentool
Paste the following Python program into this file:
#! /usr/bin/python
import sys
if (len(sys.argv) > 1):
num=int(sys.argv[1])
else:
num=int(raw_input("How many nodes are in your cluster? "))
for i in range(0, num):
print 'node %d: %d' % (i, (i*(2**127)/num))
Save and close the file and make it executable:
chmod +x tokengentool
Run the script:
./tokengentool
When prompted, enter the total number of nodes in your Cassandra data center:
How many nodes are in your cluster? 3
node 0: 0
node 1: 56713727820156410577229101238628035242
node 2: 113427455640312821154458202477256070485
Run the tool again for two nodes (Solr data center):
How many nodes are in your cluster? 2
node 0: 0
node 1: 85070591730234615865843651857942052864
If you have a firewall running on the nodes in your Cassandra or DataStax Enterprise cluster, you must open up the following ports to allow communication between the nodes, including certain Cassandra ports. If this isn't done, when you start Cassandra (or Hadoop in DataStax Enterprise) on a node, the node will act as a standalone database server rather than joining the database cluster.
| Port | Description |
|---|---|
| Public Facing Ports | |
| 22 | SSH (default) |
| DataStax Enterprise Specific | |
| 8012 | Hadoop Job Tracker client port |
| 8983 | Solr port and Demo applications website port (Portfolio, Search, Search log) |
| 50030 | Hadoop Job Tracker website port |
| 50060 | Hadoop Task Tracker website port |
| OpsCenter Specific | |
| 8888 | OpsCenter website port |
| Intranode Ports | |
| Cassandra Specific | |
| 1024+ | JMX reconnection/loopback ports |
| 7000 | Cassandra intra-node port |
| 7199 | Cassandra JMX monitoring port |
| 9160 | Cassandra client port |
| DataStax Enterprise Specific | |
| 9290 | Hadoop Job Tracker Thrift port |
| OpsCenter Specific | |
| 50031 | OpsCenter HTTP proxy for Job Tracker |
| 61620 | OpsCenter intra-node monitoring port |
| 61621 | OpsCenter agent ports |
After you have installed and configured DSE on one or more nodes, you are ready to start your cluster starting with the seed nodes. In a mixed-workload DSE cluster, you must start the Analytics seed node first.
Packaged installations include startup scripts for running DSE as a service. Binary packages do not.
Note
When Cassandra loads, you may notice a message that MX4J will not load and that mx4j-tools.jar is not in the classpath. You can ignore this message. MX4j provides an HTML and HTTP interface to JMX and is not necessary to run Cassandra. DataStax recommends using OpsCenter It has more monitoring capabilities than MX4J.
If running a mixed-workload cluster, determine which nodes to start as Analytics, Cassandra, and Search nodes. Begin with the seed nodes first - Analytics seed node, followed by the Cassandra seed node - then start the remaining nodes in the cluster one at a time. For additional information, see Configuring Multiple Data Centers Quick Start.
To start DataStax Enterprise as a stand-alone process:
Analytics node: dse cassandra -t
Cassandra node: dse cassandra
Solr node: dse cassandra -s
To check that your ring is up and running (from the install directory):
$ bin/nodetool ring -h localhost
Packaged installations provide startup scripts in /etc/init.d for starting DSE as a service.
For mixed-workload clusters, nodes that are Cassandra-only can simply start the DSE service (skip step 1).
To start DataStax Enterprise as a service:
Create the /etc/default/dse file, and then add the appropriate line to this file, depending on the type of node you want:
Note
Using the SOLR_ENABLED and HADOOP_ENABLED options together to enable both search and Hadoop analytics on the same node is only recommended for development. In production environments each node should be used only for one or the other.
Start the DSE service:
sudo service dse start
To check if your cluster is up and running:
nodetool ring -h localhost
On RHEL and CentOS, the DSE service runs as a java process. On Debian systems, the DSE service runs as a jsvc process.