DataStax Enterprise 3.0 Documentation

Single data center deployment

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

In this scenario, data replication is distributed across a single data center in mixed workload clusters. For example, if the cluster has 3 Hadoop nodes, 3 Cassandra nodes, and 2 Solr nodes, the cluster has 3 data centers: one for each type of node. A multiple data center cluster has more than one data center for each type of node.

Data replicates across the data centers automatically and transparently - no ETL work is necessary to move data between different systems or servers. You can configure the number of copies of the data in each data center and Cassandra handles the rest, replicating the data for you. To configure a multiple data center cluster, see Multiple data center deployment.

Prerequisites

To correctly configure a multi-node cluster, requires the following:

  • DataStax Enterprise is installed on each node.
  • The total number of nodes in the cluster.
  • A name for the cluster.
  • The IP addresses of each node in the cluster.
  • For a mixed-workload cluster, the purpose of each node.
  • Which nodes will serve as the seed nodes. (Cassandra nodes use this host list to find each other and learn the topology of the ring.)
  • If the nodes are behind a firewall, make sure you know what ports you need to open. See Configuring firewall port access.
  • Other configuration settings you may need are described in Choosing Node Configuration Options and Node and Cluster Configuration.

This information is used to configure Node and Cluster Initialization Properties in the cassandra.yaml configuration file on each node in the cluster. Each node should be correctly configured before starting up the cluster.

Configuration example

This example describes installing a six node cluster spanning two racks in a single data center.

Location of the property file:

You set properties for each node in the cassandra.yaml file. This file is located in different places depending on the type of installation:

  • Packaged installations: /etc/dse/cassandra/cassandra.yaml
  • Binary installations: <install_location>/resources/cassandra/conf/cassandra.yaml

Note

After changing properties in the cassandra.yaml file, you must restart the node for the changes to take effect.

To configure a mixed-workload cluster:

  1. The nodes have the following IPs, and one node per rack will serve as a seed:

    • node0 110.82.155.0 (Cassandra seed)
    • node1 110.82.155.1 (Cassandra)
    • node2 110.82.155.2 (Cassandra)
    • node3 110.82.155.3 (Analytics seed)
    • node4 110.82.155.4 (Analytics)
    • node5 110.82.155.5 (Analytics)
    • node6 110.82.155.6 (Search - seed nodes are not required for Solr.)
    • node7 110.82.155.7 (Search)
  2. Calculate the token assignments using the Token Generating Tool for a single data center.

    Node Token
    node0 0
    node1 21267647932558653966460912964485513216
    node2 42535295865117307932921825928971026432
    node3 63802943797675961899382738893456539648
    node4 85070591730234615865843651857942052864
    node5 106338239662793269832304564822427566080
    node6 12760588759535192379876547778691307929
    node7 148873535527910577765226390751398592512
  3. If you have a firewall running on the nodes in your Cassandra or DataStax Enterprise cluster, you must open certain ports to allow communication between the nodes. See Configuring firewall port access.

  4. Stop the nodes and clear the data.

    • For packaged installs, run the following commands:

      $ sudo service dse stop (stops the service)

      $ sudo rm -rf /var/lib/cassandra/* (clears the data from the default directories)

    • For binary installs, run the following commands from the install directory:

      $ ps auwx | grep cassandra (finds the Cassandra and DataStax Enterprise Java process ID [PID])

      $ sudo kill <pid> (stops the process)

      $ sudo rm -rf /var/lib/cassandra/* (clears the data from the default directories)

  5. Modify the following property settings in the cassandra.yaml file for each node:

    Note

    In the - seeds list property, include the internal IP addresses of each seed node.

    node0

    cluster_name: 'MyDemoCluster'
    initial_token: 0
    seed_provider:
      - class_name: org.apache.cassandra.locator.SimpleSeedProvider
        parameters:
             - seeds: "110.82.155.0,110.82.155.3"
    listen_address: 110.82.155.0
    rpc_address: 0.0.0.0
    

    node1 to node7

    The properties for the rest of the nodes are the same as Node0 except for the initial_token and listen_address:

    Node initial_token listen address
    node1 21267647932558653966460912964485513216 110.82.155.1
    node2 42535295865117307932921825928971026432 110.82.155.2
    node3 63802943797675961899382738893456539648 110.82.155.3
    node4 85070591730234615865843651857942052864 110.82.155.4
    node5 106338239662793269832304564822427566080 110.82.155.5
    node6 12760588759535192379876547778691307929 110.82.155.6
    node7 148873535527910577765226390751398592512 110.82.155.7
  6. After you have installed and configured DataStax Enterprise on all nodes, start the seed nodes one at a time, and then start the rest of the nodes.

    Note

    If the node has restarted because of automatic restart, you must stop the node and clear the data directories, as described in above.

  7. Check that your ring is up and running:

    • Packaged installs: nodetool ring -h localhost

    • Binary installs:

      $ cd /<install_directory>
      $ bin/nodetool ring -h localhost
      

    ../../_images/nodetool_results.png