DataStax Enterprise 3.0 Documentation

Elastic workload re-provisioning

When you install a node within a cluster, you mark it as either real-time (Cassandra), analytics (Hadoop), or search (Solr). To meet the requirements of changing workloads, DataStax Enterprise allows you to re-provision your existing Cassandra and Hadoop nodes at will and change the overall dynamic and capacity of your clusters.

For example, suppose an online Web application's daily operations dictate that a cluster’s is allocated as follows:

  • 4 Cassandra nodes (for real-time/transactional processing)
  • 2 Hadoop nodes (for analytics)
  • 2 Solr nodes (for search)

To meet the analytics requirements of various marketing programs, more than two nodes are needed to perform the required analysis. To meet this need, you can re-provision two of the Cassandra nodes to Hadoop nodes during low traffic volume hours so the cluster looks like:

  • 2 Cassandra nodes
  • 4 Hadoop nodes
  • 2 Solr nodes

Then after the Hadoop tasks are completed, you can return the cluster to its daily configuration.


../../_images/cluster_reprovisioning.png

Setting up a node for re-provisioning

The first step for enabling workload re-provisioning is to change the delegated snitch, which is designated by the DseDelegateSnitch (located in the dse.yaml file). A snitch is a configurable component of a Cassandra cluster that defines how the nodes are grouped together within the overall network topology.

By default the delegated snitch is the DseSimpleSnitch (org.apache.cassandra.locator.DseSimpleSnitch). In addition to the DseDelegateSnitch, DataStax Enterprise delegates to the standard Cassandra snitches. For more information see About Snitches in the DataStax Cassandra documentation.

The second step for packaged installations, such as RHEL and Debian, is to set the node's role in a configuration file and then restart the node.

The second step for tarball installations is stop the node and restart it in the desired role by setting an option.

Note

Solr nodes cannot be re-provisioned.

Delegating the snitch

The DseDelegateSnitch sets which snitch is used for re-provisioning. You need to only set the snitch one time. All nodes must use the same snitch in a cluster.

This section provides an example of delegating the RackInferringSnitch to enable workload re-provisioning. The RackInferringSnitch infers the topology of the network by analyzing the node IP addresses.

To delegate a snitch:

  1. Open the dse.yaml file.

    • Packaged installations - /etc/dse/dse.yaml
    • Tarball installations - <install_location>/resources/dse/conf/dse.yaml
  2. Set the delegated snitch and save the file:

    delegated_snitch: org.apache.cassandra.locator.RackInferringSnitch
    

Re-provisioning packaged installations

Packaged installations provide startup scripts in /etc/init.d.

  1. Edit the /etc/default/dse file to set the node's role:

    • To make the node analytics: HADOOP_ENABLE=1
    • To make the node real-time/transactional, comment out HADOOP_ENABLED=1
  2. Restart the node:

    $ sudo service dse restart
    

Re-provisioning tarball installations

Use these instructions for Mac and other tarball installations:

  1. To stop a node, find the Cassandra or DSE Java process ID (PID) and kill the process using the PID. For example:

    $ ps -auwx | grep cassandra
    $ kill <pid>
    
  2. Start the node:

    • Analytics node: dse cassandra -t
    • Cassandra node: dse cassandra

Note

DataStax does not recommend running Hadoop and Solr on the same node in production environments.