DataStax Enterprise 3.0 Documentation

Using a rolling restart

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

Using a rolling restart, you upgrade and start one node at a time, instead of bringing down the entire cluster and starting all nodes at once. The procedures for a rolling restart of DSE Search/Solr and a rolling restart of Cassandra real-time nodes are different.

Performing a rolling restart of Cassandra real-time nodes

To perform a rolling restart of Cassandra real-time nodes:

  1. Restart the first node.
  2. Check for and resolve schema disagreements for the node.
  3. Repeat steps 1 and 2 to restart and resolve schema disagreements for another node.

Performing a rolling restart of Analytics/Hadoop Nodes

You can restart an Analytics/Hadoop node using the same rolling restart as Cassandra real-time nodes, however, a rolling restart is not fully supported. Exceptions, which you can ignore, flood the log file.

The exceptions are caused by the Hadoop job tracker repeatedly logging exceptions until all analytics nodes are upgraded. If you can tolerate these exceptions being added to the log file, use the rolling restart. The runtime exceptions you might see when starting analytics nodes look something like these snippet.

ERROR [GossipStage:1] 2012-09-21 01:09:21,510 AbstractCassandraDaemon.java
 (line 139) Fatal exception in thread . . .

INFO [JOB-TRACKER-INIT] 2012-09-20 07:06:38,064 JobTracker.java (line 2427) problem
 cleaning system directory: cfs:/tmp/hadoop-automaton/mapred/system
 java.io.IOException: java.lang.RuntimeException: TimedOutException() . . .

Ignore these exceptions. When the last node upgrades, restarts, and joins the cluster, the exceptions cease. As previously mentioned, upgrade and start the new job tracker node first.

Post-upgrade problems?

In the event of a post-upgrade problem, such as a schema disagreement, contact Support before attempting further DDL operations.

Checking for and resolving schema disagreements

When the output of DESCRIBE CLUSTER indicates a schema disagreement, or if a node is UNREACHABLE, perform these steps:

  1. Using the Command Line Interface (CLI), run the DESCRIBE CLUSTER command. For example,

    $ cassandra-cli -host localhost -port 9160
    
    [default@unknown] DESCRIBE cluster;
    

    If any node is UNREACHABLE, you see output something like this:

      [default@unknown] describe cluster;
      Cluster Information:
      Snitch: com.datastax.bdp.snitch.DseDelegateSnitch
      Partitioner: org.apache.cassandra.dht.RandomPartitioner
      Schema versions:
    UNREACHABLE: [10.202.205.203, 10.80.207.102, 10.116.138.23]
    
  2. Restart unreachable nodes.

  3. Repeat steps 1 and 2 until the DESCRIBE cluster command shows that all nodes have the same schema version number: only one schema version appears in the output of DESCRIBE cluster.

Scrubbing SSTables

If you created column families having LeveledCompactionStrategy, you need to scrub the SSTables that store those column families.

First, upgrade all nodes to the latest version of DataStax Enterprise, according to the platform-specific instructions presented earlier in this document. Next, complete steps in Configuring and starting a node. At this point, all nodes are upgraded and started.

Finally, follow these steps to install the sstablescrub utility and scrub the affected SSTables:

Tarball Installations

Download the sstablescrub and dse.in.sh utilities.

  1. Place the downloaded sstablescrub script into the $DSE_HOME/bin directory.
  2. Copy dse.in.sh script to the $DSE_HOME/bin directory.

Packaged Installations (deb/rpm)

Download the sstablescrub and dse.in.sh

  1. Place the attached sstablescrub in the /usr/bin directory.

  2. Replace dse.in.sh in the /usr/share/dse directory with the version you downloaded.

    Note

    Do not replace dse-env.sh in the /etc/dse directory.

To scrub SSTables:

  1. Shut down the nodes, one-at-a-time.

  2. On each offline node, run the sstablescrub utility.

    For example, on a tarball installation:

    cd <install directory>/bin
    ./sstablescrub mykeyspace mycolumnfamily
    

    To get help about sstablescrub:

    usage: sstablescrub -h
    
  3. Restart each node and client applications, one node at-a-time.

  4. Check for schema disagreements.

If you do not scrub the affected SSTables, you might encounter the following error during compactions on column families using LeveledCompactionStrategy:

ERROR [CompactionExecutor:150] 2012-07-05 04:26:15,570 AbstractCassandraDaemon.java (line 134)
Exception in thread Thread[CompactionExecutor:150,1,main]
java.lang.AssertionError
at org.apache.cassandra.db.compaction.LeveledManifest.promote
(LeveledManifest.java:214)