Using a rolling restart, you upgrade and start one node at a time, instead of bringing down the entire cluster and starting all nodes at once. The procedures for a rolling restart of DSE Search/Solr and a rolling restart of Cassandra real-time nodes are different.
To perform a rolling restart of Cassandra real-time nodes:
You can restart an Analytics/Hadoop node using the same rolling restart as Cassandra real-time nodes, however, a rolling restart is not fully supported. Exceptions, which you can ignore, flood the log file.
The exceptions are caused by the Hadoop job tracker repeatedly logging exceptions until all analytics nodes are upgraded. If you can tolerate these exceptions being added to the log file, use the rolling restart. The runtime exceptions you might see when starting analytics nodes look something like these snippet.
ERROR [GossipStage:1] 2012-09-21 01:09:21,510 AbstractCassandraDaemon.java (line 139) Fatal exception in thread . . . INFO [JOB-TRACKER-INIT] 2012-09-20 07:06:38,064 JobTracker.java (line 2427) problem cleaning system directory: cfs:/tmp/hadoop-automaton/mapred/system java.io.IOException: java.lang.RuntimeException: TimedOutException() . . .
Ignore these exceptions. When the last node upgrades, restarts, and joins the cluster, the exceptions cease. As previously mentioned, upgrade and start the new job tracker node first.
When the output of DESCRIBE CLUSTER indicates a schema disagreement, or if a node is UNREACHABLE, perform these steps:
Using the Command Line Interface (CLI), run the DESCRIBE CLUSTER command. For example,
$ cassandra-cli -host localhost -port 9160 [default@unknown] DESCRIBE cluster;
If any node is UNREACHABLE, you see output something like this:
[default@unknown] describe cluster; Cluster Information: Snitch: com.datastax.bdp.snitch.DseDelegateSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: UNREACHABLE: [10.202.205.203, 10.80.207.102, 10.116.138.23]
Restart unreachable nodes.
Repeat steps 1 and 2 until the DESCRIBE cluster command shows that all nodes have the same schema version number: only one schema version appears in the output of DESCRIBE cluster.
If you created column families having LeveledCompactionStrategy, you need to scrub the SSTables that store those column families.
First, upgrade all nodes to the latest version of DataStax Enterprise, according to the platform-specific instructions presented earlier in this document. Next, complete steps in Configuring and starting a node. At this point, all nodes are upgraded and started.
Finally, follow these steps to install the sstablescrub utility and scrub the affected SSTables:
Packaged Installations (deb/rpm)
Place the attached sstablescrub in the /usr/bin directory.
Replace dse.in.sh in the /usr/share/dse directory with the version you downloaded.
Do not replace dse-env.sh in the /etc/dse directory.
To scrub SSTables:
Shut down the nodes, one-at-a-time.
On each offline node, run the sstablescrub utility.
For example, on a tarball installation:
cd <install directory>/bin ./sstablescrub mykeyspace mycolumnfamily
To get help about sstablescrub:
usage: sstablescrub -h
Restart each node and client applications, one node at-a-time.
If you do not scrub the affected SSTables, you might encounter the following error during compactions on column families using LeveledCompactionStrategy:
ERROR [CompactionExecutor:150] 2012-07-05 04:26:15,570 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:150,1,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.LeveledManifest.promote (LeveledManifest.java:214)