DataStax Enterprise 3.0 Documentation

Common operations

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

Common DSE Search/Solr operations are:

Adding a new Solr node

To increase the number of nodes in a Solr cluster, you can add or bootstrap a DSE node to the cluster. If you want to increase capacity of your search, you need to bootstrap the node, then optionally, rebalance the cluster. To bootstap a Solr node, use the same method you use to bootstrap a Cassandra node. Using the default DSESimpleSnitch automatically puts all the Solr nodes in the same data center. Use OpsCenter Enterprise to rebalance the cluster.

Deleting Solr data

To delete a column family and its data, including the indexed data, from a Solr node drop the column family using the Cassandra Query Language (CQL) or the Command Line Interface (CLI). The following example, which assumes you ran the Wikipedia demo, lists the Solr files on the file system, drops the solr column family that the demo created, and then verifies that the files have been deleted from the file system:

  1. List the Solr data files on the file system.

    • Packaged install:

      ls /usr/local/var/lib/dse5/data/
    • Tarball install:

      ls /var/lib/cassandra/data/

    The output looks something like this:

    _33.fdt                        _35_nrm.cfe             _38_Lucene40_0.tim
    _33.fdx                        _35_nrm.cfs             _38_Lucene40_0.tip
    _33.fnm                        _36.fdt                 _38_nrm.cfe
    . . .
  2. Launch cqlsh and execute the CQL command to drop the solr column family.

    use wiki;
    drop columnfamily solr;
  3. Exit cqlsh and check that the files have been deleted on the file system. For example:

    ls /var/lib/cassandra/data/

    The output is:

    ls: /var/lib/cassandra/data/ No such file or directory

Updating Solr data

Using the CQL, the CLI, or Solr APIs, you can modify Solr and column family data. When you update a column family using CQL or CLI, the Solr document is updated. When you update a Solr document using the Solr API, the column family is updated. Re-indexing occurs automatically after an update.

Writes are durable. A Solr API client writes data to Cassandra first, and then Cassandra updates secondary indexes. All writes to a replica node are recorded both in memory and in a commit log before they are acknowledged as a success. If a crash or server failure occurs before the memory tables are flushed to disk, the commit log is replayed on restart to recover any lost writes.

The Solr index update operation is similar to a Cassandra secondary index update. If the old column value was still in the Cassandra memtable, Cassandra removes the index entry; otherwise, the old entry remains to be purged by compaction. If a read sees a stale index entry before compaction purges it, the reader thread invalidates it. You can also trigger the expiration of search data.

Updating individual fields using the Solr API

You can use the Solr HTTP REST API to insert into, modify, or delete data from a Solr node. When you update only a single field, the document is re-indexed in full. After writing the field modifications to the Solr document, use a URL in the following format to update the document:

curl http://<host>:<port>/solr/<keyspace>.<column family>/update?

The Solr convention is to use curl for issuing update commands instead of using a browser.

When you use CQL or CLI to update a field, DSE Search implicitly sets replacefields to false and updates individual fields in the Solr document. The re-indexing of data occurs automatically.

Re-indexing using the Core Admin UI

You can re-index manually using the UI or command-line tools. In the Core Admin screen of the Solr Admin UI, the Reload, Reindex and Full Reindex buttons perform functions that correspond to RELOAD command options.

Warning about using the optimize command

Do not use the optimize command. This warning appears in the system log when you use the optimize:

WARN [http-8983-2] 2013-03-26 14:33:04,450 (line 697)
Calling commit with optimize is not recommended.

The Lucene merge policy is very efficient. Using the optimize command is no longer necessary and using the optimize command in a URL can cause nodes to fail.

Decommissioning and repairing a node

You can decommission and repair a Solr node in the same manner as you would a Cassandra node.

Rebuilding an index

To rebuild the index, reload the Solr core.

Managing the location of Solr data

Solr has its own set of data files. Like Cassandra data files, you can control where the Solr data files are saved on the server. By default, the data is saved in <Cassandra data directory>/ You can change the location from the <Cassandra data directory> to another directory, from the command line. For example:

cassandra -s

In this example, the data in is saved in the /opt directory.

Accessing the validation Log

DSE Search stores validation errors that arise from non-indexable data sent from non-Solr nodes in this log:


For example, if a Cassandra node that is not running Solr puts a string in a date field, an exception is logged for that column when the data is replicated to the Solr node.

Changing the Solr connector port

To change the Solr port from the default, 8983, change the http.port setting in the file installed with DSE in <dse-version>/resources/tomcat/conf.