Common DSE Search/Solr operations are:
To increase the number of nodes in a Solr cluster, you can add or bootstrap a DSE node to the cluster. If you want to increase capacity of your search, you need to bootstrap the node, then optionally, rebalance the cluster. To bootstap a Solr node, use the same method you use to bootstrap a Cassandra node. Using the default DSESimpleSnitch automatically puts all the Solr nodes in the same data center. Use OpsCenter Enterprise to rebalance the cluster.
To delete a column family and its data, including the indexed data, from a Solr node drop the column family using the Cassandra Query Language (CQL) or the Command Line Interface (CLI). The following example, which assumes you ran the Wikipedia demo, lists the Solr files on the file system, drops the solr column family that the demo created, and then verifies that the files have been deleted from the file system:
List the Solr data files on the file system.
Packaged install:
ls /usr/local/var/lib/dse5/data/solr.data/wiki.solr/index/
Tarball install:
ls /var/lib/cassandra/data/solr.data/wiki.solr/index
The output looks something like this:
_33.fdt _35_nrm.cfe _38_Lucene40_0.tim _33.fdx _35_nrm.cfs _38_Lucene40_0.tip _33.fnm _36.fdt _38_nrm.cfe . . .
Launch cqlsh and execute the CQL command to drop the solr column family.
use wiki; drop columnfamily solr;
Exit cqlsh and check that the files have been deleted on the file system. For example:
ls /var/lib/cassandra/data/solr.data/wiki.solr/index
The output is:
ls: /var/lib/cassandra/data/solr.data/wiki.solr/index: No such file or directory
Using the CQL, the CLI, or Solr APIs, you can modify Solr and column family data. When you update a column family using CQL or CLI, the Solr document is updated. When you update a Solr document using the Solr API, the column family is updated. Re-indexing occurs automatically after an update.
Writes are durable. A Solr API client writes data to Cassandra first, and then Cassandra updates secondary indexes. All writes to a replica node are recorded both in memory and in a commit log before they are acknowledged as a success. If a crash or server failure occurs before the memory tables are flushed to disk, the commit log is replayed on restart to recover any lost writes.
The Solr index update operation is similar to a Cassandra secondary index update. If the old column value was still in the Cassandra memtable, Cassandra removes the index entry; otherwise, the old entry remains to be purged by compaction. If a read sees a stale index entry before compaction purges it, the reader thread invalidates it. You can also trigger the expiration of search data.
You can use the Solr HTTP REST API to insert into, modify, or delete data from a Solr node. When you update only a single field, the document is re-indexed in full. After writing the field modifications to the Solr document, use a URL in the following format to update the document:
curl http://<host>:<port>/solr/<keyspace>.<column family>/update? replacefields=false
The Solr convention is to use curl for issuing update commands instead of using a browser.
When you use CQL or CLI to update a field, DSE Search implicitly sets replacefields to false and updates individual fields in the Solr document. The re-indexing of data occurs automatically.
You can re-index manually using the UI or command-line tools. In the Core Admin screen of the Solr Admin UI, the Reload, Reindex and Full Reindex buttons perform functions that correspond to RELOAD command options.
Do not use the optimize command. This warning appears in the system log when you use the optimize:
WARN [http-8983-2] 2013-03-26 14:33:04,450 CassandraDirectUpdateHandler2.java (line 697) Calling commit with optimize is not recommended.
The Lucene merge policy is very efficient. Using the optimize command is no longer necessary and using the optimize command in a URL can cause nodes to fail.
You can decommission and repair a Solr node in the same manner as you would a Cassandra node.
To rebuild the index, reload the Solr core.
Solr has its own set of data files. Like Cassandra data files, you can control where the Solr data files are saved on the server. By default, the data is saved in <Cassandra data directory>/solr.data. You can change the location from the <Cassandra data directory> to another directory, from the command line. For example:
cassandra -s -Ddse.solr.data.dir=/opt
In this example, the data in solr.data is saved in the /opt directory.
DSE Search stores validation errors that arise from non-indexable data sent from non-Solr nodes in this log:
/var/log/cassandra/solrvalidation.log
For example, if a Cassandra node that is not running Solr puts a string in a date field, an exception is logged for that column when the data is replicated to the Solr node.
To change the Solr port from the default, 8983, change the http.port setting in the catalina.properties file installed with DSE in <dse-version>/resources/tomcat/conf.