You can run Solr on one or more nodes. DataStax does not support running Solr and Hadoop on the same node, although it's possible to do so in a development environment. In production environments, run real-time (Cassandra), analytics (Hadoop), or DSE Search (Solr) nodes on separate nodes and in separate data centers.
Common DSE Search/Solr operations are:
Repairing subranges of data in a cluster is faster than running a nodetool repair operation on entire ranges because all the data replicated during the nodetool repair operation has to be re-indexed. When you repair a subrange of the data, less data has to be re-indexed.
To repair a subrange
Perform these steps as a rolling repair of the cluster, one node at a time.
Run the dsetool list_subranges command, using the approximate number of rows per subrange, the beginning of the partition range (token), and the end of the partition range of the node.
dsetool list_subranges my_keyspace my_table 10000 113427455640312821154458202477256070485 0
The output lists the subranges.
Start Token End Token Estimated Size ------------------------------------------------------------------------------------------------ 113427455640312821154458202477256070485 132425442795624521227151664615147681247 11264 132425442795624521227151664615147681247 151409576048389227347257997936583470460 11136 151409576048389227347257997936583470460 0 11264
Use the output of the previous step as input to the nodetool repair command.
nodetool repair my_keyspace my_table -st 113427455640312821154458202477256070485 -et 132425442795624521227151664615147681247 nodetool repair my_keyspace my_table -st 132425442795624521227151664615147681247 -et 151409576048389227347257997936583470460 nodetool repair my_keyspace my_table -st 151409576048389227347257997936583470460 -et 0
The anti-entropy node repair runs from the start to the end of the partition range.
Due to the nature of a distributed system, the DSE Search/Solr consistency level of ONE, and other factors, Solr queries can return inconsistent results. For example, Solr might return different numFound counts from consecutive queries.
An efficient way of achieving consistent results is to repair nodes using the subrange repair method.
To increase the number of nodes in a Solr cluster, you can add a DSE node to the cluster. If you want to increase capacity of your search, add the node, then optionally, rebalance the cluster. To add a Solr node, use the same method you use to add a Cassandra node. Using the default DSESimpleSnitch automatically puts all the Solr nodes in the same data center. Use OpsCenter Enterprise to rebalance the cluster.
You can decommission and repair a Solr node in the same manner as you would a Cassandra node. The efficient and recommended way to repair a node, or cluster, is to use the subrange repair method.
Solr has its own set of data files. Like Cassandra data files, you can control where the Solr data files are saved on the server. By default, the data is saved in <Cassandra data directory>/solr.data. You can change the location from the <Cassandra data directory> to another directory, from the command line. For example, on Linux:
cd <install_directory> bin/dse cassandra -s -Ddse.solr.data.dir=/opt
In this example, the Solr data is saved in the /opt directory.
DSE Search logs Solr log messages in the Cassandra system log:
Assuming you configured and are using the Apache log4j utility, you can control the granularity of Solr log messages, and other log messages, in the Cassandra system.log file by configuring the log4j-server.properties file. The log4j-server.properties file is located in:
Packaged installations: /etc/dse/cassandra
Binary installations: /resources/cassandra/conf/
To set log levels, configure the log4j.rootLogger value, specifying one of these values:
For example, open the log4j-server.properties file and change the log level by configuring the log4j.rootLogger value:
# output messages into a rolling log file as well as stdout log4j.rootLogger=INFO,stdout
DSE Search stores validation errors that arise from non-indexable data sent from non-Solr nodes in this log:
For example, if a Cassandra node that is not running Solr puts a string in a date field, an exception is logged for that column when the data is replicated to the Solr node.
To change the Solr port from the default, 8983, change the http.port setting in the catalina.properties file installed with DSE in <dse-version>/resources/tomcat/conf.
DataStax Enterprise supports secure enterprise search using Apache Solr 4.3 and Lucene. The security table summarizes the security features of DSE Search/Solr and other integrated components. DSE Search data is completely or partially secured by using DataStax Enterprise security features:
Access to Solr documents, excluding cached data, can be limited to users who have been granted access permissions. Permission management also secures tables used to store Solr data.
Data at rest in Cassandra tables, excluding cached and Solr-indexed data, can be encrypted. Encryption occurs on the Cassandra side and impacts performance slightly.
You can encrypt HTTP access to Solr data and internal, node-to-node Solr communication using SSL. Enable SSL node-to-node encryption on the Solr node by setting encryption options in the dse.yaml file as described in Client-to-node encryption.
You can authenticate DSE Search users through Kerberos authentication using Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO). To use the SolrJ API against DSE Search clusters with Kerberos authentication, client applications should use the SolrJ-Auth library and the DataStax Enterprise SolrJ component as described in the solrj-auth-README.md file.
You can also use HTTP Basic Authentication, but this is not recommended.
When you enable Cassandra's internal authentication by specifying authenticator: org.apache.cassandra.auth.PasswordAuthenticator in cassandra.yaml, clients must use HTTP Basic Authentication to provide credentials to Solr services. Due to the stateless nature of HTTP Basic Authentication, this can have a significant performance impact as the authentication process must be executed on each HTTP request. For this reason, DataStax does not recommend using internal authentication on DSE Search clusters in production. To secure DSE Search in production, enable DataStax Enterprise Kerberos authentication.
To configure DSE Search to use Cassandra's internal authentication, follow this configuration procedure:
Comment AllowAllAuthenticator and uncomment the PasswordAuthenticator in cassandra.yaml to enable HTTP Basic authentication for Solr.
#authenticator: org.apache.cassandra.auth.AllowAllAuthenticator authenticator: org.apache.cassandra.auth.PasswordAuthenticator #authenticator: com.datastax.bdp.cassandra.auth.PasswordAuthenticator #authenticator: com.datastax.bdp.cassandra.auth.KerberosAuthenticator
Start the server.
The browser asks you for a Cassandra username and password.
You can exclude hosts from Solr-distributed queries in DataStax Enterprise 3.1.2 and later. To exclude hosts from queries, perform these steps on each node that you want to send queries to.
DataStax Enterprise 3.1.2 exposes the com.datastax.bdp:type=ShardRouter Mbean, providing the following operations: