DataStax Enterprise 2.2 Documentation

Tuning DSE Search Performance

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

DataStax Enterprise server is able to support real-time, analytic, and search workloads in the same cluster of machines with smart workload isolation. This ensures that workloads do not compete with the other for data or computing resources and helps deliver consistently high performance. In the event of a performance degradation, high memory consumption, or other problem with DataStax Enterprise Search nodes, try:

Using Column Family Compression

Search nodes typically engage in read-dominated tasks, so maximizing storage capacity of nodes, reducing the volume of data on disk, and limiting disk I/O can improve performance. In Cassandra 1.0 and later, you can configure data compression on a per-column family basis to optimize performance of read-dominated tasks.

Configuration affects the compression algorithm for compressing SSTable files. For read-heavy workloads, such as those carried by Enterprise Search, Snappy compression is recommended. Compression using the Snappy compressor is enabled by default when you create a column family in Cassandra 1.1 and later. You can change compression options using CQL. Developers can also implement custom compression classes using the org.apache.cassandra.io.compress.ICompressor interface. You can configure the compression chunk size for read/write access patterns and the average size of rows in the column family.

Setting the High-Performance Update Handler

You need to configure the solrconfig.xml to use near real-time capabilities in Solr by setting the default high-performance update handler flag. For example, the Solr configuration file for the Wikipedia demo sets this flag as follows:

<!-- The default high-performance update handler -->
  <updateHandler class="solr.DirectUpdateHandler2">
    <autoSoftCommit>
      <maxTime>1000</maxTime>
    </autoSoftCommit>
  </updateHandler>

This example uses the maxTime update handler option. The update handler options enable near real-time performance and trigger a soft commit of data automatically, so checking synchronization of data to disk is not necessary. Data durability is maintained by letting cassandra do hard commits along with Cassandra memtable flushes. This table describes both update handler options.

Option Name Default Description
maxDocs No default Maximum number of documents to add since the last soft commit before automatically triggering a new soft commit.
maxTime 1000 Maximum expired time in milliseconds between the addition of a document and a new, automatically triggered soft commit.

For more information about the update handler and modifying SolrConfig.xml, see the Solr documentation.

Changing the Stack Size and Memtable Space

Some Solr users have reported that increasing the stack size improves performance under Tomcat. To increase the stack size, uncomment and modify the default -Xss128k setting in the cassandra-env.sh file. Also, decreasing the memtable space to make room for Solr caches might improve performance. Modify the memtable space using the memtable_total_space_in_mb property in the cassandra.yaml file.

Managing Caching

You can configure the solrconfig.xml to specify where files are cached, in RAM or on the file system, by setting the DSE near real-time caching directory factory flag. By changing directory factory attributes, you can manage where files are cached.

To manage caching operations:

  1. Open solrconfig.xml for editing.
  2. Add a directoryFactory element to solrconfig.xml of type DSENRTCachingDirectoryFactory. For example:
<directoryFactory name="DirectoryFactory"
  class="com.datastax.bdp.cassandra.index.solr.DSENRTCachingDirectoryFactory">
  <double name="maxmergesizemb">5.0</double>
  <double name="maxcachedmb">32.0</double>
</directoryFactory>
  1. Set the DirectoryFactory attributes:

    • maxmergesizemb

      The threshold (MB) for writing a merge segment to a RAMDirectory or to the file system. If the estimated size of merging a segment is less than maxmergesizemb, the merge segment is written to the RAMDirectory; otherwise, it is written to the file system.

    • maxcachemb

      The maximum value (MB) of the RAMDirectory.

Increasing Read Performance by Adding Replicas

You can increase DSE Search read performance by configuring replicas just as you do in Cassandra. You define a replica placement strategy and the number of replicas you want. For example, you can add replicas using the NetworkToplogyStrategy replica placement strategy. To configure this strategy, you can use CQL. For example, if you are using a PropertyFileSnitch, perform these steps:

  1. Check the data center names of your nodes using the nodetool command.

    ./nodetool -h localhost ring
    

    Note

    The data center names, DC1 and DC2 in this example, must match the data center name configured for your snitch.

  2. Start CQL on the DSE command line and create a keyspace that specifies the number of replicas you want.

    CREATE KEYSPACE test
    WITH strategy_class = 'NetworkTopologyStrategy'
    AND strategy_options:DC1 = 1
    AND strategy_options:DC2 = 3;
    
The strategy options set the number of replicas in data centers, one replica in data center 1 and three in data center 2. For more information about adding replicas, see Choosing Keyspace Replication Options.

Changing the Replication Factor for a Solr Keyspace

When you post the solrconfig.xml and schema.xml, DSE Search creates a keyspace and column family in Cassandra. The default replication factor for this keyspace is 1. If you need more than one replica of the keyspace in your cluster, you need to update the replication factor of the keyspace.

The following procedure builds on the wikipedia demo example. Assume the solrconfig.xml and schema.xml files have already been posted using wiki.solr in the URL, which creates a keyspace named wiki that has a default replication factor of 1. You want three replicas of the keyspace in the cluster, so you need to update the Solr keyspace replication factor.

To change the Solr keyspace replication factor

  1. Check the name of the data center of the Solr/Search nodes.

    ./nodetool -h localhost ring
    

    The output tells you that the name of the data center for your node is, for example, datacenter1.

  2. Use the Cassandra CLI to change the replication factor of the keyspace. Set a replication factor of 3.

    cassandra-cli -host localhost -port 9160
    
    [default@unknown] UPDATE KEYSPACE wiki
       WITH strategy_options = {Solr:3};
    

If you have data in a keyspace and then change the replication factor, run nodetool repair to avoid having missing data problems or data unavailable exceptions.

Managing the Consistency Level

Consistency refers to how up-to-date and synchronized a row of data is on all of its replicas. Like Cassandra, DSE-Search extends Solr by adding an HTTP parameter, cl, that you can send with Solr data to tune consistency. The format of the URL is:

http://<host>:<port>/solr/<keyspace>.<column family>/update?cl=ONE

The cl parameter specifies the consistency level of the write in Cassandra. The default consistency level is QUORUM, but you can change the default using the “search.consistencylevel.write” system property.