DataStax Enterprise 3.1 Documentation

Capacity planning

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

Using DSE Search/Solr is memory-intensive. This discovery process is intended to help you, the DSE Search/Solr administrator, develop a plan for having sufficient memory resources to meet the needs of your users.

Overview

First, you estimate how large your Solr index will grow by indexing a number of documents on a single node, executing typical user queries, and then examining the field cache memory usage for heap allocation. Repeat this process using a greater number of documents until you get a feel for the size of the index for the maximum number of documents that a single node can handle. You can then determine how many servers to deploy for a cluster and the optimal heap size. The index should be stored on SSDs or should fit in the system IO cache.

Setup

You need to have the following hardware and data:

A node with:

  • GB of RAM
  • SSD or spinning disk

Input data:

  • N documents indexed on a single test node
  • A complete set of sample queries to be executed
  • The total number of documents system should support

Step-by-step process

  1. Create a schema.xml and solrconfig.xml.

  2. Start a node.

  3. Add N docs.

  4. Run a range of queries that simulate those of users in a production environment.

  5. View the status of field cache memory to discover the memory usage.

  6. View the size of the index (on disk) included in the status information about the Solr core.

  7. Based on the server's system IO cache available, set a maximum index size per-server.

  8. Based on the memory usage, set a maximum heap size required per-server.

  9. Calculate the maximum number of documents per node based on #6 and #7.

    When the system is approaching the maximum docs per-node, add more nodes.

Results

  • Optimal heap size per node
  • Number of nodes needed for your application, where the replication factor can be increased for more queries per second.