DataStax Enterprise 3.0 Documentation

Creating a search index

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

A minimal Solr installation requires these files:

  • Schema.xml

    Describes the fields to index in Solr and types associated with them. These fields map to Cassandra columns. To route search requests to the appropriate nodes, the schema needs a unique key.

  • Solrconfig.xml

    Holds configuration information for query handlers and Solr-specific caches.

After writing a schema.xml you HTTP-post the solrconfig.xml and the schema.xml to a Solr node in your DataStax Enterprise cluster. Next, you create a new Solr core (or reload an existing core) to create (or recreate) an index on a column family for searching Cassandra data.

When users post schema or configuration files simultaneously, schema disagreements can occur. This causes Solr errors.

Note

Do not make schema changes on hot production systems.

To create a Solr index for searching Cassandra data:

  1. Post the configuration file using the cURL utility:

    curl http://localhost:8983/solr/resource/<keyspace.columnfamily>/solrconfig.xml
      --data-binary @solrconfig.xml -H 'Content-type:text/xml; charset=utf-8'
    
  2. Post the schema file:

    curl http://localhost:8983/solr/resource/<keyspace.columnfamily>/schema.xml
      --data-binary @schema.xml -H 'Content-type:text/xml; charset=utf-8'
    
  3. Create or reload a Solr core. Do not perform this step before performing steps 1 and 2.

    Create a Solr core

    curl "http://localhost:8983/solr/admin/cores?action=CREATE&name=<keyspace.columnfamily>"
    

    Creating a Solr core on one node automatically creates the core other Solr nodes, and DSE Search stores the files on all the Cassandra nodes.

    Reload an existing Solr core

    curl "http://localhost:8983/solr/admin/cores?action=RELOAD&name=<keyspace.columnfamily>"
    

    Reload a Solr core instead of creating a new one when you need to modify the schema.xml or solrconfig.xml. You can use options with the RELOAD command to re-index and keep or delete the Lucene index.

Checking indexing status

If you HTTP post the files to a pre-existing column family, DSE Search starts indexing the data. If you HTTP post the files to a non-existent column keyspace or column family, DSE Search creates the keyspace and column family, and then starts indexing the data. For example, you can change the stopwords.txt file, repost the schema, and the index updates.

To check the indexing status, open the Solr Admin and click Core Admin.


../../_images/dse_search_core_admin.png

Using RELOAD command options

When you make a change to the schema, the compatibility of the existing index and the new schema is questionable. If the change to the schema made changes to a field's type, the index and schema will certainly be incompatible. Changes to a field's type can actually occur in subtle ways, occasionally without a change to the schema.xml file itself. For example, a change to other configuration files, such as synonyms, can change the schema. If such an incompatibility exists, a full re-index, which includes deleting all the old data, of the Solr data is required. In these cases, anything less than a full re-index renders the schema changes ineffective. Typically, a change to the Solr schema requires a full re-indexing.

Use these RELOAD command options to specify the level of re-indexing that occurs:

  • distributed

    True, the default, distributes an index to nodes in the cluster. False re-indexes the Solr data on one node.

    curl -v "http://localhost:8983/solr/admin/cores?action=RELOAD&
      name=<keyspace.columnfamily>&distributed=true"
    
  • reindex and deleteAll

    Re-indexes data in place or re-indexes in full. The default for both options is false.

Re-indexing in place

Setting reindex=true and deleteAll=false re-indexes data and keeps the existing lucene index. During the uploading process, user searches yield inaccurate results. To perform an in-place re-index, use this syntax:

curl "http://localhost:8983/solr/admin/cores?action=RELOAD
  &name=<keyspace.columnfamily>&reindex=true&deleteAll=false"

Re-indexing in full

Setting reindex=true and deleteAll=true deletes the Lucene index and re-indexes the dataset. User searches initially return no documents as the Solr cores reload and data is re-indexed.

Setting reindex=false and deleteAll=true does nothing.

Checking a schema

After creating a schema and indexing documents, you can check that the Solr index is working by using the Solr Admin UI in this location:

http://localhost:8983/solr/

If the UI appears, the index is working. The UI looks something like this:


../../_images/wikipedia2_30.png

Adding and viewing index resources

DSE Search includes a REST API for viewing and adding resources associated with an index. You can look at the contents of the existing Solr resource by loading its URL in a web browser or using HTTP get. Retrieving and viewing resources returns the last uploaded resource, even if the resource is not the one currently in use. If you upload a new schema, and then before reloading, request the schema resource, Solr returns the new schema even though the core continues to use the old schema.

Use this format:

http://<host>:<port>/solr/resource/<keyspace>.<columnfamily>/<filename>.<ext>

Generally, you can post any resource required by Solr to this URL. For example, stopwords.txt and elevate.xml are optional, frequently-used Solr configuration files that you post using this URL.