DataStax Enterprise 3.0 Documentation

Configuring Solr

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

A Solr schema defines the relationship between data in a column family and a Solr core. The schema identifies the columns to index in Solr and maps column names to Solr types. This document describes the Solr schema at a high level. For details about all the options and Solr schema settings, see the Solr wiki.

Wikipedia Sample Schema Elements

The sample schema.xml for the Wikipedia demo represents a typical schema. It specifies a tokenizer that determines the parsing of the wiki text. The set of fields specifies what Solr indexes and stores. In this example, these name, body, title, and date fields are indexed.

<schema name="wikipedia" version="1.1">
 <types>
  <fieldType name="string" class="solr.StrField"/>
  <fieldType name="text" class="solr.TextField">
    <analyzer><tokenizer class="solr.WikipediaTokenizerFactory"/></analyzer>
  </fieldType>
 </types>
 <fields>
    <field name="id"  type="string" indexed="true"  stored="true"/>
    <field name="name"  type="text" indexed="true"  stored="true"/>
    <field name="body"  type="text" indexed="true"  stored="true"/>
    <field name="title"  type="text" indexed="true"  stored="true"/>
    <field name="date"  type="string" indexed="true"  stored="true"/>
 </fields>
 <defaultSearchField>body</defaultSearchField>
 <uniqueKey>id</uniqueKey>

The example schema.xml meets the requirement to have a unique key and no duplicate rows. The unique key maps to the row key and is necessary for DSE to route documents to cluster nodes. This unique key is like a primary key in SQL. The last element in the schema.xml example designates that the unique key is id. In a DSE Search/Solr schema, the value of the stored attribute of non-unique fields needs to be true; True causes the field to stored in Cassandra. The field does not show up in search results.

Changing a schema

Changing the Solr schema makes reloading the Solr core necessary. Re-indexing can be disruptive. Users can be affected by performance hits caused by re-indexing. Changing the schema is recommended only when absolutely necessary. Also, changing the schema during scheduled down time is recommended.

About column family metadata

After indexing the Wikipedia articles, Cassandra columns in the column family contain metadata corresponding to the fields listed in the demo schema. The output of the CLI command, DESCRIBE wiki, shows this metadata:

 Column Name: body
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: wiki_solr_body_index
   Index Type: CUSTOM
   Index Options: {class_name=com.datastax.bdp.cassandra.index.solr.SolrSecondaryIndex}
 Column Name: date
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: wiki_solr_date_index
   Index Type: CUSTOM
   Index Options: {class_name=com.datastax.bdp.cassandra.index.solr.SolrSecondaryIndex}
 Column Name: name
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: wiki_solr_name_index
   Index Type: CUSTOM
   Index Options: {class_name=com.datastax.bdp.cassandra.index.solr.SolrSecondaryIndex}
 Column Name: solr_query
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: wiki_solr_solr_query_index
   Index Type: CUSTOM
   Index Options: {class_name=com.datastax.bdp.cassandra.index.solr.SolrSecondaryIndex}
 Column Name: title
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: wiki_solr_title_index
   Index Type: CUSTOM
   Index Options: {class_name=com.datastax.bdp.cassandra.index.solr.SolrSecondaryIndex}
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy

Column metadata matches each field in the schema except the id field because id is the unique key. The column metadata example shows some of the Cassandra Validator types in the Validation Class attribute.

Solr type mapping

DataStax Enterprise 3.0 and earlier releases use legacy mapping of Solr types to Cassandra validator types. In DataStax Enterprise 3.0.1 and later, this mapping is used:

Latest Mapping of Solr Types to Cassandra Validators

Solr Type Cassandra Validator Description
BCDIntField Int32Type Binary-coded decimal (BCD) integer. BCD is a relatively inefficient encoding that offers the benefits of quick decimal calculations and quick conversion to a string.
BCDLongField LongType BCD long integer
BCDStrField UTF8Type BCD string
BinaryField BytesType Binary data
BoolField BooleanType Contains either true or false. Values of "1", "t", or "T" in the first character are interpreted as true. Any other values in the first character are interpreted as false.
ByteField Int32Type Contains an 8-bit number value.
DateField DateType Represents a point in time with millisecond precision.
DoubleField DoubleType Double (64-bit IEEE floating point)
ExternalFileField UTF8Type Pulls values from a file on disk. See the section below on working with external files.
FloatField FloatType Floating point (32-bit IEEE floating point)
IntField Int32Type Integer (32-bit signed integer)
LongField LongType Long integer (64-bit signed integer)
RandomSortField UTF8Type Does not contain a value. Queries that sort on this field type will return results in random order. Use a dynamic field to use this feature.
ShortField Int32Type Short integer
SortableDoubleField DoubleType The Sortable* fields provide correct numeric sorting. If you use the plain types (DoubleField, IntField, and so on) sorting will be lexicographical instead of numeric.
SortableFloatField FloatType Numerically sorted floating point
SortableIntField Int32Type Numerically sorted integer
SortableLongField LongType Numerically sorted long integer
StrField UTF8Type String (UTF-8 encoded string or Unicode)
TextField UTF8Type Text, usually multiple words or tokens
TrieDateField DateType Date field accessible for Lucene TrieRange processing
TrieDoubleField DoubleType Double field accessible for Lucene TrieRange processing
TrieField see description Used with a type attribute and value: integer, long, float, double, date. Same as using any of the Trie field types, such as TrieIntField.
TrieFloatField FloatType Floating point field accessible for Lucene TrieRange processing
TrieIntField Int32Type Int field accessible for Lucene TrieRange processing
TrieLongField LongType Long field accessible for Lucene TrieRange processing
UUIDField UUIDType Universally Unique Identifier (UUID). Using a value of NEW and Solr creates a new UUID.
LatLonType UTF8Type Latitude/Longitude as a 2 dimensional point. Latitude is always specified first.
PointType UTF8Type For spatial search: An arbitrary n-dimensional point, useful for searching sources such as blueprints or CAD drawings.
GeoHashField UTF8Type Representing a Geohash. The field is provided as a lat/lon pair and is internally represented as a string

Legacy Mapping of Solr Types to Cassandra Validators

In DataStax Enterprise 3.0 and earlier, Solr types map to these Cassandra validator types:

Solr Type Cassandra Validator
TextField UTF8Type
StrField UTF8Type
LongField LongType
IntField Int32Type
FloatField FloatType
DoubleField DoubleType
DateField UTF8Type
ByteField BytesType
BinaryField BytesType
BoolField UTF8Type
UUIDField UUIDType
TrieDateField UTF8Type
TrieDoubleField UTF8Type
TrieField UTF8Type
TrieFloatField UTF8Type
TrieIntField UTF8Type
TrieLongField UTF8Type
All Others UTF8Type

For efficiency in operations such as range queries, using Trie types is recommended.

Configuring Solr type mapping

By default, DataStax Enterprise 3.0.x enables legacy type mapping (dseTypeMappingVersion is set to 0).

To make the new Solr type mappings effective, add the following line to the Solr config:

<dseTypeMappingVersion>1</dseTypeMappingVersion>

Switching between the two versions is not supported. Attempting to load a solrconfig with a different dseTypeMappingVersion configuration and reloading the core causes an error.

Configuring Solr library paths

Contrary to the examples shown in the solrconfig.xml indicating that relative paths are supported, DataStax Enterprise does not support the relative path values set for the <lib> property. DSE Search/Solr fails to find files placed in directories defined by the <lib> property. The workaround is to place custom code or Solr contrib modules in these directories:

  • Packaged installs: /usr/share/dse
  • Binary installs: <install_location>/resources/dse/lib