DataStax Enterprise 3.1 Documentation

Configuring Solr

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

DataStax Enterprise 3.1 includes improvements in the mapping of Solr types to Cassandra validators. If you are running applications from DataStax Enterprise 3.0.x and earlier, configure legacy mapping of Solr types.

Latest mapping of Solr types

This table shows the DataStax Enterprise 3.1 mapping of Solr types to Cassandra validators.

Solr Type Cassandra Validator Description
BCDIntField Int32Type Binary-coded decimal (BCD) integer
BCDLongField LongType BCD long integer
BCDStrField UTF8Type BCD string
BinaryField BytesType Binary data
BoolField BooleanType True (1, t, or T) or False (not 1, t, or T)
ByteField Int32Type Contains an 8-bit number value.
DateField DateType Point in time with millisecond precision
DoubleField DoubleType Double (64-bit IEEE floating point)
ExternalFileField UTF8Type Values from disk file
FloatField FloatType 32-bit IEEE floating point
IntField Int32Type 32-bit signed integer
LongField LongType Long integer (64-bit signed integer)
RandomSortField UTF8Type Dynamic field in random order
ShortField Int32Type Short integer
SortableDoubleField DoubleType Numerically sorted doubles
SortableFloatField FloatType Numerically sorted floating point
SortableIntField Int32Type Numerically sorted integer
SortableLongField LongType Numerically sorted long integer
StrField UTF8Type String (UTF-8 encoded string or Unicode)
TextField UTF8Type Text, usually multiple words or tokens
TrieDateField DateType Date field for Lucene TrieRange processing
TrieDoubleField DoubleType Double field for Lucene TrieRange processing
TrieField see description Same as any Trie field type
TrieFloatField FloatType Floating point field for Lucene TrieRange processing
TrieIntField Int32Type Int field for Lucene TrieRange processing
TrieLongField LongType Long field for Lucene TrieRange processing
UUIDField UUIDType Universally Unique Identifier (UUID)
LatLonType UTF8Type Latitude/Longitude 2-D point, latitude first
PointType UTF8Type Arbitrary n-dimensional point for spatial search
GeoHashField UTF8Type Geohash lat/lon pair represented as a string

For efficiency in operations such as range queries, using Trie types is recommended. Notes about some of the types are:

  • BCD

    A relatively inefficient encoding that offers the benefits of quick decimal calculations and quick conversion to a string.

  • SortableDoubleField/DoubleType

    If you use the plain types (DoubleField, IntField, and so on) sorting will be lexicographical instead of numeric.

  • TrieField

    Used with a type attribute and value: integer, long, float, double, date.

Legacy mapping of Solr Types

DataStax Enterprise 3.0.x and earlier use the legacy type mapping by default.

Solr Type Cassandra Validator
TextField UTF8Type
StrField UTF8Type
LongField LongType
IntField Int32Type
FloatField FloatType
DoubleField DoubleType
DateField UTF8Type
ByteField BytesType
BinaryField BytesType
BoolField UTF8Type
UUIDField UUIDType
TrieDateField UTF8Type
TrieDoubleField UTF8Type
TrieField UTF8Type
TrieFloatField UTF8Type
TrieIntField UTF8Type
TrieLongField UTF8Type
All Others UTF8Type

Configuring Solr type mapping

By default, DataStax Enterprise 3.1 enables the latest type mapping whereas DataStax Enterprise 3.0.x enables legacy type mapping. In DataStax Enterprise 3.1, to use the legacy type mapping or to revert to the new Solr type mapping, configure dseTypeMappingVersion in the Solr config:

Set the value to 1 to enable the latest type mapping:

<dseTypeMappingVersion>1</dseTypeMappingVersion>

Set the value to 0 to enable the legacy type mapping:

<dseTypeMappingVersion>0</dseTypeMappingVersion>

Switching between the two versions after adding data is not supported. Attempting to load a solrconfig with a different dseTypeMappingVersion configuration and reloading the core causes an error.

Changing a Solr type mapping

Changing a Solr type mapping is rarely if ever done and is not recommended; however, for particular circumstances, DataStax Enterprise 3.1 includes the capability to convert new Solr type mappings, such as the Solr LongField to another type, such as TrieLongField. You configure the dseTypeMappingVersion using the force option.

The Cassandra internal validation classes of the types you are converting to and from must be compatible and the conversion must be to/from valid types. For example, converting a legacy Trie type to a new Trie type is invalid. The output of the CLI command, DESCRIBE keyspace_name, shows the validation classes assigned to columns.

For example, the org.apache.cassandra.db.marshal.LongType column validation class is mapped to solr.LongType. You can force this column to be of the TrieLongField type by using force="true" in the solrconfig.xml, and then running a core reload with re-indexing.

<dseTypeMappingVersion force="true">1</dseTypeMappingVersion>

Use this option only if you are an expert and have confirmed that the Cassandra internal validation classes of the types involved in the conversion are compatible.

Configuring the schema

A Solr schema defines the relationship between data in a table and a Solr core. The schema identifies the columns to index in Solr and maps column names to Solr types. This document describes the Solr schema at a high level. For details about all the options and Solr schema settings, see the Solr wiki.

Sample schema

The schema.xml for the example of using DSE Search/Solr represents a typical schema. It specifies a tokenizer that determines the parsing of the example text. The set of fields specifies the data that Solr indexes and stores. The id, body, name, and title fields are indexed.

<schema name="my_search_demo" version="1.1">
  <types>
    <fieldType name="string" class="solr.StrField"/>
    <fieldType name="text" class="solr.TextField">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
      </analyzer>
    </fieldType>
  </types>
  <fields>
    <field name="id"  type="string" indexed="true"  stored="true"/>
    <field name="body"  type="text" indexed="true"  stored="true"/>
    <field name="name"  type="text" indexed="true"  stored="true"/>
    <field name="title"  type="text" indexed="true"  stored="true"/>
  </fields>
  <defaultSearchField>body</defaultSearchField>
  <uniqueKey>id</uniqueKey>
</schema>

This schema.xml meets the requirement to have a unique key and no duplicate rows. The unique key maps to the Cassandra partition key and is necessary for DataStax Enterprise to route documents to cluster nodes. This unique key is like a primary key in SQL. The last element in the schema.xml example designates that the unique key is id. In a DSE Search/Solr schema, the value of the stored attribute of non-unique fields needs to be true; True causes the field to be stored in Cassandra. The indexed="true" gives the field the potential to show up in search results.

Changing a schema

Changing the Solr schema makes reloading the Solr core necessary. Re-indexing can be disruptive. Users can be affected by performance hits caused by re-indexing. Changing the schema is recommended only when absolutely necessary. Also, changing the schema during scheduled down time is recommended.

Configuring the Solr library path

Contrary to the examples shown in the solrconfig.xml indicating that relative paths are supported, DataStax Enterprise does not support the relative path values set for the <lib> property. DSE Search/Solr fails to find files placed in directories defined by the <lib> property. The workaround is to place custom code or Solr contrib modules in these directories:

  • Packaged installs: /usr/share/dse
  • Binary installs: <install_location>/resources/dse/lib