DataStax Enterprise 2.0 Documentation

Creating a Schema

A Solr schema defines the relationship between data in a column family and a Solr core. The schema identifies the columns to index in Solr and maps column names to Solr types. This document describes the Solr schema at a high level. For details about all the options and Solr schema settings, see the Solr wiki.

Wikipedia Sample Schema Elements

The sample schema.xml for the Wikipedia demo represents a typical schema. It specifies a tokenizer that determines the parsing of the wiki text. The set of fields specifies what Solr indexes and stores. In this example, these name, body, title, and date fields are indexed.

<schema name="wikipedia" version="1.1">
 <types>
  <fieldType name="string" class="solr.StrField"/>
  <fieldType name="text" class="solr.TextField">
    <analyzer><tokenizer class="solr.WikipediaTokenizerFactory"/></analyzer>
  </fieldType>
 </types>
 <fields>
    <field name="id"  type="string" indexed="true"  stored="true"/>
    <field name="body"  type="text" indexed="true"  stored="true"/>
    <field name="date"  type="string" indexed="true"  stored="true"/>
    <field name="name"  type="text" indexed="true"  stored="true"/>
    <field name="title"  type="text" indexed="true"  stored="true"/>
 </fields>
 <defaultSearchField>body</defaultSearchField>
 <uniqueKey>id</uniqueKey>

The example schema.xml meets the requirement to have a unique key and no duplicate rows. The unique key maps to the row key and is necessary for DSE to route documents to cluster nodes. This unique key is like a primary key in SQL. The last element in the schema.xml example designates that the unique key is id.

Checking a Schema

After creating a schema and indexing documents, you can check that the Solr index is working by using the Solr Admin tool in this location:

http://hostname/solr/{keyspace}.{columnfamily}/admin/

If the tool appears, the index is working. The tool looks something like this:

../../_images/wikipedia2.png

Wikipedia Sample Column Family Metadata

After indexing the Wikipedia articles, Cassandra columns in the column family contain metadata corresponding to the fields listed in the demo schema. The output of the CLI command, DESCRIBE wiki, shows this metadata:

 Column Name: body
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: wiki_solr_body_index
   Index Type: CUSTOM
   Index Options: {class_name=com.datastax.bdp.cassandra.index.solr.SolrSecondaryIndex}
 Column Name: date
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: wiki_solr_date_index
   Index Type: CUSTOM
   Index Options: {class_name=com.datastax.bdp.cassandra.index.solr.SolrSecondaryIndex}
 Column Name: name
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: wiki_solr_name_index
   Index Type: CUSTOM
   Index Options: {class_name=com.datastax.bdp.cassandra.index.solr.SolrSecondaryIndex}
 Column Name: solr_query
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: wiki_solr_solr_query_index
   Index Type: CUSTOM
   Index Options: {class_name=com.datastax.bdp.cassandra.index.solr.SolrSecondaryIndex}
 Column Name: title
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: wiki_solr_title_index
   Index Type: CUSTOM
   Index Options: {class_name=com.datastax.bdp.cassandra.index.solr.SolrSecondaryIndex}
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy

Column metadata matches each field in the schema except the id field because id is the unique key.

The column metadata example shows some of the Cassandra Validator types in the Validation Class attribute. The Solr types map to Cassandra validator types as shown in this table:

Solr Type Cassandra Validator
TextField UTF8Type
StrField UTF8Type
LongField LongType
IntField Int32Type
FloatField FloatType
DoubleField DoubleType
DateField UTF8Type
ByteField BytesType
BinaryField BytesType
BoolField UTF8Type
UUIDField UUIDType
All Others UTF8Type

Using Dynamic Fields instead of Composite Columns

You can use Solr dynamic fields for pattern matching on a wildcard instead of using composite columns, which are not supported. The number of dynamic fields allowed for a particular row is 1024. Adding the following element to the schema will index anything with the column name that ends with -tag.

<dynamicField name="*-tag" type="string" indexed="true"/>

When you use the dynamicField element, DSE Search adds a special solr field, _dynFld, to the index, so you can search for rows that have columns X, Y and Z.

To learn more about the Solr schema, see the well-documented sample Solr schema file.