email iconemail phone iconcall

An Introduction to DSE Field Transformers

By Edward Ribeiro -  April 7, 2015 | 7 Comments

The seamless integration between Cassandra and Solr that is provided by DataStax Enterprise (DSE) allows Cassandra columns to be automatically indexed by Solr through its secondary index API, as the picture below (click to enlarge) from DataStax Enterprise Documentation:

DSE Search Architecture

Each insert or update of a Cassandra row triggers a new indexing on Solr, inserting or updating the document that corresponds to that Cassandra row. This work is done automatically on the behalf of the inserts, updates and deletions of rows (a deletion will automatically remove the corresponding Solr document). There is usually a direct mapping between Cassandra columns and Solr document fields.

There are, however, cases where we may want to map a single Cassandra column to multiple Solr document fields. This requires some pre-processing hooks to modify the Solr document before the proper indexing takes place on Solr side. To address this requirement, DSE provides two classes -- FieldInputTransformer and FieldOutputTransformer -- that can be extended to allow the developer to plugin code that will customize the Solr document just before indexing.

To illustrate its use, let's assume that we need to store a JSON object as a text field in the CQL3 table below:

id json
1 '{"city":"Austin","state":"TX", "country":"USA"}'
2 '{"city":"San Francisco","state":"CA", "country":"USA"}'
3 '{"city":"Seattle","state":"WA", "country":"USA"}'

Even though we store the JSON object as a text field into Cassandra, we would like to have the ability to index and query individual JSON fields on Solr. So, let's begin our walkthrough example by creating a keyspace, called solr_fit, and our example table, called cities:

clqlsh> CREATE KEYSPACE solr_fit WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'Solr': 1};

cqlsh> use solr_fit;

cqlsh> CREATE TABLE cities (id int, json text, primary key (id));

As we can see above, the whole JSON object is represented by a single Cassandra column.

1. Project Setup

To develop the FIT extension we'll need dse.jar to be on the classpath of our FIT plugin project during compilation. Either we can point to it from within your IDE (Eclipse, Netbeans, etc) or create a Maven based project. If the FIT project is based on Maven, as the example here, then the developer needs to install dse.jar as a local maven dependency, as it's not in available at DataStax (or any other) maven repo. This can be accomplished by the following command on terminal:

mvn install:install-file -Dfile=/path/to/dse.jar -DgroupId=com.datastax \
 -DartifactId=dse -Dversion=4.6.1 -Dpackaging=jar

This will install the jar as a Maven dependency under the developer's $HOME/.m2/repository. Then we can add the following dependency to our pom.xml:


The same needs to be done with DSE's lucene-solr jar located at $DSE_HOME/resources/solr/lib/solr-, so that we can add the dependency as below:

mvn install:install-file -Dfile=/path/to/solr- \
-DgroupId=org.apache.solr -DartifactId=solr-core -Dversion= -Dpackaging=jar

Further details about Maven based projects can be referenced at the official site: Once the project is compiled, we should copy the compiled jar of our project to $DSE/resources/solr/lib/ of each DSE node that is part of our Solr cluster. For development purposes, a single DSE node is enough.

2. Project files

On the Solr side, we need create a schema.xml and solrconfig.xml file. Let's start with the former. In this schema we specify the individual fields that compose the JSON object besides the fields defined in the Cassandra table. As we can see below, the individual JSON fields were specified as fields to be indexed by Solr even tough those fields (city, state, country) don't exist in the respective Cassandra's table nor are available as Solr copyFields.

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="blog" version="1.1">
    <fieldType name="int" class="solr.TrieIntField" multiValued="false"/>
    <fieldType name="string" class="solr.StrField"/>
    <field name="id" type="int" indexed="true" stored="true"/>
    <field name="json" type="string" indexed="true" stored="true"/>
    <field name="city" type="string" indexed="true" stored="false"/>
    <field name="state" type="string" indexed="true" stored="false"/>
    <field name="country" type="string" indexed="true" stored="false"/>


The corresponding solrconfig.xml should include the following top-level tags that specify the full qualified name of our custom classes in charge of parsing the Cassandra column.

<fieldInputTransformer name="dse" class="">

<fieldOutputTransformer name="dse" class="">

In this example, our json object is serialised and unserialised as a Java object called City, a simple POJO. Below is its class definition:


public class City
    private String city;
    private String state;
    private String country;

    public String getCity()
        return city;

    public void setCity(String city)
    { = city;

    public String getState()
        return state;

    public void setState(String state)
        this.state = state;

    public String getCountry()
        return country;

    public void setCountry(String country)
    { = country;


    public String toString()

        return "City{" +
                "city='" + city + "'" +
                ", state='" + state + "'" +
                ", country='" + country + "'" +

The FieldInputTransformer and FieldOutputTransformer classes must be extended to define a custom column-to-document field mapping like the one in our JSON example. FieldInputTransformer takes an inserted Cassandra column and modifies it prior to Solr indexing, while FieldOutputTransformer parses a Cassandra row just before returning the result of a Solr query.

3. FieldInputTransformer

The code below shows our custom FieldInputTransformer. We should override two methods: evaluate() and addFieldDocument(). The first one checks if the column should be parsed or not by looking at the column name. If it returns true then addFieldDocument is called to transform the Cassandra row into a Solr document to be indexed. We are using Jackson at line 42 to parse the Cassandra text field column into a City object. Lines 29 to 32 we retrieved Solr schema fields that we need to specify during indexing. Finally, at lines 34 to 35 we insert each field into the Solr document.


import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.lucene.document.Document;
import org.apache.solr.core.SolrCore;
import org.apache.solr.schema.IndexSchema;
import org.apache.solr.schema.SchemaField;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;


public class JsonFieldInputTransformer extends FieldInputTransformer
    private static final Logger LOGGER = LoggerFactory.getLogger(JsonFieldInputTransformer.class);

    public boolean evaluate(String field)
        return field.equals("json");

    public void addFieldToDocument(SolrCore core,
                                   IndexSchema schema,
                                   String key,
                                   Document doc,
                                   SchemaField fieldInfo,
                                   String fieldValue,
                                   float boost,
                                   DocumentHelper helper)
            throws IOException
            ObjectMapper mapper = new ObjectMapper();

  "JsonFieldInputTransformer called");
  "fieldValue: " + fieldValue);

            City city = mapper.readValue(fieldValue, City.class);
            SchemaField jsonCity = core.getLatestSchema().getFieldOrNull("city");
            SchemaField jsonState = core.getLatestSchema().getFieldOrNull("state");
            SchemaField jsonCountry = core.getLatestSchema().getFieldOrNull("country");

            helper.addFieldToDocument(core, core.getLatestSchema(), key, doc, jsonCity, city.getCity(), boost);
            helper.addFieldToDocument(core, core.getLatestSchema(), key, doc, jsonState, city.getState(), boost);
            helper.addFieldToDocument(core, core.getLatestSchema(), key, doc, jsonCountry, city.getCountry(), boost);
        catch (Exception ex)
            throw new RuntimeException(ex);

4. FieldOutputTransformer

The corresponding FieldOutputTransformer code should override one of the Field methods (where is string, int, float, etc). In our example, we override the stringField method only as this is the original field type of the "json" field in the CQL3 table. In this method we receive the original value of the row as raw text and parse it as we did in JsonFieldInputTransformer.


import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.lucene.index.FieldInfo;
import org.apache.lucene.index.StoredFieldVisitor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;


public class JsonFieldOutputTransformer extends FieldOutputTransformer

    private static final Logger LOGGER = LoggerFactory.getLogger(JsonFieldOutputTransformer.class);

    public void stringField(FieldInfo fieldInfo,
                            String value,
                            StoredFieldVisitor visitor,
                            DocumentHelper helper) throws IOException

        ObjectMapper mapper = new ObjectMapper();"name: " + + ", value: " + value);
            City city = mapper.readValue(value.getBytes(), City.class);

            FieldInfo json_city_fi = helper.getFieldInfo("city");
            FieldInfo json_state_fi = helper.getFieldInfo("state");
            FieldInfo json_country_fi = helper.getFieldInfo("country");

            if (city.getCity() != null)
                visitor.stringField(json_city_fi, city.getCity());
            if (city.getState() != null)
                visitor.stringField(json_state_fi, city.getState());
            if (city.getCountry() != null)
                visitor.stringField(json_country_fi, city.getCountry());
        catch (IOException e)
            LOGGER.error( + " " + e.getMessage());
            throw e;

Both these classe are compiled, packaged and copied into $DSE_HOME/resources/solr/lib/.

5. Schema Creation

To create a Solr core, we need to upload schema.xml and solrconfig.xml resources, as shown below:

$ curl http://localhost:8983/solr/resource/solr_fit.cities/solrconfig.xml --data-binary @solrconfig.xml -H 'Content-type:text/xml; charset=utf-8'

$ curl http://localhost:8983/solr/resource/solr_fit.cities/schema.xml --data-binary @schema.xml -H 'Content-type:text/xml; charset=utf-8'

$ curl -X POST "http://localhost:8983/solr/admin/cores?action=CREATE&name=solr_fit.cities"

After uploading the Solr resources and creating the core we can check the schema via Solr admin UI by accessing the following URL: http://localhost:8983/solr/#/solr_fit.cities/schema

6. Execution

Now, let's insert some example data into cities table:

cqlsh> insert into cities (id, json) values (1, '{"city":"Austin","state":"TX", "country":"USA"}');

cqlsh> insert into cities (id, json) values (2, '{"city":"San Francisco","state":"CA", "country":"USA"}');

cqlsh> insert into cities (id, json) values (3, '{"city":"Seattle","state":"WA", "country":"USA"}');

This feature allows the searching for individual fields (city, state, country) from both Solr HTTP interface as well as CQL.

As we can see below, a simple CQL3 query shows that the JSON document is stored in a single Cassandra column:

cqlsh:solr_fit> select * from cities;

id | json | solr_query
 1 | {"city":"Austin","state":"TX", "country":"USA"} | null
 2 | {"city":"San Francisco","state":"CA", "country":"USA"} | null
 3 | {"city":"Seattle","state":"WA", "country":"USA"} | null

(3 rows)

Things start to get interesting when we use the solr_query facility to retrieve the rows matching a single field of the JSON document as the example below. Let's start by querying the cities table using a solr_query expression. The first query retrieves all the rows from the cities table whose country is USA:

cqlsh:solr_fit> select * from cities where solr_query = '{"q":"country: USA"}';

id | json | solr_query
 1 | {"city":"Austin","state":"TX", "country":"USA"} | null
 2 | {"city":"San Francisco","state":"CA", "country":"USA"} | null
 3 | {"city":"Seattle","state":"WA", "country":"USA"} | null

(3 rows)

The second query retrieves cities whose name start with Seat:

cqlsh:solr_fit> select * from cities where solr_query = '{"q":"city: Seat*"}';

id | json | solr_query
 2 | {"city":"Seattle","state":"WA", "country":"USA"} | null

(1 rows)


Finally, let's query the cities in the California state:

cqlsh:solr_fit> select * from cities where solr_query = '{"q":"state: CA"}';

id | json | solr_query
2 | {"city":"San Francisco","state":"CA", "country":"USA"} | null

(1 rows)

We can see the same queries can be executed by the Solr admin UI. Below we see the retrieval of all the sample rows:

alt text 3

This example shows the recovery of a single field as above:

cap 2

7. Debugging

For debugging and development purposes, it's recommended to rely on logging facilities of a single DSE node. Log messages should appear in /var/log/cassandra/system.log.

8. Conclusion

This blog introduced a flexible way of pre-processing Cassandra rows that are indexed as Solr documents by DSE. The approach shown in this post can be used to parse binary files too, like the example bundled with DataStax Enterprise.

DataStax has many ways for you to advance in your career and knowledge.

You can take free classes, get certified, or read one of our many white papers.

register for classes

get certified

DBA's Guide to NoSQL


  1. wentao says:

    excellent blog, very clear and right to the point, exactly what i was looking for in order to index nested fields in json object, under DSE cassandra with solr.

    couple of questions/comments pls.

    1. in addition to the jar file containing the custom transformer classes, i also had to copy jackson-databind, jackson-annotations and jackson-core jar files. otherwise, i get runtime exception in DSE complaining class-definition-not-found.

    2. after copying the jars to $DSE_HOME/resources/solr/lib/, i had to restart the node (single-node dev environment) in order for the classes to be picked up by DSE. is there a hot-deploy option that doesn’t require server restart when new jars are added?


    1. Caleb Rackliffe says:

      @wentao There is currently no hot-deploy option if you need to add a new JAR to …/resources/solr/lib/.

  2. gomes says:

    nice one, it works, i yet to work with nested object,like to get some sample if any one has, also i need to add content type curl -X POST “http://localhost:8983/solr/admin/cores?action=CREATE&name=solr_fit.cities” while creating core.good work.

  3. Neil Buesing says:

    It would be nice if the Cql Validation that runs on startup woudn’t provide warnings for these fields… – No Cassandra column found for field: XXXX
    Unless it’s a non stored copy field, Cassandra columns must have the same case or be quoted in order to be correctly mapped.

  4. Yue says:

    Where to download jar files with custom transformer classes? Mine is DSE 5.0.2.

  5. Krishna Govindan says:

    Followed the instructions , the JSON elements does appear in Solr.But unable to query in Solr as well as Cassandra ..

    is there additional configuration needs to be done ? I’m using DSE 5.0

  6. Joatam Guevara says:

    Where can I find dse.jar? I am using DSE 5.1


Your email address will not be published. Required fields are marked *

Subscribe for newsletter: