CQL3 Support for DataStax Enterprise Search

By Hamilton Tran -  March 19, 2014 | 0 Comments

Introduction:
Back in DSE 3.2 we quietly introduced support for CQL3 and DataStax Enterprise Search, but did little to showcase the new way of using CQL3 sets, lists, maps, and compound primary keys with Solr. This post will demonstrate and explain how to create a CQL3 backed solr core, some of the differences from the old cql2 backed cores, and a couple nifty things you can do with Solr. We'll be taking a page from the CQL3 Data Modeling Example and adding a new column family where a user can rate songs, add tags, and then search their rated songs.

A Brief History of CQL2 and Solr

In versions prior to DSE 3.2 these things were the norm:

  • 2 ways of creating a solr core,
    • Over a preexisting keyspace / table
    • Uploading the Solr resources and we would automatically create the keyspace/table based on the schema.xml upon core creation
  • Solr multiValue fields mapped to a text column that used a json representation to store multiple values
    • example: 'solrjson:["Human","Mouse","Rat","Cow"]'
  • All Solr dynamicFields mapped to a __dynFld column

The Glory of CQL3 and Solr in DSE 3.2+

  • 1 way of creating a solr core
    • Define keyspace / table, upload Solr resources, create core
  • Solr multiValue fields map to CQL3 Lists and Sets
  • Solr dynamicFields map to CQL3 maps
  • compound primary key support

The Setup

  • DataStax Enterprise 4.0.1
  • 1 node on Ubuntu 11.10 with Java 1.7.0_51

Creating the Core
In versions of DSE 3.2+ you have to define your keyspace and table first in order to get a CQL3 backed Solr core, otherwise you'll end up with a CQL2 backed Solr core.

In cqlsh:

CREATE KEYSPACE music_service WITH replication = {
 'class': 'NetworkTopologyStrategy',
 'Solr': '1'
};
use music_service;
create table user_song_rating ( 
 user_id uuid, 
 song_id uuid, 
 song_name text, 
 tags set<text>, 
 artist_name text, 
 album_name text, 
 rating int, 
 "meta_" map<text,text>, 
 searchall list<text>, 
PRIMARY KEY(user_id, song_id) );

Here we use a few CQL3 features, collections (set and list) on columns "tags" and "searchall", the map type for the column "meta_" , and the usage of a compound primary key. The data type mapping between Solr and Cassandra for our purposes boils down to this:

  • Cassandra Int datatype -> Solr TrieIntField datatype
  • Cassandra UUID datatype -> Solr UuidField datatype

For more check out our documentation on Solr Type Mappings

Now we need to prepare some additional documents, namely our solrconfig.xml and schema.xml. The solrconfig.xml is easy, we will use the one provided to us from <dse install>/demos/wikipedia/solrconfig.xml. The schema.xml however we have to make ourselves and make sure that it matches our table definition.

<?xml version="1.0" ?>
<schema name="music_service" version="1.1">
<types>
 <fieldType class="solr.TextField" name="text">
   <analyzer>
     <tokenizer class="solr.StandardTokenizerFactory"/>
   </analyzer>
 </fieldType>
 <fieldType class="solr.TextField" name="textcollection" multiValued="true">
   <analyzer>
     <tokenizer class="solr.StandardTokenizerFactory"/>
   </analyzer>
 </fieldType>
 <fieldType class="solr.TrieIntField" name="int" />
 <fieldType class="solr.UUIDField" name="uuid" />
 </types>
<fields>
<field indexed="True" name="user_id" stored="True" type="uuid"/>
 <field indexed="True" name="song_id" stored="True" type="uuid"/>
 <field indexed="True" name="song_name" stored="True" type="text"/>
 <!-- user defined tags they add to the song -->
 <field indexed="True" name="tags" stored="True" type="textcollection"/>
 <field indexed="True" name="artist_name" stored="True" type="text"/>
 <field indexed="True" name="album_name" stored="True" type="text"/>
 <field indexed="True" name="rating" stored="True" type="int"/>
 <field indexed="True" name="searchall" stored="True" type="textcollection"/>
 <dynamicField indexed="True" name="meta_*" stored="True" type="text"/>
</fields>
 <copyField source="song_name" dest="searchall"/>
 <copyField source="tags" dest="searchall"/>
 <copyField source="artist_name" dest="searchall"/>
 <copyField source="album_name" dest="searchall"/>
<defaultSearchField>searchall</defaultSearchField>
<uniqueKey>(user_id, song_id)</uniqueKey>
</schema>

You will probably notice that in our schema.xml we mapped the "tags" and "searchall" fields to use the defined type "textcollection" which has the attribute "multiValued=True" and that the uniqueKey matches the order of our compound key we defined in cqlsh. But what is this dynamicField tag? A dynamicField in Solr is a way to allow users to add data through Solr that isn't defined in the schema.xml, in our usage we'll be able to store and index data that matches fields that begin with "meta_".  There are few other things here I'd like to point out, namely the copyField entries and defaultSearchField. Here what the schema is telling Solr to do is when we add data, also copy that data into another field, in this case "searchall", which is also our default search field. By doing this, a user can do a quick search and be able to get back results that may match data contained in "song_name", "tags", "artist_name" or "album_name".

Now we upload the resources and create the core:

curl http://localhost:8983/solr/resource/music_service.user_song_rating/schema.xml --data-binary @schema.xml -H 'Content-type:text/xml; charset=utf-8'
curl http://localhost:8983/solr/resource/music_service.user_song_rating/solrconfig.xml --data-binary @solrconfig.xml -H 'Content-type:text/xml; charset=utf-8'
curl "http://localhost:8983/solr/admin/cores?action=CREATE&name=music_service.user_song_rating"

Adding and Querying the Data

The core is up, now to add data

curl "http://localhost:8983/solr/music_service.user_song_rating/update/?commit=true" --data-binary @data.xml -H 'Content-type:text/xml; charset=utf-8'

The data.xml has entries that looks like this :

<add>
  <doc>
     <field name="user_id">7612f8ca-06a7-4c0a-9c8c-6e51142b0b4a</field>
     <field name="song_id">aea148c9-9b7e-47fd-aac6-37887b895792</field>
     <field name="song_name">Never Gonna Give You Up</field>
     <field name="tags">Pop</field>
     <field name="tags">Dance Pop</field>
     <field name="tags">80's</field>
     <field name="artist_name">Rick Astley</field>
     <field name="album_name">Whenever You Need Somebody</field>
     <field name="rating">2</field>
     <field name="meta_always_repeat">1</field>
     <field name="meta_special_msg">You've been RickRolled</field>
  </doc>
  <doc>
  ...
  </doc>
</add>

Now that the data has been added, we can query cqlsh and see what the data looks like.
Here is the result of those copyField directives in our schema.xml in cqlsh:

cqlsh:music_service> select searchall from user_song_rating where user_id=7612f8ca-06a7-4c0a-9c8c-6e51142b0b4a and song_id=aea148c9-9b7e-47fd-aac6-37887b895792;
searchall
-------------------------------------------------------------------------------------------------------
 ['Never Gonna Give You Up', 'Pop', 'Dance Pop', '80''s', 'Rick Astley', 'Whenever You Need Somebody']
(1 rows)

When I do a query I get back results that match values in the "searchall" field that may have been missed if we were do searches against specific Solr fields.

curl "http://localhost:8983/solr/music_service.user_song_rating/select/?q=*body&indent=true&wt=json"
{
 "_uniqueKey":"[\"7612f8ca-06a7-4c0a-9c8c-6e51142b0b4a\",\"aea148c9-9b7e-47fd-aac6-37887b895792\"]",
 "tags":["80's",
 "Dance Pop",
 "Pop"],
 "song_id":"aea148c9-9b7e-47fd-aac6-37887b895792",
 "artist_name":"Rick Astley",
 "song_name":"Never Gonna Give You Up",
 "user_id":"7612f8ca-06a7-4c0a-9c8c-6e51142b0b4a",
 "rating":2,
 "album_name":"Whenever You Need Somebody",
 "searchall":["Never Gonna Give You Up",
 "Pop",
 "Dance Pop",
 "80's",
 "Rick Astley",
 "Whenever You Need Somebody"],
 "meta_always_repeat":"1",
 "meta_special_msg":"You've been RickRolled"},
 {
 "_uniqueKey":"[\"7612f8ca-06a7-4c0a-9c8c-6e51142b0b4a\",\"f039cac9-7cbd-417e-82af-a00307a46d7c\"]",
 "tags":["80's",
 "Dance Pop",
 "Pop"],
 "song_id":"f039cac9-7cbd-417e-82af-a00307a46d7c",
 "artist_name":"Rick Astley",
 "song_name":"Together Forever",
 "user_id":"7612f8ca-06a7-4c0a-9c8c-6e51142b0b4a",
 "rating":2,
 "album_name":"Whenever You Need Somebody",
 "searchall":["Together Forever",
 "Pop",
 "Dance Pop",
 "80's",
 "Rick Astley",
 "Whenever You Need Somebody"]},
 {
 "_uniqueKey":"[\"7612f8ca-06a7-4c0a-9c8c-6e51142b0b4a\",\"d18aad7c-24c1-45b1-b799-d4fe27100048\"]",
 "tags":["New Wave",
 "Pop Rock"],
 "song_id":"d18aad7c-24c1-45b1-b799-d4fe27100048",
 "artist_name":"Tears for Fears",
 "song_name":"Everybody Wants to Rule the World",
 "user_id":"7612f8ca-06a7-4c0a-9c8c-6e51142b0b4a",
 "rating":4,
 "album_name":"Songs from the Big Chair",
 "searchall":["Everybody Wants to Rule the World",
 "New Wave",
 "Pop Rock",
 "Tears for Fears",
 "Songs from the Big Chair"]}
cqlsh:music_service> select "meta_" from user_song_rating where user_id=7612f8ca-06a7-4c0a-9c8c-6e51142b0b4a 
and song_id=aea148c9-9b7e-47fd-aac6-37887b895792;

Let's take a look back at our "meta_" column which was mapped to a solr dynamicField:

cqlsh:music_service> select "meta_" from user_song_rating where user_id=7612f8ca-06a7-4c0a-9c8c-6e51142b0b4a 
and song_id=aea148c9-9b7e-47fd-aac6-37887b895792;
meta_
----------------------------------------------------------------------------
 {'meta_always_repeat': '1', 'meta_special_msg': 'You''ve been RickRolled'}
(1 rows)

Hmm, looks like my program has some sort of easter egg associated with this song, but the important thing to note is that the "meta_" column handled the extra "meta_always_repeat" and "meta_special_msg" fields in the data we uploaded.

In conclusion:

I hope with this little example, it's clear how much more intuitive the integration is between Solr and Cassandra in DataStax Enterprise than in prior versions as well as a little on taking advantage of the power that DataStax Search can provide with your data.

For more information:

DataStax Search Documentation

Download DSE here









DataStax has many ways for you to advance in your career and knowledge.

You can take free classes, get certified, or read one of our many white papers.



register for classes

get certified

DBA's Guide to NoSQL







Comments

Your email address will not be published. Required fields are marked *




Subscribe for newsletter:

Tel. +1 (650) 389-6000 sales@datastax.com Offices France GermanyJapan

DataStax Enterprise is powered by the best distribution of Apache Cassandra™.

© 2017 DataStax, All Rights Reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache Cassandra, Apache, Tomcat, Lucene, Solr, Hadoop, Spark, TinkerPop, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.