Technology•January 23, 2018

Moving From Cassandra Tables to Search With DataStax: Part 1

David Jones-GilardiDeveloper Relations Engineer

Hi there and welcome to part 1 of a three part series on upgrading our KillrVideo java reference application from Cassandra based tabular searches to using DSE Search.

Here in part 1, we’ll take a look at the “before” picture and how we were previously performing searches. I’ll give some examples of the types of searches and how those were implemented with Cassandra tables. We’ll also talk a little about the “why” of moving to DSE Search.

In part 2, I’ll explain the transition to DSE Search and what considerations I had to take into account along with a before and after code comparison.

Finally, in part 3, we’ll take a look at our results along with some of the more advanced types of searches we can now perform.

Ok, let’s do this

First things first…assumptions!

If it isn’t obvious, we are using the Java based KillrVideo application for reference. If you aren’t familiar with KillrVideo go take a look here to get up to speed. In short, this is a real, open source, micro-service style application that we build and maintain to present examples and help folks understand the DataStax tech stack. It’s also a nice way that I personally get some code time against the stack in a real application as compared to punching out demo apps.

We are using DataStax Enterprise from drivers to cluster. All of the capabilities we’re talking about here are assumed to be within that ecosystem.

Do we really need to use DSE Search?

No. Maybe. Yes? ¯\_(ツ)_/¯

Ok, it depends, but for the most basic searches it isn’t a requirement. As a matter of fact search was already implemented in the Java version without using DSE Search. So, why the change? Mostly it comes down to requirements and the right tool for the job. So, before I get into all of this why don’t we take a look at the various types of searches that exist in KillrVideo.

In KillrVideo, you can get details and play any video from the video detail page. A quick look at the whole page and you can see all of the available searches. At the top left is the “typeahead” search bar, over to the right are the tag search buttons, and at the very bottom is the “more videos like this” search.

KillrVideo

“Typeahead” search

So, nothing new here really. These types of searches have been around for quite some time. Start typing letters in the search bar and you are provided with a list of potential matches

Typeahead search

Tag search

This next one is pretty straightforward as well. If you click on any of the tag buttons on the video detail page it will perform a search for other videos with the same tag.

Tag Search

“More videos like this” search

At the bottom of the video detail page is a section labeled “More Videos Like This”. This search will happen automatically when you navigate to any video and present a set of videos that are similar to the video you are currently viewing.

Let’s take a look at the implementation

Remember I mentioned before moving to using DSE Search these were all powered with Cassandra tables. Let’s break some of this down and take a look at some details. Also, if you are interested check out this pull request up on github. You can use this as a reference of the before and after changes if you so choose.

Overview

So overall the setup is pretty simple. We have 3 searches that are supported by a combination of 3 Cassandra tables (the number of searches and tables just happen to match, there is no correlation between them), 3 table entities that map to our Cassandra tables, and 3 mapping objects derived from our table entities. Here is a simple visual representation.

overview

Now, I’m really pointing out these particular items because they will come into play later once we move to using DSE Search, namely, we will need to remove most of them. We’ll leave that there for now and come back to it later.

Of the 3 tables I just mentioned 2 of them exist only to support Cassandra based searches in KillrVideo, tags_by_letter and videos_by_tag. They follow the denormalized data model pattern we’ve come to love in Cassandra and were created solely to support this purpose. The videos table stores all videos inserted into KillrVideo. It is not specialized to Cassandra based searches and will come into play a little later.

I just mentioned that both tags_by_letter and videos_by_tag were specially created to support searches within KillrVideo. Let’s take a deeper look at both tables and see what’s going on. If you aren’t familiar with primary keys in Apache Cassandra® I highly suggest you take 5 minutes to read Patrick McFadin’s post on their importance. This will better explain how they are applied below.

Here is the CQL schema for the tags_by_letter table:

CREATE TABLE IF NOT EXISTS tags_by_letter ( first_letter text, tag text, PRIMARY KEY (first_letter, tag) );

The first_letter column is the partition key with tag as a clustering column. The partition key determines where data is located in your cluster while clustering columns handle how data is ordered within the partition. This is especially useful in cases like a “typeahead” search where searches typically start with the first letter of a given search term and usually provide an alphabetical list of results. This, in a sense, pre-optimizes query results and prevents us from having to sort our data, whether in query execution or code.

Just to absolutely belabor this point (because who doesn’t like belaboring something) here is an example of this in action. Notice how the results are sorted automatically per the tag column with no sorting or extra commands needed in the query.

Tags By Letter Example

Moving on, here is the CQL schema for the videos_by_tag table:

CREATE TABLE IF NOT EXISTS videos_by_tag ( tag text, videoid uuid, added_date timestamp, userid uuid, name text, preview_image_location text, tagged_date timestamp, PRIMARY KEY (tag, videoid) );

Again, this table was specially created to answer the question of what videos have a specified tag. It uses tag as the partition key which allows for fast retrieval of videos that match a tag when querying. The other fields you see listed are there to provide information required by our web tier UI.

VideoAddedHandlers

Another portion we need to keep an eye on is the VideoAddedHandlers class. As the name implies this class is responsible for performing some action(s) every time a video is added to KillrVideo. If you take a look at the two prepared statements within the init() method you should notice they are inserting data into the 2 search tables we mentioned above tags_by_letter and videos_by_tag.

videosByTagPrepared = dseSession.prepare( "INSERT INTO " + Schema.KEYSPACE + "." + videosByTagTableName + " " + "(tag, videoid, added_date, userid, name, preview_image_location, tagged_date) " + "VALUES (?, ?, ?, ?, ?, ?, ?)"

);

tagsByLetterPrepared = dseSession.prepare( "INSERT INTO " + Schema.KEYSPACE + "." + tagsByLetterTableName + " " + " (first_letter, tag) VALUES (?, ?)" );

Every time a video is added these queries are fired off to power our searches. Notice some of the column names, things like “tag” and “first_letter”. Again, we will dig into the detailed logic here soon.

Detailed logic here soon

Alrighty, so, we’ve gone over a high level overview of the various items and objects we are using to support our Cassandra based searches. Now, let’s get into the searches themselves and see what they are doing.