DataStax Enterprise 3.0 Documentation

Querying search results

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

DSE Search hooks into the Cassandra Command Line Interface (CLI), Cassandra Query Language (CQL) library, the cqlsh tool, existing Solr APIs, and Thrift APIs.

Using SolrJ and other Solr clients

Solr clients work with DSE 2.0 and later. If you have an existing Solr application, and you want to use DSE, it is straight-forward. Create a schema, then import your data and query using your existing Solr tools. The Wikipedia demo is built and queried using Solrj. The query is done using pure Ajax. No Cassandra API is used for the demo.

You can also use any Thrift API, such as Pycassa or Hector, to access DSE-Search. Pycassa supports secondary indexes. You can use secondary indexes in Pycassa just as you use the solr_query expression in DSE Search.

DataStax has extended SolrJ to protect internal Solr communication and HTTP access using SSL. You can also use SolrJ to change the consistency level of a DSE-Search node.

Using the Solr HTTP API

There is no difference between using the Solr HTTP API in OSS and in DSE Search.

Solr HTTP API example

To find the titles in the solr column family that begin with the letters natio in a format available through the Solr HTTP API, such as JSON, use this URL:

http://localhost:8983/solr/wiki.solr/select?q=
  title%3Anation*&fl=title&wt=json&indent=on&omitHeader=on

The response, sorted by relevance, is returned in JSON format:

{
  "response":{"numFound":7,"start":0,"docs":[
    {
  "title":"Bolivia national football team 1999"},
    {
  "title":"Bolivia national football team 2000"},
    {
  "title":"Kenya national under-20 football team"},
    {
  "title":"Israel men's national inline hockey team"},
    {
  "title":"Bolivia national football team 2001"},
    {
  "title":"Bolivia national football team 2002"},
    {
  "title":"List of French born footballers who have played for other national teams"}]
}}

Querying multiple column families

To map multiple Cassandra column families to a single Solr core, use the Solr API. Specify multiple column families using the shards parameter. For example:

http://<host>:<port>/solr/<keyspace1>.<cf1>/select?q=*:*&shards=
  <host>:<port>/solr/<keyspace1>.<cf1>,<host>:<port>/solr/<keyspace2>.<cf2>

Using the Solr API, you can query multiple column families simultaneously if they have same schema.

Delete by query

In DSE Search, the delete by query triggers a different process from OSS that includes a commit. After you issue a delete by query, documents start getting deleted immediately and deletions continue until all documents are removed. For example:

  1. Delete the wikipedia data in Cassandra database and the Solr index. On the command line:

    curl http://localhost:8983/solr/wiki.solr/update --data
      '<delete><query>*:*</query></delete>' -H
      'Content-type:text/xml; charset=utf-8'
    

    You do not have to post a commit command after posting the delete command as you do in OSS, and doing so is ineffective.

  2. Check that the data has been deleted by searching for the titles in the solr column family that begin with the letters natio. After running the wikipedia demo, you know that this query returns some results. Enter this URL in the browser:

    http://localhost:8983/solr/wiki.solr/select?q=%20title%3Anation*
      &fl=title&wt=json&indent=on&omitHeader=on
    

    Output:

    The output shows no documents found.

    {
     "response":{"numFound":0,"start":0,"docs":[]
    }}
    

Using CQL

You can use a solr_query expression in a SELECT statement to retrieve Solr data from Cassandra. CQL Solr queries are suitable for simple, brief, and occasional testing and simple administrative tasks, but not recommended for production-level queries which are better suited for the Solr HTTP API. Using Solr HTTP API is faster than using CQL. Using the Solr HTTP API, the read request goes directly to Cassandra. Using CQL, the read request first goes to Solr. A document ID, an unordered bit set, is returned. Next, the request goes to Cassandra.

Synopsis

SELECT <select expression>
 FROM <column family>
 [USING CONSISTENCY <level>]
 [WHERE solr_query = '<search expression>' [LIMIT <n>]

<search expression> syntax is a Solr query string that conforms to the Lucene syntax and Solr query syntax. You enclose the Solr query string in single quotation marks. For example:

Type of Query Example Description
Field search 'title:natio* AND Kenya' You can use multiple fields defined in the schema: 'title:natio* AND body:Carlos Aragonés'
Wildcard search 'Ken?a' Use ? or * for single or multi-character searches.
Fuzzy search 'Kenya~' Use with caution, many hits can occur.
Phrase search '"American football player"' Searches for the phrase enclosed in double quotation marks.
Proximity search '"football Bolivia"~10' Searches for football and Bolivia within 10 words of each other.
Range searches 'title:[football TO soccer}' Supports both inclusive and exclusive bounds using square brackets and curly braces, respectively.
Term boosting '"football"^4 "soccer"' By default, the boost factor is 1. Must be a positive number.
Boolean operator '+Macedonian football' AND, +, OR, NOT and - can be used.
Grouping '(football OR soccer) AND Carlos Aragonés' Use parentheses to group clauses.
Field grouping 'title:(+football +"Bolivia")' Use parentheses to group multiple clauses into one field.

A SELECT expression reads one or more records from a Cassandra column family and returns a result-set of rows. Each row consists of a row key and a collection of columns corresponding to the query. Unlike the projection in a SQL SELECT, there is no guarantee that the results will contain all of the columns specified because Cassandra is schema-optional. An error does not occur if you request non-existent columns.

CQL Example

To query the Wikipedia demo search results:

  1. Connect to the cqlsh in CQL 2 or 3 mode. On the Mac, for example:

    cd <install_location>/bin
    
    ./cqlsh -3
    
  2. Use the wiki keyspace and include the solr_query expression in a CQL select statement to find the titles in the solr column family that begin with the letters natio:

    use wiki;
    
    SELECT title FROM solr
      WHERE solr_query='title:natio*';
    

    The output, sorted in lexical order, appears:

     title
    --------------------------------------------------------------------------
                                           Bolivia national football team 2002
      List of French born footballers who have played for other national teams
                                           Bolivia national football team 1999
                                           Bolivia national football team 2001
                                           Bolivia national football team 2000
                                      Israel men's national inline hockey team
                                         Kenya national under-20 football team
    

About Solr shard selection

Previously, for each queried token range, Cassandra selected the first closest node to the node issuing the query within that range. Equally distant nodes were always tried in the same order, so that resulted in one or more nodes being hotspotted and often selecting more shards than actually needed. In DataStax Enterprise 2.2 and later, an improved algorithm uses a shuffling technique to balance the load, and also attempts to minimize the number of shards queried as well as the amount of data transferred from non-local nodes.