DataStax Enterprise 3.1 Documentation

Querying search results

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

DSE Search hooks into the Cassandra Command Line Interface (CLI), Cassandra Query Language (CQL) library, the cqlsh tool, existing Solr APIs, and Thrift APIs.

Using SolrJ and other Solr clients

Solr clients work with DSE 2.0 and later. If you have an existing Solr application, and you want to use DSE, it is straight-forward. Create a schema, then import your data and query using your existing Solr tools. The Wikipedia demo is built and queried using Solrj. The query is done using pure Ajax. No Cassandra API is used for the demo.

You can also use any Thrift API, such as Pycassa or Hector, to access DSE-Search. Pycassa supports Cassandra indexes. You can use indexes in Pycassa just as you use the solr_query expression in DSE Search.

DataStax has extended SolrJ to protect internal Solr communication and HTTP access using SSL. You can also use SolrJ to change the consistency level of a DSE-Search node.

Using the Solr HTTP API

You can use the Solr HTTP API to query data indexed in DSE Search/Solr just as you would search for data indexed in OSS. After creating a keyspace in Cassandra using CQL, for example, you can HTTP post the files to a pre-existing or non-existent table. DSE Search creates the table if it doesn't exist, and then starts indexing the data. You can use the HTTP API to query the database.

Solr HTTP API example

Assuming you performed the previous example, to find the titles in the mykeyspace.mysolr table that begin with the letters Succ in XML, use this URL:

http://localhost:8983/solr/mykeyspace.mysolr/
  select?q=%20title%3ASucc*&fl=title

The response is:

<response>
 <lst name="responseHeader">
   <int name="status">0</int>
   <int name="QTime">2</int>
   <lst name="params">
     <str name="fl">title</str>
     <str name="q">title:Succ*</str>
   </lst>
 </lst>
 <result name="response" numFound="2" start="0">
   <doc>
     <str name="title">Success</str>
   </doc>
   <doc>
     <str name="title">Success</str>
   </doc>
 </result>
</response>

Using CQL

You can use a solr_query expression in a SELECT statement to retrieve Solr data from Cassandra. In this release, CQL Solr queries are suitable for clusters having a single node, but not recommended for production-level queries which are better suited for the Solr HTTP API. Using the Solr HTTP API is faster than using CQL. Using the Solr HTTP API, the read request goes directly to Cassandra. Using CQL, the read request first goes to Solr. A document ID, an unordered bit set, is returned. Next, the request goes to Cassandra.

Synopsis

SELECT <select expression>
 FROM <table>
 [WHERE solr_query = '<search expression>' [LIMIT <n>]

<search expression> syntax is a Solr query string that conforms to the Lucene syntax and Solr query syntax. You enclose the Solr query string in single quotation marks. For example, after running the wikipedia demo you can use these Solr query strings:

Type of Query Example Description
Field search 'title:natio* AND Kenya' You can use multiple fields defined in the schema: 'title:natio* AND body:Carlos Aragonés'
Wildcard search 'Ken?a' Use ? or * for single or multi-character searches.
Fuzzy search 'Kenya~' Use with caution, many hits can occur.
Phrase search '"American football player"' Searches for the phrase enclosed in double quotation marks.
Proximity search '"football Bolivia"~10' Searches for football and Bolivia within 10 words of each other.
Range searches 'title:[football TO soccer}' Supports both inclusive and exclusive bounds using square brackets and curly braces, respectively.
Term boosting '"football"^4 "soccer"' By default, the boost factor is 1. Must be a positive number.
Boolean operator '+Macedonian football' AND, +, OR, NOT and - can be used.
Grouping '(football OR soccer) AND Carlos Aragonés' Use parentheses to group clauses.
Field grouping 'title:(+football +"Bolivia")' Use parentheses to group multiple clauses into one field.

A SELECT expression reads one or more records from a Cassandra table and returns a result-set of rows. Each row consists of a partition key and a collection of columns corresponding to the query. Unlike the projection in a SQL SELECT, there is no guarantee that the results will contain all of the columns specified. An error does not occur if you request non-existent columns.

CQL Example

To query the Wikipedia demo search results:

  1. Connect to the cqlsh. On the Mac, for example:

    cd <install_location>/bin
    
    ./cqlsh
    
  2. Use the wiki keyspace and include the solr_query expression in a CQL select statement to find the titles in the table named solr that begin with the letters natio:

    use wiki;
    
    SELECT title FROM solr
      WHERE solr_query='title:natio*';
    

    The output, sorted in lexical order, appears:

     title
    --------------------------------------------------------------------------
                                           Bolivia national football team 2002
      List of French born footballers who have played for other national teams
                                           Bolivia national football team 1999
                                           Bolivia national football team 2001
                                           Bolivia national football team 2000
                                      Israel men's national inline hockey team
                                         Kenya national under-20 football team
    

Delete by query

After you issue a delete by query, documents start getting deleted immediately and deletions continue until all documents are removed. For example:

Delete the mykeyspace.mysolr data that you inserted in the Using DSE Search/Solr example. On the command line:

curl http://localhost:8983/solr/mykeyspace.mysolr/update --data
  '<delete><query>*:*</query></delete>' -H
  'Content-type:text/xml; charset=utf-8'

You do not have to post a commit command in the update command as you do in OSS, and doing so is ineffective.

Limiting columns indexed and returned by a query

When using dynamic fields, the default column limit controls the maximum number of indexed columns overall, not just dynamic field columns. The column limit also controls the maximum number of columns returned during queries. This column limit prevents out of memory errors caused by using too many dynamic fields. If dynamic fields are not used, the column limit has no effect.

To change the default column limit, which is 1024, configure the dseColumnLimit element in the solrconfig.xml file. You can override the default configuration using the column.limit parameter in a query to specify a different value, for example 2048.

http://localhost:8983/solr/<keyspace>.<table>/select?q=
  title%3Amytitle*&fl=title&column.limit=2048

Querying multiple tables

To map multiple Cassandra tables to a single Solr core, use the Solr API. Specify multiple tables using the shards parameter. For example:

http://<host>:<port>/solr/<keyspace1>.<cf1>/select?q=*:*&shards=
  <host>:<port>/solr/<keyspace1>.<cf1>,<host>:<port>/solr/<keyspace2>.<cf2>

Using the Solr API, you can query multiple tables simultaneously if they have same schema.

About Solr shard selection

Previously, for each queried partition range, Cassandra selected the first closest node to the node issuing the query within that range. Equally distant nodes were always tried in the same order, so that resulted in one or more nodes being hotspotted and often selecting more shards than actually needed. In DataStax Enterprise 2.2 and later, an improved algorithm uses a shuffling technique to balance the load, and also attempts to minimize the number of shards queried as well as the amount of data transferred from non-local nodes.

Querying using spellcheck

By default, the solrconfig.xml does not include configuration for the Solr suggestor. After creating a request handler in the solrconfig.xml for /suggest, you can issue a query specifying the autocomplete/spellcheck behavior using the &shards.qt= parameter. For example, to test the suggestor:

http://localhost:8983/solr/mykeyspace.mytable/shards.qt=/suggest?q=testin