Data consistency in DSE Search

By Piotr Kołaczkowski -  September 23, 2012 | 6 Comments

DSE Search provides full text search feature for Cassandra. In principle, columns stored in the database can be indexed by Solr, which is plugged into Cassandra using the secondary index API. Each Solr index indexes only the data stored on a single node. Let’s take a closer look at what happens when you insert new data or query your database.

Write Path

Solr-indexed data, just as any other kind of data, can be saved to Cassandra using any of the supported Cassandra clients, e.g. CQL. You open a connection to any node in the cluster, and issue an INSERT or UPDATE statement. We refer to this node as the coordinator. The coordinator selects the replicas to contact basing on the row key, partitioner and replication strategy. Then it sends your new data there and waits for some or all of them to confirm saving the data in their commit log, memtable and solr index. If you configured your replication factor to be 3, three nodes would get the data. Every node that got the new data confirms saving the data to the coordinator. The coordinator waits until enough nodes confirm, depending on the configured write consistency level, and returns control to the user.

What happens if one of those nodes is down? This depends on the consistency level you specified in the CQL statement. If the consistency level was ONE, then only one replica has to confirm getting the data, so your INSERT would run just fine. You don’t need to worry about the failed node – it will get the data once it is up, and if you enabled hinted handoff feature, the data will be additionally stored on the coordinator node in the meantime. If you set higher consistency level, the coordinator would have to wait for more nodes to respond. In case of consistency level ALL with one node or more nodes down, your insert would fail.

DSE Search allows also to store / update data using Solr HTTP interface. The write path is the same as if you queried using CQL. You can set the desired write consistency level with appending “cl” parameter to the Solr POST request:

http://<host>:<port>/solr/<keyspace>.<column family>/update?cl=ONE

Read path

You can query Solr-indexed columns in two ways: by CQL or by Solr HTTP interface. Contrary to the write path, this time the read path depends on the interface used.


To issue a Solr query using CQL, you have to append a solr_query predicate to the WHERE clause:

SELECT * FROM <column family> WHERE solr_query = '<solr query>';

The query is processed by the Cassandra storage engine in the following way:

  1. The coordinator selects a set of Solr shards that covers the entire token ring.
  2. The query is executed on each of the selected nodes using the Solr index to return the row keys matching the ‘solr_query’ predicate.
  3. The resulting row keys are sent back to the coordinator and merged.
  4. The coordinator fetches the documents corresponding to the merged row keys at consistency level LOCAL_ONE.
  5. The coordinator returns the resulting rows to the client.

Cassandra version up to 1.1 uses a simple greedy algorithm for selecting the nodes answering queries involving secondary indexes. It does prefer selecting nodes that are close to the coordinator according to the configured snitch, but it doesn’t try very hard to minimize the number of queried nodes, nor to balance the load.


To query using Solr API, send a request in the following format:

http://<host>:<port>/solr/<keyspace>.<column family>/select?q=<Solr query>

You can specify consistency level here, just as in the update request, but… it will be ignored. Searches by Solr API rely on Solr sharding with a little help of thin DSE integration layer to select shards. The only supported consistency level is ONE. The query is processed as follows:

  1. A proper list of Solr shards is selected by a dedicated DSE component. This component uses heuristics to query the closest shards and also tries to minimize the total number of shards queried. The algorithms used here are constantly being improved. Currently the selection depends only on the node initially queried, so if you query the same node over and over again, the set of shards is always the same and you should get repeatable results. However, if you query a different node, a different set of shards can be selected and the results for the same query might be different.
  2. Due to the fact that you can have several replicas of the same document in your Solr data-center (RF > 1), a proper token range predicate is added to each query before sending it to execution. Each query returns documents with row keys within the appended token range(s). Token ranges are selected in such a way that each token range is handled by exactly one node. Therefore, there is no need for additional duplicate removal and merging the results is a simple and fast union.
  3. Queries are executed by Solr and results are sent back to the coordinator node.
  4. Results are merged by Solr and returned to the user in form of an XML document.


For maximizing DSE search query performance we recommend querying through Solr API with possibly high replication factor. The higher the replication factor, the less shards will be selected and the lower will be the overall cluster load. Due to the fact that shards are statically assigned basing on the coordinator node, you should distribute your queries among all the nodes, to avoid hotspotting. If you need consistency with Solr API, the only option is to use consistency level ALL for writes. The only supported consistency level for CQL Solr queries is LOCAL_ONE. CQL is also the only way to obtain columns not stored in the Solr index.

  CQL Solr
Supported CL for writes all levels all levels
Supported CL for reads in a Solr-only cluster LOCAL_ONE ONE
Supported CL for reads in a mixed workload cluster LOCAL_ONE ONE
Minimum number of queries executed (per search) N N / RF
Maximum number of queries executed (per search) N * RF N

Updated: May 21st, 2015


  1. Jeff Schmidt says:

    Hi Piotr:

    My application makes use of Solr/DSE 2.1 via the Solr port using SolrJ. I’ve been having consistency issues, where one or two of three nodes (RF=3) don’t appear to get an update, or perhaps just not return results based on that update (index issue). According to your blog post, it sounds like when a write succeeds, not only does the data get saved to the commit log and memtable, but also the Solr index (in memory via the Cassandra index API I’d guess)?

    I know the Lucene indexes are present in the local file system of each node and are not themselves partaking in Cassandra level eventual consistency. My Solr updates do not specify the cl parameter, and according to the DSE 2.2 docs, the default CL is QUORUM, or 2 nodes in my case (just the one DC). When the third node is eventually made consistent (read repair?), it will at that time update its Solr index as well? So not only will it be consistent with in the index column family, but the on disk Lucene index also?

    One thing that was suggested to me, and you point out in your summary, is to use cl=ALL for all Solr updates. I can see in this scenario that all nodes will only acknowledge the write to the coordinator only after getting all the way to the Solr index. Again, I assume that’s in memory, not a file in Then, if that cl=ALL fails, back off on the CL because a node is down, and try again etc. Is this what you advocate?

    I’m able to get my three nodes back in sync by following the recommended procedure of rebuilding a corrupted index (repair, shutdown, remove CF index files form, restart, dsetool rebuild_indexes), but that get’s old real fast…

    I am working with a support ticket, and I’m upgrading to DSE 2.2 and I am also going to get things in better clock sync with NTP etc. But, the update CL is still a question for me.

    Thanks for any thoughts on this.


  2. Piotr Kołaczkowski says:

    Hi Jeff,

    One important thing to keep in mind is, if you have RF=3 and write with CL.QUORUM, your update is immediately sent to *three* replicas, not just two. However, the coordinator waits only for the two of them to succeed, so the client wouldn’t notice anything even if one of the replicas was down. The third replica gets that update asynchronously and if you query that node too early, the update might not be there yet.

    When a node receives an update from Cassandra through secondary index API (exactly cassandra calls applyIndexUpdates on the solr index and passes information on added rows/columns), Solr DirectUpdateHandler2.addDoc is used to update the Lucene index on that node, but commit is not called yet. Commit will be called when memtable is flushed to disk.

    Hope this helps a little

  3. Jeff Schmidt says:

    Thanks Piotr. That is helpful. I wonder if commit is part of my issue. My customer wants to apply a series of updates, and only commit on the last one. In the mean time, users continue to see the old results.

    So I have auto commit disabled in solrconfig.xml. When instructed the time is right, my app explicitly issues a commit to Solr. So, even if content was updated with CL=3, the commit is issued to only one node in the cluster (from the Solr API perspective anyway).

    I assume the other two nodes will also see the commit and update their local index appropriately? I don’t get the linking of memtable flush with performing the commit on any given node, and perhaps this is where not using NRT is an issue for me. That is, if the last update applied (with commit) does not cause a memtable flush, then the node will continue to use the old index. That could go on for quite a while since there may not be another update for quite a while. It’s all Solr queries until the customer chooses to apply more updates.

    Does my app have to issue a nodetool flush to ensure the memtable is flushed, and the index committed?

    Thanks again!


  4. Piotr Kołaczkowski says:

    Are you using a soft commit?

    A hard commit posted to any Solr node should distribute the commit to *all* the nodes. If it doesn’t, it is a bug. So there is no need to nodetool flush the memtables to get solr updates committed.

  5. Piotr Kołaczkowski says:

    “A hard commit posted to any Solr node should distribute the commit to *all* the nodes. If it doesn’t, it is a bug.”

    Ok, so I confirm, there is a bug 😉
    We are working on fixing it.

  6. Jeff Schmidt says:

    Thanks for checking into this. It is a hard commit. I have the auto-commit stuff disabled in solrconfig.xml, and then when instructed by the tenant app, I invoke commit() on the SolrServer instance.

    So, it sounds like this bug must still exist in DSE 2.2 then. With cl=ALL, all nodes will have the current data in the Cassandra CF, but the lucene indexes (and thus search results) may well be inconsistent.

    Glad to hear the is an official bug in the system. I look forward to seeing this fixed in DSE 2.3. 🙂




Your email address will not be published. Required fields are marked *