Get your copy of the O’Reilly Cassandra eBook: The Definitive Guide - Download FREE Today
The main focus of these releases was to add support for speculative query executions. Additionally, we improved the performance of Murmur3 hashing and changed the query preparation logic along with other enhancements.
Speculative query executions
Speculative execution is a way to limit latency at high percentiles by preemptively starting one or more additional executions of the query against different nodes, that way the driver will yield the first response received while discarding the following ones.
Speculative executions are disabled by default. Speculative executions are controlled by an instance of
SpeculativeExecutionPolicy provided when initializing the
Client. This policy defines the threshold after which a new speculative execution is triggered.
The driver provides a
ConstantSpeculativeExecutionPolicy that schedules a given number of speculative executions, separated by a fixed delay, the policy is exported under the
Given the configuration above, an idempotent query would be handled this way:
- Start the initial execution at t0
- If no response has been received at t0 + 200 milliseconds, start a speculative execution on another node
- if no response has been received at t0 + 400 milliseconds, start another speculative execution on a third node
As with the rest of policies in the driver, you can provide your own implementation by extending the
One important aspect to consider is whether queries are idempotent, (that is, whether they can be applied multiple times without changing the result beyond the initial application). If a query is not idempotent, the driver never schedules speculative executions for it, because there is no way to guarantee that only one node will apply the mutation. Examples of operations that are not idempotent are: counter increments/decrements; adding items to a list column; using non-idempotent CQL functions, like
In the driver, query idempotence is determined by the
isIdempotent flag in the
QueryOptions, which defaults to
false. You can set the default when initializing the
Client or you can set it manually for each query, for example:
Note that enabling speculative executions causes the driver to send more individual requests, so throughput does not necessarily improve. You can read how speculative executions affect retries and other practical details in the documentation.
Improved Murmur3 hashing performance
Apache Cassandra uses Murmur3Partitioner to determine the distribution of the data across cluster partitions. The adapted version of the Murmur3 hashing algorithm used by Cassandra performs several 64-bit integer operations. As there isn't a native int64 representation in ECMAScript, previously we used to Google Closure's Long to support those operations.
To perform int64 add and multiply operations with int32 types requires you to use smaller int16 chunks to handle overflows. Google Closure's Long handles it by creating 4 uint16 chunks of each operand, performing the operations and creating a new int64 value (composed of 2 int32 values), as Long is immutable.
To improve the performance of the partitioner on Node.js, we created a custom type
MutableLong that maintains 4 uint16 fields that are used to apply the operation, modifying the internal state, preventing additional allocations per operation.
Query preparation enhancements
Previously, the driver prepared the query only on the first node selected by the load-balancing policy, taking a lazy approach.
In this revision, we added fine tuning options on how the driver has to deal with query preparation, introducing 2 new options:
prepareOnAllHosts: That determines whether the driver should prepare the query on all hosts.
rePrepareOnUp: That when a node that has been down (unreachable) is considered back up, determines whether we should re-prepare all queries that have been prepared on other nodes.
Both properties are set to true by default. You can change it when creating the Client instance:
Expose connection pool state
The driver now provides a method to obtain a snapshot of the state of the pool per host. It provides the information of all hosts of the cluster, open connections per host and the amount of queries that are currently being executed (in-flight) through a given host.
You can also use the string representation, that provides the information condensed in a readable format useful for debugging or periodic logging in production.
New version of the drivers are available on npm:
Your feedback is important to us and it influences what features we prioritize. To provide feedback use the following:
- Mailing List: https://groups.google.com/a/lists.datastax.com/forum/#!forum/nodejs-driver-user
- Report issues on JIRA: https://datastax-oss.atlassian.net/browse/NODEJS/issues
- DataStax Academy Slack: https://academy.datastax.com/slack
- Review and contribute source code: