DataStax Developer Blog

DataStax Java Driver: 2.0.10 Released!

By Alexandre Dutra -  May 4, 2015 | 0 Comments

The Java Driver team is pleased to announce that version 2.0.10 has been released! This new release comes with a lot of new features, improvements and bugfixes.

Below are some highlights that we think will be of interest for most of our users. Please refer to the README file for a global overview of the current set of features, or to the complete changelog for the full list of changes.

  1. Speculative Executions
  2. Query Logger
  3. Per-Host Latency Histograms
  4. Netty 4
    1. TCP_NODELAY
    2. Shading
    3. Advanced customizations
  5. Manual Query Paging
  6. Improvements to BoundStatements
  7. Exposing Token Ranges
  8. Improvements to Connection Handling
    1. New pool resizing algorithm
    2. Asynchronous initialization
    3. Connection heartbeats
    4. Revert of JAVA-425
  9. Schema Agreement API
  10. Better Naming of Threads
  11. New Metrics Gauges

Speculative Executions

Since version 2.0.2, Cassandra offers a mechanism to protect against bad read latencies: rapid read protection.

JAVA-561 now introduces a similar protection mechanism that we named Speculative Executions (not to be confused with retries): the driver is now able to pre-emptively start a second execution of the same query against another node, before the first node has replied or errored out. The driver would pass whichever response comes back first onto the client, canceling the other ones.

The driver currently ships with two speculative execution policies:

  • NoSpeculativeExecutionPolicy, which is the default one and that actually disables speculative executions;
  • ConstantSpeculativeExecutionPolicy, that spawns speculative executions at a constant rate.

As usual, you can also provide your own policy by simply implementing SpeculativeExecutionPolicy.

Since speculative executions are disabled by default, to switch them on and use e.g. ConstantSpeculativeExecutionPolicy, all you need to do is register your policy with your Cluster instance:

Cluster cluster = Cluster.builder()
    .addContactPoint("127.0.0.1")
    .withSpeculativeExecutionPolicy(
        new ConstantSpeculativeExecutionPolicy(
            500, // delay before a new execution is launched
            2    // maximum number of executions
        ))
    .build();

Given the above configuration, speculative executions would be spawned at a constant rate according to the following scenario:

  1. start the initial execution at t0;
  2. if no response has been received at t0 + 500 milliseconds, start a speculative execution on another node;
  3. if no response has been received at t0 + 1000 milliseconds, start another speculative execution on a third node.

One important aspect to consider when using speculative executions is whether queries are idempotent or not, i.e. whether they can be applied multiple times on a given initial state while always producing the same resulting state. If a query is not idempotent, then speculative executions should not be attempted for it, because there is no way to guarantee that the mutation will be applied only once.

There is a lot more to know about speculative executions; check the online documentation, or consult the API docs for Statement and SpeculativeExecutionPolicy.

Query Logger

Java Driver users have long asked for a convenient way to log queries executed by the driver, and also for a tool to track slow queries yielding bad response times. This is now possible thanks to JAVA-646, that introduces a new API class named QueryLogger.

Let’s suppose that we want to track queries that take more than 300 milliseconds to complete; this can be achieved in two steps:

1) Create one (singleton) QueryLogger instance at application startup and register it with the Cluster instance:

Cluster cluster = ...
QueryLogger queryLogger = QueryLogger.builder(cluster).withConstantThreshold(300).build();
cluster.register(queryLogger);

2) Set the com.datastax.driver.core.QueryLogger.SLOW logger level to DEBUG, e.g. with Logback:

<logger name="com.datastax.driver.core.QueryLogger.SLOW" level="DEBUG" />

The driver would then print a log message for every query that takes more than 300 milliseconds to complete, including useful information such as the queried host and the query string.

QueryLogger‘s behavior can be fully customized to your needs. For more information, read the online documentation, or the API docs for QueryLogger.

Per-Host Latency Histograms

We are including in this version a beta preview of a set of new components that focus on recording latency histograms.

The core component is PerHostPercentileTracker. It is a LatencyTracker that records latencies for each host over a sliding time interval, and exposes an API to retrieve the current latency at a given percentile. This class uses HdrHistogram to record histograms behind the scenes. See JAVA-723 for more details, or the API docs for the PerHostPercentileTracker.

We are also including another more elaborate, percentile-based speculative execution policy called PercentileSpeculativeExecutionPolicy. We’re very excited about this policy, and we are expecting very good results for speculative executions triggered at higher latency percentiles (95th and above), so we decided to let users experiment with it. See the online documentation or the API docs for PercentileSpeculativeExecutionPolicy for more details and usage examples. A separate blog post will be published soon and will focus on performance benchmarks for different kinds of speculative executions, including this one.

The QueryLogger described above can also be configured to use dynamic, percentile-based thresholds instead of a constant threshold, although used this way, it should be considered as in beta state too. Find out more about dynamic thresholds for the QueryLogger in the online documentation or in the API docs.

Again, we should stress that the above features are currently marked “beta” and are included in this version for evaluation purposes only and as such, they haven’t been thoroughly tested yet, and their API is still subject to change.

Netty 4

Although most users won’t notice it, another significant improvement under the hoods in version 2.0.10 is the upgrade from Netty 3 to Netty 4 (JAVA-622).

TCP_NODELAY

One important thing to notice is that Netty 4.0 sets the TCP_NODELAY flag to true by default. We are also now defaulting SocketOptions.getTcpNoDelay() to true. Set this option explicitly to false if you want to enable Naggle’s algorithm.

Shading

Another important change is Netty shading. Since version 2.0.9 and 2.1.4, the Netty library has been shaded by default. Based on feedback we received since, we are now providing the driver artifacts in two different flavors: with and without shaded Netty classes. Please refer to the online documentation to find out how to use the shaded driver jar.

Advanced Customizations

But there’s even more: thanks to JAVA-640 and JAVA-676, it is now possible for client applications to customize the driver’s underlying Netty layer. Clients that need such flexibility can now subclass the newly-created NettyOptions class and provide the necessary customization by overriding its methods. But contrary to other driver options, the options available in this class should be considered as advanced features and as such, they should only be modified by expert users. Moreover, given that NettyOptions API exposes Netty classes, it should only be extended and used by clients using the non-shaded version of driver. Check the API docs for NettyOptions for more information about this feature and how to use it.

Manual Query Paging

Automatic query paging has been around since Cassandra 2.0 and version 2 of the native protocol.

However, the paging state was kept internally by the driver and clients would not have a direct access to it. This was a serious limitation for applications trying to achieve “manual” paging, e.g. when displaying query results in a stateless web application.

With JAVA-550, this has been finally made possible. Check the online documentation to find out how.

Improvements to BoundStatements

The BoundStatement class has been enriched with two new long-awaited improvements: a set of get*() methods to retrieve typed CQL values either by index (starting at 0) or by name, as well as a more generic getObject() method (see JAVA-547 and JAVA-584). These have been grouped in a new interface: GettableData, that is implemented by both BoundStatement and Row. This means that it is now possible to retrieve bound values from a BoundStatement instance.

We are also introducing a new method DataType.format(Object) that formats a Java object as a String, again for pretty-printing CQL values.

Let’s combine all of this into a simple example: suppose that we want to log our bound values and pretty-print them to the console. This can now be achieved with the following code:

BoundStatement bs = ...
ColumnDefinitions variables = bs.preparedStatement().getVariables();
int index = 0;
for (ColumnDefinitions.Definition variable : variables) {
    DataType type = variable.getType();
    String name = variables.getName(index);
    Object value = bs.getObject(index++);
    logger.debug("Parameter {}={}", name, type.format(value));
}

Note however that, because bound values are stored internally in a serialized form, retrieving them like in the example above may have a non-negligible impact on performance, because they need to be deserialized back. These methods are thus provided for debugging purposes mainly and should not be used in normal application code.

Exposing Token Ranges

Making the driver able to report information about token distribution across the ring is also a long-awaited feature for people building Hadoop and Spark applications that interact with Cassandra tables. So far, such applications were relying on the Thrift protocol, because the Java driver wasn’t capable of providing enough information for these clients to be able to correctly compute InputSplits for Cassandra tables and evenly dispatching jobs across the Hadoop/Spark cluster. One example of such applications is the DataStax Spark Cassandra Connector.

Thanks to JAVA-312, that has been backported from version 2.1.5, the Java driver is now able to report enough information to such clients, contributing to the progressive abandon of the deprecated Thrift protocol in the recalcitrant Hadoop/Spark area.

JAVA-312 introduces a new class: TokenRange. Its most important method is splitEvenly(int numberOfSplits), which splits the current token range into a number of smaller ranges of equal size. “Size” here refers to the number of tokens in each range; if you want to split according to the actual amount of data, sizing information is now exposed in a system table (see CASSANDRA-7688, fixed in Cassandra 2.1.5).

Check the online documentation on Metadata for further information and guidelines about how to use the new TokenRange class to compute splits for Cassandra tables.

Improvements to Connection Handling

New pool resizing algorithm

Let’s start with a nice improvement: JAVA-419 brings a brand new algorithm for connection pool resizing that finally fixes a well-known bug affecting variable-sized pools (core connections != max connections).

Asynchronous initialization

Connection pools will also benefit from asynchronous initialization: so far, when the driver creates a connection, it will block until each connection is established and initialization queries are performed. Moreover, a connection pool creates its connections sequentially; sometimes, and especially with large clusters and/or a large number of core connections per hosts, the overall process of creating all connection pools at session startup can be very long. JAVA-692 mitigates this fact by introducing asynchronous, parallel connection pool initialization. This improvement should be noticeable for most users: our tests showed that the new asynchronous initialization outperforms previous versions of the driver in every case, but specially with large clusters and clusters requiring authentication. For example, a 40-nodes cluster with authentication enabled showed an initialization time 8 times faster than before.

Connection heartbeats

Also, the SUSPECT state has disappeared. Handling of misbehaving hosts is now done via connection heartbeats, introduced by JAVA-533 and backported from version 2.1.5. You will find out more about connection heartbeats in the online documentation on connection pooling.

Revert of JAVA-425

And last, but not least: the Java driver community spoke, and we heard you! JAVA-425 has just been completely reverted! For those of you who remember, JAVA-425 introduced a major behavior shift regarding driver read timeouts: in the event of such timeouts (as determined by SocketOptions.getReadTimeoutMillis()), the driver would defunct the connection and mark the node DOWN.

As good as our intentions were, it appears that this was too an aggressive behavior for most of our users. Among the reasons our users invoked not to defunct the connection:

  1. The driver cannot reason about the state of the server based on a single request timeout;
  2. Gossip protocol and connection heartbeats are better indicators of a host’s health than a single read timeout;
  3. It can create sudden hotspots by restraining the number of live nodes the driver can talk to, risking a domino effect and a complete cluster outage;
  4. It could conceal server-side issues, notably insufficient cluster capacity.

Schema Agreement API

JAVA-669 introduces a new API to check schema agreement between peers.

After a DDL query, one can use resultSet.getExecutionInfo().isSchemaInAgreement() to check if peers agreed. Also, at any time, users can now perform a one-time check with cluster.getMetadata().checkSchemaAgreement().

Check the online documentation for further information.

Better Naming of Threads

JAVA-583 has introduced changes in the way the driver names its threads to clearly mark them as belonging to the Java driver; all names are now prefixed with the cluster name.

New Metrics Gauges

And finally, JAVA-626 adds 4 new Gauges to the Metrics class:

  1. getExecutorQueueDepth(): The number of queued up tasks in the non-blocking executor (threads named <cluster>-worker).
  2. getBlockingExecutorQueueDepth(): The number of queued up tasks in the blocking executor (threads named <cluster>-blocking-task-worker).
  3. getReconnectionSchedulerQueueSize(): The size of the work queue for the reconnection scheduler (threads named <cluster>-reconnection). A queue size > 0 does not necessarily indicate a backlog as some tasks may not have been scheduled to execute yet.
  4. getTaskSchedulerQueueSize(): The size of the work queue for the task scheduler (threads named <cluster>-scheduled-task-workers). A queue size > 0 does not necessarily indicate a backlog as some tasks may not have been scheduled to execute yet.

These can be used for monitoring whether or not a Cluster’s executors are becoming backlogged, which could help understand abnormal behavior of the driver.

How To Contribute

Have comments, feedback, questions? We would be glad to hear from you! Please use any of the following:



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>