DataStax News: Astra Streaming now GA with new built-in support for Kafka and RabbitMQ. Read the press release.
After dealing with stability challenges in the materialized views feature added in the 3.0 release, the Cassandra community created a new process for introducing new features that represent significant changes to Cassandra’s codebase or dependencies and need time to prove their stability.
These so-called “experimental features” are disabled by default in the cassandra.yaml file, allowing users to try them out selectively. Experimental features are explicitly not recommended for production use.
There are two new experimental features added in Cassandra 4.0: Transient Replication and Support for JDK 11.
Introducing Transient Replication
The objective of transient replication (CASSANDRA-14404) is to reduce the amount of storage needed to achieve the desired consistency level. It was designed for use in very large clusters in which storage costs are a significant factor. The idea is that you can add transient replicas to increase consistency without adding additional hardware.
In transient replication, some nodes act as full replicas, storing all the data for assigned token ranges, while other nodes act as transient replicas, storing only unrepaired data for the same ranges. This approach is also known as “cheap quorums”.
Using Transient Replication
After enabling transient replication on all nodes in the cluster via the cassandra.yaml file, you set the number of transient replicas desired per keyspace using the replication strategy. For example, a
5/2 means you are requesting five total replicas of the data, with two of them being transient. This notation can be used when specifying
replication_factor for both the
NetworkTopologyStrategy, in which case you can specify a desired number of transient replicas per datacenter.
That’s it! Cassandra handles the rest, so that everything is transparent to your application. For more detail, reference the Cassandra documentation.
How Transient Replication Works
Transient replication affects Cassandra’s write path, read path, and incremental repair.
When data is written, if a sufficient number of full replicas are not available to receive a write, transient replicas are used instead.
Similarly, reads can use transient replicas to achieve the desired consistency level if not enough full replicas are available.
After incremental repair runs, data stored on transient replicas is discarded, resulting in reduced usage of disk, CPU, and I/O because the extra copies of data are only stored for a short amount of time.
Java 11 Support
Cassandra 4.0 is the first release that claims any level of support against JDK 11, even if it is only considered experimental at this point. Since the Java community has committed to a 6-month release cycle, this is an encouraging move toward upgrading the supported Java versions more frequently in Cassandra.
You can compile Cassandra against either JDK 8 or JDK 11, but remember that binaries compiled against JDK 11 can only run on Java 11. For more information, consult the Cassandra documentation.
Experimental Garbage Collectors
From a Cassandra perspective, the most significant new feature in Java 11 is the introduction of new garbage collection algorithms such as ZGC and Shenandoah.
Preliminary performance testing by Cassandra project engineers indicates promising results for both of these garbage collection options, which support high throughput for Cassandra benchmarks with minimal tuning. The worst-case or “tail latencies” are also greatly improved due to the avoidance of the occasional long GC pauses typical of other garbage collectors.