Experimental Features in Apache Cassandra® 4.0

Video: Learn about the Experimental Features process for continuous improvement
Experimental Features

Experimental Features

After dealing with stability challenges in the materialized views feature added in the 3.0 release, the Cassandra community created a new process for introducing new features that represent significant changes to Cassandra’s codebase or dependencies and need time to prove their stability.

These so-called “experimental features” are disabled by default in the cassandra.yaml file, allowing users to try them out selectively. Experimental features are explicitly not recommended for production use.

There are two new experimental features added in Cassandra 4.0: Transient Replication and Support for JDK 11.

Next: Introducing Transient Replication

Introducing Transient Replication

The objective of transient replication (CASSANDRA-14404) is to reduce the amount of storage needed to achieve the desired consistency level. It was designed for use in very large clusters in which storage costs are a significant factor. The idea is that you can add transient replicas to increase consistency without adding additional hardware.

In transient replication, some nodes act as full replicas, storing all the data for assigned token ranges, while other nodes act as transient replicas, storing only unrepaired data for the same ranges. This approach is also known as “cheap quorums”.

Next: Using Transient Replication
Introducing Transient Replication
Using Transient Replication

Using Transient Replication

After enabling transient replication on all nodes in the cluster via the cassandra.yaml file, you set the number of transient replicas desired per keyspace using the replication strategy. For example, a replication_factor of 5/2 means you are requesting five total replicas of the data, with two of them being transient. This notation can be used when specifying replication_factor for both the SimpleStrategy and NetworkTopologyStrategy, in which case you can specify a desired number of transient replicas per datacenter.

That’s it! Cassandra handles the rest, so that everything is transparent to your application. For more detail, reference the Cassandra documentation.

Next: How Transient Replication works

How Transient Replication Works

Transient replication affects Cassandra’s write path, read path, and incremental repair.

When data is written, if a sufficient number of full replicas are not available to receive a write, transient replicas are used instead.

Similarly, reads can use transient replicas to achieve the desired consistency level if not enough full replicas are available.

After incremental repair runs, data stored on transient replicas is discarded, resulting in reduced usage of disk, CPU, and I/O because the extra copies of data are only stored for a short amount of time.

Next: Limitations of Transient Replication
How Transient Replication Works
Limitations Of Transient Replication

Limitations Of Transient Replication

As described on the Cassandra blog, there are certain CQL features which are not supported on keyspaces using transient replication, specifically: lightweight transactions, counters, logged batches, secondary indexes, and materialized views.

Next: Java 11 Support

Java 11 Support

Cassandra 4.0 is the first release that claims any level of support against JDK 11, even if it is only considered experimental at this point. Since the Java community has committed to a 6-month release cycle, this is an encouraging move toward upgrading the supported Java versions more frequently in Cassandra.

You can compile Cassandra against either JDK 8 or JDK 11, but remember that binaries compiled against JDK 11 can only run on Java 11. For more information, consult the Cassandra documentation.

Next: Experimental Garbage Collectors
Java 11 Support
Experimental Garbage Collectors

Experimental Garbage Collectors

From a Cassandra perspective, the most significant new feature in Java 11 is the introduction of new garbage collection algorithms such as ZGC and Shenandoah.

Preliminary performance testing by Cassandra project engineers indicates promising results for both of these garbage collection options, which support high throughput for Cassandra benchmarks with minimal tuning. The worst-case or “tail latencies” are also greatly improved due to the avoidance of the occasional long GC pauses typical of other garbage collectors.

More Resources

Items related to Experimental Features

Introducing Transient Replication | Blog

Introducing Transient Replication | Blog

Learn More
Java 11 Support in Apache Cassandra 4.0

Java 11 Support in Apache Cassandra 4.0

Learn More
Apache Cassandra Performance Benchmarking: 4.0 Brings the Heat with New Garbage Collectors ZGC and Shenandoah | Blog

Apache Cassandra Performance Benchmarking: 4.0 Brings the Heat with New Garbage Collectors ZGC and Shenandoah | Blog

Learn More