Thread Per Core with Jake Luciani
Jake Luciani takes us behind the scenes to explain how the principle of mechanical sympathy was applied to DataStax Enterprise 6 in the new Thread Per Core feature. DSE 6 is demonstrating 2x improvements in read/write latency compared to DSE 5.1 / open source Apache Cassandra.
0:15 - Jeff welcomes Apache Cassandra committer and DataStax Engineer Jake Luciani to the show
1:35 - Defining mechanical sympathy
3:06 - Why the lost art of mechanical sympathy is coming back - the recent trend toward multi-core servers.
4:34 - Thread per core is the DataStax Enterprise manifestation of mechanical sympathy concepts. We’ve rebuilt the internal engine with an asynchronous, non-blocking approach that improves over the “Staged Event Driven Architecture” (SEDA) approach used by Cassandra
8:31 - TPC leverages reactive streams to connect one event loop to the processing chain for each core
10:07 - The solution includes libraries such as Netty and RxJava, plus custom code
10:51 - The benefits of TPC are improved performance on machines with more cores. With SEDA, the problem was that the more cores you added, the more contention.
13:31 - From a data partitioning perspective, each node is treated as a mini cluster, with a token range assigned to each core.
15:24 - The same techniques we’ve used historically to design around the hot partition problem still apply
17:46 - Don’t forget the client when scaling distributed systems - scaling the application layer is just as important as adding nodes to a cluster.
19:38 - Using libaio for asynchronous disk access helps toward the goal of having all interactions (memory, disk, network) in the same event loop. It’s all about reworking any place in the code that can block.
22:25 - The solution was tested on many different hardware configurations and topologies. More than half the TPC effort was on testing.
23:53 - We’re seeing ~2x throughput improvements on reads and writes, and big improvements in tail latency. This kind of improvement allows you to run the same workload on a smaller cluster.
26:14 - Improving the observability of what is happening on each core was a major part of the effort. You can see the results in what is reported via JMX.
28:01 - Wrapping up