Distributed Data Show Episode 52
Are all benchmarks lies? Nitsan Wakart joins the show to explain the discipline of performance engineering, the ingredients of an effective benchmark, why you should always create custom benchmarks based on your expected workload, and the benchmarking effort we undertook for DSE 6.
0:15 - Nitsan introduces his background in performance engineering at companies including Push Technology and Azul Systems and how he ended up at DataStax working on DSE 6.
3:18 - Defining the role of a “performance engineer” - applying scientific approaches
5:37 - On the bad reputation of benchmarks: abuse in marketing means that many benchmarks don’t actually mean what the “headline” says they mean. Brendan Gregg from Netflix has a great reference book on Systems Performance
8:50 - An example of bad math in benchmarks: dividing throughput by number of messages to get latency. This can result in benchmark metrics that are actually meaningless for your domain. The recommendation on Java benchmarking is traditionally “Don’t do it”, but it’s not evil, and should be part of the curriculum
11:24 - The ingredients of good benchmarking: a use case, hardware, JVM version, and measurements of the system at rest. “Performance is easy, all you have to understand is everything.”
13:05 - Taking lots of measurements and measuring variance is important. It’s rare to get two measurements that are exactly the same. Experiments should run long enough to be relevant for the domain, and for JVMs you need to have a warm up period.
17:01 - Benchmarking distributed databases requires a variety of benchmarks with different workloads on different hardware. To keep the combination of options from exploding we give particular weight to scenarios from the field.
19:52 - On how DataStax rationalizes the marketing message of DSE 6 being 2x faster than DSE 5 and Apache Cassandra. In particular, scaling to larger numbers of cores yields more than 2x improvements. Tail latencies are vastly improved even at higher load.
23:26 - Nitsan’s advice on how to do benchmarking for your own applications. Instead of running “standard” benchmarks, focus on custom benchmarks that approximate the workload of your applications.
26:26 - Nitsan suggests a few engineers you should follow to get educated on performance engineering. JMH is a great tool, and the mechanical sympathy mailing list is also recommended. Finally, ask for help! The performance engineering community is helpful.
30:59 - Wrapping up
Developer Relations at DataStax