[Webcast] Apache Cassandra 4.0: The Best One Yet
Apache Cassandra™ 4.0 delivers huge improvements in stability and performance, and reduces the total cost of ownership (TCO) for existing and new installations. Cassandra 4.0 represents the efforts of unprecedented cross-industry collaboration, from the planet’s largest Cassandra users and engineers.
In this recorded webcast, Aaron Morton (formerly of The Last Pickle), Josh McKenzie, and I dive into the upcoming 4.0 release. Cassandra’s feature list is long but here’s what makes it the biggest, fastest, and most stable database ever:
- Enterprise-grade security & observability with real-time audit logging and traffic replay
- Faster bootstrap and recovery from node outage with zero-copy streaming
- Up to 25% faster throughput on denser nodes
- Over 1000 bug fixes!
Patrick McFadin: 00:01
All right. Good day, I always say good day because I don't know where you are in the world and we have people here that it isn't morning. It's morning for me so thank you for joining us today for our webinar. I'm sure this massive list of people that I'm looking at are here for one reason and one reason only. Cassandra 4.0 and we are going to talk a bit about it. It's this ongoing presentations about what to expect and get you ready for the upgrade. This is all about why this is the most reliable, elastic and fastest Cassandra ever. And if you don't believe this at the end ask questions, we will be happy to take them. So my name is Patrick McFadden, I'm with developer relations here at DataStax and today the superstar team I'm joined by Aaron Morton, who you probably know. Hello, Aaron.
Aaron Morton: 00:56
Hi, Patrick. Thanks. So my name's Aaron Morton, I live all the way down here in New Zealand where it is morning. It's 8:00 in the morning on a bright sunny day. I've been using Cassandra since version 0.3, ran The Last Pickle for a long time, started that. I'm a committer and on the project management committee. And now I work at DataStax, running our global services team.
Patrick McFadin: 01:19
And Josh McKenzie, we don't see you in webinars a whole lot, but maybe we can fix that.
Josh McKenzie: 01:25
Yeah, I'm in a less exotic location, suburbs of Atlanta, Georgia. I definitely don't have the 0.2, 0.3 street cred. Aaron hasn't been around for about seven years on Cassandra. PMC member, committer, dubious honor of being the X Windows lead on the code base [inaudible 00:01:44] code there. And right now I head up community engineering, I'm VP of server engineering at DataStax. I am the head of open force strategy working as a product manager on Stargate, wear a bunch of different hats and do a bunch of crazy things like webinars. So here we are.
Patrick McFadin: 02:00
Well, we're glad you're here. Yeah and just to kind of wrap it all up into a tight little package. I mean, the three of us have been doing Cassandra stuff for a long, long time. Two of us are on the PMC, one of us is just not. Anyway, before we get into the topic, I love this term, housekeeping. I'm going to sweep the floors or something, but I got to tell you some stuff. So everyone is muted so if you say anything we will not hear you, trust me on that. So during the presentation, if you have anything you want to ask, we have a Q&A box. So we're using the ON24 platform which it's all web-based, so if you have problems, refresh your browser, see what happens.
Patrick McFadin: 02:47
But this is pretty reliable, we use it quite a bit. So questions and answers, there's a Q&A box that you can ask your questions in. And if we see that mid-presentation maybe we can answer them but really my thing at the end is going to be going through those questions and trying to answer as many as possible, as much as we have time for. We love questions. I mean, we're not going to answer everything during the presentation. So as you have them, go ahead and put them in. So we're going to go through this presentation here and in pieces and parts and hopefully everyone can see my screen. Let's see, do I have it here? Yes, the screen is live. Thank you very much. Pushing to the audience. Here we go.
Patrick McFadin: 03:40
So Cassandra 4.0, this is a preview because 4.0 it's not out yet. It will be soon. And how soon? Hey, welcome to open source. We're not driven by marketing. It'll be done when it's done. That's kind of a flippant answer, but that's the way it works. And I think one of the things you're going to learn here is because Apache Cassandra is an open source project, it goes by different rules of the road. Product companies that are trying to push a product or reinvents going to be happening soon, I bet you're going to hear a lot of product announcements around reinvent, right? Well, Cassandra is going to be done when it's done because it has to be right and that drives more than anything. But let's take a look at Apache Cassandra at a glance real quick.
Patrick McFadin: 04:27
First developed at Facebook by a couple of crazy guys, Avinash Lakshman and it was really there too in this era, in this 2009, 2010 era, where we were trying to figure out new kinds of databases for new kinds of problems. Relational databases weren't scaling the way we needed to and the "hyperscalers" were trying to figure out things as quick as possible, Facebook being one of them. So it is a fully distributed database, shared-nothing and which is really an important part of how it works. We won't go into how Cassandra works. And if you want to dig into it, there's plenty of resources. If you go to datastax.com/dev, we have a lot of learning material there.
Patrick McFadin: 05:10
But the key point is that it's a shared-nothing architecture. So it gives you a lot of really cool stuff like elastic scalability, so when you need more, you add more online, no problem. The availability is probably it's most notable thing that people talk about is it's really hard to kill a Cassandra cluster, although we all have stories about how people have, but it's meant to take abuse and even across network partitions from data centers not seeing each other, it has this whole tuneable consistency thing which is also bear some explanation sometimes, but really gives developers control over how your data gets set into the database. And it's really hard to explain what kind of databases, it's not a columnar database, it's a partitioned row store. That's the best way to explain it. And that's about all we're going to talk about from architecture standpoint. Some of the things we're going to talk about today, I assume you know about Apache Cassandra.
Patrick McFadin: 06:11
So if you get lost in why is this important, hit pause, this will be recorded, come back and this will be ready for you if you want to learn more about Apache Cassandra. So the release history of Apache Cassandra is varied. I love how this is a meandering path. And we're right now on a 4.0 path and it's been there for a long time. And why is that? Well, we've gone through some iterations of how Cassandra has been developed and being an open source project, there's a lot of ideas. So some of it was really driven by a paste response like, "All right, it's January, let's get a release out. It's June, let's get a release out." And that works if you have tight features, but then we went to this tick-tock model, which a lot of people did not like, and we stopped doing it.
Patrick McFadin: 07:08
And that's all I'm going to say about that. If you want to read more about it, go to the mailing list. But right now 4.0 has been in the pipe for a while long time. And why? It's because the project has made this determination that 4.0 will be the most stable database in the world, period. And the people, the engineers that are working on it and the companies that they work at, value stability and correctness higher than any other thing, than a ship date. So that's what it's going to be about. And Aaron and Josh have some really cool stuff to talk about that, but right now what we're looking at is we're in beta 3. And what's going to happen soon hopefully is we're going to move into a release candidate. And then finally we're going to ship.
Patrick McFadin: 07:55
That I hope it's going to happen soon, but there's no fixed schedule. What is the difference between 3.0 and 4.0? Well, if there's one picture it's going to tell the tale, here it is. 3.0 had a lot of stuff going on. I thought it was a big release at the time, but 4.0 completely capsize it. And just in the realm of bug fixes, no, these are bugs like, oh, the color is wrong on this page. These are bugs and correctness bugs and scalability bugs and getting in the way of people's good experience with Cassandra.
Patrick McFadin: 08:31
And most important is because we know Cassandra is a database of record for a lot of companies, including banks. You can't rely on a database and hope for the best. You have to rely on a database and know that it's right. And Josh has a great slide on some of the correctness testing, but look at how many bug fixes are in there. And there will be more, this is as of right now, in beta 3, but as well as many improvements. So the new features at the very top there, it looks like a very small in percentage. It's still a lot of really good features, but the bug fix is the one I really am excited about because having a dot release this stable is pretty unprecedented in a database world. All right. So we're going to start out and then I'll be done talking. Aaron and Josh are going to talk, don't worry. Yeah, we should have changed this up guys, huh?
Josh McKenzie: 09:35
Keep going, man. It's beautiful.
Patrick McFadin: 09:37
It's beautiful. You're like, "Oh, you can do this [inaudible 00:09:39]." Well, I'm going to pause there for a second. If you have anything to add before I jump into this?
Aaron Morton: 09:49
We've been talking about 4.0 for a long time. People have been at conferences, we've heard that being talked about. It's going to be real in a couple of months and that's going to be we're going to see in a little bit, just the amount of changes that that brings. And it's pretty much worth the wait, I would say, from a performance perspective, from an operations perspective and the new functionality that's there, it's been worth the wait.
Josh McKenzie: 10:17
Yeah. And the one thing that I will say having worked on the coding for the better part of a decade at this point which is interesting and terrifying, is distributed systems are complex in ways that people don't expect, especially ones that are eventually consistent. And they work the way Cassandra does with the kind of bulletproof availability. And the number of times where I've seen engineers think through, "Okay, we have these three different data points at these four different times with these three different nodes." And then by the end of two paragraphs your brain is melted. This is what keeps us all in this space is how fascinating and how complex and how interesting the things are that we're working on. But Cassandra is really the best of breed when it comes to like you said Patrick, it doesn't go down. It is bulletproof and it can take a beating and it's built to. So…
Patrick McFadin: 11:02
Yeah. And being a database of record, those are all really great features. And I will tell you, I know just through my own wanderings that I know that there is a few pretty large companies that are currently running beta 3 in production. Wow, I know, crazy times. Let's move on. So let's talk about some of the improvements and the first one, if you were familiar with Renee Cassandra cluster, I'm going to use the R word trigger warning, repair. And if you're not familiar with Cassandra, when I talk about a repair process, I'm not talking about fixing the database, I'm talking about the anti-entropy repair. This goes all the way back to the dynamo paper. And it's really managing the consistency of a data in a large-scale distributed system.
Patrick McFadin: 11:53
This is running is absolutely mandatory, no choice. And then this gets screwed up during the import into ON24, I blame everyone. It's mandatory, but you get range guy over it because it is complex. Running repair is you're an operator and you're running it because in the cluster, running repair is like something you really have to spend the time to get to know and figure out and not fail at. And it's just constant source of problems for the community. Like, "Oh, my repair didn't work." As a result, not a lot of people do it like they should. So there's been a lot of energy put into this, a lot of great engineering time, a lot of contributions.
Patrick McFadin: 12:38
So essentially what it is the repair process is now wrapped in like a tree transaction. So we know that there's a consistency of that repair process instead of just not happening or silently failing. It also gives you better control over what has been repaired. And that's really critical because when we do incremental repairs, we're doing a little bits at a time. You want to know that it got done. And finally, this whole thing is just really set up for doing this, running repair out of the box with almost every situation and the same default thing. I know same default's kind of a loaded word because who's default? But repairs will be a lot easier to work with.
Patrick McFadin: 13:20
Now, we have community projects out there that are doing some great things around repair called Cassandra Reaper, still their project. That's a part of the Apache Cassandra project. Reaper is a tool to help you run repairs. If you are running a Cassandra cluster right now, I strongly recommend checking out Reaper. Reaper is a great tool to use and will make your life a lot easier. That started all the way back at Spotify, Last Pickle picked it up and ran with it and now it's part of the Apache Cassandra project. So worth looking at.
Patrick McFadin: 13:54
And in my shameless plug, if you really... Josh's laughing now, it's like, "Do you have to keep talking about it?" Yes, I do. If you were underwater, under a rock or something in the past couple of weeks, we just released this project called k8ssandra, which is essentially running Cassandra on Kubernetes, and all the tools you need to run with it including Reaper for running repairs. And it's all automated, very easy to use. If you're using Kubernetes and Cassandra, k8ssandra.io come join the project, we would love to include you into that. But it will be pretty important as we upgrade to 4.0 because this will be a part of it. And it does this really cool thing where it creates Grafana dashboards and you couldn't have, I can't do without. That's right. This is actually a screenshot from a dashboard. So k8ssandra.io makes your life a lot easier for repair.
Josh McKenzie: 14:55
Patrick because the tech lead that was working on k8ssandra did a demo there earlier today and it's amazing to see that many different moving parts installed via helm charts with a one or two commands. And then the number of different dashboards to just see, "Here's what our ingress looks like, here's what the repairs status is, it gets Reaper set up for you et cetera." Moving towards kind of the more SRE-focused view of being an operator and running stuff, it's great to see Cassandra kind of come into that space. And yeah, it's pretty cool stuff, but it's the community project, right? So the more of us that can get together working on it, the better.
Patrick McFadin: 15:29
Yeah. That's what it is. It's where our communities gathering or Cassandra community, Kubernetes community are joining up forces to make it both work better. And if you're a DBA, this is a great way to get to be an SRE. Okay, so moving onto the next thing, and this is one that I have a lot of personal interest in, virtual tables. If you're familiar with the V dollar signs and Oracle, pretty much the same thing where before you would get to any system variables in Cassandra, you'd have to use JMX, the Java Management Extensions. If you don't know what JMX is, you were so lucky. Because JMX was a thing that was cool in the 1990s and just never went away. It is the way to instrument things inside of the JBM. However, getting to it for operators is just impossible sometimes.
Patrick McFadin: 16:25
So now all those variables, all those system, the system stuff like what's running in my Cassandra cluster is exposed via CQL. So now you can do things like look at the schema, if you look at the schema for the virtual scheme, I mean the system virtual schema, you can see that there's system views and you can actually do a select off of those and see what's going on inside of your cluster from an operation standpoint. There'll be a lot of opportunities for new instrumentation, ways to manage your cluster that don't have to do with connecting to a JMX port and exposing yet another port to the security world.
Aaron Morton: 17:07
Patrick, I think when you say impossible to get it to, it's not an exaggeration, it's often in the highest security. It wasn't like, "Oh, this is difficult." It was in a compliance environment, in a bank environment, it was outlawed. You could not have JMX open because it exposed a lot of things to the outside world. And we've added role-based access controls to that JMX endpoint over the years, but still just having it enabled was outlawed. Then that meant that a lot of tooling really struggled to work, which has led to sidecar innovations and things like that. So this is a really helpful tool for the production operator.
Patrick McFadin: 17:51
Yeah, I think in 2016 at NGCC I did a presentation called JMX Needs to Die in a Fire. I'm very happy that this is happening because still feels the same way. So there's these couple of different views out there right now, the system views is probably one that has the most information. For instance, I want to be able to see what's going on with my data table and... Oops, went too far. So this is one of the things for operators that you're just going to have to explore. There will be a lot of content on this when it's out, but yay, no port 8099 JMX anymore, go team.
Patrick McFadin: 18:37
Next one really big for operators is this whole idea of full query logging or audit logging, which again is a part of a natural evolution of a enterprise-style database, right? And I mean, calling Cassandra an enterprise database kind of cracks me up, but it is, it's like every enterprise in the world. But there are just certain things you got to have for compliance. And one of those things is full query logging and audit logging. It's the same mechanism for both and it's very lightweight so it doesn't get in the way of things. Sorry for the eye chart on the diagram, but essentially what it means is that now when you set this up, you get a really good insight of what's happening.
Patrick McFadin: 19:16
So you can have a dump-out to an FQL log, enable it through nodetool and disable it so you could turn it off or on. And so it gives you all of the information about the CQL queries that are happening. So for instance, operators who want to keep a tabs on their developers aren't going to know what's going on. If you've been a DBA for any time, you know this is pretty invaluable. And when you look inside the logs, you get a lot of information about timestamps, when it generated, the actual query. So all of the good stuff that you would probably need to do troubleshooting later on, especially in a dev environment, whenever you're running production load testing and things like that.
Patrick McFadin: 20:04
And then the audit logging of course is similar to that, uses the same mechanism, but really what it is is making sure that bad people are doing good things on your computer or good people are doing bad things, there you go. And again, really important for any kind of regulated environment. If you're in financial services, for instance, you have to produce audit logs and same things, but it gives you a lot of information about what was accessed and when. And really critical for like if you're maintaining important, sensitive information, you should need to be able to show this.
Patrick McFadin: 20:42
There are different categories, you don't have to do all. Some of them you can use just want queries. How about just the updates, deletes, batch operations, that sort of thing. This is one of the things that might be handy in say like in a dev environment, you want to just maintain some control over your schema. Did it change at any point? I want to make sure I audit those changes. And any time there's an error if there's a request, I want to kick it out. So you can have an error log going for your CQL. So if someone's throwing bad queries at your database, you can catch those as well. So it shouldn't be a huge surprise if you work with databases and you know what audit logging in, none of this should be a surprise. There should be exactly what you needed.
Aaron Morton: 21:28
There's the auth one there, Patrick, for unauthorized access attempts. That's a really simple thing that is great to have that you can now say, "Oh, somebody tried to log into the database and they didn't have the right password." Great, that's going to go fire off a process. That's going to go and find out why that happened." And it could be someone got something wrong or it could be someone inside the network trying to get access.
Patrick McFadin: 21:57
Yeah, we've seen this time and time again where no SQL, it used to be no security. These are quickly coming to an end. By the way, Cassandra does make you set a password down so that's a good thing. Yeah, you can control it so go ahead.
Josh McKenzie: 22:28
So new features, there is this thing in Cassandra called change data capture. And it turns out I was the author that wrote it, and it is pretty bare bones. The original implementation of it prioritize two things and two things only. One was performance and two was correctness. And the intent was over time to evolve and build it. And you can kind of read the tea leaves and look at my career trajectory and how I started managing people right around the same timeframe and development slows down when these things happen. That being said, there is a pretty significant change in the way CDC works for 4.0. In prior version, the CDC would essentially serve almost like a commit log archiver. And the whole point is like, when you say, "I want to enable CDC on a table," to give you folks an understanding of the feature, you can say, "I want this data to be reflected that I care what happens with it changing."
Josh McKenzie: 23:18
And what Cassandra does is essentially archives that commit log file for you and says basically, "The rest of it's your problem." You need to parse that, you need to read that, you need to do duplicate and you'll actually read what's coming from that file. Part of the way Cassandra works is these commit log files actually hang around for quite some time until there's pressure in the system and things have to flush. So there's a change in 4.O where instead of you having to wait for the commit logs to actually flush or to redo the commit log files as they're live, they're actually hard-linked to a separate directory. And that way you can basically say, "Look, I know how much of this has actually been stored, I can incrementally process this. I can read it on the fly as it's going."
Josh McKenzie: 23:56
And it just provides a significantly more user-friendly, it's building up the scaffolding of the other tooling that sits on top of CDC consumption. And for how you can build on top of that, there's a couple of different things. One is the thing that we're actually looking into open sourcing from the DataStax side, which is a Kafka connector and some other code inside currently in the DSC product that we're looking at bringing into the open-source space called advanced replication to actually work with change data capture in that respect. But the other one that I would recommend taking a look at, Yuki is another one of the committers on the project. He's got a CDC example that's basically the starting out scaffolding for building a CDC consumer. And there's quite a few of these floating around in the world from the open-source side as well, different people have built CDC consumption on top of kind of the basic primitives that are built there.
Josh McKenzie: 24:49
So that's kind of the changes with CDC. Next, and this is a big one, the inside baseball here is we talk about zero-copy streaming, which is kind of a software engineering term for bypassing the userspace copies and all this stuff with the [inaudible 00:25:06] and just go straight to a buffer for the program to parse. The TLDR though is like we are streaming full SSTable files significantly faster than we have before, up to five times faster in benchmarks. What that means is your mean time to recovery is a lot better for node dies because you need to bootstrap a new node and the stuff gets copied across the wire, basically at CPU/network bottleneck speed.
Josh McKenzie: 25:27
That also means your elasticity is going to be significantly better on 4.0 as well. Because if you need to add a new node, it's going to be adding five times faster. So there's a blog post on the Cassandra blog about this to give more details and linked to the Jira and all the good gory details. But this is a pretty huge change for the project itself. One of Cassandra's super powers has always been availability, it's always been that that ability to both recover from commodity hardware failure, but also to scale up elastically and this is just doubling down on those kind of unique strengths for the project.
Josh McKenzie: 26:00
The third thing I was going to talk about here is client backpressure. Cassandra historically as a project or ethos was kind of a fail fast ethos, only it wasn't fast. It was fail eventually and accept a bunch of pressure in the system. The storage engine inside Cassandra is based on bigtable, it's called an LSM tree, log-structured merge tree. And the way that thing works is it basically allows massive, massive, massive ingest and it differs pain. So there's a lot of post-processing, there's flushing, there's compaction. You essentially take out debt in terms of processing that data for the purposes of dealing with immense amounts of bursts and ingest. And so Cassandra gets this reputation of being incredibly phenomenal at handling writes, at handling bursts, at elasticity. And then when things fall over, they fall over really hard because by that point you've accumulated this massive backlog of data processing you need to do.
Josh McKenzie: 26:54
So one of the things that's changing in 4.0 is this, the paradigm of just, "Keep accepting until we're flooded and then detonate from on high," that's not really going to be the way nodes work in 4.0 anymore. Part of what we do is kind of lean on a standard TCP for back pressure from an application perspective, but there's a new option now to actually send overloaded exceptions to clients when you have too many inbound messages and say, "Hey, back off, chill." And there's the expectation that all the clients in the Cassandra ecosystem aren't yet built with these kinds of affordances to actually expect the server to tell them, "Hey, please stop trying to murder me." But now you can start updating your clients with options on connection to actually get those exceptions, to know when you need to back off or to change your load balancing.
Josh McKenzie: 27:41
From an implementation perspective, part of what makes this particularly interesting, and this is where these delightful diagrams come in, we used to actually handle pulling stuff off the wire and putting stuff back on the wire in the same breath. Which means that if too much stuff is coming in, then you end up not putting stuff back on because you're flooded by ingress and then all of a sudden the whole system locks up and everything moves it back. And we lean on Netty, this library that Norman Mauer and a big team work on. It's a great high-performance networking IO library. So we've changed some of the way the code works inside Cassandra and so now the ingress and egress are different. Why that matters is with these delightful drafts, which is where we're talking about the performance in 4.0.
Josh McKenzie: 28:25
There's significantly higher throughput, having a system that actually gracefully backpressures and the latency is better as well. And Aaron's going to talk more about some of the details, but they surmounted an incredible amount of work. One of the engineers over at Netflix has just pages and pages of graphs of the performance on their large-scale profile testing of this stuff. This is a huge, huge improvement for the project, for the performance, but more importantly, for its stability and for its ability to actually gracefully handle massive burst and micro burst of ingress traffic, which is no part of its claim to fame.
Josh McKenzie: 28:59
And then last but not least, Patrick alluded to the full query logging and the audit logging. The question that everybody always has, it was the first question I had was how on earth are you going to audit log every single thing in the system without having a massive negative impact on performance? The answer to that is the high-frequency trading sector and this OpenHFT ChronicleQueue which is a super, super, super high-performance distributed few that they've written and it's really cool to actually kind of dig into that. There's an open source Java version of it to kind of look into what its performance characteristics are. There's just kind of a brain dump here of the different claims to fame for how this sucker works.
Josh McKenzie: 29:37
But for the software engineers that like to geek out on this stuff, it's basically reading on a random access file and pretending that Java is C and accessing this giant block of memory directly. And so that way two JVMs can talk to each other and under a microsecond, you can go machine to machine in under 10. This allows for the massive storage of all these different queries that are coming in and doing real-time logging, and also doing all the filtering, everything else without a significant impact on performance on your system.
Josh McKenzie: 30:08
So there's an adage where premature optimization is the real evil, and that's true, except for when you're an infrastructure software and it's not premature because this is in the hot path or the backbone of the internet. So it's pretty cool to see the tech that's in here and then anybody can dig this up and start taking a look but that's what we leaned on there. And I will hand it over now to Aaron to talk a little bit more in detail about those performance improvements.
Aaron Morton: 30:35
Awesome, thanks Josh. When we talk about performance, if you're an engineer, you think about making things go faster. If you are a manager or someone who pays for things, you think that performance reduces the total cost of ownership. It does both. And I want to go through today some of the performance numbers and hopefully it's going to make you think about how Cassandra 4.0 is going to make things go faster but also make things go cheaper. So let's start kind of at the courier, right? You should go look up this ticket, Cassandra-14654 from Chris Lohfink at Apple. It's pretty easy to get to look through. This is the type of work that's been going on in the Cassandra 4.0 release, the compaction process, which Patrick started with repair. There were two things that people complained about in Cassandra, compaction and repair.
Aaron Morton: 31:28
On the compaction process, this is squishing together all the data that Josh just said that we leave this bookkeeping work until later on. When it was running with really small partitions, lots of small partitions in our multiple SSTables, we would allocate a lot of memory that was putting a lot of pressure on our garbage collection processes. And you would see that in Cassandra, you would see garbage collection get worse, compaction's the problem. You throw the compaction back, you're doing a dance around that. Well, this ticket just goes in there and says, "Hang on a second, there's a bunch of allegations we didn't need to do. Let's stop doing them." And we're looking at sort of cutting it half the amount of memory allocations that we needed to do. That's the type of work that's gone on under the cupboards.
Aaron Morton: 32:19
Take that type of work and put it together with moving into the modern Java world with support for Java 11. It's been around for a while, Java 8 we know is no longer supported. One of the things you get with Java 11 is some new compaction strategy. We've got the G1 garbage collector, Zero GC can also be used there. And there's new one from Red Hat called Shenandoah out. Alex Dejanovski, TLP and DataStax employee and if we talk about Cassandra Reaper, one of the driving forces behind Reaper knows more about repair than anyone else. I think did some work a couple of months ago, looking at the beta version of 4.0, testing it with these different combinations of Java 11, the new garbage collectors, and really this headline number here 25 to 85% performance improvements. Whenever you see that, if you're someone who has to pay for things, that equals less money.
Aaron Morton: 33:24
Well, let's dive into some of these examples and I'll advise you to go and have a look at the blog because the whole methodology is put out here. It was testing on a three-node cluster with eight cores on each using the tlp-stress tool to generate up to 25,000 or 40, 50,000 operations per second against that. We tested on the same hardware, the same machines and these different combinations as you can see on the left-hand column here of garbage collection, Java version, that Cassandra version. When we look at just straight throughput here, we can see on the first row there, the CMS on version 3.11.6 with JDK 8, when we tried to reach 45,000 ops per second from a client perspective, we hit about 41,000. The zeros there mean that we started failing and so we said, "That's not good enough."
Aaron Morton: 34:19
If you scan your eye down to the third row from the bottom, CMS collector again on JDK 11, 51,000 ops per second is what we got to in that configuration. All we did was upgrade Cassandra and Java and we got to 25% throughput performance. That's amazing. Look at the next thing, latency. So again, look at the first row, the CMS 3.11, we're looking here at our p99 latency. So 17 milliseconds, sorry, at 25,000 ops per second. Third row from the bottom there, the CMS on JDK 11 dropped it down to 11 milliseconds so 30% reduction in our latency. But the clue here of the interesting thing is there's a green line that goes across the Shenandoah.
Aaron Morton: 35:16
So when we're running Shenandoah in those earlier charts we had showed, it was keeping up pretty much on the throughput but latency story here for Shenandoah is that on reads and on writes, it was up to 85% reduction in latency. So in this situation, we were running with a 31 gig heap, again, running these standard workloads, have a look at what it's got on that blog post. The thing you should take away from this is just upgrading the 4.0 and Java 11 then you do nothing else, you're going to get 25, 30% improvement. If you put it in a [crosstalk 00:35:54] test with Shenandoah, yeah, put in some improvement into work and test on Shenandoah, you can get up to 85% in some workloads. That is money for nothing and that's an amazing thing. Josh.
Josh McKenzie: 36:10
So what's next?
Patrick McFadin: 36:12
Oh, I'm going to say p99 again, can I say that again? One more time? Because that...
Aaron Morton: 36:21
You can say p99.
Patrick McFadin: 36:22
p99 because that graph it's like the first time I saw I'm like, "Oh, that's interesting." But then I really groped. That's p99, which is always the terrible number that everyone tries to hide. And here it is, pretty awesome. Yeah, so I'm just really excited.
Josh McKenzie: 36:39
I will say when you start reading into some of the implementation details on Shenandoah and ZGC, there's some really clever stuff happening there. That's some cool stuff. So, yeah.
Patrick McFadin: 36:48
Yeah, we had some folks from the OpenJDK project, I thought on the mailing list as well.
Josh McKenzie: 36:55
Yeah, they periodically show up and then we periodically do terrible things to the JDK and they're like, "Stop it. Don't do that." And we're like, [inaudible 00:37:00]. Anyway, that's how it works. All right, so what's next? There's a bunch of stuff. But a few of the different kind of housekeeping things that have taken place over the last couple of few years with Cassandra is we now have what we call CEPs, Cassandra Enhancement Proposals where individuals can bring forward things where they say, "Here's a major change I want to make to the system." And people can talk about it before they've written the code and put their 40,000 line patch out into the world. So this is a big cultural shift. You see this kind of thing happen in Spark and in Kafka and other, they have [inaudible 00:37:36] and like this is a thing, this is a pattern that's pretty common to Apache projects.
Josh McKenzie: 37:42
And the whole idea is that in infrastructure software you've got to at least measure twice before you cut once. And we should all measure together and we shouldn't do things behind closed doors and we should talk about that design upfront. These are things that we have not really been pushing heavily as a community yet because we're really grinding towards 4.0 and trying to hammer out all those dents and get everything perfect and beautiful and buffed and shined. But you can see there's a backlog of a bunch of stuff. The point in Cassandra's lifecycle where it is right now, is basically there's massive adoption, then there's proliferation of opinions, which in open-source manifest as forks. And then there's the unforking, the great unforking. There's actually an emoji for unforker, that is me on Slack, right? This is my true North Holy Grail which is...
Josh McKenzie: 38:28
There's a lot of companies that have really, really cool stuff they've done with Cassandra that they have not brought back into the open source space yet and donate it to the Apache Software Foundation. So really part of my true North and part of the goal for 2021 for the project is to start getting towards this unforking and adding new features into the system. A lot of the stuff that's listed here is stuff that contributors that are employed by DataStax are bringing to the project. And there's a variety of other companies with large forks that are bringing other stuff in as well.
Josh McKenzie: 38:57
So it's going to be cool to see how this stuff all comes together, but everybody's super excited about being past this massive stability push for 4.0 and kind of talking about what next steps are. There are some interesting things on this list, like the word join, which is a thing where people say, "Hey, what, wait a minute. This is Cassandra." And my answer to that is, "Yeah, there's a thing called Calcite. And so there's president here, Apache Calcite is a product that does this stuff."
Patrick McFadin: 39:24
Josh McKenzie: 39:24
One of the things we've got to get hammered out here is a release cadence that we all agree on as a project. We have, as Patrick alluded to, we've done a lot of different things before. We have done the yearly cadence we've done every two years, we did tick-tock where one month it was new features and the next month it was stabilization. At the end of the day, there's a happy medium that we want to target which is moving towards having something that is consumable by enterprises, but also is moving enough that new use cases get access to these new features rather than having to run something that's super tip of the spear.
Josh McKenzie: 40:02
So some of us on the PMC have talked a little bit about a twice-a-year cadence. We have to talk about it as a project, vote on it, figure it out. There's the Apache way for us to go through and do. But the TLDR there is the release cadence is not going to be whatever the straddle was between 3.0 and 4.0. There's the Kubernetes operator [inaudible 00:40:21] that we've worked on, in getting a single Kubernetes operator that's actually blessed and perhaps donate to the Apache Foundation itself to be entree with the project. There's some better search indexes coming down the pipe that are kind of based on some of the data structures inside who's seen, but attaching themselves to SSTables. There's exploration on cross partition transactions, on joins, on consensus algorithms that are optimizations on land ports, work with Paxos or Raft or other as yet undefined things where people are trolling through [LXID 00:40:50] to try to find interesting new consensus, algos. There's all kinds of stuff that's happening that people aren't sure and thinking about that you're going to start seeing coming down the pipe for the Cassandra project.
Josh McKenzie: 40:59
There's a new project that's hit as well as open source called Stargate, which is essentially taking the concept of a coordinator-only node, which showed up at the Cassandra summit in 2016. And people have done periodically and says basically, "What if the query coordination was separate from the storage engine at all times?" What if you separated your compute from your storage, what would that look like? What would it look like if you added a rest-based document API to Cassandra or graph QL support. And so these things are actually happening right now and that project is open source and available at stargate.io. So there's new API is coming into the ecosystem there as well.
Josh McKenzie: 41:36
And there's the focus on operator usability and developer's userability is this question of what makes Cassandra easier to consume? What brings the power of Cassandra to people instead of trying to drag people, kicking and screaming to the power of Cassandra. And that's the big inflection point that you're probably going to see what the project coming up this next year is moving into the modern, both in terms of the APIs available, but also in terms of the structure of the architecture and what's going on with Cassandra. And whatever your wishlist is, what do you want to work on? Open a CEP for a join the project, get working like I'm on the mailing list and on Jira with a bunch of other committers and PMC members. And the more the merrier, we want the tribe to grow, right? That is how Cassandra stays relevant and stays thriving as a project into the future. And that I believe is what we [crosstalk 00:42:29].
Patrick McFadin: 42:29
And that as they say is the that, but it isn't because we have a lot of great questions. And boy, did they come up pretty fast? I always love it. Like, "Oh, okay. Everyone paid attention to that." So what I would like to do is go through some of this stuff in here. I just marked something to answer that should not be answered. So I'm looking at all the questions that people asked and thank you very much. There seems to be one that's overriding all of them, which is how do I upgrade from 3.0? Well, let's just say, blah, 24.0. And Josh, you answered one in the Q&A and I think this is probably very generally interesting for everyone. So maybe you could expand on that.
Josh McKenzie: 43:19
Yeah, for sure. I type fast and so, anyway. So with the upgrade process, there's two different paths. One is if you're on 3.0, we recommend you go directly to 4.0 and there's a bunch of contributors that are running a whole lot of 3.0 that are vetting that pathway, to go from 3.0 latest to 4.0. The other path is if you're running on 3.11.latest, or if you're running on 2.1. Either of those, we tell you, if you're on a 2.1 or 2.2 it's a real version that exists. We're set out for windows support before 3.0 but whatever, we're not going to go there. If you're on 2.1, you go to 3.11.latest. Then you go to 4.0. 3.11 has got a lot of stuff in there that's pretty vastly improved.
Josh McKenzie: 43:58
So that interim timeframe where you're at a mixed version cluster, you want to go from 2.1 to 3.11 to 4.0. And so those are different tracks that we're taking in terms of vetting the upgrade process. The big lift is between 2.1 and 3.11. That's where the big storage engine changes took place, that's Cassandra 8099. That's kind of moving from a world of schema-less thrift into a world of keykey value, stronger type, stronger schema, CQL-ish storage engine. But going from 3.0 or 3.11 to 4.0 should be a significantly smaller lift for operators and for clusters.
Patrick McFadin: 44:36
Okay Aaron, you get thrown in YouTube hits.
Aaron Morton: 44:42
Okay, where did you go?
Patrick McFadin: 44:47
On the upgrade? Because I'm giving you a chance to answer. No, no, no. I think this is really important.
Aaron Morton: 44:55
Do we have time for one more question? Just because we're running low here. So if there are any others we can get to?
Patrick McFadin: 45:00
Well, yeah. I mean, we have 15 minutes. So all right, I will add we also have our Luna support. So if you really don't want to do it yourself, we can help. All right, let's move on. There are some other questions in here. Some of these are probably easier for sure. So there's a couple of questions that are somewhat related and it has to do with security. First is any support for encryption at rest or any encryption at all and encrypted passwords in the ML file. Anybody want to take that? Aaron, come on now. You were all ready for another question and answer.
Aaron Morton: 45:46
I'm not aware of any updates in the 4.0 or the encryption [inaudible 00:45:53] Yep. The big issue is with the key management and that normally comes from an infrastructure solution.
Josh McKenzie: 46:03
Yeah, what I will say as the answer is hit the mailing list, open a Jira, get engaged, let's have a discussion about it. I don't think anything's happened with that yet, but if that's the thing that people want, let's go.
Patrick McFadin: 46:14
And I think that's good general guidance also, if you're looking to upgrade to 4.0, you could try it now with beta. And we would really appreciate as a project any feedback or input you have before we go to RC or full GA. It will make a lot of people's lives easier if we have extra eyes. So that's a great contribution.
Josh McKenzie: 46:35
One thing that I'll throw out there is the point of the beta and part of what we committed to is API stability. So anything that you roll out for the beta for endeavor QA environment, you can expect anything you build to that will remain the same for when at GAs. And in the past, I would equate the stability of this beta to what would have previously been Adopt 2 or Adopt 3 release. We have definitely shifted from the, "We're going to lean on the community and all of us testing things out, do it live, just and broad," into a, "Test it and validate it programmatically before it's released." So testing the beta is a different ask for 4.0 than it's ever been before in terms of the stability you can expect from the code itself right now.
Patrick McFadin: 47:16
All right, next question. On this idea of performance, is there any optimizations for around scanning tombstones? And I'm assuming someone's on 3.0 to 4.0.
Aaron Morton: 47:34
Yeah. I'm not aware of any specific improvements for that, but if compaction becomes less of a concern for you, if compaction that's faster, compaction is the thing that gets those tombstones out of the read path. So you will get some indirect improvements there because your compaction is going run faster because it has a lower impact on the process, the accenture process as a whole, you can get those tombstones out.
Patrick McFadin: 48:07
All right. Anything to add Josh?
Josh McKenzie: 48:10
No, he's totally spot on. The thing is I say no and then I have words to say and... Anyway.
Patrick McFadin: 48:17
Like a politician.
Josh McKenzie: 48:21
2020 is not a year to compare me with politicians, thank you very much. There's optimizations to the way mentables are storing data. There's optimizations to how compaction's working. There's optimizations to repair, there's optimizations across the board. There's nothing specifically targeting tombstones. And part of the problem with tombstones is they're an artifact of the LSM model for storage engines, where you basically have duplications of data and you have to have tombstones to cover up the fact that there was data, we're not deleting stuff in place. So there's going to be a limit to how well we can optimize around tombstones, a fact of life for the storage engine.
Josh McKenzie: 48:58
That being said, there's been some good refactoring happening in the code base. There's Rocksandra, which is a version of Cassandra that's backed by RocksDB for the storage engine. There has been discussion about modularizing the code base and making it so you have pluggable storage engine at the backend. So, I mean, there could be a future in which a couple of years from now we've got a version of Cassandra that doesn't even do tombstones at all. That's obviously not a 4.0 thing, that's not a this year thing, but that is kind of where the thinking of some of the members of the project are.
Patrick McFadin: 49:26
Excellent. So yeah, I think this goes to some of where we're going with the project, with pluggable storage and what I think the path to 5.0 is going to be this variety of changes. So getting involved in the project now is going to be a lot of fun, especially if we have things like this. Next question, and this is probably for you, Josh. This is around the client back-off. It says, "In 4.0 why is the inbound messaging volume management thrown into the clients? Shouldn't the engine handle that more gracefully?"
Josh McKenzie: 50:03
This is the religious debate, right? This is always the question [inaudible 00:50:08] client do it or do you do it on the server side? And the answer in 4.0 is [inaudible 00:50:14], right? By default, it is handled service side by default service side, leans on TCP backpressure to handle it. So there's a limit to what you can do, though. If you've got a pipe that can handle X gallons of water and X times 10 gallons keep getting shoved at it... Physics, right? At some point, short of like auto-expanding your cluster and partitioning data and shutting things and getting into crazy town. At some point you have to have an expectation of a certain degree of client behavior. But by default, it doesn't actually punt to the client, it doesn't push the overloaded exception. That's a configurable client option on connection.
Aaron Morton: 50:57
Yeah. There's use cases this is perfect for, right? You're making the overnight bulk loading tool then make that go as fast as you possibly can, until you get this exception come back and then have your application handle it back, right? Maybe it doesn't work for a [inaudible 00:51:15] setting IOT, API calls. We can't go and tell them to slow down, we don't want to do that. In the use case where you're loading data as fast as you possibly can, or reading as fast as you possibly can through a spark job, it's perfect, right? You don't have to guess at anymore, you can tune it down at the point that it's happening rather than pick a broad number and go, "Hey, don't go any more than 1000 ops a second because last week that broke it."
Patrick McFadin: 51:45
Yeah. Well, I mean, this is just the classic, "Oh yeah, I love it." It's kind of religious word, but one that I'm happy we're falling in the right side of history for. All right, another one good question.
Josh McKenzie: 51:57
Right? The right answer is allow the individuals to choose, allow the ecosystem to choose. Now, you don't want to end up with the bowl, I call it a bowl of razors, right? Like if you got a Yaml file that has 400 parameters and all of them are super, super tweaky. That's not where you want to be. But you do want to have like, here's the same defaults. Like this is the philosophy of the Cassandra distribution. It's like, "Here's the same defaults for all these different domains." And absolutely you can have stuff that falls inside those boundaries. And you can tune and you can change and you can customize things to be the bespoke ecosystem that you need. But the whole like here's the large curve of use cases. If 80% of them are solved by thing X, we should probably just default to doing thing X. In this case we're not changing the default on how the server behaves because obviously that would break all client applications ever. But having the option to tap new things except those new domains. And it's kind of good hygiene for a project moving forward.
Patrick McFadin: 52:55
Yeah. All right, a follow-up question which I think is actually really good that someone asked us because this is where the three of us kind of get stuck in our own world. And the question is around upgrade still, but it's a really good one. It's like, can you just upgrade in place or is there some sort of complicated procedure? I think it's a lot simpler, it can be a lot simpler. Aaron, you can take this.
Aaron Morton: 53:23
Do you want us to do the one more Patrick on that. Do we just go yes and move on or it should be more forthcoming with, yeah.
Patrick McFadin: 53:31
Yes, online upcomings. 20 sentences.
Aaron Morton: 53:34
Yeah, as Joshua was saying, there's often a minimum version that you need to get from, if you're on the 2.x that's five years old, and we're probably going to have an intermediate step to get you onto the 3.x expert run. But if you're on 3.11 something, it is an in-place upgrade, it is a rolling upgrade. I saw a question there as well around the Kubernetes operator. Having been in Cassandra for 10 years, the Kubernetes operator should have been there 10 years ago because that is the operation that Kubernetes is designed to take out of the picture to say, "Go and do my upgrade and make sure I'm available the whole time."
Aaron Morton: 54:16
Now, if you're self-managing, it is as simple as shutting down the process, changing the bins and starting the process up again. Read the News.txt file that ships, if you're self-managed because they will give you any detailed changes and instructions that you need to follow for the authorization. But Cassandra takes care of it and it can run in a mixed mode, you can have 10 nodes and just go round. I wouldn't run mixed versions for a month and you can just go around and do the upgrade because it has to be able to work. Upgrades to Cassandra adjust a series of preplanned small failures, the whole thing's designed to deal with failures. Upgrades are just the failures we push onto the cluster.
Patrick McFadin: 55:01
Right? Yeah. I think my short answer is upgrading in place online without downtime is a really important thing about Cassandra. So why would we say no to that?
Aaron Morton: 55:13
Do it on Monday, don't do it Friday night or on the weekend. Do your upgrades Monday morning at 10:00 AM. That's when you should do.
Josh McKenzie: 55:22
Patrick McFadin: 55:23
That's the golden rule of ops, is thou shalt not do an upgrade on Friday, unless thou wants to work on Saturday. That's just good old-fashioned operations knowledge. All right, one more question I'm looking for... Let's see, where was it? I was looking at. Ah, man, here's so many good questions in there and I missed it. You guys can see these questions, you can help me? All right. Oh, does 4.0 have any more system parameters that can be modified dynamically using no tool? And I think this is actually pretty insightful because virtual tables is a read-only. So this is modified dynamically. So Josh, you maybe want to take that?
Josh McKenzie: 56:14
It's making my brain hurt. So there's SQL tool for the full query logging which is like a whole new ecosystem of stuff, but in terms of no tool itself and tuning system parameters, I don't know off the top of my head. I think I'd have to check News.txt and see if there's anything added in there.
Patrick McFadin: 56:32
Josh McKenzie: 56:33
Most of the parameters are on initialization in the Yaml file because usually they're buffer sizes and there are other things that initialize that startup time for the node and not something that dynamically changes while it's in flight. What I will say is that my white whale is the fact SEDA, which is the core scheduling architecture in Cassandra is actually built to dynamically tune itself, which Cassandra does not do. So for me, there is a future in which Cassandra is a lot more self-tuning in terms of buffer sizing and resource allocation. But I don't think anything major has happened there on the 4.0 window.
Patrick McFadin: 57:08
Yeah. And I think that would be one of those features that will continuously evolve as we just... No one wants to use JMX anymore. So if you find yourself using JMX, maybe you'll fix that. But shameless plug, I'm going to do more shameless plugs for k8ssandra, and operators is like you should pretty much get into the mode of not having to manage your Cassandra cluster as much anymore. I just said that, mic drop.
Josh McKenzie: 57:40
Beautiful webpages to use, to tell you how it's all going with shiny graphs, the data.
Patrick McFadin: 57:47
I don't want to use the [inaudible 00:57:48] anymore. Well, coop cuddle one time, helm. Helm and coop cuddle. That's it, done. Aaron, did you have anything to add to this really insightful conversation?
Aaron Morton: 58:01
No, don't make the same mistakes that Patrick and I made. Don't install Cassandra by yourself, use Kubernetes.
Patrick McFadin: 58:17
Yep, we are. And I really appreciate the great... There's a lot of questions in there. We will continue to answer these questions, don't worry. We'll try to post these good questions for everyone for the whole community. I think at this point, we're going to wrap things up. This will be available so if you want to share this with your friends, that's great. I want to thank Aaron for joining me across transpacific and then I want to thank Josh for joining me across the United States. Let's go for technology here, we are a fully distributed system. I think we're 120 degrees off of each other, which is really cool.
Patrick McFadin: 59:02
But we have a lot of content about Cassandra on datastax.com/dev. If you want to go sign up for more webinars, datastax.com/webinars. And of course Astra, if you want to use a 4.0 and not have to run it, use it, don't run it. It's easier to rent, trust me. That's all I use anymore is Astra. So you can go to astra.datastax.com, sign up, get a free tier five gigabytes for life. And there's a really a lot of cool stuff happening in the community. So get involved, join us over at cassandra.apache.org. And until we ship this thing, we will talk to you then. Thanks everyone.
Aaron Morton: 59:46
Josh McKenzie: 59:47