Toggle Menu

Episode 8

From DBA to SRE: 2021 Predictions for Data on Kubernetes with Patrick McFadin

With data comes DBAs and with Kubernetes comes SREs. Listen in as Patrick McFadin and Sam discuss what’s in store in 2021 for Data on Kubernetes, how experienced DBA roles can evolve into very effective SREs, and why today is THE day to learn Kubernetes.

Published December 23rd, 2020  |  31:00 Runtime

Episode Guest

Episode Transcript

Patrick McFadin:
We don't have to make a lot of ceremony around the database. It's integrated into a total package and when people are thinking of what they're going to deploy next and quickly, they're thinking here's all the parts I can do this today or I can finish this project in a couple of weeks, move quickly. And I have all the assurances, I feel very assured that all these things are coming together and they will work for me in the future. And that's really cool to see. And when I see the bigger picture like that from somebody else it's just eye-opening.

Sam Ramji:
Hi, I'm Sam Ramji and you're listening to Open Source Data. This week we welcome Patrick McFadin. Patrick started his professional career in the Navy doing digital communication, including moving physical data tapes from sonar arrays to mini computers. While on a destroyer in the North Atlantic, he joined the internet all the things wave in the 1990s and he was an Oracle DBA for over 15 years. He ended up as the chief evangelist for Apache Cassandra and now he's head of developer relations for DataStax. Patrick, welcome to the show.

Patrick McFadin:
Thank you, Sam. That was a nice introduction.

Sam Ramji:
It's going to be a nice conversation because we get to talk about our favorite things, data in 2021, which can't help but be a better year than the last one. But data is getting better all the time. I'm looking forward to asking you a few good questions. You ready?

Patrick McFadin:
I'm already here, yeah. I mean, what could be bad about data, right? That's probably the best part of 2020.

Sam Ramji:
For sure. So my favorite question is what does open source data mean to you?

Patrick McFadin:
Open source data, and this is my interpretation, I know there's many. But I feel that goes into the core of information needs to be free and not free isn't freeze and beer but free as in freedom and you're free to do with what you need. So one of the reasons I got into open source databases was because I feel like that is a commodity is an important part of our the infrastructure of how the world works. Having closed source just seemed like the antithesis of my philosophical belief, knowing that there's a way that open source can make data a part of our lives and know that you could trust it and that it's available to anyone, pretty critical.

Sam Ramji:
I love the freedom part of open source. It's easy to make it a little bit sort of bloodless and clinical and treat it as just a way of sharing intellectual property but that focus on the freedom of the user to not be harmed by decisions that provider might make in future, seems incredibly relevant as data becomes the way that we run our companies and kind of run the world.

Patrick McFadin:
I started out in my database career using Oracle and it was clearly not an open source database but that had its moments, just like you said. And you get scarred over time with experience and you're like, "Wait a minute, someone made a change that I have no control over and now I'm bearing the consequences of it."

Sam Ramji:
Which is an interesting reflection on the 2000s, right? Because then you look at what, Monty Widenius and Mårten Mickos and the team were doing at MySQL. And they built such an extraordinary following that open source at data and then reached a billion-dollar valuation. They were acquired by Sun, which was in turn acquired by Oracle. So the value of the data and the need for a sort of flexible and free flow of that data got realized in open-source actually relatively early on.

Patrick McFadin:
That was first real open source database. I use too was MySQL. I didn't need to see the source code all the time, but when there was something really critical, I knew I could go see it. But then there was just this psychological safety around that.

Sam Ramji:
Kind of looking at where we're going. You mentioned your focus on freedom as a component of open-source. It turns out that freedom is very valuable, right? It creates a lot more adoption. When I was at Microsoft in the 2000s, we were also looking at acquiring MySQL and I participated in an evaluation exercise with the SQL server team. And we ended up valuing MySQL and about $500.. $550 million. And I'm not sure anybody is told the story before but I was there on the team doing it, working with Marten.

Patrick McFadin:
This is news to me.

Sam Ramji:
Yea, so, we put a value of about $550 million on it, which we understood how to value adoption. But Jonathan at Sun ended up being able to put a billion dollar valuation on it and the value is defined by the market. Oracle then bought Sun potentially largely for the value of MySQL. So all of those things have ended up catalyzing the 2010s, which was an explosion of open source data. But I'm really interested in next year, which is kind of the beginning of the 2020s. One of the things that you've been working on a lot is how do you do data on Kubernetes and in particular, obviously you know Cassandra real well. What do you think is going to show up in data on Kubernetes in 2021?

Patrick McFadin:
It's this conjuncture things are coming together really nicely and it's this whole concept of stateful workloads on Kubernetes. I think this has been a fairly new focus for the Kubernetes project over the past couple of years. And as infrastructure goes things happen in that timeframe. But what's really fascinating to me about this data on Kubernetes is how closely aligns to what we want Kubernetes to be in the future, which is your complete application plane. It's not the control plane is there from beginning to end. And minimizing the amount of toil that it takes to deploy applications, I should not spend as much effort in 2021 as I did in 2001 to deploy an application. And when I look at data on Kubernetes, it's like that big thing it's exciting because it's like, I feel like that's what I've been hoping for most of my careers. Just, it's not a database. It's how do I work with data.

Sam Ramji:
Data is a thing that Kubernetes wasn't really designed for and I've told this story a few times, so I won't belabor the point here but Google is built on an incredibly elastic compute environment and an incredibly elastic data environment all of which works really well together. When Kubernetes was brought to the world by Eric Brewer and Craig McLuckie and Joe Beda and others, it focused on taking the compute part built on Borg and turning it into something that the world could use an open-source as Kubernetes but the symmetric component of data never happened. So when you look at that, it's really a stateless environment. So, what does it mean for state, right for data on Kubernetes to kind of come into its own next year?

Patrick McFadin:
This is very parallel universe kind of argument. I think this is also freeze and freedom. And you look at what Kubernetes is already done. It's breaking down barriers on clouds. You don't have to use walled gardens, you can rent your database, you can rent services. There's that freedom. If I want to run data services in my own environment, I'm free to do that. If I want to run it in my environment that's utilizing some sort of cloud infrastructure such as EC2, I can do that. If I just want to rent it, I can do that. The portability is there and freeze and freedom is really critical. And it's, we're back to that again.

Sam Ramji:
The state is still quite difficult, right? Because when we think about stateful sets, there's a set of affordances that statelessness doesn't require. You can spontaneously restart clusters. You know, that you can just reboot the containers and you can kind of ripple that across a cluster or a cluster of clusters without really worrying about much as long as you've got sticky sessions, reasonable load balancing, yet the true state that these things depend on the affordance has to be a little bit different, right? You don't want to just spontaneously restart the database while it's doing a backup process or while it's maybe taking care of tombstoning or what not. So what are you seeing in the community about the challenges of managing StatefulSets or where the specific details of dealing with databases, data stores, datasets in a Kubernetes managed environment showing up.

Patrick McFadin:
Yeah, Let's put on the scuba gear. This is where it gets into the inner workings of Kubernetes. You know, that you mentioned StatefulSets and DaemonSets and all of the things that go around like stateful applications require storage. That is the state where state is managed. And up to this point, it's been pretty loose in Kubernetes and there's been some release giving it changes in Kubernetes, probably in the recent years about, all right, we want to pin storage to this node. So if you fire up a pod, let's say a Cassandra node has data attached to it, that's pretty critical. That's not a random thing, you can't just mix and match. There's some cool companies doing stuff. My data has OpenEBS.

Patrick McFadin:
I see what some of the cloud companies now are doing around LPV is and local persistent volumes. It's beginning to acknowledge that this problem is first-class because data needs to have that, you can't play fast and loose. And it's opening up a whole other class of storage instead of just a general storage parameter for Kubernetes, we really are defining a data class of storage. There are vendors and there are open source projects out there that are directly addressing that problem, which I think is a really cool problem to solve.

Sam Ramji:
And there's not a lot actually yet, right? So it seems like there's a coming explosion, right? You've got the tests, our way of scaling out MySQL, which I think came from the YouTube team originally, you got things like a TiKV from pingCAP. Congratulations to pingCAP, they just raised about $270 million expanding their TiDB database. There's etcd of course at the core of every Kube cluster. But when you look at the explosion of different ways to manage data, all the, NoSQL, all of the SQL, all the graph databases versus what are the ones that are really Kubernetes native, you see a pretty low ratio. Any thoughts about how that's going to change next year? Do you think that ratio is going to expand massively? Or do you think there are some things holding it back?

Patrick McFadin:
I think it will, but I mean, right now the ratio is heavily biased towards Ingress. You throw a rock and you can get 10 great Ingress implementations but I think that you're going to see it swing back towards that stateful workloads and storage, the data storage itself like physical storage. I predict that we're going to start seeing some really clever ways of doing storage in Kubernetes that maybe weren't available on a physical box. There are many ways you can hook you can use SAN, you can use NAS we've got as established group of ways of connecting storage to a physical server. But I think Kubernetes and this is where I see a lot of the interesting stuff. I mentioned, like EBS is one good example of this mix and match like, okay, how does Kubernetes want to work? It's very composable set of tools. And then we start layering on things. I think this is where 2021 is going to start showing some real progress there we'll all start using it, is composing the right things. Like I said, a class of storage, a class of understanding, and Kubernetes that is for data itself.

Sam Ramji:
And there's been some interesting acquisitions and acceleration of companies at the end of 2020 in Kubernetes on data. Kasten.io was acquired by Veeam. You've got the Portworx acquisition by Pure Storage. You've got a Portworx alum, Eric Han, who's driving Kubernetes approach to being able to do filers and NAS at NetApp. Some stuff that I've seen at Dell around NVMe over fiber. So the different ways that you're finding to attach storage to compute is getting really interesting with the standardization of Kubernetes as your compute control plane, then the actual technical ability to get this all targeted at one environment is getting a lot more attractable. The interesting thing about Kubernetes though too, you've talked about a lot is, where we see Kubernetes you'll find an SRE.

Patrick McFadin:
Right.

Sam Ramji:
And it's kind of a new term, right? The old TLA for people who did operations was DBA, right? So the three-letter acronym for operations with the database administrator, right? And you're seeing DBA has become SREs (site reliability engineers), which is a role created by Benjamin Treynor at Google. I've got a chance to talk with him when I was working there, actually met him in my interview process in 2016. And he said, SRE is what you would get, if you asked a software engineer to take accountability for operations. In the intervening four years, that's become a huge trend. Everybody's talking about SRE, you talk about SRE. So what do you think is going to happen in 2021 in terms of the movement between DBAs and SREs?

Patrick McFadin:
Well, with data comes DBA is right, but with Kubernetes comes SREs and if we're going to put data on Kubernetes, then I think it's pretty logical to see where the DBA role will evolve and we'll see this migration or this really upskilling of the folks that are just focused on just making sure the database works, the what into this SRE role, where they're thinking about the how. How do we deploy a data service? How does it interact with the rest of the stack? It's really, it's a change in mentality, but it's bringing a lot of the same skill sets with some new definitions.

Sam Ramji:
And so what skills are missing in order to bridge someone who is a super capable decades of experience DBA, and to being a very effective SRE.

Patrick McFadin:
I think that what's missing, there's probably plenty of skills that a DBA can up level but thinking about it in terms of like, all right, what is a Kubernetes deployment need? Well, observability is a key component of that. I know how to monitor my database. Right. But do you know how to trace the calls from your database to your application? Do you see that interaction happening in an observable fashion? Can you deploy the entire stack as a unit instead of just thinking, well, here's your port developers come and get it. That's really the difference in how, the how. Because when a DBA is trying to think in terms of how this fits into the larger scheme, I lived as a DBA for a long time. You're like your job as a DBA is to block developers from doing bad things to your database. Well, that mentality is going to change.

Sam Ramji:
Yeah. And there's more about traceability. I love what you're saying about observability. I've heard some incredibly thoughtful and insightful comments from people like Charity Majors at honeycomb, Liz Fong-Jones is at honeycomb now as well as it turns out. Traceability, things like Jaeger and other ways to find out what's going on between every element of the distributed system with databases being distributed, as well as being elements of the distributed system, observability with insight across all of these different nodes becomes super important. Do you see any technologies that you're excited about or any changes in practice coming in the next year will be more popular to make this stuff work better?

Patrick McFadin:
I think Jeager is an implementation, but the OpenTracing spec, which is, I think this is where we're all getting to, tracing is important let's turn it into something a little more common. OpenTracing, I'm a big fan of it. I hope we can support it more for sure. This is where I see that we're databases are going to be a part of Kubernetes is thinking about the observability stack is more than just an application. And you mentioned Charity and I've had these conversations about how databases have their own world of monitoring and things going on, that has to stop.

Patrick McFadin:
We can't think of this Island of data and then everything else. With OpenTracing there's some of the things that I've saw like in the Apache Cassandra project, it has hooks built in to the database, et cetera, et cetera, through the driver that have worked with Zipkin. Well, Zipkin is moving to OpenTracing as well. I see this, this migration of everything going in that direction pretty solidly. And it's going to make the DBA very welcoming to the SRE community as well, because they're going to have a lot of that big knowledge that they can bring with them.

Sam Ramji:
You were telling me about a couple of other day-to-day SRE tools that also have a Cassandra backend integration. One was Prometheus and the other one was Loki, which is part of the Grafana stack. Do you want to talk a little bit more about them?

Patrick McFadin:
Yeah and Cortex, this is a funny thing where it's like, "Oh, this is turtles all the way down", but Prometheus, of course is very popular for a monitoring tool uses Cortex and is a middleware into database engines. And that can use a lot of different databases, including Cassandra. So in some cases you may find that you're deploying a database to monitor your database. And that's where I was saying, like the turtles all the way down.

Sam Ramji:
Say in AI, and I'm a lisp person ... that's just called recursion. I do like good old concepts, but yes.

Patrick McFadin:
A recursion!

Patrick McFadin:
Yeah, eventually you're going to run out of databases, but I mean, it's fit for purpose. So you may not need a Cassandra database to run your application, but you may need a Cassandra database to monitor it, making that easy is really important. Loki which is another one of those abstraction engines for Grafana can use Cassandra as well and other databases, but using Cassandra as a backing store for your monitoring, all of a sudden turns a lot of people into Cassandra users when they don't even know it.

Sam Ramji:
And those people are likely be SRE. I've had an opportunity to talk with a real thoughtful engineering leader and practitioner Tom Offermann, who you know well at New Relic. And I was surprised to find out that despite how strong New Relic's general set of data technologies is including NRDB, they also use Cassandra within the set of operations and SRE tools they've got.

Patrick McFadin:
Well, it's fit for purpose. And if you're, especially, if you're going to deploy into a distributed environment, you want to have something, a database that matches up with it. It has a parody check with, is it a distributed database? This is our distributed environment. One of the reasons I've really been focusing on running Cassandra and Kubernetes and why data and Kubernetes is a big deal is because these are starting to match now and even in the observability tier, which I think is really cool.

Sam Ramji:
Yeah. It was nice to be able to see that kind of crossover right. Suggest that you're doing the right thing. So speaking of that, right. As a champion of DBAs and SREs and Cassandra, you have been in the engine room for something called K8ssandra.

Patrick McFadin:
Yeah.

Sam Ramji:
Would you talk a little bit about K8ssandra and then would you talk a little bit about what you're trying to do for the SRE community with that particular distribution and how it's managed?

Patrick McFadin:
Yeah. So like I'm thinking of engine room like a Scotty or engine room like shoveling coal, probably both. I need a little more power, I need more Cassandra.

Sam Ramji:
You are trying to do Lithium crystals, but I'm sure you can make more.

Patrick McFadin:
Yeah. So K8ssandra is, I think this is one of those projects that it just became as is as an evolution of a lot of other things that were happening. So this was released a few weeks ago at KubeCon, but it was really built from an effort that was going on around the Cassandra Operator in the project. This was a critical first step for running Cassandra and Kubernetes.

Sam Ramji:
Well, let me pause, like talk a little bit about what is an Operator.

Patrick McFadin:
Okay. An Operator is a go-between as some sort of running server like a database and Kubernetes. So this is based on the fact that this thing may not have been built for Kubernetes in the first place, but what an Operator does is it acts like, my favorite analogy as a robot-like in your data center. When Kubernetes says, I need more power, you know when an engine goes into the engine room, it goes to the Operator and the Operator knows how to translate it to the thing behind it. So in the case of like Cassandra, where Kubernetes is like, "Oh, we're missing, we don't have enough pods, we're not matching up with the state, we need to add more pods." Instead of just randomly doing that. It tells the Operator in the Operator is, like, "I know what you're saying, let me do it the correct way that Cassandra will take this." And so it's managing things like stateful workloads on top of the stateful services, like underlying storage. So it's making a translation from one place to the other.

Sam Ramji:
Right - So that's the Operator pattern was established by Alex Polvi and the team at CoreOS and is obviously now part of Red Hat, right? How do you manage effectively foreign infrastructure, things that are built Kubernetes native and make them work in a Kube environment?

Patrick McFadin:
There are a lot of operators. There's OperatorHub that has an Operator for about any piece of software you're want to run an infrastructure now.

Sam Ramji:
What is specific about what you've done with the Cassandra Operator and more make it useful for SREs, right? This idea of automation. You talked about robotics in distributed systems. We think a lot about automation, again, another word that you hear a lot from Charity and the SRE and observability community.

Patrick McFadin:
First and foremost, every SRE must be infinitely lazy and put work hard. I think this is anybody's who's involved in infrastructure knows that you try to automate yourself out of a job to no avail. You will always have a job because that's how you get scale. But what we've put into the Cassandra Operator, what the Cassandra Operator has been trying to accomplish is taking knowledge about how to scale, scale up, scale down, do things like replacing bad and broken nodes, running maintenance operations, that sort of thing. But also and it's not just running the system, it's also the monitoring of the system. So being able to pull out metrics and make them into a presentable way to something like Prometheus is so that as you're deploying Cassandra into an environment like this CassOperator, this is things we're working on is how do you make it really useful in a Kubernetes cluster, not just running it but also monitoring it and maintaining it.

Sam Ramji:
So one of the things that you talked a bunch about is how K8ssandra is a distribution of a whole bunch of these technologies but also of some practices, right? That skilled Cassandra SRE would put into production and automate themselves. So how do you take knowledge? Which usually is like institutional experience, it's conversations, it's things you talk about over Slack or IRC, maybe it's in the docs. And how do you automate that so that you can sleep when your infrastructure is trying to scale.

Patrick McFadin:
First thing, that it will be an, it's an infinite task, right? We will never get to a place where we're code complete, which is great. The thing about a project like K8ssandra, the reason I say it's a project to gather SRE knowledge is because we're not changing the individual components. We are creating a project that helps you deploy everything as a package. So you mentioned, how do you sleep at night? Well, knowing that whenever I deploy Cassandra in my Kubernetes environment with all my special stuff, maybe I don't use Prometheus, I use something else or I use New Relic or Datadog, knowing that it deploys it'll be fine, pretty critical. But how do we gather that knowledge? And right now what we're doing is we're using Helm charts to deploy things. And Helm charts are like the old package management like the APT packages or deb packages, YUM in RPMs.

Patrick McFadin:
This is not a new concept. But if you think of Kubernetes like an operating system, infrastructure operating system, Helm is like the installer for that. The combinations of things like, how I backup my data, how I maintain it, and thinking about expanding that, alright, there are other concepts in there. A really important one is how do I do networking, knowing that if I'm going to connect the two clusters together, how do I know I get reliable networking that's secure? And the worst thing that we could have in our community of users is that people start building these one-off implementations and never share how they do it. And that's what K8ssandra is therefore is, "I did this, I want to share".

Sam Ramji:
And how does the sharing work? Is there an artifact?

Patrick McFadin:
Well, there's a couple of ways that it happens. First of all, like I mentioned, there is an artifact with the Helm chart and you can make modifications to the Helm chart, but you can also add components into the K8ssandra project. And there's different operators that work together but the Helm chart is the main way we're doing it right now. And so when you say Helm install, K8ssandra cluster, you get all of the things that you need and you can define it. There's some declarative parts of it like, "Oh, I don't need backups." Maybe you're just, I hope you're just doing that in dev, there's certain I want to run some process in a different way. That's fine. But right now it's just the word Helm install, and you should be able to get what you need.

Sam Ramji:
So the institutional knowledge then is more like infrastructure is code where you can put in your opinion, your point of view contributed, share it and it'll be part of what you're doing under your Helm 3 installation. Is that right?

Patrick McFadin:
Yeah. And I think that's a good way to characterize it. And when we start moving away from Helm into other ways of deploying, it'll be the same concept though. Here's a selection of things that are, what's the best practice, right? And this is the best practice in code. And if it doesn't work for you, I think that's the feedback mechanism is if there is some concept that didn't work like, "Oh, I don't use this storage paradigm, I use something very different." Well, it gives you an opportunity to feed that back.

Sam Ramji:
Let's pop out a couple of layers of abstraction, right? That sounds like a pattern that pretty much any stateful technology could use on Kubernetes could deploy through a Helm chart, could represent institutional knowledge as infrastructure and code. Can I have a few different points of view? How do you think that this pattern is going to shift in 2021, right. Is that going to make it a lot easier for open-source data infrastructure? Where do you think that's going to connect Cloud and OSS.

Patrick McFadin:
One of the best interactions they had on this? It just kind of clicked for me. There's the L8ist Sh9y podcast I was on that. And Rob, the guy that I was talking with, he just, all of a sudden just stopped everything. And he was like, "This is my Nirvana." I'm like, "What are you talking about?" He's like, "This is like installing things in windows, we're finally getting to this point with all of my infrastructure, if we're here." And he was just super excited about that and I'm just thinking about the bits like, "Oh, here we can run repairs or we can do things that are inside of Cassandra." But he's really thinking the big picture, I think that's 2021 where the light bulbs start to click and people start seeing it like, "Oh, the people deploying these things that I need to run my application it's like double-clicking on an installer file, like running Chrome on my desktop or something. And I just really like, "Wow, what is that going to create in 2021?" When people just feel like everything is that way.

Sam Ramji:
You may have been digging a new groove with the particular use case of Cassandra that represents what people are going to be able to do with distributed infrastructure. Using Helm charts is almost a KPM. It's almost like a Kubernetes Package Manager that you can have a consistent installation process across a distributed infrastructure for things that want to be distributed like distributed databases.

Patrick McFadin:
Yeah. And it takes that point of view where you just like, "Wow, that was a pretty much a quantum shift in my brain." Like, Whoa here I was hyper-focused. But now I think this is kind of a, like I said a dream realized is where we don't have to make a lot of putting a lot of ceremony around the database, data hard. It's integrated into a total package and when people are thinking of what they're going to deploy next and quickly, they're thinking here's all the parts I can do this today or I can finish this project in a couple of weeks and move quickly. And I have all the assurances. I feel very assured that all these things are coming together and they will work for me in the future, not trading things off. I'm not going to get ruined in three weeks whenever it scales out and I can't keep up with it. And that's really cool to see. And when I see the bigger picture like that from somebody else, it's just eye-opening.

Sam Ramji:
We are coming at the end of our time, so I like to ask everybody on this show to offer one resource or piece of advice that you'd give to anybody who was interested in what you've been talking about.

Patrick McFadin:
One piece of advice. I really think that everybody that, if you're especially someone in infrastructure or someone like me who maybe has a deeper background as being a DBA or even in database development, today's the day go learn how to run Kubernetes from a command line. Learn how to use kubectl, install it, run it, deploy something simple. You may not do that in the future, you'd probably use a service to deploy your Kubernetes, but today is the data understand how it works because that train is coming and it's going to be great, but don't get left behind.

Sam Ramji:
Yeah. So go grab Kelsey Hightower's "Kubernetes The Hard Way".

Patrick McFadin:
If you're feeling like that? Sure. Yeah and that's a great one.

Sam Ramji:
What folks might not know is that Patrick runs his house on Kubernetes. No joke, definitely ask him questions. Maybe ping him on Twitter. Patrick, it's been a pleasure. Really appreciate you spending the time on the podcast from everybody at Open Source Data to everybody who's listening. We wish you peace, happiness and good health through the holiday season and we wish you a fantastic 2021. Thank you so much.

Patrick McFadin:
Thanks Sam.

Narrator:
Thank you so much for tuning in to today's episode of the Open Source Data podcast, hosted by DataStax Chief Strategy officer, Sam Ramji. We're privileged and excited to feature many more guests who will share their perspectives on the future of software. So please stay tuned. If you haven't already done so, subscribe to this series to be notified when a new conversation is released and feel free to drop us any questions or feedback and opensourcedata@datastax.com.