Episode 4

Data, Kubernetes, and Our Best Selves with Google’s Kelsey Hightower

Inspire, collaborate, and solve together. Principal Engineer, Kelsey Hightower joins Sam Ramji to discuss the future of Data and Kubernetes, and what it means to participate in a welcoming developer community, while igniting positive growth.

Published October 29th, 2020  |  Runtime

Episode Guest

Kelsey Hightower

Kelsey Hightower

Principal Engineer at Google Cloud Platform

Episode Transcript

Kelsey Hightower: The weird thing is most people find themselves in a situation where they've collected all of this wonderful data and it gets trapped. Maybe it's a database that's not flexible, maybe it's some other area where it just goes and it disappears from the rest of the organization. And then you have those companies guessing, guessing what kind of shoes that people like to buy, even though they work at one of the largest maybe shoe manufacturers in the world because that data's hidden. Then people have to resort to these kind of weird measures to answer questions that they can probably answer if they were able to tap into the data that they already have.

Sam Ramji: I am thrilled today to introduce someone who needs no introduction Kelsey Hightower and invite him to conversation about open source data. I spent a little bit of time thinking about this lately and the more open the more open source, the more open the data can be the more it can flow, gives us an opportunity to create the future together. So Kelsey welcome.

Kelsey Hightower: Awesome to be here.

Sam Ramji: It's been a few years since I got to work with you regularly, when we were hanging out together at Google, I got to see you in a lot of roles teaching a lot of people about Kubernetes and about how to use these things effectively at scale. You blew a lot of minds with your demos. I think the worst gig I ever had was going after you at Redis Conference because it just reminds you you're on stage and it was the first time that you had taught-

Kelsey Hightower: The Google assistant.

Sam Ramji: It was the Google Assistant. You taught the Google Assistant to listen to your commands and deploy Redis clusters at scale on GKE, is that right?

Kelsey Hightower: Yep. And we actually gave Redis a voice and we had it doing things and backing up data and moving things around. I brought the A game there and I looked at you as like you're next.

Sam Ramji: It was so brutal because the end of your demo is Redis saying that's extra dope. And I thought, "Oh my gosh, I'm going to go onstage and just sit down with a VC and we're going to talk at developers, they're going to be so bored." And I think I was right. Word to the wise, don't go after Kelsey Hightower.

Sam Ramji: You've worn so many hats. One of the things that most people don't know is that you had the courage to step away from all of the touring and teaching that you were doing kind of opening up Kubernetes ecosystem and you came out and hung out with me and with Melody Meckfessel and a few other amazing folks like Michael Windsor at Google in New York for a summit we pulled together for Product and Engineering for DevOps.

Sam Ramji: And one of the things that was born out of that was something that you drove, which was Tecton among other things. That's just kind of an example of you taking your knowledge, trying to make the world better. And it's really deploying the use of the thing rather than the thing itself. I think a lot of people know you for Kubernetes, but it's really kind of these flows of like how do you create processes that enable people to work well together? I'm super curious like what hats are you wearing right now and are you having the most fun with in your polymathic life?

Kelsey Hightower: That's interesting because there's a lot of talk in tech, especially around career development. And just for a little context for people Sam Ramji was the VP of our organization. In addition to product when he was at Google and I remembered this kind of field trip was very important for me because I got to see firsthand how the product org was run at the leadership level. How head count is allocated, how projects are green-lighted and justified and just kind of this stuff that you don't typically see behind the scenes.

Kelsey Hightower: And it taught me a bit about how to approach things at that level and it forced a couple of partnerships. So when you asked the question about how do you kind of motivate and inspire people? Well, at Google firsthand, you don't really get to manage everyone that you have to influence. You're going to have to persuade them and say, "Hey, here's a vision of what the world could be. You definitely have the talent and skillset, but here's what we got to do to get there."

Kelsey Hightower: And I think that's one of the things that I had to learn and grow into it. Google was not about how fast or how good code you can write by yourself. It's not about managing just one team. Some things actually take an organization to get done and I think part of that was just learning how to be human first. I can remember a couple of meetings we were in where you ask people to be present, put your phone down, kind of sit right here in this moment.

Kelsey Hightower: And I remember distinctly you used to give people an out. You used to say, "Hey, if you've got stuff going on, you got emails, maybe there's something you got to go take care of, go ahead and step away and take care of that." But for the people that choose to remain, let's try to be present and focus on this conversation to make sure we get the best out of it.

Kelsey Hightower: And even before that, I think I've always taken a people first approach. So whatever problem I'm trying to solve, you try to get at the very core of what people actually care about and what motivates them. And it just changes the energy in the room and I've seen the best results come from people who feel that way.

Sam Ramji: One thing that we can all be in love with it's humans and their potential to solve awesome problems when they're really truly connected. When we're not turning each other into things if we keep connected to each other as people and realize that there's magic in it then the magic can happen.

Sam Ramji: I always think about when you're ordering people it's a little bit like Pac-Man when the ghosts would all bounce off of each other. Don't turn people into ghosts who bounce off of each other people. Figure out how to make people want to blend their best selves together so that they can all make this really cool thing that we just imagined together actually real.

Kelsey Hightower Yeah. You got to create a situation that requires the very best of that person and they seem to step into it.

Sam Ramji: If people live up into the frame of the better expectations. Is there anything that you want people to live up into right now? Is there anything that you can talk about structures or architectures or projects or problems that you inspire groups to solve together?

Kelsey Hightower: Yeah. I guess that's the topic is around data.

Sam Ramji: And we tend to think about data.

Kelsey Hightower: Yeah. The important thing about data it's one of the areas that most people collaborate without understanding that that's what they're doing. If you're a front end engineer, you're collecting and presenting forms to collect data. If you're a backend engineer, typically you're waiting on some data to arrive so you can process it, manipulate it, transform it, and then store it somewhere else.

Kelsey Hightower: If you're in finance maybe marketing working on maybe the back office, then you're taking that data and either making business decisions, making some decision based on that particular data. This data is kind of that lifeblood of the business that flows through all of these systems that we're building. And the weird thing is most people find themselves in a situation where they've collected all of this wonderful data and it gets trapped.

Kelsey Hightower: Maybe it's a database that's not flexible. Maybe it's some other area where it just goes and it disappears from the rest of the organization. We start off in a very collaborative system and then that data just piles up behind this hidden door that no one ever can access again. And then you have those companies guessing, guessing what kind of shoes that people like to buy even though they work at one of the largest maybe shoe manufacturers in the world, because that data is hidden, then people have to resort to these kind of weird measures to answer questions that they can probably answer. And they were able to tap into the data that they already have.

Sam Ramji: It's interesting microservices kind of showed up as such a great way to get teams to be at high velocity. And to some degree sort of Node backed microservices were one of the things that drove Docker containers to be popular because Node was easy to program against, but so unreliable that you needed to restart it a lot. And if it was in a Docker container, then that was all pretty trivial.

Sam Ramji: And then you get Kubernetes saying. "Here's your basically pod architecture, here's your Nodes." And then we also coupled the technical architecture with the belief that the team who's building the microservice the two pizza team, no constraints should be applied to them, they can use anything that they want. Well, it's great now we have a given enterprise thousands of microservices, but it also means we've got thousands of data stores that don't talk to each other.

Sam Ramji: Just like you said, the data is there, but it's hidden. I think if we kind of take a different view of what we've done in the last 10 years with microservices, we would maybe say that they were the beginning of network data in the enterprise. And that services at the core with data microservices everywhere is the big architectural change. Going forward, microservices is really about the future of data.

Kelsey Hightower: I think a lot of people may forget the fundamentals. The fundamentals were programs are algorithms and data structures. That's pretty much it. The architecture you adopt to collect that data and to perform or apply those algorithms that's kind of secondary. I think a lot of times we will say in tech is focused on the business problem.

Kelsey Hightower: And I think what people are really getting to there is the data typically represents the business problem at its very core. And when we move away from that and we start talking about microservices and Prometheus and Kubernetes. Even in those systems they have a lot of data, whether that's the configuration, whether that's the specification of the security policy, all of this stuff now is also represented as data, but people don't think of it that way.

Kelsey Hightower: They think of it as do step one, do step two. But if you convert your thinking to this data model. We were saying these days infrastructure as data. Because when we set infrastructure as code people started having battles over the best programming language, should I write this in Ruby? Should it be Python? Should it be in Pulumi? Should it be in Terraform, HCL?

Kelsey Hightower: That front end is cool because that's the algorithm piece of how we manipulate the data, but the data is still important, the data structure. And if you think about the data structure that's available in the systems now they're at a point now where we can actually articulate how we want these systems to behave and then fire it through a pipeline, the same data pipelines we do for business data, you can do it at this layer as well. And if you treat it as data, then it can be manipulated by other tools. It can be queried. It can be analyzed. It's hard to analyze inquiry code, but data you can do those things.

Sam Ramji: Such a profound perspective because eventually code will be self-generating. We already have some self generating code now and when you think about what's the structure of a query? What's happening when you are typing a search into the Google search bar and you're getting these outstanding auto-complete results. There's a lot of agents that are building code on the fly, but it's against a structure of data.

Sam Ramji: Maybe it's craft shaped data. Maybe there's some other shape for it, but the data is something that we can always generate other things from, but it's hard to generate or manage the data better than the format and the process that you've chosen.

Kelsey Hightower: That's exactly right. The schema is important because sometimes you can translate and write a new program as long as it manages that data structure the same way we tend to say that's a compatible program.

Sam Ramji: How do you think about the future of data against Kubernetes? One of the things that we've learned is Kubernetes is great for stateless applications, really kind of the top half of the Borg API. And a bunch of us are trying to take data workloads and make those play well. They're not yet first-class citizens. Hopefully we can get them there using what the CoreOS folks came up with the operator pattern.

Sam Ramji: You're starting to see a lot of Kube operators for data. You and I have talked about this in the past. Are you building a day one or day two operator? Like hopefully at a certain point everybody is building these day two operators for data. But is that as far as we're going to be able to take data in Kubernetes.

Kelsey Hightower: It's funny you use the word databases like Kafka and Postgres they're not first class citizens. And when you think about citizenship, if Kubernetes is I guess the country in this analogy. In order to be a first-class citizen, then there's going to be a few rules to citizenship, a little bit of things you have to do to kind of get that approval. Learn the language, understand the society, its laws, its rules.

Kelsey Hightower: And with that, once you learn those things then you'll be able to live in that particular society. And then you can say, "Hey, I'm a citizen." Where the databases and the current offerings, not all of them, but where many of them come from is they come from a different world. And when they come over, maybe they don't want to learn the language. Maybe they're not familiar with the rules, so then we tried to build things around the particular databases.

Kelsey Hightower: We try to augment them with like control loops, AKA operators. Trying to fill in the gaps, trying to translate where things aren't clear. And what I think needs to happen if those database products, these stateful workloads, there's two pieces here. One mainly we're talking about the operational of these things.

Kelsey Hightower: Because if you think about what Kubernetes does and most people get overly confused with this. If I took a database product from three virtual machines and I ran them and typically what most people will do is they'll have some form of local storage. If they need to go super fast, maybe SSD because network attached storage may be too slow. They install it, set the right kernel flags, make sure that it's up and running and you have those IP addresses, off you go. Configure your clients there you go.

Kelsey Hightower: In the Kubernetes world, you can actually get very close to doing exactly the same thing. You can turn off a lot of the dynamic parts of Kubernetes if you just want to run a database workload. You can say, "Hey, Kubernetes, I want you to run this workload only on this virtual machine. I want you to use the IP address of that virtual machine just like I did before." Still runs on Linux and you can get almost 80% of what you were doing before.

Kelsey Hightower: You don't have to have Kubernetes manage your data volume. You can still attach that data volume out of band like you used to do on your old virtual machines or bare metal machines. You don't have to change everything just because you entered Kubernetes world. And now there's another piece where people say, "Well, what if I do want to take advantage of moving my application in case the app dies? Remounting the piece of storage in case the machine dies, that's on the other side of that." That's where I think the database products have to meet Kubernetes halfway.

Kelsey Hightower: And I'll list two things there, which is service discovery. If you're going to have multiple Nodes and your database product or cluster that thing should be able to say, "Hey Kubernetes, where are my members? Where are my peers?" If they ever get restarted or they ever go away and come back up, Kubernetes can tell me what the new IP addresses of that particular machine. And the last thing is you have to just build a database product that assumes that it can go away at any time.

Kelsey Hightower: You have to think about native replication. You have to think about what happens if I get put on a new machine can I recover? Whether that's low to write ahead log or read your old note ID from a disc and then transfer over data from another machine that is caught up. You have to think about those things and meet Kubernetes halfway.

Sam Ramji: And those are the things that make it really tricky to try to do a hybrid old world new world approach. If you want to write to a storage volume and you're not going to use the Kubernetes system for doing that, then kind of caveat emptor. Your ability to do a replication, backup and restore, which is kind of almost becomes an atomically necessary operation if you're trying to do distributed databases in a distributed environment like Kubernetes.

Sam Ramji: I really like what some companies and projects are doing like OpenEBS and MayaData, which is the company behind that and Castin. They're looking at a very application aware, how do I understand the exploded application architecture that a cube app really has. And then reify that as appropriate for how do you write to something that feels like whatever your block storage is going to be and remembers all that when you replicate, when you back up, when you restore, when you come back up for air.

Sam Ramji: Arrikto is also a company that's doing some interesting stuff around dealing with different classes of storage. Knowing if you have NVMe on the cloud environment that you've got or SSD or there's a lot of emerging stuff I think that can give us some hope for a more coherent, lower level data plane for Kubernetes, but still a long way to go for them. And I think for all of us who are doing databases and trying to make it work in the new world.

Kelsey Hightower: Yeah. And if you're listening to this and you're like, "Wow, that sounds like a lot." Here's a tip for you. You can create three machines inside of your database or Kubernetes cluster and you can just mount the storage the way you managed it before. And when the database comes up, you can tell it to just use that storage.

Kelsey Hightower: And yeah it's not fancy. It's not super automated, but my guess is most databases would have a challenge if you move them around half the day. They need to be fairly static once they land. I think there's going to be a future where we can automate a lot of this provisioning of the underlying resources. But until then, I think you can get real far by automating 90% of it. And then relying on your traditional tools for that other 10%.

Sam Ramji: This is where I'm kind of excited about Cassandra and Kubernetes because of the sort of master lists architecture and sort of peer-to-peer replication and coordination in there. There's a lot more work to do, by just kind of figuring out how can you complete the promise of elastic scale ability for an application? How would you provide elastic scalability of data, knowing that you do have this physics issue of moving the data around and getting the application done?

Kelsey Hightower: Yeah. I think we're going to learn a lot of lessons from the past. I remember when I first got into system administration, we used to put the database, the web app and something like Apache on the same server. You just bought this big server and you put all three on the server and then we start to have performance issues and everyone can answer this question quickly without thought, which piece do you move first, the database.

Kelsey Hightower: And we moved the database and we gave a dedicated networking, over provision the storage because you know what? It was more important that the database had the resources it needs no matter what happens. And when I looked at the Kubernetes community, it seems like we may be forgetting some of these lessons where I'm watching people try to comix and co-mingle applications and data services on the same machines under the umbrella of, "Oh, Kubernetes makes it really easy to do this."

Kelsey Hightower: I was like, "Well, look, the laws of physics still exist. The best practices still exist." You can still tell Kubernetes to keep those things separate within the same cluster just like we used to do. And I want to remind people that the fundamentals haven't really changed. When you're looking at this don't necessarily believe just because you rubbed a little bit of Kubernetes on top of your database that is now free of those laws of physics that require some times dedicated resources and things like latency and all of those things to work properly.

Sam Ramji: That's awesome. Yeah. There's always kind of that new technology euphoria. Right rub a little Kubernetes on and everything will be transformed. And then you remember, oh right, there is actually physics and gravity again. One conversation we had a few months ago that I've been thinking about it ever since and quoted you two on a few people is kind of a transformation of our expectations of open source.

Sam Ramji: And I think Kubernetes puts a fine point on it. You had said that there's been kind of an evolution from open source as just code to where we are today, where you really want it as a service and you had some steps in between. You were like, "Hey, it's got to be code. Then you need doc and you need a logo, then you need a foundation." But really I just want to be able to use it. I don't know if that's still a conversation that's a live for you, but it was just super insightful and it's made me think about things differently.

Kelsey Hightower: Yeah. I remember that conversation because I remember the first time in my lifetime, I interacted with an open source project through a free service before I downloaded it and installed it myself and that software or that project is Let's Encrypt, which I think now is producing hundreds of millions of SSL certificates on the web today.

Kelsey Hightower: And there's an open source that are projects that backs it. You can download it and run it yourself and configure your own CA for your own needs. But it was also set up in parallel with this foundation with healthy funding, nonprofit, and their goal was to secure the internet. They got great backers behind them and they focused on experience and UI. How do we make this absolutely painless?

Kelsey Hightower: And it does such a good job where it's just there. If you go to Cloudflare, Let's Encrypt is just there, you go to your favorite cloud provider it's just there. And even the open source projects like Apache tend to have either these great instructions or there's a native way to integrate with something like Let's Encrypt. And I think that's going to be something that imagine a world where you go to Postgres website and you can definitely download the latest and greatest release of Postgres.

Kelsey Hightower: And just to be clear, I always want the ability to grab the raw source code and compile that binary myself. I always want the ability to download that pure binary that's already been compiled and just run it myself. But I would also like a new option, which would be visit postgres.org and just create an account and start interacting with this open source database as a service. And I think that's something we just have to evolve to at some point.

Sam Ramji: Yeah, it's going to be a basic expectation for everybody. They're going to use code that's open-source licensed. They can get support from anywhere and they can buy what they need adding the basic expectation that it will be hosted by default and then be able to get some utility without having to become an expert operator. There'll be the new normal be able to do that for data. That would set a lot of people free.

Kelsey Hightower: Yeah. Universal data planes, where you can just write your code, that code could run anywhere. And you just know that your data is just going to be available via some end point. I think that'd just be a major game changer for the way we think about building things.

Sam Ramji: Well, we are coming to the end of the time. I have a couple of questions that I want to close with. One I'll embarrass you from one of my favorite tweets that I've seen recently someone said, "Hey, it's Kelsey's birthday, let's celebrate. Tweet back something that you've learned from him."

Sam Ramji: It was kind of a shockingly giant Twitter thread. You've taught a lot of folks a lot of things. What would you go back and teach yourself if you could talk to like the 24 year old Kelsey, what advice would you give that person now if you could?

Kelsey Hightower: I think I would have given myself the advice of learn to see the world for what it truly is. And that would have helped me temper my expectations. Software isn't going to save me, the next hot technology isn't going to save me. Changing my identity to align with a particular employer, project, brand, a logo isn't as important as me understanding who I am independent of those things.

Kelsey Hightower: And I would advise myself to just be super patient to understand like, why do I think the way I think, is it the TV that I'm watching? Is it the books that I've read? Is it the ambitions that I have and learn how to separate those two. And to make sure that I can actually be who I want to be versus who I am shaped by the things around me, that would have been the number one advice because once I learned that I felt like I could actually do anything. I felt like I had the confidence to navigate and flow through situations in life, no matter how difficult in a consistent and sane way, that would be the thing that I would go back until 24 year-old Kelsey.

Sam Ramji: That's awesome. It's such a good answer. I kind of want to stop the interview there really. Is there one resource or one thing that you'd point developers go check out this URL, go take a look at this particular page this will help you today or it'll help you in the near future.

Kelsey Hightower: I don't know if I have a particular URL, but one thing I think will help anyone that's coming into our industry for the first time. Because if you look at the entire space, you would just become overwhelmed or you'll have this urge to learn it all in order to be adequate that you do. And I think the best thing you can possibly do is in some cases would work for me, was starting to really break down the fundamentals like programming, there's algorithms and data structures, networking, and how that works at the low levels.

Kelsey Hightower: Because if you understand what network is work fundamentally, then you can actually start to break down these buzzwords like service mesh. If you understand what it's like to be a collaborative person, figure out what your role is and how to help other people be successful. Again, you'll be able to decompose things like agile and scrum and dev ops and all these names we try to assign to the basic fundamentals.

Kelsey Hightower: I think the thing you can do is any technology you have, don't spend too much time trying to figure out if you're working with the best programming language, don't try to figure out if you're working with the best load balancer. The thing you can do though is figure out can I learn the fundamentals with the thing that I have in front of me? And once I learned those fundamentals, what you'll find is that those things transfer to the next language that maybe you switched jobs or the next load balancer that the team decides to adopt.

Kelsey Hightower: You will always have those fundamentals that will help you make great technology decisions. And when to walk back those decisions, because the fundamentals no longer align with what you're trying to do.

Sam Ramji: That is awesome. That kind of call for intellectual courage to get down to the bottom of it without using the buzzwords is going to set a lot of people free. Well, Kelsey, it is always a privilege to get to talk with you. This is no exception. I'm super grateful that you spent the time today with all the things that you could be doing hanging out here and talking with us.

Kelsey Hightower: Awesome. Thanks for having me.

Patrick McFadin: Hi everyone. This is Patrick McFadin. I work in Developer Relations here at DataStax, also a long time Cassandra advocate. I've been working with the project for a long time. It was really neat listening to Sam and Kelsey have this conversation like a couple of old friends, but people who have seen things and I just really appreciate that point of view. There was a couple of things that I wanted to point out in there that struck me as I was listening to it and I just wanted to highlight these.

Patrick McFadin: First of all, it was this concept and I'll probably work backwards here, but it was that concept of going to an open source project and using it right off of the website. I think for things like Let's Encrypt, that's a lot easier said as a less of an impact on actual infrastructure. And this is probably one of the things that we worry about most in open source is there's freeze and beer and freeze and freedom, and that freeze and beer thing can get in your way pretty quickly.

Patrick McFadin: It's hard to finance certain activities and finding sponsors to do those can be very helpful. And I think this is kind of the key if you're Let's Encrypt and I've used it and it's wonderful, it does have an infrastructure cost, but it's being sponsored by various parties and they keep that infrastructure costs covered.

Patrick McFadin: I feel like if you're talking about something like Cassandra or even Postgres or MySQL, what have you. Just being able to go there and try it out is going to be difficult unless there's going to be major sponsors. And it's been a really difficult part of open-source is finding where an open source project is okay with the major sponsor. Knowing that that's available.

Patrick McFadin: I think this is just where foundations are going to have to work together with major sponsors for the bigger the infrastructure we're going to need. It's just my quick thought on that and it's fascinating as open source does become more a delivering thing on cloud. You're starting to see those things happen already.

Patrick McFadin: For instance, DataStax already has a free forever tier of Cassandra and maybe that's what it is in the world, that's an open question really. I think there was some really good discussion about like, how do you get started? And being in developer relations, I get this question a lot too, but I would challenge anyone that's listening to really think about your getting started to experience and then just use that to pay it forward. Knowing that, and this is a part of this is going to be my pitch for community is you had to get through something in your life and you learned a technology.

Patrick McFadin: It would be really great if you could take that experience. And I'm just saying this generally, and find a way to help somebody today, maybe go answer their question. LinkedIn has this mentor thing, which is pretty cool. Going onto a discussion board and finding someone who's just getting started and maybe getting them into a discussion about like, "Well, here's how things are, here are some ways that I did it."

Patrick McFadin: I think this is what it's about in our technology communities is helping each other. I didn't get here by myself, I will not go out by myself. And I think that's a really what everyone should think about whenever they think about our communities and just making the world generally a better place. But I really love that discussion. So thanks for everyone for listening and I will see you soon.

Narrator: Thank you so much for tuning in to today's episode of the Open Source Data podcast hosted by DataStax Chief Strategy Officer Sam Ramji. We're privileged and excited to feature many more guests who will share their perspectives on the future of software, so please stay tuned. If you haven't already done so subscribe to the series to be notified when a new conversation is released and feel free to drop us any questions or feedback at opensourcedata@datastax.com.