Data on Kubernetes: Platform, Resource, and Echo system tooling with Microsoft Azure’s Lachlan Evenson
How do we create free and open data sets that are trustworthy? Microsoft Azure’s Principal Program Manager Lachlan Evenson and Sam Ramji discuss standards for accessing data, and the magic that can happen with data on Kubernetes.
Principal Program Manager at Microsoft Azure
Lachlan Evenson: The things we've been talking about around building out APIs to interact with data in the cloud native ecosystem. I think that is largely unexplored territory.
Sam Ramji: Hi, I'm Sam Ramji. And this is Open Source Data. I am here with Lachlan Evenson, also known as Lachie, who is a principal program manager on the open source team at Microsoft Azure. As a cloud native ambassador and Emeritus Kubernetes steering committee member and release lead, and helm charts maintainer, Lachlan has deep operational knowledge of many cloud native projects. He spends his days building and contributing to software that addresses key challenges in the cloud native ecosystem. Welcome Lachie.
Lachlan Evenson: Thanks Sam. It's great to be here. Thanks for having me.
Sam Ramji: Hey, thanks for taking the time. I'm delighted to see how effectively your team has grown within Microsoft and really brought both open source and cloud native culture to a more mature level. So, in that vein, I'm really curious to hear, what does open source data mean to you?
Lachlan Evenson: When I think about open source data, I think about it on several different levels. I think about the platform level, and this is open source platforms that support interacting with data and storing data. You might think of databases like MySQL or MariaDB or those kinds of databases that are out there. Cassandra is another one that stores and allows the access and retrieval of data that is stored. That's the platform level I feel that's fairly well-established at this point of what that means in these open source communities, but then I think about it from a couple of other dimensions where I'd say the access. So, do we have standards to access this data and retrieve it? You know obviously there's tools like SQL and NoSQL in different ways to interact with that data and retrieve it, but do we have standards around how we interact with those different platforms and are they built on standards that are interoperable or portable?
Lachlan Evenson: I think this is less talked about at this point in these open source communities, but I think it's an area of interest and that we should focus on more and then finally, open source data, I think, in the current climate and conditions in the world, people are interested to get access to data sets, so this is datasets for different tools and projects can interact with, for example, machine learning models, or you can build analytics and tools on top of and I think when we talk about those kinds of datasets being open, people want to have access and assurance that is the data from who, not interfered with or messed with, does it have controls over it so that we can check point it and make sure that it's still intact? I think open data means a plethora of different things to me personally, but you could cut it up into those levels.
Sam Ramji: I love those levels. I think we'll be unpacking those for the next decade. Great. All the levels you talked about are key, do you have free and open access to the data formats? How it actually landing on the disc? Can you survive the death of a particular database or storage vendor as a company with your data intact? And then your point on, basically data policy and protocol. There's the stuff that you're doing, which I can't wait to talk about on service meshes and sort of aspect-oriented control of distributed infrastructure on Kubernetes for the service layer, that's becoming a little bit more obvious, a little more standard at the data layer, how do you do security access or audit policy? Right? Then finally, you remind me of freebase or what Anthony Goldbloom was trying to do with Kaggle.
Sam Ramji: How do you create free and open data sets that are trustworthy and that creates some democratization of data about society so that we can become a little bit more literate with numbers. I saw a person refer to the term numeracy as the numerical version of literacy recently. We seem like we need to improve our numeracy as a society and maybe open source datasets will help us do that.
Lachlan Evenson: Absolutely.
Sam Ramji: One of the things you've been doing a ton of stuff with, is in creating the services layer and I'd love for you to talk about that a bit more. There's the fundamental challenge of once you move to a world of microservices, the good news is, you've got all these services. The bad news is, you've got all these services. Oh no, I'll hand it over to you to describe the problem that you see and have been working in for some time, and then maybe walk through some of the layers of standardization implementation and tooling that you're guiding right now.
Lachlan Evenson: Absolutely. If I kind of hop back to my journey in building being responsible for a platform that was having workloads that were built as microservices and distributed microservice applications, I'd like to say, you don't need a service mesh until you've had your 50th service on the platform, because you need to understand exactly the relationships between these services and how one service being upgraded or degraded impacts the rest of the system as a whole and trying to rationalize this is extremely complex and service mesh tries to set out and provide a same platform where you can interact with those relationships and understand them. I think from that open data perspective, some really interesting tooling has emerged in the ecosystem. So you've seen platforms observability platforms where you can say, how has this service related to that service and what causes it making to that service?
Lachlan Evenson: Because, we've all heard stories about, six click dependencies where one service depends on another, which just depends back on itself. And with this data that you're gathering from the platform, you can actually bubble up these relationships and start making decisions about it. So service meshes are about not only control access control, obviously, and controlling where your traffic is routing across these broad myriad of microservices. But I think, the most understated aspect is really having tools and a platform that you can query and introspect. What are the relationships between these communications and what does it mean if I push on this API or this API has degraded, what does it mean to the sum total of the system. So service meshes, trying to set out and solve that. And my particular work in the service mesh ecosystem has been around.
Lachlan Evenson: Building a common abstraction on top of all the service meshes in the open source ecosystem to start to provide an ecosystem of tooling that can live on top of that, and only implement against one specific obstruction like service mesh interface, which is what I've been working on, and then creating an ecosystem of providers under there that can inter operate and provide different features and functionality.
Sam Ramji: What's the journey to abstract SMI, this is a super important area. And I think one of the reasons that I ended up working at Google on Kubernetes was that when I was running the Cloud Foundry foundation, Craig McLuckie and I ended up working on transferring the service broker that was native to cloud Foundry out into an open IP project that could take all the services that have been built to attach to cloud Foundry, and to be able to bring those into a Kubernetes environment. And then we also looked at what are the other standards? You need OCI, CNI, CSI. And once you can have those, then you, then you end up having a lot of flexibility in the architecture. As you said, you don't need a service mesh until you've built your 50th service. You probably don't need a standard SMI until there are, what a half a dozen service meshes out there and service brokers. What was the journey in abstracting? What was out there? What did, what did you find you needed to have? What surprised you as you developed SMI?
Lachlan Evenson: One of the things we were trying to address was we were talking to customers at Azure and that was saying, it's really complicated for me to know which service mesh to choose and being front-loaded with that decision immediately is very challenging. The other thing is they were asking for simplified interfaces. A lot of these service meshes out there can do a lot of really complex things, but for most users, they don't need that level of complexity. So the APIs that are being presented to them as uses can be cumbersome, to understand and take a lot of onboarding. So with SMI, we were not trying to distill a set of common use cases, the most common across the service mesh ecosystem, which were at the time access. So can service A access service B in a standardized way to define policy is at the service level, telemetry and observability.
Lachlan Evenson: So how can I get, those golden metrics that Google talks about out of my services, what's the p95, what's the error rates. And the other thing is the traffic splitting. We refer to it as, but it's more intelligent routing. So how can I facilitate a Canary deploy where I roll out a new version of an application and direct only a subset of traffic based on the specific criteria to that new version of software, confirm that it's indeed running, using the observability and meeting the SLAs that we have before we roll it out more broadly. So we went and focused on those three APIs first. And what we came to find was a lot of in the industry, we're not looking to create new APIs in Kubernetes or in their service mesh products.
Lachlan Evenson: So they get rallied around SMI as a way to not have to worry about designing a bespoke APIs for their specific platform, but they could focus on getting the functionality right, and not worrying about the API design. So, it's been incredibly rewarding to work with the community and having, other tools and platforms integrate into SMI without even us knowing at this point, people can just pick it up and implement it and build their own service mesh.
Sam Ramji: That's when you really know you've succeeded. And we've seen so many new service mesh come up recently, the ones that are sticking in my head are Kuma from Kong. And then of course, Solo.io, which is Idit Levine's company. It seems there are probably going to be several more coming. So it's good that we have a orchestration or an orientation framework around it.
Lachlan Evenson: Absolutely. I think you know there is a few critics saying, it's too early to do standardization, but I see it as, good way to encourage innovation in the space. And we've had other examples of this, as you mentioned, CSI for storage CNI, for networking CRI for container runtimes have all been ways that have allowed an ecosystem of tools to open up, to offer bespoke functionality using that standard interface.
Sam Ramji: It's interesting to see Kubernetes kind of getting us a little bit faster towards the wisdom of knowing that whatever we're building is sort of instant legacy. And if we can at least have a standard orientation or a standard of practice, even if it's not a standard of law in the APIs is we'll avoid some pain later. We've learned a lot from rebuilding Cassandra onto Kubernetes in the last 18 months or so because these technologies that came out from the hyperscalers in the late two thousands, early 2010s couldn't depend on Kubernetes. So they had to take a lot of their own feet in their own hands. If you had to do workload management, if you had to bind a hardware, you had to manage networking and do all those things yourself. So it seems there's an opportunity for data on Kubernetes to start to refactor itself,
Sam Ramji: So for some of these monoliths to start to fall apart and be more Kube-Native, a little bit more open with how they pass control, how they pass faults, how they report status and how they participate in an application aware kind of fabric. So you've seen a bunch of this. I'm curious about your thoughts on best practices for using Kubernetes in the cloud.
Lachlan Evenson: In my time, working with Kubernetes in the ecosystem, and as you know, I've collected, a wealth of knowledge of best practices. And, I set out last year, I ping Brendan Burns and we actually wrote a book over at O'Reilly called Kubernetes Best Practices. So I'll give you a little bit of the cliff notes for that, but I think, and you hit on it right now. You want to look at the abstractions. And when I talk about obstructions, look at the APIs that you making a commitment against and what they provide you. Cause I hear a lot of customers out there in the ecosystem, Kubernetes moves too fast. It moves too fast. And my rebuttal to that is if you're making a contract with an API version, then it doesn't matter what version of Kubernetes is supporting that API.
Lachlan Evenson: If the commitments against the API and, for best practices specifically, I think, in the world of Kubernetes, you really want to treat your platform. As your Kubernetes version as fungible, you should be able to move Kubernetes versions, but take a dependency on the API. And a lot of providers make it really easy to get a Kubernetes cluster and it gives you really good hooks into the platform benefits of the cloud providers. But I think for me, Kubernetes is extremely configurable. A lot of best practices are baked into these different cloud providers that make it really easy to consume. But I think from my perspective is just know the dependencies you taking on APIs, and that allows you to move between Kubernetes versions a lot more seamlessly. And, you mentioned Cassandra specifically before I think these platforms and data platforms building on Kubernetes is happening in a lot faster right now because Kubernetes is very stable and it's core API. So platforms like Cassandra can take dependencies on APIs and know that, Hey, I can get this out of the platform and rely on that.
Sam Ramji: That's kind of core to modern ecosystem theory, that if you were the platform, you want to be the bottom of the ecosystem. And so every time you shake the platform, you shake the whole ecosystem, things will fall off and never come back a trust for the platform decreases. And I'm really impressed with how the CNCF has performed and how everyone involved in Kubernetes technical committee has worked together to basically be forward compatible. We often talk about backward compatibility for interfaces, but I think Kube has done a really nice job of being forward compatible.
Lachlan Evenson: There's definitely been a lot of attention to detail going into that and making sure that that happens. A lot of conversations that are happening in the Kubernetes community at the moment are around. Should they support an LTS version to have longer support cycles? The other thing that they're thinking about is dropping a release from every year at this cadence, they do four releases a year, the Kubernetes community, they're talking about bringing it back to three, but they, the knock on effect and the rebuttal to that is then people will jam a lot more changes into the three releases. So, the community's not going to slow down regardless of the number of releases you publish. You're just going to have more changes per release. So it'll be interesting to see how that shakes out, but this is a coming of age moment for Kubernetes, because a lot of the APIs now are stable. So for a customer using Kubernetes or somebody in the community rolling their own, there's very little incentive for them to move between Kubernetes versions, because they're already using GA APIs that aren't changing.
Sam Ramji: It's an interesting challenge of cloud native and open source as it meets enterprise. I suspect that this is the underlying tension that's causing the conversations about dropping and release enterprises will say, I can only consume N releases per year. Where N is some number that's between, two and a half right. I can do two releases, one release, or like one release every two years because that's how I operate. But being able to decouple that from the rate of innovation in the project is pretty important to be able to say, well, you don't have to take every release. You can snap to every other, or every third is kind of, one way to manage that in cloud Foundry, we did a release every two weeks that you could just install that passed all the tests and that worked pretty well.
Sam Ramji: And that's, that's part of Tanzu. So hopefully that'll continue. One of the things that you touched on is effectively the Kubernetes API is a breakthrough of APIs because it is declarative. It doesn't tell you how to do what you want. It just tells the infrastructure to take the following form. And that seems to be missing in the data tier, when you think about stateful sets, there's some cool things in there, rolling restarts, but mostly it's about very, very small amounts of data, persistent volumes and, and things that can live in one pod, but there's not a great language yet for declaring what kind of shape you want your storage to be in, or what shape you want, your data to be managed or your data to be stored in, in a way that communicates across the Kubernetes fabric. So we've been putting some thought into those. I've talked with a number of folks in data, and in Kubernetes, I'm curious as we try to figure out how to make data cloud native, what are some open source cloud native projects that you see that are focused on helping solve for data?
Lachlan Evenson: This is a great question. And again, I think I could talk about it at those three different layers I did earlier. So obviously I see platform open source ecosystem tooling out there, like the tests, which is all around, that's an interesting story and I'm sure you're aware of it, but it's about the YouTube folks. When Google bought them, they're a plight to move from a monolithic, my SQL database over to running it on bog. And they built the tests. And that obviously takes all that operational knowledge that they gained from doing, going through that migration onto bog and open sourcing that back out so that other people at hyper scale or large scale, my SQL deployments can have a shattered, my SQL offering. So I think the test is, is a good one at the platform layer. I think at the standards, the interesting places I'm seeing developments at this point more on the ML side of the house through standards like Onyx, which allows, your models to be portable across different tooling and implementations of ML tooling out there in the ecosystem.
Lachlan Evenson: But I don't see, too much about schema portability or data interactions. That's really opaque to things like Kubernetes at the moment. And that might be okay. The interesting opportunity I see, especially in the Kubernetes ecosystem is with the advent of custom resources and Kubernetes, you can build bespoke. APIs for your application really easily. And, we're in this world of operator explosion. So, obviously there's Cassandra operators, there's operators for different tools out there, which makes, Kubernetes at least support the APIs and build a way to have a set of controllers that interact with them. But in essence, it makes Kubernetes applications aware. Now, I think this might be the perfect breeding grounds for these kinds of abstractions for how do we model data as APIs to be born and Kubernetes is CRDs out of need, but then level up and outgrow Kubernetes and become standards that are larger and more accepted across the ecosystem.
Lachlan Evenson: So one of the great benefits of having custom resources and Kubernetes is you can model things and see the benefit immediately, obviously in Kubernetes and meet that need. But I think then that gives you an API where you can have discussion within the community and get broad consensus about acceptance of this API and have other implementations and again, pebble up these standards. So, I think, my challenge would be for those who are interested in building, Data aware APIs in open source Kubernetes could be a place that you could start to model those APIs. And the fact that Kubernetes is so broadly adopted out there in the ecosystem, other companies in the ecosystem can pick them up and run with them and test them and give feedback.
Sam Ramji: That is really interesting because at the application aware level, that's kind of Nirvana. If you could have an application where infrastructure that can direct data in a way that the data can also be elastic, because one of the problems that we see is that databases do the heavy lifting for distributed systems. It's the difficult jobs that they do that make distributed applications kind of work, but usually it's divorced from the application fabric itself. So it's not really particularly application aware if demand, if the applications increasing the database just sees more load, but it doesn't have a communication as silly with the load balances or anything else that could say, Hey, the second derivative is positive on the workload. We predict it's going to go even higher. You should probably pre deploy a bunch of new instances, start aggressively replicating data to them so that you can handle the load that's coming and then do the inverse.
Sam Ramji: And you mentioned the tests, right? A part of the Borg infrastructure board was really focused on being an ecosystem for how Google's resource economy works. There's the board compute scheduling, but then there's also mega store, right? And spanner, which ended up handling, an elastic behavior for data, which is pretty key. So the stuff that you're getting into application where data would be a big breakthrough, I was talking with Jared Rosoff at VMware recently, and there was some pretty interesting new stuff in vSphere where you have a supervisor cluster. So there's a Kubernetes managed environment where you can also kind of punch through into the supervisor cluster and talk directly to the VM. So that would also probably be helpful, for data specific workloads under the custom resource scheme that you just described.
Lachlan Evenson: Yeah and this is already happening, it's happened, at the platform level with things like Kubernetes and cloud Foundry that you've mentioned and it's happened at the application level. So there are obviously application aware, Kubernetes operators that'll make intelligent decisions based off more complex set of metrics that the application developers are aware of to, pre-seed and shift load and get more load balancing capacity. But the people in charge of the data aren't privy to, this world of possibility yet. And I think that's right for the picking in the ecosystem, the way we're thinking about it today, Sam, is in raw volumes. Can I have a data volume plays on this, volume provider? And can I attach this and do I use, NFS? So what do I use to interact with this? But let's stop thinking about it as this data is serving an application.
Lachlan Evenson: How can I make it the best possible position to serve that application and the needs of that application and be smarter about, you've got all these platform, API and Kubernetes have the data engineers start to say, when you see this happening change the operational environment that I'm serving up this data so that I can serve it in a better way.
Sam Ramji: Exactly. Your Kube operators could be aware that, Hey, you're under a federated Kube cluster and you've got a Kube infrastructure in Europe and you've got a Kube infrastructure on the West coast, and now you're starting to get an application traffic in Europe. So can you intelligently replicate through the Kube operator for your particular data store to Europe so that you can reduce latencies. Some of those larger scale, more complex problems, which have a lot of business value could be done through better automation in the near future.
Sam Ramji: One thing, you know really well that most people, I think aren't aware of - and I would love you to talk about this for a bit - Kubernetes moves fast, but it's also very measured in what it changes. And it's done a great job of assimilating coherent point of view. What is Kubernetes? What is the ecosystem? And you've been in two very different parts of one organization, the Kubernetes steering committee and the cloud native computing foundation governing board. You're still on the governing board. You recently stepped off the Kubes steering committee. Could you take a minute and distinguish the two different teams and what they are committed?
Lachlan Evenson: Absolutely. So the Kubernetes steering committee is a group of seven people chosen and voted in by the community that a task and have a charter of making all the non-technical decisions in support of the community. So let me give you a couple of examples of things that I did there in my tenure on the steering committee was we actually instated a rule recently that all leads in the Kubernetes community must have taken the Linux foundation issued unconscious bias training. So we make different decisions about creating inclusive community, building out structures to make sure that the community is supported. Developers are supported. Non code committers are supported. New contributors are supported. That is the function of the steering committee, the CNCF governing board. I actually sit on, the Microsoft seat. So as a member company of the CNCF, we have a seat there as Microsoft and I sit on that and that's around making strategic decisions to support the plethora of projects that come under the umbrella of the CNCF.
Lachlan Evenson: We make decisions about the events and how best to help the community a different way. We can fund things. For example, annual security scanning of different projects in the community to just making sure that all the projects, having all the care and feeding that they need from the CNCF.
Sam Ramji: I think those two things have been absolutely essential in making sure that it continues to grow in a healthy way. And, I had the privilege of sitting on the CNCF board for Google for a while when I was there, I just, couldn't be more pleased with how mature and thoughtful their group has become. I realize we're just about out of time. So I want to ask you, what's one thing, one resource, one concept, one book that you'd point our listeners to, and then maybe finally a piece of advice that you'd share with the audience related to Kubernetes service meshes or even data.
Lachlan Evenson: If you're interested in taking a look at open-source data in a community like Kubernetes, I think having a look at different operators, so whatever system you're using to store your data, take a look at how it operates on Kubernetes. And obviously there are communities to get attached in there. So if you're interested in something like Cassandra over tests, there are a communities built around that and take a look at that and how the data platform meets something like Kubernetes. I think there are plenty of resources out there, and the test has a lot of use cases of different problems. They've solved at the platform level. I think more tactically, the things we've been talking about around building out APIs to interact with data in the cloud native ecosystem. I think that is largely unexplored territory. Now the call to action I'd put out there is there's a concept under the CNCF called a CNCF special interest group.
Lachlan Evenson: Now there are special interest groups for networking, for storage, for other pieces of security, for example, but there's not one that's looking at data at a whole. And I think if you're interested, you can start putting up your hand to actually build that ecosystem. So my call to action is if there is interest out there by the listeners to talk about how we would model data in the cloud native world and build APIs and standards. That conversation could happen in the CNCF, and we could build a special interest group to actually have a shared space to talk about that. So my call to action there is if you're interested, you could ping me. I can get you connected, or you can take a look at the resources out there online about forming your own special interest group and gathering interest there in the community. I think that's it, Sam?
Sam Ramji: That is awesome. Lachie thank you so much for your time. It's been a privilege talking with you and no surprise. I learned a lot. I'm hoping that everybody listening, enjoyed it as much as I did and, and learned a ton themselves. So wishing you all the very best and thank you for your leadership, of the community of this absolutely strategic technology and even more so for continuing the march of open source and open cloud at Microsoft.
Lachlan Evenson: Thank you very much for having me, Sam.
Narrator: Thank you so much for tuning in to today's episode of the open source data podcast, hosted by DataStax's chief strategy officer Sam Ramji, we're privileged and excited to feature many more guests who will share their perspectives on the future of software, so please stay tuned. If you haven't already done so, subscribe to this series to be notified when a new conversation is released and feel free to drop us any questions or feedback email@example.com.