CompanyFebruary 11, 2021

The Future of Application Development: Cassandra, Kubernetes, Streaming Data & Open Source

DataStax
DataStax
The Future of Application Development: Cassandra, Kubernetes, Streaming Data & Open Source

DataStax Chief Product Officer Ed Anuff recently sat down with Stephen O’Grady of RedMonk to talk about the future of application development and the role that four key elements — Kubernetes, Cassandra, streaming data, and open source — will play in it.

1. Kubernetes

While there’s been no shortage of innovation in the world of app development in recent years, these innovations didn’t come out of thin air. 

Rather, they’re “descendants running the lineage of ideas,” according to Ed, with developers using a trial-and-error approach to build off previous ideas — like Heroku and containers. 

“Kubernetes really brought containers into a standard control plane that could be leveraged in a lot of different scenarios and was a framework in and of itself that could be extended,” Ed says. “Kubernetes is essentially the platform of platforms.”

2. Cassandra

In our increasingly connected, data-driven world, more and more developers are using NoSQL databases. This is largely due to the fact that they enable developers to think about and interact with data in a way that is more aligned with how they build applications — particularly when compared to the previous generation of relational databases. 

“If you zoom out, you have this end-to-end new stack for building applications,” Ed continues. “Each one of these has been evolving independently, and now we need to bring it back into a common platform.”

This is exactly why DataStax released Stargate, an open source data gateway that lives between the app and the databases it uses. Read more about Stargate here.

3. Streaming 

As developers continue to build NoSQL data-driven applications that are highly performant and highly scalable, the need to support streaming data becomes even more important.

When it comes to streaming for modern data apps, the two biggest names in the game are Apache Kafka and Apache Pulsar. While a lot of people use Kafka and there’s a lot of hype around it, DataStax opted to use Pulsar to achieve cloud-native streaming at scale.

“We looked really closely at the problem,” Ed explains. “We saw that Pulsar was designed for use within Kubernetes, it’s got the best story around geographical distribution, and it’s able to handle the throughput and performance and scale scenarios we’re seeing.”

4. Open source

DataStax is focused on empowering developers to build powerful cloud-native applications using the tools and cloud resources they want to use. Doing that effectively starts with asking developers to share their perspective and building solutions that meet their needs.

“Looking across most of the folks in the open source and open core industry, most of the users who are coming to us — developers, architects, and technologists — are saying I don’t want to be locked in,” Ed says. 

For this reason, open source tools and technologies — which are already pervasive today — will take on increasing prominence as we move further into the future.

To learn more about what the applications of tomorrow look like and how DataStax is helping usher in a new era of cloud-native development, listen to the full conversation between Ed and Stephen here.

Transcript

Stephen O'Grady: Hi, I'm Stephen O'Grady. I'm here for another RedMonk Conversation. Today we're going to be looking at the intersection of application development and databases in 2021. Where are we today? Where are the trends, and what are the trends going to be to shape the year ahead? And with me is Ed from DataStax. Ed, can you introduce yourself?

Ed Anuff: Hi, I'm Ed Anuff. I'm the chief product officer of DataStax. I've been here for a little over a year now, but I've been working with Cassandra for over a decade now. So, very familiar with all of the issues around how people run and use data at scale, and really excited to be solving a lot of those things at DataStax.

Stephen O'Grady: Excellent. Now, so I mentioned the intersection of application development and databases, so let's start with the app dev part. So one of the things we're seeing a lot of interest in at RedMonk is, essentially, consolidation of the app dev and this common app dev platform that we see as Kubernetes, in part, because it's seen as offering platform independence, the ability to move between different clouds, and on and off prem platforms. There are a lot of other app dev trends that we're seeing, sort of in your head, what's top of mind for you folks, what's top of mind for DataStax?

Ed Anuff: So, it's really interesting, a lot of what we've seen over the last say five years or so, has been that we've been taking a lot of the ideas that people have discovered through trial and error, what are the best ways to build applications?

Ed Anuff: If we look at things like 12 factor apps, if we go and look at elastic scalability, we can see that things like Kubernetes are really the descendants, part of the lineage of the ideas that you saw the platforms as a service, that you saw it was the dinos of Heroku, through what they and other people were doing in the platform service space, led to things formalized around containerization as this concept of how we package and deploy.

Ed Anuff: And then Kubernetes really brought that into being a standard control plane that could be leveraged in a lot of different scenarios and was a framework in and of itself that could be extended so that Kubernetes could be essentially the platform of platforms.

Ed Anuff: So, it very much has been app development, the ways that developers have been building applications, that drove everything that we're talking about right now with this whole Kubernetes transformation. Same thing's been happening with data. So what we've seen has been that the no SQL database has provided a way to think about and interact with store queries, leveraging my data in a way that was better aligned with how I build my application than perhaps the previous generation of relational databases.

Ed Anuff: So these were two different aspects of how that application stack was being transformed. Along with a few other things like the front end here was being reinvented with single page apps and mobile. But you ended up with, if you now zoom out, you have this end-to-end new stack for building applications where what's super important for us today is like, "Okay, each one of these has been evolving independently. We now need to bring it back into a common platform."

Ed Anuff: And that's been the challenge that Kubernetes has had to step up to really over the last two years and has been on the part of, for example, us at DataStax, has been trying to shepherd Cassandra to live within this platform stack concept. If you look at how most of these architectures are built, they are done through modern APIs, microservices. And so you're seeing a lot of work around that. At DataStax, we've done that with, with an open source project that we call Stargate that's about exposing data as APIs and microservices for Cassandra users. You're seeing a lot of activity around this for other types of data infrastructure as well.

Stephen O'Grady: So going back to the app dev side for a moment, one of the fastest [inaudible 00:04:53] that are called the Dataspace, from RedMonks perspective, is streaming. This is not new, streaming, it's been with us for a while, but it continues to just take off like a rocket. And heading into 2021, it seems pretty clear that this is going to be a fundamental component of most data strategies. So, what's DataStax take here? What's the outlook look like?

Ed Anuff: Well, we think that streaming is super important. We see that all the types of things that drive people to go and say that Cassandra is the right database for them, which is that they need to be able to deal with users and data around the world. They need to be able to go and make it highly available, highly scalable, and build a multitude of applications on top of it.

Ed Anuff: All of these things are common to the streaming space. And again, when you go and look at the architectures that Cassandra users are building, you see streaming, you'll see it next to no SQL pretty much all the time. And so our goal there is to go and figure out [inaudible 00:06:12] and solve streaming problems in the same way for the same types of users that we solved database problems for.

Ed Anuff: And so we've looked at what's been going on within streaming. We found that most folks are using one of two major open source projects either Kafka or Pulsar. A lot of folks using Kafka, you hear about that probably the most these days.

Ed Anuff: But we looked really closely at the problem. We saw that Pulsar was designed for use within Kubernetes, it's been where most of the work has happened. It's got the best story around geographical distribution, it's able to go and handle the throughput, and performance, and scale scenarios that we're seeing. And it has a better extension expansion model for being able to go and actually use it as a platform so you can build things like analytics and stream processing directly on top of it.

Ed Anuff: So those were a lot of things we liked about it. We're going to be doing a lot more with it. We're introducing a couple of new products built on top of it so that enterprises that want to get support can do that, and developers who want to use it in the cloud will be able to do that as well.

Ed Anuff: But I think the important piece is that if you're a developer or you're an architect, and you're looking at these situations where you're dealing with large amounts of data in motion, as I said, whether it's an application that you're building, maybe a mobile app that's being used by people around the world, whether you're building a high throughput website, or really almost any of these situations, it's important to go and figure out how does streaming tie into it, but also be aware that you want to make sure that you're choosing a streaming architecture that doesn't have an impedance mismatch either from a development standpoint or an operational standpoint with the other pieces of your data infrastructure. And that's why we're looking and we're saying, "Okay, we think that Cassandra and Pulsar are the two things that should be used in conjunction."

Stephen O'Grady: Okay. So lastly here, we'd be remiss if we're talking about for a database and app dev trends and any intersection there in 2021, and didn't bring up open source, specifically on the database side, there've been a number of providers that have moved away from open source towards more proprietary protection, is sort of my word, license choices there. We have strong opinions on this at RedMonk, namely that anybody who writes the code does get to pick whatever terms they want, but that the frequent practice of these licenses blurring lines between proprietary and open source and applying to things that are not open source are is potentially unhelpful for open source more broadly.

Stephen O'Grady: But I'm curious, from the DataStax perspective, what is the attitude about open source and where do you see it fitting into the strategy in 2021 and beyond?

Ed Anuff: Well, look, the company DataStax was built upon, providing enterprises with a supportive, longterm, reliable, certified, hardened distribution of Apache Cassandra, that's where we came from.

Ed Anuff: I think more broadly speaking when we think about open source. The thing that I would say first and foremost is, if you're building open source, like we are, for the purposes of helping people out, helping developers, users to be successful, you really need to go and look at what's their perspective. And in almost any aspect of infrastructure, there's a lot of choice out there. And so, almost by definition, anybody who's talking to us, anybody's talking to me about, "Why should I be using DataStax?"

Ed Anuff: But I think this is pretty true, looking across most of the folks in the open source, open core industry. Most of the users who are coming to us, developers, architects, technologists, they're looking and saying, "I don't want to be locked in, and that's why I'm looking at this option. If I didn't care about being locked in, I would be [inaudible 00:11:09] some proprietary that I could only get from a cloud vendor."

Ed Anuff: So I personally take really seriously why our users are here, why are developers building on top of us, and foundationally, again, it's because they were looking for choice, they were looking for zero lock-in, it was... And as a developer, that's something I respect, that's [inaudible 00:11:33] the technologies I would build on, that I would look at.

Ed Anuff: And so, that's the question I always ask when I see the different things that come up. I think we all understand why, from a business standpoint, these licenses are necessary, I'm not going to fault any of these other companies, they all have to make their own choices, and we are in a space where open source companies can end up in a situation where they do a whole bunch of work and somebody else swoops in and gets the business value for it. But ultimately, we're delivering our technology because people are looking for an open solution, that's why they started here. And so we try to be respectful and focus on that.

Stephen O'Grady: Makes perfect sense to me. And with that, let's wrap up. Ed, thank you so much for taking the time to talk to me.

Ed Anuff: Thank you.


 

Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.