Get your copy of the O’Reilly Cassandra eBook: The Definitive Guide - Download FREE Today
I have a confession to make, and I’m sure I’ll have a sizable silent cohort when I tell you what it is. I’m really spoiled by a world that quickly reacts when I need something. It may have all started in 2008 when I started using this new thing called Amazon EC2, and after typing some commands, I had my own server available in a few minutes. This was a near real-time reaction to a current need, and wow, no more waiting weeks for something similar.
Now I live in a city where I can have dinner delivered to my door in minutes. I use an app on my phone with the current menu; I click a few choices, pay for it instantly, and get a burrito on my doorstep within 30 minutes. It's like the whole world is conspiring to make things faster. This is great for me (especially when I’m hungry). But if you’re an organization trying to take part in this conspiracy, you have to move to near real-time for everything you do.
Near real-time development
This is the current landscape of application development. We need our applications to react in near real-time because end users expect that experience. Developers need to work with tools that react in near real-time as well. These are cloud-native developers I’m talking about — people who are used to getting what they need when they need it, all to move faster.
The old and slow way to build applications is to define the application infrastructure and then convince your operations teams to make what you need. The new and accelerated way is to define and continuously refine your application infrastructure as the application is being built.
Kubernetes is the declarative way to describe the state of infrastructure we need, one that will consume the compute and provide the network and storage from our preferred commodity hardware provider. This infrastructure is also known as “the cloud” and has been satisfying our need for a “short attention span” infrastructure for years.
Thinking in services
Understanding how Kubernetes wants to build and connect infrastructure, you might find yourself aligning your way of operating to match that. Don’t code a connection, name it. This is precisely what services in Kubernetes were designed to do. One service connects to another by name and all of the enabling technology is geared to make that easy for developers. A microservice connects to the named data service. This is the exact pattern the early internet builders used years ago.
When humans are involved, memorizing an IP address is a recipe for disaster, leading to the invention of domain name services. Hard coding an IP address into your configurations is an anti-pattern we have repeated for years. Applications should be able to simply request, “I would like to connect to the data service please,” and Kubernetes figures it out. That naming abstraction goes all the way up to “Userland” when developers ask for data via HTTP and use service names – a huge step toward eliminating the ceremony of connecting to a database via an IP, establishing the connection, setting up and managing pools.
To make matters even more complicated, you then have to build queries with prepared statements, use the database connection, and send your query in the format required by the backend database. Are you using SQL? Which one? MySQL is different from Postgres, which is different from Oracle. We have been developing applications and working with data in ways that make it seem like we have all the time in the world. We don’t, and that’s why speedy developers are thinking about data services.
The cloud-native data architecture as deployed into Kubernetes isn’t any less complex than if you run it outside of Kubernetes. Your application needs a data architecture that supports the requirements and there won’t be any one product that will do it all.
Modern cloud applications are assembled from the best-in-breed with years of production experience. What you don’t need is the headache of running it, and that’s what Kubernetes will provide for you.
Data infrastructure can be broken into three groupings: persistence, streaming, and batch analytics. I can hear some of you out there objecting and maybe defining five or even ten different groups. For simplicity, let's just bucket into these three logical groupings, even if some products do some of each or overlap.
In the world of data services, I’d like to explain how we can rethink complexity for developers. Applications need to put data in and get it out. The complexity we need addresses everything that happens to that data from when it goes in to when it comes out. Consider a simple IoT application. How the data comes out of that application may have a lot of different choices for the one piece of data that went in. For example, with a simple HTTP API, the data could pass through your Kubernetes ingress then travel through a connected Apache Pulsar topic service. Pulsar could immediately persist the data in Apache Cassandra and fork the data into a Flink stream processing for further enhancement. Another fork could persist the data to a file in Ceph which later could be rolled up by Apache Spark and stored into Apache CassandraⓇ. A lot is going on there from one HTTP PUT command, and the result is different GETs for the further processed data.
Kubernetes is my copilot
All of the infrastructure in the imagined interaction I just walked through are all production-proven systems. If we were doing a whiteboard session, I would draw a data architecture that would describe this one PUT command and the multiple GETs with boxes and arrows all over. That would then turn into a wish list for your operations team to begin a long negotiation and project plan. You’re not exactly reacting to an immediate need. Rather, you’re embracing everything that makes cloud-native a perfect accelerator for projects like this. First, you aren’t setting up hardware. You’re renting it by the hour. And because every cloud provider supports Kubernetes, we can define how our virtual data center will look and apply it where we want.
Best of all, we can move faster because the cost of not getting it right is lower. We can define and then refine as we go along. When creating a datacenter, each service is defined in a YAML file that gives all the application-specific configurations and, most importantly, a name. When the service configurations are combined, they connect via those service names, and Kubernetes puts them together. We can even ensure every network connection is encrypted with TLS certificates using cert-manager. Kubernetes projects such as ArgoCD provide a way to be agile with configuration. Changes to YAML files once committed into GIT are applied to a running cluster – the accelerator for our refined, cloud-native data infrastructure.
Don’t look back
There is a reason why organizations migrate as much as possible to Kubernetes. It’s not because it’s just the next cool thing. It’s because they are finding a distinct competitive advantage. They’re embracing Kubernetes because it allows their developers to build near real-time products that provide better customer experiences.
We’re now reimagining infrastructure projects like Cassandra that were built before Kubernetes with projects like K8ssandra and Stargate. There’s nothing to gain by holding on to the old ways of using a database and perpetuating slow development. Building virtual datacenters that provide data services is the best way to accelerate your application development. Imagine now – what can you do with a custom infrastructure you can get faster than a burrito using DoorDash?
Want to hear more about how open source projects like Kubernetes, Apache Cassandra, and K8ssandra along with new data gateways are helping developers build and scale their applications faster? Tune in to Patrick McFadin’s presentation at Open Source 101 – a one-day virtual streaming event – on Tuesday, March 29!