TechnologyDecember 10, 2019

How to Optimize Data Management in Containers with Kubernetes and DataStax

Christopher Bradford
Christopher Bradford
How to Optimize Data Management in Containers with Kubernetes and DataStax

In recent years, containers have become an increasingly popular technology used to accelerate modern application development. Thanks to prevalent container platforms like Docker, developers can package applications much more efficiently than they can using virtual machines. Applications and all of their dependencies are packaged together into a minimal deployable image. This grossly simplifies the "works on my machine" problem where environments on developer workstations differ from those that are deployed.

The benefits of containers extend beyond simply speeding up development. Developers can use containers to move applications between environments and guarantee they behave as expected. DevOps teams may improve the speed of delivery and ship better software, at scale, faster. Instead of provisioning new virtual machines for each instance of an application, multiple containers may be executed on the same hardware to reduce costs.

This all seems ideal, but how are we going to assign containers to servers? Who will manage configuring load balancers and network rules? These concerns led to the creation of container orchestration platforms. The leader in this space is Kubernetes. Born out of Google's Borg platform it accepts definitions for services and handles assigning containers to servers and connecting them together. Furthermore, it follows the health of the running containers. If a container goes down Kubernetes handles restarting it, going so far as to schedule the replacement on other hardware.

By using Kubernetes to orchestrate containers, developers can rapidly build microservices-powered applications and ensure they run as designed across any Kubernetes platform, whether that's on premises or in the cloud. This all sounds pretty amazing, but there are nuances at play. When a container is stateless it is trivial to spin it up and down without consequences that could lead to data loss. Compare this to stateful containers and the story gets a little tricky. What happens when the hardware behind the container has a failure? If we move the container elsewhere, but the state is persisted on the hardware which is down, we're in a bit of a pickle. 

As Kubernetes has grown and matured we've seen the creation of components that keep stateful workloads in mind. Whether this is represented by a software-defined storage backend where our data can follow the container wherever it is scheduled or a scheduler that knows it cannot simply spin things back up, the resources supporting stateful workloads have matured into production viable solutions. With the building blocks for stateful workloads in place it's time to look at one of the most prevalent stateful components, the database.

Traditional databases are not suited for the cloud-native world. In order to scale, read replicas are created to spread the workload around. When the master fails there is a complicated process to promote a replica and then validate it to confirm no data is missing. Given we want to succeed in a high-scale, cloud-native world, the traditional database architecture doesn't fit. We need to go masterless. With DataStax Enterprise (DSE) we can forgo the master-slave architecture and make the database cloud native.

Built on the best distribution of Apache Cassandra™, DSE simplifies development substantially. 

With DSE, all nodes are equal; each node is capable of handling read and write requests, and no single point of failure exists. Data is automatically replicated between failure zones to prevent the loss of a single container taking down your application. With DSE you have a supported, production-certified distribution of Cassandra. 
DataStax Enterprise: The database for a container-first approach in the enterprise

At DataStax, we’ve been offering production support for DSE customers since September 2015. More recently, we’ve released Docker images for DSE, DSE OpsCenter, and DataStax Studio for production use. With the use of containers we have significantly reduced testing time from hours to minutes. This helps our team get feedback much faster, saving a lot of time and money along the way.

In order to help both developers and operators alike we have developed the DataStax Enterprise Operator for Kubernetes. This tool manages the lifecycle of individual Kubernetes resources while simply asking the user for the number of nodes and cluster name. Now the process of managing the distributed data platform, DSE, is turnkey and simple—leaving your team free to focus on the application layer. 

To learn more about how Kubernetes and DataStax work together—and read some recommendations and best practices about how you can get the best results—download our whitepaper, Optimizing Data Management in Containers with Kubernetes and DataStax.

Discover more

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.