Managing Cassandra Clusters in Kubernetes Using Cass-Operator

If you’re building a world-class cloud-native app, no doubt you will be using Kubernetes, the most popular open-source container manager. Similarly, you need the reliability and scalability of Apache Cassandra®. While Kubernetes manages stateless systems easily, things get a bit more complicated in the stateful realm. Because Kubernetes doesn’t understand Cassandra database constructs, trying to work with Cassandra directly in kubernetes requires you to manage a myriad of minutia. We don’t want that! The DataStax Kubernetes Cassandra Operator moves us up levels of abstraction so we can deal with Cassandra using higher level constructs that are simple and make sense.

Get Started

Quick Review: What Makes Up A Kubernetes Cluster?

At a machine level (real or virtual), a Kubernetes Cluster consists of at least one control plane node and some worker nodes. But, it might be best to think of machines as merely resources available to the more interesting constructs of Kubernetes.

What are these constructs? The basic building block in Kubernetes is a pod. Pods are like servers with containers representing processes on that machine. Containers share common resources such as storage and networking with other containers within the pod. Normally, one scales a kubernetes system by scaling the number of pods, whereas the number of containers/pod remains constant.

Often, Kubernetes architectures associate a service with a set of similar pods. The service acts as a DNS entry and discovery tool within Kubernetes. In some instances it may also function as a load balancer across a number of pods. Services are not generally accessible outside a kubernetes cluster (with some exceptions), so we can configure an ingress to map network access from outside a kubernetes cluster to services inside the cluster.

The final Kubernetes construct to review is a deployment. A Deployment is a controller that facilitates updates of pods and services.

You can learn about more Kubernetes components here.

Next: How to use a Kubernetes Ingress?

How To Use A Kubernetes Ingress?

While there are several ways to provide external access to a Kubernetes cluster, using an ingress has many advantages. It is possible to create Kubernetes services that are externally accessible. For example, one can use a LoadBalancer service type or a NodePort service type to expose the service’s port externally. LoadBalancers are cloud service provider specific, so they may not be ideal for multi-cloud environments or where one is trying to avoid vendor lock-in. Using NodePort services can scatter cluster port access and make it a bit more complex to keep track of which ports that the Kubernetes cluster exposes.

An ingress provides one-stop for exposing access to a Kubernetes cluster. You can use an ingress to map external cluster ports to internal services. You can even map external URL endpoints to services.

When mapping ports between machines, ingresses, pods and containers, it can be confusing knowing which port is which. Here is a general Kubernetes naming convention:

Host port - an external facing port on the machine hosting the Kubernetes cluster node
Service port - a service’s external facing port within a cluster
Container port - the port for accessing a container

An ingress maps a host port (and possibly a URL endpoint) to a service. The service, in turn, maps to a container.

Next: What is a Cassandra Operator and How Do I Use it?

What Is A Cassandra Operator And How Do I Use It?

As we have discussed, Kubernetes deals with services and pods, while Cassandra deals with data centers. When deploying Cassandra on Kubernetes, it’s burdensome to have to worry about services, pods and containers when what you really care about is datacenters and databases. This is where the Kubernetes Cassandra Operator comes in.

When you deploy a Cassandra cluster using Cassandra Operator, you can specify higher-level Cassandra constructs in the manifest.

Besides using higher-level abstractions, Cassandra Operator also contains a Kubernetes controller, which monitors the health of the Cassandra cluster and restarts nodes as necessary. This controller also makes it possible to scale the Cassandra cluster with finesse (i.e., graceful rolling restarts. etc.).

Once you have installed the Cassandra Operator, you can easily deploy Cassandra clusters within a Kubernetes cluster (Notice that cluster is an overloaded term here. Sometimes it refers to a Kubernetes cluster, and sometimes it refers to a Cassandra cluster).