TechnologyNovember 18, 2019

Simplifying DataStax Enterprise Deployments with Kubernetes for Containerized Workflows

Christopher Bradford
Christopher Bradford
Simplifying DataStax Enterprise Deployments with Kubernetes for Containerized Workflows

Available today in DataStax Labs, the DataStax Enterprise Kubernetes Operator simplifies the deployment and management of DataStax Enterprise (DSE) clusters on the Kubernetes platform. Many enterprise IT organizations are implementing hybrid and multi-cloud solutions to leverage features and capabilities on disparate providers. Additionally, tooling has grown to be homogenous across providers—both on-premises and in the cloud. Kubernetes (also known as k8s) has come out as the dominant player in this space with distributions available on most cloud platforms. 

As Kubernetes has matured, so has the tooling surrounding it. In this shift, deployment tools are evolving as well with operators for the deployment of containerized workloads both within user-facing applications and the databases that back them. The DataStax Kubernetes Operator handles this role for DSE by simplifying the process of deploying and managing clusters within k8s namespaces. Instead of spinning up individual resources and configurations, and gluing them all together, a simple YAML file is sent to k8s defining the number of nodes and storage requirements. From here, the operator provisions pods, services, storage claims, and more. As the resources come online, the cluster is bootstrapped and becomes available at a predictable k8s service DNS address. For example, if your cluster is named "demo-cluster" with a data center named "dc1" the service would be named "demo-cluster-dc-1-service".

If you're interested in diving into the details, check out the documentation on GitHub. The rest of this blog post will cover setting up and running the operator on a local Red Hat OpenShift 4 installation.

Getting Started Locally

In our environment, we are running OpenShift 4 via CodeReady Containers. The cluster may be instantiated with the following:

$ crc setup
$ crc start

INFO Starting OpenShift cluster ... [waiting 3m]
INFO To access the cluster, first set up your environment by following 'crc oc-env' instructions
INFO Then you can access it by running 'oc login -u developer -p developer https://api.crc.testing:6443'
INFO To login as an admin, username is 'kubeadmin' and password is REDACTED
INFO You can now run 'crc console' and use these credentials to access the OpenShift web console
$ crc oc-env

First, make sure you have the oc command available in your terminal and that it can communicate with the cluster.

$ oc login -u kubeadmin -p REDACTED https://api.crc.testing:6443
$ oc version

Client Version: v4.3.0
Server Version: 4.2.0-0.nightly-2019-09-26-192831
Kubernetes Version: v1.14.6+73b5d76

>$ oc cluster-info
Kubernetes master is running at https://api.crc.testing:6443

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

Optionally, you may open the web interface with crc console and log in with the credentials returned by crc start.

Next, we download and submit a YAML file to the cluster containing the resources required to run our operator. This includes a service account, role binding, custom resource destination (CRD) representing DseDatacenter objects, and deployment for the operator.

$ curl -O
$ oc apply -f datastax-operator-manifests.yaml

serviceaccount/dse-operator created created created created deployment.apps/dse-operator created

Verify that the operator is up and running:

$ oc get deployments
dse-operator   1/1     1            1           6m48s

At this point we can describe the DSE cluster we wish to deploy. With the DSE Kubernetes Operator we describe clusters as logical DSE data centers. Each data center is defined as a number of nodes, DSE version number, storage requirements, rack identifiers, and any tweaks to the configuration files. Within our local environment, we're looking to leverage a single node with persistent storage. Note the version number below can not be edited at this time. We rely on APIs that are only available in the pre-release version of DSE 6.8.0.

$ cat dse-datacenter.yaml
kind: StorageClass
  name: dse-storage
volumeBindingMode: WaitForFirstConsumer
kind: DseDatacenter
  name: dc-1
  dseClusterName: demo-cluster
  size: 1
  repository: datastaxlabs/dse-k8s-server
  version: 6.8.0-20190822
    storageclassname: dse-storage
        storage: 20Gi
    - name: rack-1
        enabled: False
      num_tokens: 16
    # jvm-options:
    #  initial_heap_size: "4g"
    #  max_heap_size: "4g"
     enabled: True

With the data center defined within our YAML file we can now submit it to the Kubernetes cluster.

$ oc apply -f dse-datacenter.yaml created created

From here the operator is alerted to the new DseDatacenter resource. It inspects the current resources and starts to provision those that are missing (all of them at this point). For our cluster the following resources are created and managed for us:

  • Services
  • StatefulSets
  • Pods

You can monitor the state of pods (and whether they are ready) with the following command:

$ oc get pods --selector com.datastax.dse.cluster=demo-cluster -w
NAME                             READY   STATUS            RESTARTS   AGE
demo-cluster-dc-1-rack-1-sts-0   0/1     PodInitializing   0          2m29s

When the pod comes online you now have a fully functioning DSE cluster. DSE-enabled applications may now be deployed within the cluster and connected via the exposed Kubernetes service demo-cluster-dc-1-service

For more information check out DataStax Labs, Kubernetes whitepaper and the DataStax Enterprise Kubernetes Operator documentation

You can also register to attend Simplifying DataStax Deployments and Management with Kubernetes webinar on December 18th, a must-attend session for anyone who’s looking to elevate IT operations without getting bogged down by complexity.

Discover more

Getting Started Locally

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.