In this article, Volkan Civelek examines the pros and cons of using public cloud database services, the case for using DataStax Enterprise as a data layer across multiple clouds, and advice on running DSE in Kubernetes.
Data as a Soul (DaaS)
In the entire history of IT, there has never been a single vendor within a data center. The same data center has served all of its traffic with Alcatel, Juniper, Cisco, HP, Dell, and many other network appliances and servers. The Cloud Era is no different. Smart large enterprises will never lock themselves into a particular cloud vendor, primarily for the reasons below:
- Retain command of their infrastructure and sensitive data.
- Control their costs.
- Provide resiliency and availability within multi-regions and multi-clouds.
- Ensure compliance with regulations such as GDPR
- 3rd party integrations and any associated locality requirements.
In IT land, Data is king, yet also the most complex. Data needs to flow freely to its geolocated home and be fetched quickly whenever needed. Data needs to be free as a soul. It is what makes an enterprise unique, or at least it should.
PaaS or not to PaaS the Data?
All enterprises need to choose whether to PaaS their data. I believe PaaS contradicts the above principles. It locks down the technology to a particular vendor/service while taking all the controls away.
The best applications leverage the best infrastructure and services available across cloud vendors and private clouds while maintaining highly available, autonomous, and geographically distributed data.
Ironically, Google appears to feel the same way. They have open sourced Kubernetes and announced that the GKE is available on-prem (pre-alpha version). The open source software layer is what’s enabling their multicloud and hybrid cloud strategy.
Federation Between Clouds
Kubernetes is great for running containerized applications. However, running applications across different clouds was never an intended use case for Kubernetes. The emergence of multicloud and hybrid cloud strategies gave birth to Kubernetes Federation APIs.
However, Kubernetes Federation APIs are becoming obsolete.
There are only a handful of engineers working on the Kubernetes Federation APIs, while there are thousands of engineers working on the Kubernetes APIs. It's not realistic to expect Federation APIs to stay up-to-date with the daily advancements and changes on the Kubernetes APIs.
Now, Google is replacing the federation idea by treating each set of clusters independently, more like isolated clusters. Google is making Federation APIs obsolete with Istio (a service mesh software that gives a view of entire systems including distributed health metrics, tracing, intelligent routing, security and telemetry) and Spinnaker (Application LCM/CI/CD).
For orchestrating the individual Kubernetes clusters, Google announced GKE on-prem. Cisco also announced their Cisco Hybrid Cloud Platform the same day, which is partnered with Google Cloud (#GoogleNext18, 7/2018).
How does DataStax Work on a Multicloud or Hybrid Cloud Environment?
DataStax is the only active everywhere database for enterprises with a native real-time replication feature. A single DSE cluster can natively span across multiple clouds. With this data flowing feature, enterprises have the freedom to design their architecture with whatever cloud infrastructure they choose. They are not limited to just one cloud provider’s infrastructure and services. For example, customers can use Amazon Kinesis to process their events streams along with the Google Cloud Machine Learning Engine. Multicloud and hybrid clouds are home territory for DataStax.
DataStax on Kubernetes
So you might be saying, “It’s cool that DataStax meets my multicloud and hybrid cloud requirements, but I want to run it on Kubernetes anyways. Is it possible to run DataStax on Kubernetes?”
In short, yes, it's possible! To run DataStax on Kubernetes, StatefulSets and Local Persistent Volumes should be leveraged. For the multicloud and hybrid cloud networking part, there are multiple evolving options:
- MultiCluster Ingress
- Federated Ingress
- Google cloud has announced traffic director(alpha), a managed control plane for Envoy Proxy deployments run by the customer.
- Through regular network service types like ExternalIP, Nodeport, and LoadBalancer
(Announced by @Prajaktaplus at #GoogleNext18)
The best option at the moment is to establish a secure VPN connection between the clouds and adopt the least-interfering option to the DataStax gossiping protocol.
Kubernetes is an open source automation software that helps IT and developers run their containerized applications. Running them over multicloud and hybrid clouds in a unified way is still under heavy development. This statement is even more true for the stateful data layer.
Urs Hölzle (Google SVP) said that the admin costs for Kubernetes have raised 83% while server costs dropped only 15% in his keynote at Google Next 2018.
(@uhoelzle at #GoogleNext18)
DataStax is already solving the data autonomy problems with multicloud and hybrid cloud strategies in a unique and easy way.
So- weigh the input and the output of running DataStax on Kubernetes against your needs and requirements before you end up using a technology just for the sake of its popularity.