CompanyMarch 31, 2021

Four Steps to Migrate Live Data from Apache Cassandra to Astra with Zero Downtime

Aaron Ploetz
Aaron PloetzDeveloper Relations, DataStax
Four Steps to Migrate Live Data from Apache Cassandra to Astra with Zero Downtime

We are excited to announce today the general availability of the new DataStax Zero-Downtime Cloud Migration tool that enables organizations to seamlessly migrate live data from self-managed Apache Cassandra™ instances to DataStax Astra with no downtime.

Astra is the first and only open, multi-cloud serverless DBaaS. With Astra, enterprises only pay for the resources they consume (e.g., reads, writes, and storage), instead of having to size and pay for a database by predicting peak usage. Astra can speed up application development, streamline operations, and deliver TCO savings of up to 75% over non-serverless database workloads, according to the recent study by GigaOm.

What is Zero Downtime Cassandra to Astra Migration?

Many enterprises that run on OSS Cassandra are moving to Astra to take advantage of the robust cloud platform and features Astra Database-as-a-Service (DBaaS) has to offer. 

Typically, these existing Cassandra applications run mission-critical operational business use cases, so having a maintenance window to perform an offline migration is generally not an option. Achieving a zero-downtime migration between clusters normally requires extensive and intrusive modifications to each client application being migrated. As a result, it is a convoluted, costly, time-consuming, and error-prone process.

The Zero Downtime Live Migration offers radical changes to the migration process. This offering helps enterprises seamlessly migrate their Cassandra applications to Astra with zero application downtime and minimal, non-invasive configuration changes to client applications.

Zero Downtime Migration Tooling

The heart of this Zero migration is a cloud Proxy, which eliminates the need for application code changes, enabling applications to continue working without interruption while historical data is migrated separately. 

The proxy is a standalone component that runs between the client application and the existing cluster (Origin) and the target (Astra). Its role is to intercept all real-time application requests and route all writes to both clusters, and all reads to the Origin one. This way, reads are fulfilled by the Origin cluster while writes are performed on both clusters, therefore keeping them continuously up-to-date.

While the Proxy is in action, the existing data in the origin cluster is migrated to Astra by using either a DSBulk-based utility or a Spark-based migrator. The DSBulk-based utility orchestrates DSBulk to easily perform the migration of all desired tables efficiently and with a configurable level of concurrency. The Spark-based migrator benefits from Spark's excellent parallelization capabilities and is ideal when the data being migrated needs to be filtered through custom logic based on the user's requirements.

Four Steps to Migrate Your Cassandra Database to Astra

1) Proxy Deployment

  • Establish connectivity to Astra
  • Deploy and configure the Proxy 
  • Create cassandra schema in Astra
  • Point application to Proxy 
  • Enable Dual Writes to Origin DB and Astra; enable Reads from Origin

2) Historical Data Migration

  • Migrate existing data from the origin cluster to Astra using either
    • DSBulk-based utility 
    • Spark-based migrator 

3) Validation of Migrated Data

  • Row Count validations for each table with DSBulk 
  • Full data comparison using cassandra-diff 

4) Disable Proxy / Migration Compete

  • Disable Proxy - Application Writes & Reads to/from Astra

migration

The DataStax Zero-Downtime Migration tool is available for free, as it’s included in an Astra subscription. For more information on the fastest way to get up and running on Astra without any downtime, contact us at migrations@datastax.com

Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.