Modernize your Cassandra App with a 3-Step Migration to the Astra DbaaS Using Apache Spark

Modernize your Cassandra App with a 3-Step Migration to the Astra DbaaS Using Apache Spark

When we launched Astra, the cloud-native service for Apache Cassandra™ applications, we also open-sourced portions of the architecture and tooling that backs it. Today, we continue our promise to lead with code and although your first Cassandra Application Migration to Astra is free, the knowledge for how to migrate Cassandra applications is not a secret.

One time migration of data from your own Cassandra cluster to Astra is the simplest way to migrate an application and is what we’ll show today. A Cassandra migration tool will soon be available for live migrations and you can sign up for the beta today.

  • Create your Keyspace: Table in the destination cluster and download the connection bundle
  • Locally, install Apache Spark and the new version of DataStax's Spark Cassandra Connector (open-source).
  • Use the DataFrames API to push your data to the remote Keyspace: Table
  • Nobody would believe us if we said it was only three steps, so step 4 shows you how to verify your data on the destination cluster

1. In this example, we’ll use a table from a popular workload in NoSQLBench in the source cluster. The table has the following structure:


CREATE TABLE baselines.iot (
    machine_id uuid,
    sensor_name text,
    time timestamp,
    data text,
    sensor_value double,
    station_id uuid,
    PRIMARY KEY ((machine_id, sensor_name), time)
) WITH CLUSTERING ORDER BY (time DESC);

To migrate data, create a corresponding table in the destination cluster in Astra, which could be done directly in Astra's UI, via "CQL Console" tab (I've created it with the name test.iot)

To connect to a destination database, download your access credentials as a secure connect bundle for your Cassandra cluster created on DataStax Astra (in my case, it was stored as ~/secure-connect-test.zip)

2. After the secure bundle is downloaded, start Spark with the Spark Cassandra Connector. Your Spark configuration could be as simple as a single Spark node in local master mode. I'm using Spark 2.4.5, compiled with Scala 2.11, so my command line looks like the following:


bin/spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.1 \
  --files ~/secure-connect-test.zip

3. After Spark Shell has started, a standard DataFrame API can be used to read the data from the source cluster - in my case the contact point is 10.101.34.176. To write data to the destination cluster in Astra, provide a name of file with the secure connect bundle, as well as the username and password:


val df = spark
  .read
  .format("org.apache.spark.sql.cassandra")
  .options(Map(
    "keyspace" -> "baselines",
    "table" -> "iot",
    "spark.cassandra.connection.host" -> "10.101.34.176"
  ))
  .load

df.write
  .format("org.apache.spark.sql.cassandra")
  .options(Map(
    "keyspace" -> "test",
    "table" -> "iot",
    "spark.cassandra.connection.config.cloud.path" -> "secure-connect-test.zip",
    "spark.cassandra.auth.password" -> "123456", 
    "spark.cassandra.auth.username" -> "test"
  ))
  .save

4. That's all! We can check that the data successfully copied by comparing the number of rows in the both tables:


val srcCount = df.count
val destCount = spark.read.format("org.apache.spark.sql.cassandra")
  .options(Map(
    "keyspace" -> "test",
    "table" -> "iot",
    "spark.cassandra.connection.config.cloud.path" -> "secure-connect-test.zip",
    "spark.cassandra.auth.password" -> "123456", 
    "spark.cassandra.auth.username" -> "test"
  )).count 
println("source count" + srcCount)
println("destination count" + destCount)

Try this yourself by using Spark Cassandra Connector 2.5.1with Apache Spark and easily migrate your existing database to DataStax Astra!

P.S. Of course, we can also use the RDD API to perform a migration, but it's more verbose, like this:


import com.datastax.spark.connector._
import com.datastax.spark.connector.cql._
val sourceConnection = CassandraConnector(sc.getConf.set("spark.cassandra.connection.host", "10.101.34.176"))
val destConnection = CassandraConnector(sc.getConf
  .set("spark.cassandra.connection.config.cloud.path", "secure-connect-test.zip")
  .set("spark.cassandra.auth.username", "test")
  .set("spark.cassandra.auth.password", "123456"))

val rdd = {
  implicit val c = sourceConnection
  sc.cassandraTable("baselines","iot")
}

{
  implicit val c = destConnection
  rdd.saveToCassandra("test","iot")
}

Not all application migrations are this easy, but that’s what we are here for.

Migrating your first Cassandra application is completely free, no catches.

If you have ideas for more technical content from our OSS team, let us know at osc-team@datastax.com.

Open-Source, Scale-Out, Cloud-Native NoSQL Database

DataStax is scale-out NoSQL built on Apache Cassandra.™ Handle any workload with zero downtime and zero lock-in at global scale.

Get started for free
Astra Product Screen