TechnologyNovember 2, 2020

Sizing Matters: Sizing Astra for Apache Cassandra Apps

Matthew Overstreet
Matthew Overstreet
Sizing Matters: Sizing Astra for Apache Cassandra Apps

This article is for anyone that knows Apache Cassandra™ and wants to understand how to size DataStax Astra for new or existing Cassandra applications. I’ll explain

  • How the service is built
  • How to choose compute and storage
  • How to replicate between geographies

TL;DR: Always start with the free tier… because it’s free. Start with C10’s for testing production apps unless you have a time-series or immutable workload (Start with D10 for these).

How the Service is Built

Apache Cassandra Replication == Scalability and Availability

Nobody wants to think about nodes and clusters, but as a Cassandra Pro, you need to know that the scalability and availability you expect is there in Astra. Good News! It is. When you deploy a database in Astra we’re delivering an RF=3 (3 replica) cluster with reads and writes at local quorum. 

The RF=3 guarantee is delivered in increments of 3 nodes across 3 availability zones. 

Every Astra cluster starts with 3 Cassandra nodes distributed across 3 availability zones. Availability zones are equivalent to “racks” in the Cassandra world. 

sizing

You would not want there to be uneven data distribution across availability zones so it makes sense that when you expand, you’re doing so in groups of 3 nodes. When you expand the cluster we call that adding a “Capacity Unit”, but you are actually adding a bucket of compute and storage resources!

sizing

How to Choose Compute and Storage

There are currently 10 types of Databases Instances in Astra. They are labeled A5 to 40, C10 to 40, and D10 to 40. When we show specs for the tiers, we are always talking about the total resources allocated per capacity unit.

The A tier is designed for development/test and smaller-scale apps. Most users will deploy an A Tier first. We’re here to talk about sizing production apps. So let's focus on the C and D tier. 

The only difference between the C and  D Tier is allocated storage. Think of C Tier as better for CPU Optimized workloads and D Tier as best suited for Storage Optimized workloads (think time series and immutable).

chart

Knowing which to choose is a three-step process. 

1. Decide if you should be considering the A tier. In addition to being multi-tenant , the A tier is very compact in the resources it delivers. It was designed for development and test, however, for some workloads it might be perfect. If the A Tier looks right for you, also understand that it lacks multi-region support as well as VPC Peering.

2. For the rest of us, the compute size is chosen based on your throughput and latency requirements. Most users start with the C10. If you know your workload well you can use the chart below to be more specific. We erred on the side of wildy conservative with this chart as there are workloads that can achieve over 20k tps on C10's and workloads that can achieve sub 10ms latencies on C10's so your mileage may vary.

sizing

3. Storage scales in 500GB Increments on the C Tier and 1.5TB Increments on the D Tier. We’re taking care of Cassandra replication behind the scenes. You only need to think about unreplicated data.  Use the chart below to choose between the C and D tiers.

sizing

The next step is to know if you should start with one database instance or expand it quickly after deployment. Astra eliminates operational considerations. You no longer need to think about 

  • Replication - It’s happening behind the scenes. You only need to think about how much unreplicated storage you need
  • Overhead for compaction and other administrative processes - Again, we’ve got this covered

If you’re migrating an existing app, divide the unreplicated total storage by either 500GB or 1.5TB and that’s the number of instances you’ll need. 

  • Example: Most users start with C10’s. If you are migrating an app that currently houses 600GB of unreplicated data, you’ll want to start with two C10’s. 

How to Replicate Between Geographies

When deploying your database you’ll be given an option to replicate it to other regions within the same cloud (AWS, GCP, Azure). As with any Cassandra cluster, you can read and write to both data centers (regions). This is a point and click option for users and handled, like everything in Astra, by Kubernetes on the backend. 

sizing

What Next?

Astra has some other features and tools that we hope will make your life easier. 

  • 5GB Free Tier. Free forever. No catches. Actually one catch. It’s RF1
  • API’s: Rest and GraphQL with Cassandra??? Yes!

Sign up now or try the pricing calculator

If you have questions, we, myself and other engineers, are behind the chat button in Astra. We’re here to make you successful. Ask us questions or give us feedback. We are here for you. 

Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.