Yes, you can use Apache Cassandra on AWS. Cassandra is available on AWS fully-managed through Astra DB, or self-managed via AWS Quick Start.
Apache Cassandra(™) is a leading NoSQL database, enabling developers to build massively scalable, geo distributed data applications with zero downtime. Cassandra is the database of choice for the most demanding applications on the internet including Netflix, Uber, Pinterest and thousands of the world’s leading engineering teams.
This guide will help you understand the best managed and self-managed ways to to run Cassandra on Amazon Web Services (AWS).
Managed Service: Using Astra DB on AWS
The fastest way to use Cassandra on AWS is with Astra DB, a database-as-a-service built on Cassandra, Kubernetes, Prometheus, Envoy, and other cuttting edge open source. Astra DB simplifies cloud-native application development and requires no operations or self-management. It reduces deployment time from weeks to minutes, delivering an unprecedented combination of serverless autoscaling, pay-as-you-go pricing, and an open source skillset you can take with you to any cloud provider. How does Astra DB make running on AWS easy?
Why Astra DB?
- Scale-up to petabytes of data without impacting performance
- Colocate data and applications anywhere in the world - without compromising performance, availability or accessibility
- Database can be replicated across multiple data centers, availability zones, even multi-region - no leader/follower troubleshooting headaches
- Compute and storage are separated enabling apps to scale cost effectively or scale down to zero automatically
- Tunable consistency can adjust the tradeoff between availability and consistency of data on Cassandra nodes
- True serverless autoscaling eliminates manual configuration changes and guesswork on database sizing
- Deploy in 5 minutes or less: no provisioning, install, or configuration
- Fully managed database and OS updates and upgrades
- Operate in any of Astra’s globally available AWS regions and availability zones
- IaaS (Infrastructure-as-a-Service) failures handled gracefully by K8s operator to keep databases healthy
- High availability from automatic self-healing at the database level
- Fault Tolerance automatically replicates data to multiple nodes and across multiple data centers to create high fault tolerance and ensure zero data loss.
- Single region deployments with a 99.9% SLA with and 99.99% SLA for multi-region minimize both downtime and the need for site-reliability engineering
- Automated anti-entropy repair procedures
- Automated hourly backup, with snapshot storage for 20 days
- Integrated Grafana monitoring system to provide accurate and up to date measurement information about health and performance
- Skip defining the schema upfront, use Astra DB like a JSON Document store using the Document API
- Go schema-first with familiar REST, GraphQL, gRPC APIs and ramp up quickly
- Drive adoption of cloud-native architectures using a microservices and API first approach
- Absolutely no low-level AWS infrastructure knowledge required to deploy: name your database and keyspace, then select a region and you are done
- Robust, cloud-enabled language drivers in all major programming languages
- JDBC/ODBC drivers for BI and other tool integration
- Popular framework integrations (Spring Boot, Spring Data, Quarkus, and more)
- Spark Cassandra Connector
- Built in CQLSH console
- Postman Collection for Astra DB APIs
- DevOps API, Terraform Provider, Ansible Playbook for CI/CD pipeline automation
- JetBrains IDE Plugin: Astra DB Data Explorer
- Achieve data sovereignty without replication headaches with multi-region deployments
- SOC2 Compliance
- Sophisticated authentication and authorization with role based access
- Client connections use two-way certificate validation for VPN-level security from client to database (mTLS).
- All data is encrypted at rest and in motion
- AWS PrivateLink connectivity connects apps in your VPC to Astra DB
- JSON web token(JWT) based authentication to ensure secure connectivity to your Astra DB database
Self Managed Service: Cassandra on AWS EC2
Some IT organizations require complete control over their systems, or are already setup for self-managed software. With self-managed virtual machines you have that control. This control comes with all the associated effort and expense, and is a tradeoff that should be considered carefully.
Self Managed Service: K8ssandra on AWS EKS
K8ssandra is a cloud native distribution of Apache Cassandra® that runs on Kubernetes and AWS EKS. K8ssandra provides an ecosystem of tools to provide richer data APIs and automated operations alongside Cassandra. This includes metrics monitoring to promote observability, data anti-entropy services to support reliability, and backup / restore tools to support high availability and disaster recovery. As part of K8ssandra’s installation process, all of these components are installed and wired together, freeing you from having to perform the tedious plumbing of components:
- Apache Cassandra
- Stargate, the open-source data gateway
- Cass-operator, the Kubernetes Operator for Apache Cassandra
- Reaper for Apache Cassandra, an anti-entropy repair feature (plus reaper-operator)
- Medusa for Apache Cassandra for backup and restore (plus medusa-operator)
- Metrics Collector for Apache Cassandra, with Prometheus integration, and visualization via pre-configured Grafana dashboards
Which one is the most efficent way of running Cassandra on AWS?
This answer depends on your requirements, your existing investments, your staff and their skills - a host of factors.
In general, we recommend Astra DB for the vast majority of Cassandra use cases. You can be ready to go in minutes, freed from operational, security and scalability concerns
All but the most demanding, security-conscious applications will be served by environments like Astra DB that are already compliant to common security standards, saving months or even years of effort, to say nothing of expense.
Startups and enterprises alike who do not want to, or cannot, dive deep into database administration and configuration, should opt for Astra DB.
Self managing databases of Kubernetes is less efficient than DBaaS, but may be driven by preexisting organizational proficiency with Kubernetes. K8s managed services like AWS EKS and K8ssandra not only make running system-of-engagement databases on Kubernates possible, but can significantly ease the burden on SRE/Ops teams.
Self managing IaaS is the least efficient option relative to DBaaS, but may be driven by a need to self-manage for regulatory reasons or the need to interoperate with proprietary or custom systems. Alternatively, a self-managed IaaS may involve the nature of an existing application, being migrated to the cloud. Your application may simply not require, or be ready for, a cloud-native architecture.
To deploy Cassandra on AWS, you can either:
- Set up a new cluster on Astra DB or migrate an existing self-managed Cassandra deployment to AWS.
- Use the AWS Quick Start to build a new self-managed Cassandra cluster yourself.
Astra DB has a free tier of $25 free credits monthly giving developers up to 80 gigabytes of free storage or up to 20 million read/writes each month. Astra DB is serverless so that you are only billed for what you use. If you’re managing your own cluster, your AWS pricing for the resources it uses will apply.
Astra DB is a fully managed, serverless, multi-cloud database as a service powered by Apache Cassandra™.
Features of Astra DB managed Cassandra on AWS
Serverless Database Built on Apache Cassandra™
Scale database resources in and out on demand to match application requirements and traffic so that you pay only for what you use. Put the power of Cassandra in the hands of every developer without ever worrying about managing the infrastructure.
Data replication across multiple data centers, availability zones, and multi-region. Scale-up to petabytes of data without impacting performance. The Astra service is resilient and highly available to minimize both downtime and the need for site-reliability engineering.
All data is encrypted at rest and in motion. Sophisticated authentication and authorization with role based access. Client connections use two-way certificate validation for VPN-level security from client to database. Private connectivity options like VPC peering upon request. JSON web token(JWT) based authentication to ensure secure connectivity to your Astra DB database.
Fully managed database and OS updates and upgrades. IaaS (Infrastructure-as-a-Service) failures handled gracefully by K8s operator to keep databases healthy. Eliminate anti-entropy repair procedures. Auto scaling eliminates manual configuration changes and guesswork on database sizing.