Developer Newsletter: Simplify your Cassandra Data Model with Better Indexing
This issue is guest edited by Rebecca Mills (@rebccamills), DataStax Developer Learning:
If you have been working with databases for a while, indexing is probably a familiar concept.
Database indexes enhance your data model and make your queries more efficient. Although Cassandra has had secondary indexes for a long time, indexing in itself is generally associated with several tradeoffs and problems. Many Cassandra experts have recommended avoiding use of indexing because of these tradeoffs, and as a result, we as a community have emphasized using denormalization to maximize performance of our queries.
The two previous secondary indexing implementations in Cassandra are Storage Attached Secondary Indexing (SASI) and Secondary Indexes (or 2i for short). The two main challenges with these implementations have been (1) write amplification and (2) index size on disk. SAI represents a huge improvement to both of these pain-points.
As Jonathan Lacefield wrote in his recent blog, the new Storage Attached Index (SAI) addresses these issues, while also creating opportunities for more flexible queries in Cassandra. SAI has been designed with a format sympathetic to Cassandra’s SSTables to use significantly less disk space. Through extensive testing and optimization, SAI supports faster writes than Cassandra or DSE Search indexes.
Give SAI a try in your free Astra cluster. SAI is also available in DataStax Enterprise 6.8.3. For a hands on learning experience, check out the new Cassandra Indexing Skills Page on our Developer site, and read up on more details in the Astra SAI Documentation.
What’s next for SAI? DataStax has submitted the Apache CEP to bring this functionality to the Apache version of Cassandra. We’d love your feedback to help refine this feature for the benefit of the worldwide Cassandra community.
Example of the Week
Our featured example for this week is a quick Storage-Attached Indexing demo. Download the schema and data set, open your cqlsh and follow along with Patricia Gorla (pgorla), Solutions Architect at Datastax, as she walks you through the basics of SAI:
- Watch the Expect Advice video Storage-Attached Indexing: A Brief Overview
- See the code and walkthrough here: Storage-Attached Index Demo
- See the video walkthrough here: YouTube video
- September 23: Cloud-native Cassandra Developer Workshop: Building Cassandra Microservices with Spring
- September 24: Cloud-native Cassandra Developer Workshop: Building Cassandra Microservices with Spring
- September 30: Cloud-native Cassandra Developer Workshop: Introduction to Cassandra
- October 1: Cloud-native Cassandra Developer Workshop: Introduction to Cassandra
Community Highlights from Cassandra.Link
- JHipster - Automatically generate a data access layer, API, and a front end via Angular/React for data in Cassandra! (Blueprints available in Kotlin, .NET, as well Java)
- Cassandra Lunch Recordings on Youtube - Weekly recordings from an informal Cassandra meetup (Cassandra & Datastax DC & Cassandra Chicago) on Zoom, recordings available on Youtube. Join any Wednesday 11PM CST/12PM EST
- Diagnostic Collection Tool - Analyzing the issues on a Cassandra / DataStax cluster is not always possible online. Here’s a very useful script to gather logs/conf from a cluster.
DataStax’s Chief Strategy Officer, Sam Ramji (@sramj)i is hosting a new podcast series called Open||Source||Data that just launched this week. He’ll explore open-source data, open-source software, data on Kubernetes, data in DevOps, and data in AI with old friends and new friends. Don’t miss out on the first podcast from Patricia Boswell and upcoming podcasts from Matt Asay, Rachel Chalmers, and Kelsey Hightower by subscribing on Spotify, Apple Podcasts, or Google podcasts.