CompanyFebruary 20, 2020

Accelerate Rewind: How to Understand Apache Cassandra Performance Through Metrics

Wei Deng
Wei Deng
Accelerate Rewind: How to Understand Apache Cassandra Performance Through Metrics

Apache Cassandra™ is a distributed database built with peer-to-peer architecture. In order to monitor the entire database, you need to be able to understand the performance of all of the nodes.

If you’re new to Cassandra, this all can be tricky. 

If you need a little help, you may want to check out this session from last year’s DataStax Accelerate. Wei Deng, a Vanguard Solutions Architect at DataStax, gave some fundamental knowledge to understand performance metrics and an overview of Cassandra performance metrics tools aimed at newcomers to the database. 

His talk covered how to begin to understand performance in a real-time database like Cassandra; the tools that are available to help you measure performance; and the most important metrics to keep track of; among other things.

Here’s a brief synopsis of Wei’s talk.

Cassandra: the nuts and bolts

First things first: a brief overview of Cassandra’s architecture.

Cassandra’s masterless architecture means that all nodes are the same. There aren’t any masters, which means every node’s performance metrics is important to collect and monitor. 

At the same time, any client can connect to any node and read and write the data it needs. Further, any node can be a coordinator—and they can also serve as a storage or replica node. This means you will need to have visibility of performance metrics at client, coordinator and storage node levels to get the full picture.

Performance = throughput + latency

When we talk about performance, we’re talking about throughput, which is the rate of operations, and latency, which is the time it takes for one operation to complete.

What happens, though, when you have millions of operations per hour—or even millions of operations per second? How can you measure and record performance in high-velocity environments?

It’s not as hard as it might sound. 

In his Accelerate session, Wei explains how you can use data structure like histograms to track latency metrics across large volumes of operations—as well as some of the pitfalls you need to avoid. Check it out.

Interested in learning more about Cassandra?

If you’re interested in learning more about Cassandra—whether you’re a newcomer to the space or an expert in it—we encourage you to head to DataStax Accelerate 2020, the world’s premier conference on Apache Cassandra.

This year, we’re hosting two events:

  • San Diego, Loews Coronado Bay, May 11–13, 2020
  • London, 133 Houndsditch, Liverpool Street, June 2–3, 2020 

Accelerate is jam-packed with all sorts of sessions designed for developers, admins, architects, managers, CTOs, and more. It’s the place where Cassandra enthusiasts from around the world come together to share ideas and best practices and talk about the future.

We’d love to see you there! For more information on Accelerate, go here.

And if you’d like to hear more from Wei’s talk, check out the full session.

Discover more

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.