email iconemail phone iconcall



Darren Bathgate

Benchmarking DataStax Enterprise (DSE) on AWS: A Guide to Raising Performance and Reducing Costs

By Darren BathgateMay 31, 2017

About the Author: Darren Bathgate is a technical architect at Kenzan based out of Providence, Rhode Island. During his six years at Kenzan, Darren has designed data models for relational SQL databases, including MySQL and Oracle, and has optimized query performance on legacy databases. He has also built reactive pipelines using Hadoop, Spark, and Cassandra. Darren is a graduate of the New England Institute of Technology, where he studied software engineering and received his Master’s degree in information technology.

You chose DataStax Enterprise (DSE)—the always-on data platform, powered by the best distribution of Apache Cassandra™—to power your data-driven application. And you’re running your DSE cluster on AWS to gain the flexibility and scalability offered by Amazon’s cloud computing services. You’re probably feeling pretty good about your choices. But what if the finance department emails to let you know you’ve just exceeded your AWS budget for the quarter—and the quarter’s not over yet? Chances are you’re not feeling so good anymore.

At Kenzan, we’ve heard stories like these from our customers more than a few times. We’re a professional services software company, and we leverage cloud infrastructure to build scalable, data-driven solutions that help organizations achieve digital transformation by aligning their technical strategy with their business goals. DSE and AWS are often key components of those solutions.

One of the advantages of DSE is its excellent scalability in relation to the hardware it’s running on. If performance starts to lag, adding more AWS EC2 instances to increase compute power is an easy solution. While still ultimately much more affordable and of course continuously available in contrast to other options like relational databases, this can be more costly than required if  best practices are not followed. Adding more instances can quickly drive up monthly operating costs. What’s more, the type of instance you initially choose for your cluster has implications for expenses in the long term.

Amazon offers a plethora of EC2 instance types, but it’s often unclear which ones yield the best price-to-performance ratio for your business. You might find yourself looking at two instance types that seem similar in terms of specifications and hardware. But what if it turns out that one of those instance types costs 300% more to operate but delivers 66% lower read throughput for DSE?

This isn’t a hypothetical situation. We uncovered this exact scenario with our customers. To provide guidance to our customers, we have teamed up with DataStax on our white paper, Benchmarking DataStax Enterprise (DSE) on AWS: A Guide to Raising Performance and Reducing Costs. At Kenzan, our customers are often worried about under-provisioning their applications and DSE clusters, but as the analysis in our paper shows, over-provisioning or wrong-provisioning are problems, too. If you don’t choose the right instance types to back your DSE cluster, you can end up spending a lot more than you should for less performance than you want.

All of this means that it’s crucial to make the right choices when setting up your DSE cloud infrastructure. But how can you know which instance types will give you the best cost-to-benefit ratio, both now and as your business grows? That’s where benchmarking helps. For our paper, we conducted DSE stress tests on an array of hardware configurations, and we simulated various real-world workflows, all to see which EC2 instance types deliver the best performance at the lowest costs.

While the results are illuminating—and as noted above, sometimes surprising—the goal of the paper is not simply to recommend the “best” instance type that everyone should choose. In our experience working with customers, we’ve learned that each organization or business has its own unique needs. Instead, our hope is to provide a blueprint that you can follow to benchmark DSE yourself.

In our paper, you’ll find a test bench setup as well as practical examples of designing and running stress tests. We also offer tips to help you avoid benchmarking pitfalls. As we show, tests can sometimes give misleading results, and you need to construct your tests carefully. But if you do, you’ll have the information you need to choose the most performant and cost-effective AWS instances to back your DSE installation, your workflows, and your business.

So check out Benchmarking DataStax Enterprise (DSE) on AWS, and then let us know about your experiences stress testing DSE and what instance types you’ve decided to go with. In the meantime, happy benchmarking!



Your email address will not be published. Required fields are marked *