Why We Added In-Memory to Cassandra
date: February 27, 2014
With DataStax Enterprise (DSE) 4.0, we’ve included a new in-memory option for Cassandra that helps bring all the benefits of Cassandra that you appreciate (e.g. multi-data center/cloud support, flexible data model, liner scale performance, no single point of failure, etc.) to an in-memory database. With the new in-memory option, it’s now very easy for you to make data assignments based on the application performance need: spinning disk for lesser performance SLA workloads; SSD’s for hotter data; and now in-memory for the fastest possible response times, all within the same database cluster. We’re not the first NoSQL database to supply an in-memory option, but we are one of the few, if not the only one, to supply the ability to handle both big data scale and in-memory use cases that are becoming more common in online applications. For example, a customer of ours like eBay who runs hundreds of TB’s of data in their DSE clusters (with one application having a single Cassandra table that is 40TB in size!) can handle those massive big data workloads alongside other use cases that have very low latency in-memory requirements.
A Balanced View of In-Memory
To some people, in-memory is the elusive FAST=TRUE parameter that all databases need, cures cancer, and gets you more dates on match.com. Not exactly. I’ve personally watched a massive system I managed (a serious trading app handling millions of $/hour) be brought to its knees even though literally no disk I/O was occurring at the time. Being disk-bound is only one performance hurdle that database admins have to avoid. A more balanced view of in-memory computing sees it as yet another tool in our Batman utility belt that is able to satisfy certain use cases and data traffic patterns that an application contains. Used properly, yes, in-memory can make a big difference in a customer’s experience with an online app. But when in-memory is abused or not understood, it can compound an already bad situation and make things worse. This is especially true when it comes to satisfying the needs of today’s online apps with technology not suited for the job – even if that technology is in-memory enabled.
Why In-Memory and NoSQL Matter
When it comes to online apps, I’m impatient and so are you. And boy does it show. Statistics like a one second decrease in page load time costs Amazon $1.6 billion in sales is not the stuff of urban legends. Just last year, a study showed that in the UK alone, online retailers lost 8.5 billion euros due to slow online application issues, which is literally $1 million for every $10 million in online sales. In other words, if your online app is slow, you’re losing money at a rapid rate. This being the case, it’s no surprise that companies are looking for ways to increase the speed at which they deliver and consume data from their customers. In-memory computing, which has been around for quite a long time in the relational world (e.g. TimesTen, etc.) has become an object of great affection for IT staffs who are on the hook for making lightening fast online apps. There’s just one problem. An in-memory RDBMS is still an RDBMS. In-memory or not, it’s still going to suffer from all the drawbacks that make relational technology ineffective for today’s online apps. It will still have a rigid vs. flexible data model, be architectured in a way that is master/slave vs. masterless and so will have multiple points of failure, be unable to truly scale out in linear fashion, and be impotent at supporting multi-data centers and cloud availability zones in a way that today’s online apps need. This is where NoSQL technology like Cassandra + In-Memory Computing = a big difference. Cassandra already solves the issues of continuous availability, linear scale performance, managing modern data types, and making your data available wherever you need it. Couple those facts with the additional capability of tackling in-memory use cases, and now you have something that can handle both the scale and speed needs of modern online apps, which legacy RDBMS technology just can’t do. That’s why we added an in-memory option to Cassandra in DataStax Enterprise.
C* For Yourself
We’ve got a new technical white paper that describes Cassandra’s in-memory option in detail with benchmarks and other pertinent tech info. The paper also contains a section that helps you understand what types of workloads lend themselves to in-memory and which do not, so you’ll have a better idea of exactly where to step and when to expect real benefits with the new option. You can download DSE 4.0 now along with OpsCenter 4.1 and try out all that’s new in our latest releases, including the in-memory option, for yourself.