Robin Schumacher

Why We Added In-Memory to Cassandra

By Robin SchumacherFebruary 27, 2014

With DataStax Enterprise (DSE) 4.0, we’ve included a new in-memory option for Cassandra that helps bring all the benefits of Cassandra that you appreciate (e.g. multi-data center/cloud support, flexible data model, liner scale performance, no single point of failure, etc.) to an in-memory database. With the new in-memory option, it’s now very easy for you to make data assignments based on the application performance need: spinning disk for lesser performance SLA workloads; SSD’s for hotter data; and now in-memory for the fastest possible response times, all within the same database cluster.

We’re not the first NoSQL database to supply an in-memory option, but we are one of the few, if not the only one, to supply the ability to handle both big data scale and in-memory use cases that are becoming more common in online applications. For example, a customer of ours like eBay who runs hundreds of TB’s of data in their DSE clusters (with one application having a single Cassandra table that is 40TB in size!) can handle those massive big data workloads alongside other use cases that have very low latency in-memory requirements.

A Balanced View of In-Memory

To some people, in-memory is the elusive FAST=TRUE parameter that all databases need, cures cancer, and gets you more dates on

Not exactly.

I’ve personally watched a massive system I managed (a serious trading app handling millions of $/hour) be brought to its knees even though literally no disk I/O was occurring at the time. Being disk-bound is only one performance hurdle that database admins have to avoid.

A more balanced view of in-memory computing sees it as yet another tool in our Batman utility belt that is able to satisfy certain use cases and data traffic patterns that an application contains.  Used properly, yes, in-memory can make a big difference in a customer’s experience with an online app. But when in-memory is abused or not understood, it can compound an already bad situation and make things worse.

This is especially true when it comes to satisfying the needs of today’s online apps with technology not suited for the job – even if that technology is in-memory enabled.

Why In-Memory and NoSQL Matter

When it comes to online apps, I’m impatient and so are you. And boy does it show.

Statistics like a one second decrease in page load time costs Amazon $1.6 billion in sales is not the stuff of urban legends. Just last year, a study showed that in the UK alone, online retailers lost 8.5 billion euros due to slow online application issues, which is literally $1 million for every $10 million in online sales.

In other words, if your online app is slow, you’re losing money at a rapid rate.

This being the case, it’s no surprise that companies are looking for ways to increase the speed at which they deliver and consume data from their customers. In-memory computing, which has been around for quite a long time in the relational world (e.g. TimesTen, etc.) has become an object of great affection for IT staffs who are on the hook for making lightening fast online apps.

There’s just one problem. An in-memory RDBMS is still an RDBMS.

In-memory or not, it’s still going to suffer from all the drawbacks that make relational technology ineffective for today’s online apps. It will still have a rigid vs. flexible data model, be architectured in a way that is master/slave vs. masterless and so will have multiple points of failure, be unable to truly scale out in linear fashion, and be impotent at supporting multi-data centers and cloud availability zones in a way that today’s online apps need.

This is where NoSQL technology like Cassandra + In-Memory Computing = a big difference.

Cassandra already solves the issues of continuous availability, linear scale performance, managing modern data types, and making your data available wherever you need it. Couple those facts with the additional capability of tackling in-memory use cases, and now you have something that can handle both the scale and speed needs of modern online apps, which legacy RDBMS technology just can’t do.

That’s why we added an in-memory option to Cassandra in DataStax Enterprise.

C* For Yourself

We’ve got a new technical white paper that describes Cassandra’s in-memory option in detail with benchmarks and other pertinent tech info. The paper also contains a section that helps you understand what types of workloads lend themselves to in-memory and which do not, so you’ll have a better idea of exactly where to step and when to expect real benefits with the new option.

You can download DSE 4.0 now along with OpsCenter 4.1 and try out all that’s new in our latest releases, including the in-memory option, for yourself.



  1. Deepak Nulu says:

    It is great to have this new in-memory capability.

    My understanding is that DataStax is a big (and probably main) contributor to the Apache version of Cassandra. So I am wondering how DataStax determines if a feature will be available only in DSE or also in Apache Cassandra.

    It is obvious for features that are external to Cassandra or involve integrating external products with Cassandra: it belongs in DSE because such features are external to Cassandra.

    But it is not so obvious for a feature like in-memory support, especially given that it requires an update to the CQL language.

    So how does DataStax determine this? Will in-memory support make its way into Apache Cassandra?



  2. Robin Schumacher Robin Schumacher says:

    Hi Deepak –

    When you’re both an open source and commercial software company, deciding what goes into OSS and what stays commercial is part art and part science. I’ve helped spearhead these decisions at MySQL, Postgres, and now here at DataStax, and it’s a process you constantly work at refining.

    First, remember we don’t own Cassandra. Any contributor can develop a feature and have it included in Cassandra with the community’s blessing.

    Where DataStax is concerned, we start by asking the fundamental questions of (1) Is the proposed feature basic and foundational to the software? (2) Will the proposed feature be deployed by the typical OSS user or is it something that appeals only to enterprises that must meet certain standards and solve more involved use cases than what’s found in general OSS deployments?

    For example, in DSE 3.0 we included a very comprehensive security feature set. Four security features (internal auth, permission mgmt, client-to-node and node-to-node encryption,) were given back to the community because we believed these to be basic and foundational security requirements for a database. However, we included external auth, TDE, and data auditing only in DSE. Why? Most OSS users we talked to don’t care about things like integrating with Kerberos or auditing DB activity. But do major government and financial institutions? You betcha.

    Lastly, it’s always possible that “enterprise extensions” like those above or our in-memory option will be given back to the community when and if they become more necessary to typical OSS deployments.

    Thanks for taking the time to write.

  3. Deepak Nulu says:

    Hi Robin,

    Thank you very much for your thoughtful and detailed response.

    If you are keeping track of requests for features in the community edition, please count me in for the in-memory feature.



  4. Hanson says:

    The DataStax Enterprise 4.0 Doc says there is 1GB size limited for the in-memory table:
    Any plan to extend it to a large size, such as 64GB?


  5. Dave says:

    By “adding in-memory to Cassandra”, does that mean you have added a caching layer (like memcache or redis) on top of Cassandra?

  6. Robin Schumacher Robin Schumacher says:

    No, there is no additional layer; you can create C* tables that reside in memory at all times vs. the traditional mem/disk design. All data for in-memory tables is written to the commit log and flushed to disk to ensure data protection and persistence.

  7. Yuriy Gavrilov says:

    How about DSE 5.0 and graph option – in-memory?
    is it managed by schema option or there are some other futures / limits ? is there some performance tests of it?


Your email address will not be published. Required fields are marked *

Tel. +1 (408) 933-3120 Offices France GermanyJapan

DataStax Enterprise is powered by the best distribution of Apache Cassandra™.

© 2017 DataStax, All Rights Reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache Cassandra, Apache, Tomcat, Lucene, Solr, Hadoop, Spark, TinkerPop, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.