What is persistence and why does it matter?

By Matt Pfeil -  October 22, 2010 | 7 Comments

Understanding the meaning of persistence is important for evaluating different data store systems. Given the importance of the data store in most modern applications, making a poorly informed choice could mean substantial downtime or loss of data. In this post, we'll discuss persistence and data store design approaches and provide some background on these in the context of Cassandra.

Persistence is "the continuance of an effect after its cause is removed". In the context of storing data in a computer system, this means that the data survives after the process with which it was created has ended. In other words, for a data store to be considered persistent, it must write to non-volatile storage.

If you need persistence in your data store, then you need to also understand the four main design approaches that a data store can take and how (or if) these designs provide persistence:

  • Pure in-memory, no persistence at all, such as memcached or Scalaris
  • In-memory with periodic snapshots, such as Oracle Coherence or Redis
  • Disk-based with update-in-place writes, such as MySQL ISAM or MongoDB
  • Commitlog-based, such as all traditional OLTP databases (Oracle,
    SQL Server, etc.)

In-memory approaches can achieve blazing speed, but at the cost of being limited to a relatively small data set. Most workloads have relatively small "hot" (active) subset of their total data; systems that require the whole dataset to fit in memory rather than just the active part are fine for caches but a bad fit for most other applications. Because the data is in memory only, it will not survive process termination. Therefore these types of data stores are not considered persistent.

The easiest way to add persistence to an in-memory system is with periodic snapshots to disk at a configurable interval. Thus, you can lose up to that interval's worth of updates.

Update-in-place and commitlog-based systems store to non-volatile memory immediately, but only commitlog-based persistence provides Durability -- the D in ACID -- with every write persisted before success is returned to the client.

Cassandra implements a commit-log based persistence design, but at the same time provides for tunable levels of durability. This allows you do decide what the right trade off is between safety and performance. You can choose, for each write operation, to wait for that update to be buffered to memory, written to disk on a single machine, written to disk on multiple machines, or even written to disk on multiple machines in different data centers. Or, you can choose to accept writes a quickly as possible, acknowledging their receipt immediately before they have even been fully deserialized from the network.

At the end of the day, you're the only one who knows what the right performance/durability trade off is for your data. Making an informed decision on data store technologies is critical to addressing this tradeoff on your terms. Because Cassandra provides such tuneability, it is a logical choice for systems with a need for a durable, performant data store.









DataStax has many ways for you to advance in your career and knowledge.

You can take free classes, get certified, or read one of our many white papers.



register for classes

get certified

DBA's Guide to NoSQL







Comments

Your email address will not be published. Required fields are marked *




Subscribe for newsletter:

Tel. +1 (408) 933-3120 sales@datastax.com Offices France Germany

DataStax Enterprise is powered by the best distribution of Apache Cassandra™.

© 2017 DataStax, All Rights Reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache Cassandra, Apache, Tomcat, Lucene, Solr, Hadoop, Spark, TinkerPop, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.