Improve Data Center Cost Efficiency with DSE Tiered StorageJuly 19, 2016
Two factors critical to the success of any Cassandra / DataStax Enterprise implementation are: (1) a well architected data model; (2) fast storage. While the first challenge is typically solved on a per-application basis, the second is somewhat more general and straightforward to address. Using locally attached solid-state drive (SSD) storage on dense conventional Linux servers is a good first step.
SSDs have quickly become the de-facto standard in database applications due to the sheer power of the cost per input/output operation (IOP) compared to legacy mechanical hard drives (HDDs). A recommended DataStax Enterprise (DSE) best practice is to use SSDs for all data storage because the NAND-Flash chips that power SSDs provide extremely low-latency response times for random reads while supplying ample sequential write performance for Cassandra maintenance operations (e.g. compactions & anti-entropy repair).
However, there are use cases where we can judiciously reduce the amount of input/output (I/O) that the system needs to do based on access patterns, and therefore reduce the dependency on extremely fast storage, which in turn enables cost savings.
DSE Tiered Storage, new in DSE 5.0 and available to Standard and Max subscribers, was designed to take advantage of these optimizations in a way that allows operators to translate reduced I/O demands into storage cost savings.
When the data that is most frequently read or updated, is the data that was most recently written, and we are primarily keeping older data around for archival reasons, DSE can transparently age off data from a fast SSD tier to a tier on slower storage that will be less performance sensitive. Two common use cases where DSE Tiered Storage can be leveraged are time series data and social interaction data:
Time Series / Internet of Things (IOT)
- A sensor writes measurements into DSE
- Those measurements will tend not to be updated after they’re written
- Older measurements are not read as frequently as newer measurements
Social Media Feed
- Very temporally sensitive
- The most recent activity is inherently the most important and therefore most frequently accessed
To leverage DSE Tiered Storage, an administrator first defines storage strategies in a DSE configuration file that identify the various SSDs and HDDs available. The administrator then applies those strategies to selected tables that hold data in DSE, along with a timing metric that tells DSE when to move data in a table from one storage location to another. After that, DSE takes care of everything.
In addition, DSE Tiered Storage supplies intelligent performance metrics so it is possible to see exactly how each storage tier is performing and how often the data is being accessed in each tier. This helps operations staff make any necessary tweaks to their tiered storage strategies.
DSE Tiered Storage is just one of a number of exciting new features in DSE 5.0. Check out our recent blogs on the exciting new features released with DSE 5.0:
- Introducing DataStax Enterprise 5.0 – This release is big. Really, really big.
- Clusters on Cruise Control with OpsCenter 6.0
SHARE THIS PAGE