Robin Schumacher

Enabling Internet Enterprise Applications with DataStax Enterprise 4.5

By Robin SchumacherJune 30, 2014

We’re excited to announce DataStax Enterprise (DSE) 4.5, the latest release of our always-on, production-certified distributed database platform powered by Apache Cassandra.

At DataStax, our product focus this year has been on supplying the performance needed by Internet Enterprise applications. Version 4.0 delivered earlier this year began the theme with our transactional in-memory option and DataStax Enterprise 4.5 builds on this functionality with additional capabilities and performance boosts for running analytics on Cassandra data, automated help with performance tuning, and deeper integration with existing Hadoop data warehouses / lakes.

Faster Performance for Today’s Transactional-Analytical Applications

DSE has long delivered the power to run analytics on Cassandra operational / OLTP databases and several of our customers use our built-in MapReduce, Hive, Pig, Mahout and Sqoop capabilities to handle analytics tasks for their online applications. We’ve seen a consistent demand for faster analytics from our customers who use DSE as the database for their fraud detection, recommendation engine, online advertising, retail, and other similar applications where the turnaround time needed for consuming and analyzing data is much faster than what we’ve been able to provide thus far.

DSE 4.5, sports much faster analytics on Cassandra data with the integration of Apache Spark into our scalable distributed database platform.  How much faster is Spark over our prior analytics speed? Naturally, it depends on the use case. The Spark community has special case benchmarks showing 100x improvements over queries issued with Hive, and some of our preliminary test cases are close with some scenarios seeing about a 50% speedup, while others experience anywhere from a 2-30x increase.

Not bad.

Moreover, the in-memory analytical capabilities of Spark can be combined with the in-memory OLTP option of DSE, delivering a full in-memory solution for transactional-analytical applications.

DSE 4.5 contains a fully production certified version of Spark on Cassandra with built-in high availability that protects against downtime. But, we haven’t kept all our Spark work to ourselves – our connectivity layer, datatype mapping, and performance optimization work is all being given back to the open source Cassandra and Spark communities so that everyone can benefit from that work.

Lastly, our new analytics powered by Spark sports that same workload management and isolation capabilities that makes DSE a standout among distributed databases. All Spark operations can either take place on OLTP nodes OR be separated from all OLTP and search workloads so that no competition occurs for either data or compute resources, which means running a mixed workload database is easy with DSE.

Enhanced Hadoop Analytics

DSE 4.5 includes another new analytic option that we’re calling BYOH – Bring Your Own Hadoop. While other various DBMS vendors have a Hadoop connector, our BYOH option provides the same simple back/forth data flow that they offer, but a whole lot more.

A number of our customers have a preferred Hadoop vendor they work with such as Cloudera or Hortonworks, and they want to integrate the historical data they keep in Hadoop with operational data such as the kind Cassandra holds. Our BYOH option supplies just that type of capability and allows data transfers and processing to occur between the two platforms.

However, we go a step further and provide the option of running your preferred Hadoop vendor components directly on the DSE platform. This means you can install a Hadoop distribution directly on nodes in a DSE cluster and run your Hive, Pig, MapReduce, and other routines directly on Cassandra data, while linking that data with an outside Hadoop cluster.

For example, you can run a query that joins a Cassandra table in DSE and an external Hadoop Hive table and either keep the results on DSE or send it to the outside Hadoop cluster. Our formal partnerships with Cloudera and Hortonworks have resulted in DSE being certified for both those Hadoop platforms.

Why provide the BYOH option? Three reasons. First, many modern applications require different analytic “tempos” where some analytic tasks require very fast performance while other analysis is more complicated and longer running. DSE now supports both “tempos” very easily with our integrated Spark component and support for both built-in and external Hadoop.

Second, while DSE’s built-in MapReduce, Hive, Pig, etc., work fine against Cassandra data, many customers want to work with their preferred flavor and version of Hadoop on Cassandra. With our new BYOH option, now they can.

And third, certain use cases require linking together hot/operational data with historical information, which can now easily be accomplished with DSE 4.5.

New Performance Service

We began the rollout of our DataStax Management Services in DSE 3.2 where we delivered automatic services that ensures data is consistent across a cluster (automated repair) and that capacity planning activities – historical trend analysis and forecasting of future resource needs – are easy to carry out. With DSE 4.5, we’re providing another new service that helps our customers find and troubleshoot performance issues very quickly.

While there are numerous raw performance metrics that Cassandra provides, they aren’t organized in a manner that easily facilities performance tuning unless you use a visual monitoring tool like DataStax OpsCenter. In DSE 4.5, our new performance service creates and maintains a specialized data dictionary of diagnostic tables that can be accessed via CQL from any CQL-enabled tool (e.g. cqlsh, DataStax DevCenter, etc.)

The Performance Service supplies a set of tables that help you start at a high level and answer the question “do I have a performance issue” and then drill down to find out what’s causing the issue, who’s causing the issue, what objects are affected, etc. If you’re used to using performance tools like Oracle’s V$ views, SQL Server’s dynamic performance views, etc., you’ll be right at home with our new diagnostic data dictionary.

In addition, the Performance Service allows you to automatically pluck out “bad queries” (the definition of which you can customize) from all statements issued against a DSE cluster and catalog them so you can automatically find the needles in the haystack that are consuming the most resources on your systems.

Resources and Downloads

There are plenty of resources available on our website to get you up to speed with DSE 4.5, including a new white paper describing DSE’s enhanced analytics, our updated documentation, downloads of DSE (free to use in non-production environments with no strings attached) and much more.

We believe DSE 4.5 continues to set the bar for distributed databases that power Internet Enterprise-style applications. If your company is focused on using modern Web and mobile applications to engage your customers and power your business, then contact us today and see what DataStax Enterprise can do for you.



Your email address will not be published. Required fields are marked *

Tel. +1 (650) 389-6000 Offices France GermanyJapan

DataStax Enterprise is powered by the best distribution of Apache Cassandra™.

© 2018 DataStax, All Rights Reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache Cassandra, Apache, Tomcat, Lucene, Solr, Hadoop, Spark, TinkerPop, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.