email iconemail phone iconcall
Robin Schumacher, SVP and Chief Product Officer

Introducing DataStax Enterprise 5.0 – This release is big. Really, really big.

By Robin Schumacher, SVP and Chief Product OfficerJune 21, 2016

I’ve been privileged to work at DataStax for nearly five years, and over that time, we’ve delivered numerous software releases, which contained exciting innovations that were “firsts” in the NoSQL database market. Today, we announced the latest release of our distributed database solution – DataStax Enterprise (DSE) 5.0 and OpsCenter 6.0.

How can I sum up this release? It’s big. Really big. Really, really big. In fact, it’s the largest release in our company’s history, both in terms of major new functionality and improvements to existing capabilities.

From a high level perspective, our two goals for this release were to:

  1. Deliver a multi-model database platform that supports the multi-faceted data management needs of today’s cloud applications.
  2. Simplify the development, protection, management, and monitoring of DSE so that cloud applications can more quickly and easily be brought to market, secured and maintained.  

I’ll do my best to take you on a short and sweet “What’s New” tour of our latest release so you can get a feel for what we’ve done to accomplish these two objectives. You can also watch our short video that covers many of our new features. 

Multi-Model with Graph

Today’s cloud applications [1] are intensely multi-dimensional. For example, a modern retail cloud application includes various software components such as product catalogs, user profile management, fraud detection, recommendation and personalization engines, shopping cart, clickstream/log analysis, and more.

The new normal is for each of these components to have distinct data model support requirements. That being true, a database that provides adaptive data management (or multi-model) functionality will deliver a simpler and more agile solution for quickly bringing cloud applications to market, and give the application owner one vendor they can easily work with, instead of many.

With DSE 5.0, we now support four different data models that all persist their data to Cassandra: (1) Key-value; (2) Tabular; (3) Document / JSON; (4) Graph. It’s important to understand two things about this new multi-model support.

First, each model inherits all the power and benefits that you enjoy today with Cassandra – continuous availability, linear scale performance, the best multi-data center and cloud zone support available, operational simplicity, and more.

Second, each model also benefits from all commercial DSE extensions including integrated analytics and enterprise search, advanced security, visual management and monitoring, etc.

Where graph support is concerned, DSE Graph delivers what cloud applications need to manage complex and highly connected data. DSE 5.0 delivers a complete solution for graph applications that consists of:

  • DataStax Enterprise Server 5.0 with native graph data model support.
  • OpsCenter 6.0 with support for creating, managing, and monitoring graph databases.
  • DataStax Enterprise Drivers, which fully support the Gremlin language as well as all other APIs used in DataStax Enterprise (e.g. CQL).
  • DataStax Studio 1.0, which is a visual developer tool for visualizing and querying graph databases.

Speaking of DataStax Studio, we’re excited about our new web-based visual tool that’s built to help you visually interact with DSE Graph. DataStax Studio lets you easily write Gremlin queries and visualize your graph data in a variety of formats.

DataStax Studio

This first version of DataStax Studio only supports DSE Graph, but upcoming versions will also support CQL as well as APIs for DSE Search and DSE Analytics.

Automation and Simplification

Beyond multi-model and graph, DSE 5.0 and OpsCenter 6.0 deliver much in the way of server-based automation and simplification that are squarely aimed at transparently solving data-centric business problems that have previously required a little more manual work than we desired.

For example, certain applications – especially those in the retail and energy markets – need specialized forms of data distribution that rely on a hub-and-spoke topology (also referred to as an “edge of the internet” model). These systems have to constantly update central data collection sites with information originally collected and stored at numerous locations around the world. While Cassandra sets the standard for modern data replication and distribution, it falls short where easily supporting this type of design is concerned.

Enter DSE Advanced Replication. It builds on Cassandra’s gold replication standard by providing multi-cluster replication from numerous endpoints to a centralized location that is used for data aggregation and analysis.

Another new area of automation in DSE revolves around the need to smartly utilize the right storage for the right data “temperature”. For data that doesn’t require high-speed access (e.g. data that ages and is no longer ‘hot’), a recurring move of that data to less expensive HDDs can help reduce overall hardware spend. However, at issue is how to intelligently perform the constant relocation of that data in a way that doesn’t require continuous interaction and supervision from IT staff.  

To help, we’re introducing DSE Tiered Storage. DSE Tiered Storage transparently shifts older, infrequently accessed data from high performance SSDs to slower more economically-friendly HDDs based on your criteria, and does so in a performant and efficient manner.

A third improvement made in DSE where simplifying operations is concerned deals with environments that operate with ‘large’ hardware vs. smaller commodity machines. Scaling a database like Cassandra out across bigger boxes can be challenging because you need to effectively utilize the system resources on each machine while ensuring your design still meets any disaster avoidance requirements.

This is where DSE Multi-Instance comes into play. DSE Multi-instance lets you run multiple DSE instances (database processes) on individual hosts without the need for a virtualization or container layer. This allows each instance to consume a share of the host’s physical resources, thereby increasing system utilization and by extension, data center efficiency. Furthermore, DSE Multi-Instance support maintains replica placement safety so that a catastrophic failure on one physical host won’t impact more than one replica of any given partition.   

Any discussion of increased automation and simplification in DSE isn’t complete without a quick mention of all the work that’s gone into OpsCenter 6.0. Of course, it supports new DSE features that we’ve talked about like DSE Graph and Tiered Storage. But at the top of my list of impressive new OpsCenter features is our Lifecycle Manager addition that takes the creation, provisioning, and administration of database clusters to a new level.

For a more complete description of OpsCenter’s Lifecycle Manager functionality and other additions like integration with Graphite, SNMP-enabled monitoring systems, improved backup support, etc., look for an upcoming blog post.  

Improved Data Protection

Because security is so important for cloud applications, we strive to improve what’s in DSE Advanced Security with each release. In DSE 5.0, we’ve added a number of new data security features that both increase the protection of data and make security simpler to implement and maintain.       

First, we’ve added the ability to encrypt database support files that can contain sensitive data – files like the Cassandra and Solr commitlogs and DSE Search indexes.

Next, we’ve introduced Role Based Access Control, more commonly known as RBAC. RBAC was initially released in Cassandra 2.2 and is enhanced in DSE 5.0 to support role assignment in LDAP and Microsoft Active Directory servers.

Lastly, we’ve delivered new unified authentication mechanisms that allow security administrators to configure DSE clusters with multiple authentication schemes to address the diverse demands of different enterprise database consumers. For example, Active Directory and Cassandra Internal Authentication can both be actively used in the same database cluster to serve different user bases.

Still More…

Yes, there’s actually more in DSE 5.0 to talk about, but I’ll do it quickly:

  • The Live Indexing feature of DSE Search has increased capabilities due to enhancements that allow the system to utilize memory outside of the native memory space, which improves efficiency and reduces latency between the inserting of new data and querying of that same data.
  • We’ve upgraded DSE Analytics to use version 1.6 of Spark, which we believe is now ready for production deployments. In addition, we now use Spark to integrate with external Hadoop clusters from Hortonworks and Cloudera vs. legacy Hive functionality that was used in earlier DSE versions.
  • For Spark Streaming applications, DSE 5.0 and DSE Analytics includes a new distributed file system called DSEFS. In its initial release, DSEFS is focused on the needs of streaming operational database applications, integrating seamlessly into Spark as an HDFS-compatible file system and allows a fault-tolerant way to checkpoint streaming applications.  
  • Last but not least, DSE 5.0 includes a production-certified version of Cassandra 3.0, that brings with it all of 3.0’s new features like materialized views, improved storage management, and more.

Wrap Up

I hope you’ll agree with me that DSE 5.0 and OpsCenter 6.0 do a nice job of fulfilling the high-level objectives we set for this release, which were to deliver a robust multi-model platform and simplify your life on DSE. To learn more about DSE 5.0/OpsCenter 6.0/Studio 1.0 and try out our latest release, check out the below resources and be sure to contact us if you have any questions.

[1] We define a cloud application as one that delivers real-time value at epic scale and has these characteristics:

  • Distributed: The app consists of numerous endpoints, is typically multi-datacenter and may use hybrid cloud, and requires linear scale out functionality.
  • Responsive: The app must be instantly responsive, has very low latency response times, and is always-on.
  • Intelligent: The app delivers immediate decisiveness, and supports both multiple data models and mixed workloads.




SHARE THIS PAGE
SUBSCRIBE

Comments

  1. Ian says:

    I am really interested in using this release, but outside of this blog post these no other information on the site.

    When is the official release date of 5.0?
    and when will an upgrade guide or compatibility document be available?

  2. Robin Schumacher, SVP and Chief Product Officer Robin Schumacher says:

    The downloads and updated documentation will be available on June 28th.

  3. Natik Ameen says:

    Any tips/doc on what steps are required to upgrade to latest version?

  4. Robin Schumacher, SVP and Chief Product Officer Robin Schumacher says:

    Yes, the downloads and updated docs for DSE 5.0 (including detailed instructions for upgrades) will be posted around noon EST on June 28th.

  5. Scott Preddy says:

    I no longer see Cassandra context in DSE 5.0 Spark? How do I issue sql in spark now over Cassandra?

  6. Brian Hess says:

    Hi Scott.
    First, you will probably get more timely responses via DataStax Support or Stack Overflow.
    In DSE 5.0, the CassandraSqlContext is no longer created by default. This is because even in DSE 4.8 the preferred method to issue SQL against data in Cassandra is via the HiveContext (in 4.8 this is created as “hc”, in 5.0 this is created as “sqlContext”). So, where you previously used “csc.sql()” you would use “sqlContext.sql()” in 5.0.

Comments

Your email address will not be published. Required fields are marked *