DataStax Enterprise 3.0 Documentation

About DataStax Enterprise

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

DataStax Enterprise is a big data platform built on Apache Cassandra that manages real-time, analytics, and enterprise search data. DataStax Enterprise leverages Cassandra, Apache Hadoop, and Apache Solr to shift your focus from the data infrastructure to using your data strategically.

New Features in DataStax Enterprise 3.0

DataStax Enterprise (DSE) 3.0 features internal and external authentication, object permission management, transparent data encryption, data auditing, client-to- node encryption, and an enterprise-class administration interface for secure database management.

The new version provides core security capabilities to the entire Cassandra community, as well as the advanced data protection that businesses expect in an enterprise-grade database. DSE 3.0 supplies the type of security framework that allows modern enterprises to confidently adopt NoSQL databases as they safely scale their big data infrastructure.

Security for the Community

With the release of DSE 3.0, DataStax is providing security features both to the open source community and to the enterprise. For the Cassandra NoSQL community, DataStax is making the following security enhancements freely available to everyone who uses Apache Cassandra:

Security for the Enterprise

In addition to the general security features being made available to the entire Cassandra NoSQL community, DSE 3.0 supplies the following features to enterprises that require advanced security when handling mission-critical data:

  • External authentication via Kerberos, which is one of the most trusted external security packages in use today, especially by governments and financial institutions. External authentication allows DSE to provide single sign-on capability to all Cassandra, Hadoop, and Solr nodes in a DSE cluster.
  • Transparent data encryption protects data at rest from theft, and from unauthorized read access at the file system level.
  • Data auditing enables administrators to create granular audit trails of some or all activity in a database cluster.

Key Features of DataStax Enterprise

The key features of DataStax Enterprise include:

  • Production Certified Cassandra – DataStax Enterprise contains a fully tested, benchmarked, and certified version of Apache Cassandra that is suitable for mission-critical production deployments.
  • No Single Point of Failure - In the Hadoop Distributed File System (HDFS) master/slave architecture, the NameNode entry point into the cluster stores configuration metadata about the cluster. If the NameNode fails, the Hadoop system goes down. DataStax Enterprise improves upon this architecture by making nodes peers. Being peers, any node in the cluster can load data files, and any analytics node can assume the responsibilities of job tracker for MapReduce jobs.
  • Reserve Job Tracker - DataStax Enterprise keeps a job tracker in reserve to take over in the event of a problem that would affect availability.
  • Multiple Job Trackers - In the Cassandra File System (CassandraFS), you can run one or more job tracker services across multiple data centers and create multiple CassandraFS keyspaces per data center. Using this capability has performance, data replication, and other benefits.
  • Hadoop MapReduce using Multiple Cassandra File Systems - CassandraFS is an HDFS-compatible storage layer. DataStax replaces HDFS with CassandraFS to run MapReduce jobs on Cassandra's peer-to-peer, fault-tolerant, and scalable architecture. In DataStax Enterprise 2.1 and later, you can create additional CassandraFS's to organize and optimize Hadoop data.
  • Analytics Without ETL - Using DataStax Enterprise, you run MapReduce jobs directly against your data in Cassandra. You can even perform real-time and analytics workloads at the same time without one workload affecting the performance of the other. Starting some cluster nodes as Hadoop analytics nodes and others as pure Cassandra real-time nodes automatically replicates data between nodes.
  • Streamlined Setup and Operations - In Hadoop, you have to set up different mode configurations: stand-alone mode or pseudo-distributed mode for a single node setup, or cluster mode for a multi-node configuration. In DataStax Enterprise, you configure only one mode (cluster mode) regardless of the number of nodes.
  • Hive Support - Hive, a data warehouse system, facilitates data summarization, ad-hoc queries, and the analysis of large data sets stored in Hoop-compatible file systems. Any JDBC compliant user interface connects to Hive from the server. Using the Cassandra-enabled Hive MapReduce client in DataStax Enterprise, you project a relational structure onto Hadoop data in the Cassandra file systems, and query the data using a SQL-like language. Cassandra nodes share the Hive metastore automatically, eliminating repetitive HIVE configuration steps.
  • Pig Support - The Cassandra-enabled Pig MapReduce client included with DataStax Enterprise is a high-level platform for creating MapReduce programs used with Hadoop. You can analyze large data sets, running jobs in MapReduce mode and Pig programs directly on data stored in Cassandra.
  • Enterprise Search Capabilities - DataStax Enterprise Search fully integrates Apache Solr for ad-hoc querying of data, full-text search, hit highlighting, multiple search attributes, geo-spatial search, and for searching rich documents, such as PDF and Microsoft Word, and more.
  • Migration of RDBMS data - Apache Sqoop in DataStax Enterprise provides easy migration of RDBMS data, such as Oracle, Microsoft SQL Server, MySQL, Sybase, and DB2 RDBMS, and non-relational data sources, such as NoSQL into the DataStax Enterprise server.
  • Runtime Logging - DataStax Enterprise transfers log-based data directly into the server using log4j. Apache log4j is a Java-based logging framework that provides runtime application feedback and control over the size of log statements. Cassandra Appender can store the log4j messages in the Cassandra table-like structure for in-depth analysis using the Hadoop and Solr capabilities.
  • Support for Mahout - The Hadoop component, Apache Mahout, incorporated into DataStax Enterprise 2.1 and later offers machine learning libraries. Machine learning improves a system, such as the one that recreates the Google priority inbox, based on past experience or examples.
  • Full Integration with DataStax OpsCenter - Using DataStax OpsCenter, you can monitor, administer, and configure one or more DataStax Enterprise clusters in an easy-to-use graphical interface. Schedule automatic backups, explore Cassandra data, and see detailed health and status information about clusters, such as the up or down status of nodes, graphs of performance metrics, storage limitations, and progress of Hadoop MapReduce jobs.

../_images/opsc-4features.png