DataStax Enterprise 3.1 Documentation

About DataStax Enterprise

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

DataStax Enterprise is a big data platform built on Apache Cassandra that manages real-time, analytics, and enterprise search data. DataStax Enterprise leverages Cassandra, Apache Hadoop, and Apache Solr to shift your focus from the data infrastructure to using your data strategically.

New features in DataStax Enterprise 3.1

Key new features are:

  • Support for virtual nodes (vnodes) in Cassandra real-time clusters. Not supported in clusters containing Solr and Hadoop clusters.
  • Support for mixed architecture. You can run clusters with virtual node-enabled and non-virtual node data centers.
  • Support for the Murmur3 partitioner.
  • Includes Cassandra 1.2, which supports the released version of CQL 3.
  • Hadoop and Hive support for CQL 3.
  • Many DSE Search/Solr enhancements, including:
    • Solr 4.3 support
    • Per segment filters
    • Improved performance on facets
    • Multivalue field support
    • Support for docValues in the schema field definition
    • Per-segment caching for facets improves performance
    • Configurable TTL for a field or document using the Solr HTTP API
    • Configurable number of columns loaded from Cassandra for dynamic field queries.
  • Support for audit logging of queries and prepared statements submitted to the DataStax Java Driver, which uses the CQL binary protocol.

Key features of DataStax Enterprise

The key features of DataStax Enterprise include:

  • Security - Internal and external authentication, object permission management, transparent data encryption, data auditing, client-to-node encryption, and an enterprise-class administration interface for secure database management.
  • Production Certified Cassandra – DataStax Enterprise contains a fully tested, benchmarked, and certified version of Apache Cassandra that is suitable for mission-critical production deployments.
  • No Single Point of Failure - In the Hadoop Distributed File System (HDFS) master/slave architecture, the NameNode entry point into the cluster stores configuration metadata about the cluster. If the NameNode fails, the Hadoop system goes down. DataStax Enterprise modifies this architecture, making nodes peers. Being peers, any node in the cluster can load data files, and any analytics node can assume the responsibilities of job tracker for MapReduce jobs.
  • Reserve Job Tracker - DataStax Enterprise keeps a job tracker in reserve to take over in the event of a problem that would affect availability.
  • Multiple Job Trackers - In the Cassandra File System (CassandraFS), you can run one or more job tracker services across multiple data centers and create multiple CassandraFS keyspaces per data center. Using this capability has performance, data replication, and other benefits.
  • Hadoop MapReduce using Multiple Cassandra File Systems - CassandraFS is an HDFS-compatible storage layer. DataStax replaces HDFS with CassandraFS to run MapReduce jobs on Cassandra's peer-to-peer, fault-tolerant, and scalable architecture. In DataStax Enterprise 2.1 and later, you can create additional CassandraFS's to organize and optimize Hadoop data.
  • Analytics Without ETL - Using DataStax Enterprise, you run MapReduce jobs directly against your data in Cassandra. You can even perform real-time and analytics workloads at the same time without one workload affecting the performance of the other. Starting some cluster nodes as Hadoop analytics nodes and others as pure Cassandra real-time nodes automatically replicates data between nodes.
  • Streamlined Setup and Operations - In Hadoop, you have to set up different mode configurations: stand-alone mode or pseudo-distributed mode for a single node setup, or cluster mode for a multi-node configuration. In DataStax Enterprise, you configure only one mode (cluster mode) regardless of the number of nodes.
  • Hive Support - Hive, a data warehouse system, facilitates data summarization, ad-hoc queries, and the analysis of large data sets stored in Hoop-compatible file systems. Any JDBC compliant user interface connects to Hive from the server. Using the Cassandra-enabled Hive MapReduce client in DataStax Enterprise, you project a relational structure onto Hadoop data in the Cassandra file systems, and query the data using a SQL-like language. Cassandra nodes share the Hive metastore automatically, eliminating repetitive HIVE configuration steps.
  • Pig Support - The Cassandra-enabled Pig MapReduce client included with DataStax Enterprise is a high-level platform for creating MapReduce programs used with Hadoop. You can analyze large data sets, running jobs in MapReduce mode and Pig programs directly on data stored in Cassandra.
  • Enterprise Search Capabilities - DataStax Enterprise Search fully integrates Apache Solr for ad-hoc querying of data, full-text search, hit highlighting, multiple search attributes, geo-spatial search, and for searching rich documents, such as PDF and Microsoft Word, and more.
  • Migration of RDBMS data - Apache Sqoop in DataStax Enterprise provides easy migration of RDBMS data, such as Oracle, Microsoft SQL Server, MySQL, Sybase, and DB2 RDBMS, and non-relational data sources, such as NoSQL into the DataStax Enterprise server.
  • Runtime Logging - DataStax Enterprise transfers log-based data directly into the server using log4j. Apache log4j is a Java-based logging framework that provides runtime application feedback and control over the size of log statements. Cassandra Appender can store the log4j messages in the Cassandra table-like structure for in-depth analysis using the Hadoop and Solr capabilities.
  • Support for Mahout - The Hadoop component, Apache Mahout, incorporated into DataStax Enterprise 2.1 and later offers machine learning libraries. Machine learning improves a system, such as the one that recreates the Google priority inbox, based on past experience or examples.
  • Full Integration with DataStax OpsCenter - Using DataStax OpsCenter, you can monitor, administer, and configure one or more DataStax Enterprise clusters in an easy-to-use graphical interface. Schedule automatic backups, explore Cassandra data, and see detailed health and status information about clusters, such as the up or down status of nodes, graphs of performance metrics, storage limitations, and progress of Hadoop MapReduce jobs.