DataStax Enterprise FAQ

What is DataStax Enterprise Edition?

DataStax Enterprise is the commercial product offering from DataStax that is designed for enterprise-class, production usage. DataStax Enterprise is a complete big data platform, built on Cassandra, architected to manage real-time, analytic, and enterprise search data all in the same database cluster.

What benefits do I get by running Hadoop within DataStax Enterprise?

First, you automatically get a continuously available (i.e., no single point of failure) Hadoop system. Unlike traditional Hadoop, which has name nodes and other such things, DataStax Enterprise is a peer-to-peer system and provides automatic and transparent redundancy for all Hadoop operations.

You also get a much easier deployment experience with Hadoop in DataStax Enterprise than if community Hadoop is used.

Another great benefit of DataStax Enterprise is that it completely eliminates the need for complex extract-transform-load (ETL) operations that are normally needed to move data from real-time systems to analytic databases or data warehouses. Instead, data is transparently and automatically replicated among real-time and analytic nodes; no work on the part of a developer or administrator is necessary.

Lastly, having one integrated database for real-time transactional work, analytics, and enterprise search makes for a much more productive environment for operations personnel and easier development experience for developers.

Isn’t it a bad idea to have both real-time and search tasks running in the same database?

Not with DataStax Enterprise. DataStax Enterprise uses smart workload isolation so that real-time and search nodes do not compete for either the underlying data or compute resources. All search tasks execute on nodes marked out for enterprise search and all real-time, online operations take place on nodes designated for real-time data tasks.

How does DataStax Enterprise provide support for enterprise search operations?

DataStax Enterprise uses Apache Solr, the most popular open source search software, to support enterprise search tasks.

What benefits do I get by running Solr within DataStax Enterprise?

First, you automatically get a continuously available (i.e., no single point of failure) Solr/enterprise search system. Unlike community Solr, which requires manual work to create a true high-availability environment, DataStax Enterprise uses its peer-to-peer architecture to provide automatic and transparent redundancy for all Solr components and operations.

Next, you get full data durability for incoming search data. Unlike community Solr, which can lose data if a node goes down before new data is flushed to disk, DataStax Enterprise guarantees that no data is ever lost through the use of Cassandra’s write ahead log.

DataStax Enterprise also provides a scalable design for write operations. Unlike community Solr’s master-slave architecture that experiences write bottlenecks with its single master, DataStax Enterprise allows writing to all Solr nodes – even across multiple data centers – and ensures everything stays in sync.

Other benefits of using Solr in DataStax Enterprise include automatic sharding (vs. manual with community), search indexes being able to span multiple data centers, on-demand search index rebuilds, and more.

One last benefit worth noting is that DataStax Enterprise completely eliminates the need for complex ETL operations that are normally needed to move data from real-time systems to search databases. Instead, data is transparently and automatically replicated among real-time and search nodes; no work on the part of a developer or administrator is necessary.

How does DataStax Enterprise handle both real-time and search data in the same database?

DataStax Enterprise uses Cassandra’s replication to replicate data between nodes designated for real-time data and nodes specified for search operations. Any node may be written to, with changes being propagated across all nodes. All nodes may also be read. Such a configuration eliminates write bottlenecks and read/write hotspots.

Can I access data in Solr/search nodes with CQL?

Yes. DataStax Enterprise extends Cassandra’s CQL to include Solr queries. See the online documentation for more on how to construct Solr CQL queries.

Does DataStax Enterprise offer any type of workload management reprovisioning?

Yes. Real-time (Cassandra) and analytic (Hadoop) nodes can be easily reprovisioned by stopping/starting nodes in a different mode. This allows you to easily adjust the performance and capacity for various workloads. As an example, you may need more real-time processing power during the day and more batch analytic capability at night. You can easily schedule a database cluster to stop some or all real-time nodes and restart them as Hadoop nodes to increase analytic capacity during the evening and then switch the nodes back to real-time for daytime processing.

What type of security does DataStax Enterprise offer?

DataStax Enterprise 3.0 and higher provides the following built-in security features: (1) internally-managed authentication (login ID’s and passwords are managed within Cassandra); (2) external authentication option that supports Kerberos and LDAP; (3) internal authorization / object permission management via GRANT/REVOKE; (4) client to node encryption via SSL; (5) transparent data encryption at the table / column family level; (6) data auditing. None of these security features are enabled by default.

How can I move data from RDBMSs to DataStax Enterprise?

DataStax Enterprise uses Sqoop to move data from any RDBMS with a JDBC driver (e.g., Oracle, MySQL) over to the DataStax Enterprise server. One or more tables are simply mapped to new Cassandra column families and the Sqoop interface takes care of the rest.

You can also use third-party tools such as Pentaho’s Kettle, which has full ETL capabilities and is free to download and use. See this blog post that describes the most commonly used methods.

Can I move application log data to DataStax Enterprise?

Yes. Using log4j, application log data can be moved easily into the DataStax Enterprise server and then indexed and searched via the Solr support that is in the server.