In DataStax Enterprise, Hadoop is continuously available for analytics workloads. DataStax Enterprise is 100% compatible with Apache's Hadoop. Instead of using the Hadoop Distributed File System (HDFS), DataStax Enterprise uses Cassandra File System (CassandraFS) keyspaces for the underlying storage layer. This provides all of the benefits of HDFS such as replication and data location awareness, with the added benefits of the Cassandra peer-to-peer architecture.
DataStax Enterprise fully supports:
Assuming an analytics node is running, use the following command to start Hadoop:
dse hadoop fs <args>
where the available <args> are described in the HDFS File System Shell Guide on the Apache Hadoop web site.
For example:
dse hadoop fs -help
For information on starting an analytics node, see Starting and stopping DataStax Enterprise.
For information on starting Hive, Pig, or using Hadoop, see:
After starting Hadoop, run these demos for a good introduction to Hadoop solutions:
The default replication for system keyspaces is 1. This replication factor is suitable for development and testing of a single node, not for a production environment. For production increase the replication factors to at least 2. This ensures resilience to single-node failures. For example:
[default@unknown] UPDATE KEYSPACE cfs
WITH placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy'
AND strategy_options={Analytics:3};
For more information, see Changing replication settings.