The Cassandra File System (CassandraFS) replaces the Hadoop Distributed File System (HDFS). It is designed to simplify the operational overhead of Hadoop by removing the single points of failure in the Hadoop NameNode and to offer easy Hadoop integration for Cassandra users. When an analytics node starts up, DSE creates a default CassandraFS rooted at cfs:/ and an archive file system named cfs-archive.
A CFS superuser is the DSE daemon user, the user who starts DataStax Enterprise. A cassandra superuser, set up using the CQL CREATE USER . . . SUPERUSER command, is also a CFS superuser.
A CFS superuser can modify files in the CassandraFS without any restrictions. Files that a superuser adds to the CassandraFS are password-protected.
DataStax Enterprise 2.1 and later support multiple CassandraFS's. Some typical reasons for using an additional CassandraFS are:
To create an additional CassandraFS:
Open the core-site.xml file for editing. This file is located in:
Add one or more property elements to core-site.xml using this format:
<property>
<name>fs.cfs-<filesystem name>.impl</name>
<value>com.datastax.bdp.hadoop.cfs.CassandraFileSystem</value>
</property>
Save the file and restart Cassandra.
DSE creates the new CassandraFS.
To access the new CassandraFS, construct a URL using the following format:
cfs-<filesystemname>:<path>
For example, assuming the new file system name is NewCassandraFS:
hadoop fs -copyFromLocal /tmp/giant_log.gz cfs-NewCassandraFS://cassandrahost/tmp hadoop fs distcp hdfs:/// cfs-NewCassandraFS:///