Transparent data encryption (TDE) protects at rest data. At rest data is data that has been flushed from the memtable in system memory to the SSTables on disk.
As shown in the diagram, data stored in the commit log is not encrypted. If you need commit log encryption, store the commit log on an OS-level encrypted file system using Gazzang, for example. Data can be encrypted using different algorithms, or you can choose not to encrypt data at all. SSTable data files are immutable (they are not written to again after they have been flushed to disk). SSTables are encrypted only once when they are written to disk.
The high-level procedure for encrypting data is:
TDE requires a secure local file system to be effective. The encryption certificates are stored locally; therefore, an invasion of the local file system invalidates encryption.
To get the full capabilities of TDE, download and install the Java Cryptography Extension (JCE), unzip the jar files and place them under $JAVA_HOME/jre/lib/security. JCE-based products are restricted for export to certain countries by the U.S. Export Administration Regulations.
Data is not directly protected by TDE when accessed using the following utilities.
| Utility | Reason Utility Is Not Encrypted |
|---|---|
| json2sstable | Operates directly on the sstables. |
| nodetool | Uses only JMX, so data is not accessed. |
| sstable2json | Operates directly on the sstables. |
| sstablekeys | Operates directly on the sstables. |
| sstableloader | Operates directly on the sstables. |
| sstablescrub | Operates directly on the sstables. |
The local file system could be protected through a third party whole-disk encryption solution. You choose ssl, kerberos authentication, encrypted file system, or other ways to secure nodes.
DataStax recommends that you do not export local file systems if possible. If you must export a local file system, ensure that mounting the file system on the node is a server-side capability.
Compression and encryption introduce performance overhead.
You designate encryption on a per table (column family) basis. When using encryption, each node generates a separate key used for only that node’s sstables.
For example, log in as the default superuser:
./cqlsh -3 -u cassandra -p cassandra
The ALTER TABLE syntax for setting encryption options is the same as the syntax for setting data compression options.
For example, to set compression options in the chores table:
ALTER TABLE chores
WITH compression_parameters:sstable_compression = 'DeflateCompressor'
AND compression_parameters:chunk_length_kb = 64;
To set encryption options in the chores table using CQL 3, for example:
ALTER TABLE chores
WITH compression_parameters:sstable_compression = 'Encryptor'
AND compression_parameters:cipher_algorithm = 'AES/ECB/PKCS5Padding'
AND compression_parameters:secret_key_strength = 128;
AND compression_parameters:chunk_length_kb = 1;
Designating data for encryption using ALTER TABLE doesn't encrypt existing SSTables, just new SSTables that are generated. When setting up data to be encrypted, but not compressed, set the chunk_length_kb option to the lowest possible value, 1, as shown in the previous example. Setting this option to 1 improves read performance by limiting the data that needs to be decrypted for each read operation to 1 KB.
Encryption and compression occur locally, which is more performant than trying to accomplish these tasks on the Cassandra-side. Encryption can be set together with compression using a single statement. The single statement in CQL 3 is:
ALTER TABLE chores
WITH compression_parameters:sstable_compression = 'EncryptingSnappyCompressor'
AND compression_parameters:cipher_algorithm = 'AES/ECB/PKCS5Padding'
AND compression_parameters:secret_key_strength = 128
AND compression_parameters:chunk_length_kb = 128;
Using encryption, your application can read and write to SSTables that use different encryption algorithms or no encryption at all. Using different encryption algorithms to encrypt SSTable data is similar to using different compression algorithms to compress data. This section lists the options and sub-options.
The high-level container option for encryption and/or compression used in the ALTER TABLE statement are:
The cipher_algorithm options and acceptable secret_key_strength for the algorithms are:
| cipher_algorithm | secret_key_strength |
|---|---|
| AES/CBC/PKCS5Padding | 128, 192, or 256 |
| AES/ECB/PKCS5Padding | 128, 192, or 256 |
| DES/CBC/PKCS5Padding | 56 |
| DESede/CBC/PKCS5Padding | 112 or 168 |
| Blowfish/CBC/PKCS5Padding | 32-448 |
| RC2/CBC/PKCS5Padding | 40-128 |
You can install custom providers for your JVM. The AES-512 is not supported out-of the box.
The secret_key_provider_factory_class is:
com.datastax.bdp.cassandra.crypto.LocalFileSystemKeyProviderFactory
The secret_key_file option is the location of the keyfile. The default location is /etc/dse/conf, but it can reside in any directory.
On disk, SSTables are encrypted and compressed by block (to allow random reads). This subproperty of compression defines the size (in KB) of the block and is a power of 2. Values larger than the default value might improve the compression rate, but increases the minimum size of data to be read from disk when a read occurs. The default value (64) is a good middle-ground for compressing tables.
Using just encryption and no compression, the size of SSTables are dramatically different. For example, during an internal test, starting with a 3.2M .db file and in using these options, resulted in a 236K encrypted .db file:
Altering the table to use the EncryptingDeflateCompressor and the same options as before resulted in a file size of 236K, so combining encryption and compression is probably a good idea.
Not all algorithms allow you to set this sub-option, and most complain if it is not set to 16 bytes. Either use 16 or accept the default.
The syntax for setting this sub-option is similar to setting a compression algorithm to compress data.
ALTER TABLE chores
WITH compression_parameters:sstable_compression = 'EncryptingSnappyCompressor'
AND compression_parameters:cipher_algorithm = 'AES/ECB/PKCS5Padding'
AND compression_parameters:secret_key_strength = 128
AND compression_parameters:iv_length = 16;
Use the nodetool upgradesstables utility to rewrite all the SSTables. Use nodetool flush to flush to disk all new data using the current settings for encryption.
After designating the data to be encrypted, a keytab file is created in the directory set by the secret_key_file. If the directory doesn’t exist, it is created. A failure to create the directory probably indicates a permissions problem.
Example values in the keytab file are:
AES/ECB/PKCS5Padding:256:bxegm8vh4wE3S2hO9J36RL2gIdBLx0O46J/QmoC3W3U= AES/CBC/PKCS5Padding:256:FUhaiy7NGB8oeSfe7cOo3hhvojVl2ijI/wbBCFH6hsE= RC2/CBC/PKCS5Padding:128:5Iw8EW3GqE6y/6BgIc3tLw==
Deleting, moving, or changing the data in the keytab file causes errors when the node restarts and you lose all your data. Consider storing the file on a network server or encrypting the entire file system of the nodes using a third-party tool.
The CassandraFS (Cassandra file system) is accessed as part of the Hadoop File System (HDFS) using the configured authentication. If you encrypt the CassandraFS keyspace's sblocks and inode tables, all CassandraFS data gets encrypted.
Follow instructions in the solrj-auth-README.md file to use the SolrJ-Auth libraries to implement encryption. The SolrJ-auth-README.md file is located in the following directory:
Debian installations: /usr/share/doc/dse-libsolr*
RHEL-based installations: /usr/share/doc/dse-libsolr
Binary installations: resources/solr
These SolrJ-Auth libraries are included in the DataStax Enterprise distribution:
Debian installations: /usr/share/dse/clients
Binary installations: <install_location>/clients