Apache Cassandra 1.1 Documentation

Node and Cluster Configuration (cassandra.yaml)

This document corresponds to an earlier product version. Make sure you are using the version that corresponds to your version.

Latest Cassandra documentation | Earlier Cassandra documentation

The cassandra.yaml file is the main configuration file for Cassandra. It is located in the following directories:

  • Cassandra packaged installs: /etc/cassandra/conf
  • Cassandra binary installs: <install_location>/conf
  • DataStax Enterprise packaged installs: /etc/dse/cassandra
  • DataStax Enterprise binary installs: <install_location>/resources/cassandra/conf

After changing properties in this file, you must restart the node for the changes to take effect.

Note

** Some default values are set at the class level and may be missing or commented out in the cassandra.yaml file. Additionally, values in commented out options may not match the default value: they are the recommended value when changing from the default.

Option Option
authenticator max_hint_window_in_ms
authority memtable_flush_queue_size
authorizer memtable_flush_writers
auth_replication_options memtable_total_space_in_mb
auth_replication_strategy partitioner
auto_bootstrap permissions_validity_in_ms
auto_snapshot phi_convict_threshold
broadcast_address populate_io_cache_on_flush
cluster_name reduce_cache_capacity_to
column_index_size_in_kb reduce_cache_sizes_at
commitlog_directory request_scheduler
commitlog_segment_size_in_mb request_scheduler_id
commitlog_sync request_scheduler_options
commitlog_total_space_in_mb row_cache_keys_to_save
compaction_preheat_key_cache row_cache_provider
compaction_throughput_mb_per_sec row_cache_size_in_mb
concurrent_reads rpc_address
concurrent_writes rpc_keepalive
data_file_directories rpc_max_threads
dynamic_snitch_badness_threshold rpc_min_threads
dynamic_snitch_reset_interval_in_ms rpc_port
dynamic_snitch_update_interval_in_ms rpc_recv_buff_size_in_bytes
encryption_options rpc_send_buff_size_in_bytes
endpoint_snitch rpc_server_type
flush_largest_memtables_at rpc_timeout_in_ms
hinted_handoff_enabled saved_caches_directory
hinted_handoff_throttle_delay_in_ms seed_provider
in_memory_compaction_limit_in_mb snapshot_before_compaction
incremental_backups ssl_storage_port
index_interval storage_port
initial_token stream_throughput_outbound_megabits_per_sec
key_cache_keys_to_save streaming_socket_timeout_in_ms
key_cache_save_period thrift_framed_transport_size_in_mb
key_cache_size_in_mb thrift_max_message_length_in_mb
listen_address trickle_fsync

Node and Cluster Initialization Properties

The following properties are used to initialize a new cluster or when introducing a new node to an established cluster. They control how a node is configured within a cluster, including inter-node communication, data partitioning, and replica placement. DataStax recommends that you carefully evaluate your requirements and make any changes before starting a node for the first time.

auto_bootstrap

(Default: true) This setting has been removed from default configuration. It makes new (non-seed) nodes automatically migrate the right data to themselves. It is referenced here because auto_bootstrap: true is explicitly added to the cassandra.yaml file in an AMI installation. Setting this property to false is not recommended and is necessary only in rare instances.

broadcast_address

(Default: listen_address**) If your Cassandra cluster is deployed across multiple Amazon EC2 regions and you use the EC2MultiRegionSnitch, set the broadcast_address to public IP address of the node and the listen_address to the private IP.

cluster_name

(Default: Test Cluster) The name of the cluster; used to prevent machines in one logical cluster from joining another. All nodes participating in a cluster must have the same value.

commitlog_directory

(Default: /var/lib/cassandra/commitlog) The directory where the commit log is stored. For optimal write performance, DataStax recommends the commit log be on a separate disk partition (ideally, a separate physical device) from the data file directories.

data_file_directories

(Default: /var/lib/cassandra/data) The directory location where column family data (SSTables) are stored.

initial_token

(Default: N/A) The initial token assigns the node token position in the ring, and assigns a range of data to the node when it first starts up. If the initial token is left unset when introducing a new node to an established cluster, Cassandra requests a token that bisects the range of the heaviest-loaded existing node. If no load information is available (for example in a new cluster), Cassandra picks a random token, which may lead to hot spots. For information about calculating tokens that position nodes in the ring, see Generating Tokens.

listen_address

(Default: localhost) The IP address or hostname that other Cassandra nodes use to connect to this node. If left unset, the hostname must resolve to the IP address of this node using /etc/hostname, /etc/hosts, or DNS. Do not specify 0.0.0.0.

partitioner

(Default: org.apache.cassandra.dht.RandomPartitioner) Distributes rows (by key) across nodes in the cluster. Any IPartitioner may be used, including your own as long as it is on the classpath. Cassandra provides the following partitioners:

rpc_address

(Default: localhost) The listen address for client connections (Thrift remote procedure calls). Valid values are:

  • 0.0.0.0: Listens on all configured interfaces.
  • IP address
  • hostname
  • unset: Resolves the address using the hostname configuration of the node.

If left unset, the hostname must resolve to the IP address of this node using /etc/hostname, /etc/hosts, or DNS.

Note

In DataStax Enterprise 3.0.x, the default is 0.0.0.0.

rpc_port

(Default: 9160) The port for the Thrift RPC service, which is used for client connections.

saved_caches_directory

(Default: /var/lib/cassandra/saved_caches) The directory location where column family key and row caches are stored.

seed_provider

(Default: org.apache.cassandra.locator.SimpleSeedProvider) A list of comma-delimited hosts (IP addresses) to use as contact points when a node joins a cluster. Cassandra also uses this list to learn the topology of the ring. When running multiple nodes, you must change the - seeds list from the default value (127.0.0.1). In multiple data-center clusters, the - seeds list should include at least one node from each data center (replication group).

storage_port

(Default: 7000) The port for inter-node communication.

endpoint_snitch

(Default: org.apache.cassandra.locator.SimpleSnitch) Sets which snitch Cassandra uses for locating nodes and routing requests. For descriptions of the snitches, see About Snitches. In DataStax Enterprise, the default snitch is com.datastax.bdp.snitch.DseDelegateSnitch.

Global Row and Key Caches Properties

When creating or modifying column families, you enable or disable the key or row caches at the column family level by setting the caching parameter. Other row and key cache tuning and configuration options are set at the global (node) level. Cassandra uses these settings to automatically distribute memory for each column family based on the overall workload and specific column family usage. You can also configure the save periods for these caches globally. For more information, see Tuning Data Caches.

key_cache_keys_to_save

(Default: disabled - all keys are saved**) Number of keys from the key cache to save.

key_cache_save_period

(Default: 14400- 4 hours) Duration in seconds that keys are saved in cache. Caches are saved to saved_caches_directory. Saved caches greatly improve cold-start speeds and has relatively little effect on I/O.

key_cache_size_in_mb

(Default: empty, which automatically sets it to the smaller of 5% of the available heap, or 100MB) A global cache setting for column families. It is the maximum size of the key cache in memory. To disable set to 0.

row_cache_keys_to_save

(Default: disabled - all keys are saved**) Number of keys from the row cache to save.

row_cache_size_in_mb

(Default: 0 - disabled) A global cache setting for column families. Holds the entire row in memory so reads can be satisfied without using disk

row_cache_save_period

(Default: 0 - disabled) Duration in seconds that rows are saved in cache. Caches are saved to saved_caches_directory.

row_cache_provider

(Default: SerializingCacheProvider) Specifies what kind of implementation to use for the row cache.

  • SerializingCacheProvider: Serializes the contents of the row and stores it in native memory, that is, off the JVM Heap. Serialized rows take significantly less memory than live rows in the JVM, so you can cache more rows in a given memory footprint. Storing the cache off-heap means you can use smaller heap sizes, which reduces the impact of garbage collection pauses. It is valid to specify the fully-qualified class name to a class that implements org.apache.cassandra.cache.IRowCacheProvider.
  • ConcurrentLinkedHashCacheProvider: Rows are cached using the JVM heap, providing the same row cache behavior as Cassandra versions prior to 0.8.

The SerializingCacheProvider is 5 to 10 times more memory-efficient than ConcurrentLinkedHashCacheProvider for applications that are not blob-intensive. However, SerializingCacheProvider may perform worse in update-heavy workload situations because it invalidates cached rows on update instead of updating them in place as ConcurrentLinkedHashCacheProvider does.

Performance Tuning Properties

The following properties are used to tune performance and system resource utilization, such as memory, disk I/O, and CPU, for reads and writes.

column_index_size_in_kb

(Default: 64) Add column indexes to a row when the data reaches this size. This value defines how much row data must be deserialized to read the column. Increase this setting if your column values are large or if you have a very large number of columns. If consistently reading only a few columns from each row or doing many partial-row reads, keep it small. All index data is read for each access, so take that into consideration when setting the index size.

commitlog_segment_size_in_mb

(Default: 32 for 32-bit JVMs, 1024 for 64-bit JVMs) Sets the size of the individual commitlog file segments. A commitlog segment may be archived, deleted, or recycled after all its data has been flushed to SSTables. This amount of data can potentially include commitlog segments from every column family in the system. The default size is usually suitable for most commitlog archiving, but if you want a finer granularity, 8 or 16 MB is reasonable. See Commit Log Archive Configuration.

commitlog_sync

(Default: periodic) The method that Cassandra uses to acknowledge writes in milliseconds:

  • periodic: Used with commitlog_sync_period_in_ms (default: 10000 - 10 seconds) to control how often the commit log is synchronized to disk. Periodic syncs are acknowledged immediately.
  • batch: Used with commitlog_sync_batch_window_in_ms (default: disabled**) to control how long Cassandra waits for other writes before performing a sync. When using this method, writes are not acknowledged until fsynced to disk.

commitlog_total_space_in_mb

(Default: 32 for 32-bit JVMs, 1024 for 64-bit JVMs**) Total space used for commitlogs. If the used space goes above this value, Cassandra rounds up to the next nearest segment multiple and flushes memtables to disk for the oldest commitlog segments, removing those log segments. This reduces the amount of data to replay on startup, and prevents infrequently-updated tables from indefinitely keeping commitlog segments. A small total commitlog space tends to cause more flush activity on less-active column families.

compaction_preheat_key_cache

(Default: true) When set to true, cached row keys are tracked during compaction, and re-cached to their new positions in the compacted SSTable. If you have extremely large key caches for your column families, set to false; see Global Row and Key Caches Properties.

compaction_throughput_mb_per_sec

(Default: 16) Throttles compaction to the given total throughput across the entire system. The faster you insert data, the faster you need to compact in order to keep the SSTable count down. The recommended Value is 16 to 32 times the rate of write throughput (in MBs/second). Setting to 0 disables compaction throttling.

concurrent_compactors

(Default: 1 per CPU core**) Sets the number of concurrent compaction processes allowed to run simultaneously on a node, not including validation compactions for anti-entropy repair. Simultaneous compactions help preserve read performance in a mixed read-write workload by mitigating the tendency of small SSTables to accumulate during a single long-running compaction. If compactions run too slowly or too fast, change compaction_throughput_mb_per_sec first.

concurrent_reads

(Default: 32) For workloads with more data than can fit in memory, the bottleneck is reads fetching data from disk. Setting to (16 * number_of_drives) allows operations to queue low enough in the stack so that the OS and drives can reorder them.

concurrent_writes

(Default: 32) Writes in Cassandra are rarely I/O bound, so the ideal number of concurrent writes depends on the number of CPU cores in your system. The recommended value is (8 * number_of_cpu_cores).

flush_largest_memtables_at

(Default: 0.75) When Java heap usage (after a full concurrent mark sweep (CMS) garbage collection) exceeds the set value, Cassandra flushes the largest memtables to disk to free memory. This parameter is an emergency measure to prevent sudden out-of-memory (OOM) errors. Do not use it as a tuning mechanism. It is most effective under light to moderate loads or read-heavy workloads; it will fail under massive write loads. A value of 0.75 flushes memtables when Java heap usage is above 75% total heap size. Set to 1.0 to disable. Other emergency measures are reduce_cache_capacity_to and reduce_cache_sizes_at.

in_memory_compaction_limit_in_mb

(Default: 64) Size limit for rows being compacted in memory. Larger rows spill to disk and use a slower two-pass compaction process. When this occurs, a message is logged specifying the row key. The recommended value is 5 to 10 percent of the available Java heap size.

index_interval

(Default: 128) Controls the sampling of entries from the primary row index. The interval corresponds to the number of index entries that are skipped between taking each sample. By default Cassandra samples one row key out of every 128. The larger the interval, the smaller and less effective the sampling. The larger the sampling, the more effective the index, but with increased memory usage. Generally, the best trade off between memory usage and performance is a value between 128 and 512 in combination with a large table key cache. However, if you have small rows (many to an OS page), you may want to increase the sample size, which often lowers memory usage without an impact on performance. For large rows, decreasing the sample size may improve read performance.

memtable_flush_queue_size

(Default: 4) The number of full memtables to allow pending flush, that is, waiting for a writer thread. At a minimum, set to the maximum number of secondary indexes created on a single column family.

memtable_flush_writers

(Default: 1 per data directory**) Sets the number of memtable flush writer threads. These threads are blocked by disk I/O, and each one holds a memtable in memory while blocked. If you have a large Java heap size and many data directories, you can increase the value for better flush performance.

memtable_total_space_in_mb

(Default: 1/3 of the heap**) Specifies the total memory used for all column family memtables on a node. This replaces the per-column family storage settings memtable_operations_in_millions and memtable_throughput_in_mb.

populate_io_cache_on_flush

(Default: false**) Populates the page cache on memtable flush and compaction. Enable this setting only when the whole node's data fits in memory.

reduce_cache_capacity_to

(Default: 0.6) Sets the size percentage to which maximum cache capacity is reduced when Java heap usage reaches the threshold defined by reduce_cache_sizes_at. Together with flush_largest_memtables_at, these properties are an emergency measure for preventing sudden out-of-memory (OOM) errors.

reduce_cache_sizes_at

(Default: 0.85) When Java heap usage (after a full concurrent mark sweep (CMS) garbage collection) exceeds this percentage, Cassandra reduces the cache capacity to the fraction of the current size as specified by reduce_cache_capacity_to. To disable set to 1.0.

stream_throughput_outbound_megabits_per_sec

(Default: 400**) Throttles all outbound streaming file transfers on a node to the specified throughput. Cassandra does mostly sequential I/O when streaming data during bootstrap or repair, which can lead to saturating the network connection and degrading client (RPC) performance.

trickle_fsync

(Default: false) When doing sequential writing, enabling this option tells fsync to force the operating system to flush the dirty buffers at a set interval (trickle_fsync_interval_in_kb [default: 10240]). Enable this parameter to avoid sudden dirty buffer flushing from impacting read latencies. Recommended to use on SSDs, but not on HDDs.

Remote Procedure Call (RPC) Tuning Properties

The following properties are used to configure and tune remote procedure calls (client connections).

request_scheduler

(Default: org.apache.cassandra.scheduler.NoScheduler) Defines a scheduler to handle incoming client requests according to a defined policy. This scheduler is useful for throttling client requests in single clusters containing multiple keyspaces. Valid values are:

  • org.apache.cassandra.scheduler.NoScheduler: No scheduling takes place and does not have any options.
  • org.apache.cassandra.scheduler.RoundRobinScheduler: See request_scheduler_options properties.
  • A Java class that implements the RequestScheduler interface.

request_scheduler_id

(Default: keyspace`) An identifier on which to perform request scheduling. Currently the only valid value is keyspace.

request_scheduler_options

(Default: disabled**) Contains a list of properties that define configuration options for request_scheduler:

  • throttle_limit: (Default: 80) The number of active requests per client. Requests beyond this limit are queued up until running requests complete. Recommended value is ((concurrent_reads + concurrent_writes) * 2).
  • default_weight: (Default: 1**) How many requests are handled during each turn of the RoundRobin.
  • weights: (Default: 1 or default_weight) How many requests are handled during each turn of the RoundRobin, based on the request_scheduler_id. Takes a list of keyspaces: weights.

rpc_keepalive

(Default: true) Enable or disable keepalive on client connections.

rpc_max_threads

(Default: unlimited**) Cassandra uses one thread-per-client for remote procedure calls. For a large number of client connections, this can cause excessive memory usage for the thread stack. Connection pooling on the client side is highly recommended. Setting a maximum thread pool size acts as a safeguard against misbehaved clients. If the maximum is reached, Cassandra will block additional connections until a client disconnects.

rpc_min_threads

(Default: 16**) Sets the minimum thread pool size for remote procedure calls.

rpc_recv_buff_size_in_bytes

(Default: N/A**) Sets the receiving socket buffer size for remote procedure calls.

rpc_send_buff_size_in_bytes

(Default: N/A**) Sets the sending socket buffer size for remote procedure calls.

rpc_timeout_in_ms

(Default: 10000) The time in milliseconds that a node will wait on a reply from other nodes before the command is failed.

streaming_socket_timeout_in_ms

(Default: 0 - never timeout streams**) Enable or disable socket timeout for streaming operations. When a timeout occurs during streaming, streaming is retried from the start of the current file. Avoid setting this value too low, as it can result in a significant amount of data re-streaming.

rpc_server_type

(Default: sync) Cassandra provides three options for the RPC server. On Windows, sync is about 30% slower than hsha. On Linux, sync and hsha performance is about the same, but hsha uses less memory.

  • sync: (Default) One connection per thread in the RPC pool. For a very large number of clients, memory is the limiting factor. On a 64 bit JVM, 128KB is the minimum stack size per thread. Connection pooling is strongly recommended.
  • hsha: Half synchronous, half asynchronous. The RPC thread pool is used to manage requests, but the threads are multiplexed across the different clients.
  • async: (Deprecated) Nonblocking server implementation with one thread to serve RPC connections. It is not recommended for high throughput use cases. It is about 50% slower than sync or hsha. It will be removed in the 1.2 release.

thrift_framed_transport_size_in_mb

(Default: 15) Frame size (maximum field length) for Thrift.

thrift_max_message_length_in_mb

(Default: 16) The maximum length of a Thrift message in megabytes, including all fields and internal Thrift overhead.

Inter-node Communication and Fault Detection Properties

dynamic_snitch_badness_threshold

(Default: 0.0) Sets the performance threshold for dynamically routing requests away from a poorly performing node. A value of 0.2 means Cassandra continues to prefer the static snitch values until the node response time is 20% worse than the best performing node. Until the threshold is reached, incoming client requests are statically routed to the closest replica (as determined by the snitch). Having requests consistently routed to a given replica can help keep a working set of data hot when read repair is less than 1.

dynamic_snitch_reset_interval_in_ms

(Default: 600000) Time interval in milliseconds to reset all node scores, which allows a bad node to recover.

dynamic_snitch_update_interval_in_ms

(Default: 100) The time interval in milliseconds for calculating read latency.

hinted_handoff_enabled

(Default: true) Enables or disables hinted handoff. A hint indicates that the write needs to be replayed to an unavailable node. Where Cassandra writes the hint depends on the version:

  • Prior to 1.0: Writes to a live replica node.
  • 1.0 and later: Writes to the coordinator node.

max_hint_window_in_ms

(Default: 3600000 - 1 hour) Defines how long in milliseconds to generate and save hints for an unresponsive node. After this interval, new hints are no longer generated until the node is back up and responsive. If the node goes down again, a new interval begins. This setting can prevent a sudden demand for resources when a node is brought back online and the rest of the cluster attempts to replay a large volume of hinted writes.

hinted_handoff_throttle_delay_in_ms

(Default: 1) When a node detects that a node for which it is holding hints has recovered, it begins sending the hints to that node. This setting specifies the sleep interval in milliseconds after delivering each hint.

phi_convict_threshold

(Default: 8**) Adjusts the sensitivity of the failure detector on an exponential scale. Lower values increase the likelihood that an unresponsive node will be marked as down, while higher values decrease the likelihood that transient failures will cause a node failure. In unstable network environments (such as EC2 at times), raising the value to 10 or 12 helps prevent false failures. Values higher than 12 and lower than 5 are not recommended.

Automatic Backup Properties

auto_snapshot

(Default: true) Enable or disable whether a snapshot is taken of the data before keyspace truncation or dropping of column families. To prevent data loss, using the default setting is strongly advised. If you set to false, you will lose data on truncation or drop.

incremental_backups

(Default: false) Backs up data updated since the last snapshot was taken. When enabled, Cassandra creates a hard link to each SSTable flushed or streamed locally in a backups/ subdirectory of the keyspace data. Removing these links is the operator's responsibility.

snapshot_before_compaction

(Default: false) Enable or disable taking a snapshot before each compaction. This option is useful to back up data when there is a data format change. Be careful using this option because Cassandra does not clean up older snapshots automatically.

Security Properties

authenticator

(Default: org.apache.cassandra.auth.AllowAllAuthenticator) The authentication backend. It implements IAuthenticator, which is used to identify users.

The following authenticator options are only available in DataStax Enterprise 3.0:

  • com.datastax.bdp.cassandra.auth.PasswordAuthenticator
  • com.datastax.bdp.cassandra.auth.KerberosAuthenticator

authority

For backwards compatibility only.

authorizer

(Default: org.apache.cassandra.auth.AllowAllAuthorizer) The authorization backend. It implements IAuthorizer, which limits access and provides permissions. (Available only in DataStax Enterprise 3.0.x.)

permissions_validity_in_ms

(Default: 2000) How long permissions in cache remain valid. Available only in DataStax Enterprise 3.0. Depending on the authorizer, fetching permissions can be resource intensive. This setting is automatically disabled when AllowAllAuthorizer is set. (Available only in DataStax Enterprise 3.0.x.)

auth_replication_strategy

(Default: org.apache.cassandra.locator.SimpleStrategy) The replication strategy for the auth keyspace. (Available only in DataStax Enterprise 3.0.x; see Configuring dse_auth keyspace replication.)

auth_replication_options

Replication options for the authorization and authentication (dse_auth) keyspace. (Available only in DataStax Enterprise 3.0.x; see Configuring dse_auth keyspace replication.)

replication_factor: (Default: 1) For SimpleStrategy:

auth_replication_options:
   replication_factor: 3

For NetworkTopologyStrategy, use the same options for creating a keyspace. For example:

auth_replication_options:
   DC1: 3
   DC2: 3

encryption_options

Enable or disable inter-node encryption. The available options are:

  • internode_encryption: (Default: none) Enable or disable encryption of inter-node communication using the TLS_RSA_WITH_AES_128_CBC_SHA cipher suite for authentication, key exchange, and encryption of data transfers.
  • keystore: (Default: conf/.keystore) The location of a Java keystore (JKS) suitable for use with Java Secure Socket Extension (JSSE), which is the Java version of the Secure Sockets Layer (SSL), and Transport Layer Security (TLS) protocols. The keystore contains the private key used to encrypt outgoing messages.
  • keystore_password: (Default: cassandra) Password for the keystore.
  • truststore: (Default: conf/.truststore) The location of the truststore containing the trusted certificate for authenticating remote servers.
  • truststore_password: (Default: cassandra) Password for the truststore.

The passwords used in these options must match the passwords used when generating the keystore and truststore. For instructions on generating these files, see: Creating a Keystore to Use with JSSE. The advanced settings are:

  • protocol: (Default: TLS)
  • algorithm: (Default: SunX509)
  • store_type: (Default: JKS)
  • cipher_suites: (Default: TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA)
  • require_client_auth: (Default: false) Enables peer certificate authentication.

ssl_storage_port

(Default: 7001) The SSL port for encrypted communication. Unused if encryption is disabled.