Apache Cassandra 0.8 Documentation

Node and Cluster Configuration (cassandra.yaml)

The cassandra.yaml file is the main configuration file for Cassandra. This file is located in /etc/cassandra/conf/cassandra.yaml in packaged installations or $CASSANDRA_HOME/conf/cassandra.yaml in binary installations. After changing properties in this file, you must restart the node for the changes to take effect.

Option Default Value
authenticator org.apache.cassandra.auth.AllowAllAuthenticator
authority org.apache.cassandra.auth.AllowAllAuthority
auto_bootstrap false
cluster_name Test Cluster
column_index_size_in_kb 64
commitlog_directory /var/lib/cassandra/commitlog
commitlog_rotation_threshold_in_mb 128
commitlog_sync periodic
commitlog_sync_period_in_ms 10000 (ten seconds)
compaction_preheat_key_cache true
compaction_thread_priority 1 (Lowest)
compaction_throughput_mb_per_sec 16
concurrent_compactors One per CPU core
concurrent_reads 32
concurrent_writes 32
data_file_directories /var/lib/cassandra/data
dynamic_snitch true
dynamic_snitch_badness_threshold 0.0
dynamic_snitch_reset_interval_in_ms 600000
dynamic_snitch_update_interval_in_ms 100
endpoint_snitch org.apache.cassandra.locator.SimpleSnitch
flush_largest_memtables_at 0.75
hinted_handoff_enabled true
hinted_handoff_throttle_delay_in_ms 50
in_memory_compaction_limit_in_mb 64
incremental_backups false
index_interval 128
initial_token n/a
internode_encryption none
keystore conf/.keystore
keystore_password cassandra
listen_address localhost
max_hint_window_in_ms 3600000 (one hour)
memtable_flush_queue_size 4
memtable_flush_writers One per data directory
memtable_total_space_in_mb 1/3 of the heap
partitioner org.apache.cassandra.dht.RandomPartitioner
phi_convict_threshold 8
reduce_cache_capacity_to 0.6
reduce_cache_sizes_at 0.85
request_scheduler org.apache.cassandra.scheduler.NoScheduler
request_scheduler_id keyspace
rpc_address localhost
rpc_keepalive true
rpc_max_threads Unlimited
rpc_min_threads 16
rpc_port 9160
rpc_recv_buff_size_in_bytes n/a
rpc_send_buff_size_in_bytes n/a
rpc_timeout_in_ms 10000
saved_caches_directory /var/lib/cassandra/saved_caches
seeds 127.0.0.1
seed_provider org.apache.cassandra.locator.SimpleSeedProvider
sliced_buffer_size_in_kb 64
snapshot_before_compaction false
storage_port 700
thrift_framed_transport_size_in_mb 15
thrift_max_message_length_in_mb 16
truststore conf/.truststore
truststore_password cassandra

Node and Cluster Initialization Properties

The following properties are used to initialize a new cluster or when introducing a new node to an established cluster, and should be evaluated and changed as needed before starting a node for the first time. These properties control how a node is configured within a cluster in regards to inter-node communication, data partitioning, and replica placement.

auto_bootstrap

When set to true, populates a new node with a range of data when it joins an established cluster based on the setting of initial_token. If initial_token is not set, the newly added node will insert itself into the ring by splitting the token range of the most heavily loaded node. Leave set to false when initializing a brand new cluster.

cluster_name

The name of the cluster. All nodes participating in a cluster must have the same value.

commitlog_directory

The directory where the commit log will be stored. For optimal write performance, DataStax recommends the commit log be on a separate disk partition (ideally a separate physical device) from the data file directories.

data_file_directories

The directory location where column family data (SSTables) will be stored.

initial_token

The initial token assigns the node token position in the ring, and assigns a range of data to the node when it first starts up. The initial token can be left unset when introducing a new node to an established cluster using auto_bootstrap. Otherwise, the token value depends on the partitioner you are using. With the random partitioner, this value will be a number between 0 and 2**127. With the byte order preserving partitioner, this value will be a byte array of hex values based on your actual row key values. With the order preserving and collated order preserving partitioners, this value will be a UTF-8 string based on your actual row key values. See Calculating Tokens for more information.

listen_address

The IP address or hostname that other Cassandra nodes will use to connect to this node. If left blank, you must have hostname resolution correctly configured on all nodes in your cluster so that the hostname resolves to the correct IP address for this node (using /etc/hostname, /etc/hosts or DNS).

partitioner

Sets the partitioning method used when assigning a row key to a particular node (also see initial_token). Allowed values are:
  • org.apache.cassandra.dht.RandomPartitioner (default)
  • org.apache.cassandra.dht.ByteOrderedPartitioner
  • org.apache.cassandra.dht.OrderPreservingPartitioner (deprecated)
  • org.apache.cassandra.dht.CollatingOrderPreservingPartitioner (deprecated)

rpc_address

The listen address for remote procedure calls (client connections). To listen on all configured interfaces, set to 0.0.0.0. If left blank, you must have hostname resolution correctly configured on all nodes in your cluster so that the hostname resolves to the correct IP address for this node (using /etc/hostname, /etc/hosts or DNS). Default Value: localhost Allowed Values: An IP address, hostname, or leave unset to resolve the address using the hostname configuration of the node.

rpc_port

The port for remote procedure calls (client connections) and the Thrift service. Default is 9160.

saved_caches_directory

The directory location where column family key and row caches will be stored.

seed_provider

The seed provider is a pluggable interface for providing a list of seed nodes. The default seed provider requires a comma-delimited list of seeds.

seeds

When a node joins a cluster, it contacts the seed node(s) to determine the ring topology and obtain gossip information about the other nodes in the cluster. Every node in the cluster should have the same list of seeds, specified as a comma-delimited list of IP addresses. In multi data center clusters, the seed list should include at least one node from each data center (replication group).

storage_port

The port for inter-node communication. Default port is 7000.

endpoint_snitch

Sets the snitch to use for locating nodes and routing requests. In deployments with rack-aware replication placement strategies, use either RackInferringSnitch, PropertyFileSnitch, or EC2Snitch (if on Amazon EC2). Has a dependency on the replica placement_strategy, which is defined on a keyspace. The PropertyFileSnitch also requires a cassandra-topology.properties configuration file. Snitches included with Cassandra are:
  • org.apache.cassandra.locator.SimpleSnitch
  • org.apache.cassandra.locator.RackInferringSnitch
  • org.apache.cassandra.locator.PropertyFileSnitch
  • org.apache.cassandra.locator.EC2Snitch

Performance Tuning Properties

The following properties are used to tune performance and system resource utilization (memory, disk I/O, CPU, etc.) for reads and writes.

column_index_size_in_kb

Column indexes are added to a row after the data reaches this size. This usually happens if there are a large number of columns in a row or the column values themselves are large. If you consistently read only a few columns from each row, this should be kept small as it denotes how much of the row data must be deserialized to read the column.

commitlog_rotation_threshold_in_mb

The size in MB to which the commit log will grow before creating a new commit log segment.

commitlog_sync

The method that Cassandra will use to acknowledge writes. The default mode of periodic is used in conjunction with commitlog_sync_period_in_ms to control how often the commit log is synchronized to disk. Periodic syncs are acknowledged immediately. In batch mode, writes are not acknowledged until fsynced to disk. It will wait the configured number of milliseconds for other writes before performing a sync. Allowed Values are periodic (default) or batch.

commitlog_sync_period_in_ms

Determines how often (in milliseconds) to send the commit log to disk when commitlog_sync is set to periodic mode.

compaction_preheat_key_cache

When set to true, cached row keys are tracked during compaction, and re-cached to their new positions in the compacted SSTable. If you have extremely large key caches for your column families, set to false (see the keys_cached attribute set on a column family).

compaction_thread_priority

Sets the priority for compaction threads. The thread priority determines execution preference by the JVM in relation to other Java processes. The default of 1 is the lowest priority.

compaction_throughput_mb_per_sec

Throttles compaction to the given total throughput across the entire system. The faster you insert data, the faster you need to compact in order to keep the SSTable count down. The recommended Value is 16-32 times the rate of write throughput (in MBs/second). Setting to 0 disables compaction throttling.

concurrent_compactors

Sets the number of concurrent compaction processes allowed to run simultaneously on a node. Defaults to one compaction process per CPU core.

concurrent_reads

For workloads with more data than can fit in memory, the bottleneck will be reads that need to fetch data from disk. Setting to (16 * number_of_drives) allows operations to queue low enough in the stack so that the OS and drives can reorder them.

concurrent_writes

Writes in Cassandra are almost never I/O bound, so the ideal number of concurrent writes depends on the number of CPU cores in your system. The recommended value is (8 * number_of_cpu_cores).

flush_largest_memtables_at

When Java heap usage after a full concurrent mark sweep (CMS) garbage collection is higher than this percentage, the largest memtables will be flushed to disk in order to free memory. This parameter serves as more of an emergency measure for preventing sudden out-of-memory (OOM) errors rather than a strategic tuning mechanism. It is most effective under light to moderate load, or read-heavy workloads. The default value of .75 means flush memtables when Java heap usage is above 75 percent total heap size. 1.0 disables this feature.

in_memory_compaction_limit_in_mb

Size limit for rows being compacted in memory. Larger rows spill to disk and use a slower two-pass compaction process. When this occurs, a message is logged specifying the row key. The recommended value is 5 to 10 percent of the available Java heap size.

index_interval

Each SSTable has an index file containing row keys and the position at which that row starts in the data file. At startup, Cassandra reads a sample of that index into memory. By default 1 row key out of every 128 is sampled. To find a row, Cassandra performs a binary search on the sample, then does just one disk read of the index block corresponding to the closest sampled entry. The larger the sampling, the more effective the index is (at the cost of memory usage). A smaller value for this property results in a larger, more effective index. Generally, a value between 128 and 512 in combination with a large column family key cache offers the best trade off between memory usage and performance. You may want to increase the sample size if you have small rows, thus decreasing the index size and memory usage. For large rows, decreasing the sample size may improve read performance.

memtable_flush_queue_size

The number of full memtables to allow pending flush, that is, waiting for a writer thread. At a minimum, this should be set to the maximum number of secondary indexes created on a single column family.

memtable_flush_writers

Sets the number of memtable flush writer threads. These will be blocked by disk I/O, and each one will hold a memtable in memory while blocked. If you have a large Java heap size and many data directories (see data_file_directories), you can increase this value for better flush performance. By default this is set to the number of data directories defined (which is 1).

memtable_total_space_in_mb

Specifies total memory used for memtables. During normal operation this complements the related column family limits on operations, throughput and SSTables. If this value is set to 0, only the column family specific limits are enforced. See also memtable_flush_after_mins, memtable_throughput_in_mb, memtable_operations_in_millions (which are set per column family).

reduce_cache_capacity_to

Sets the size percentage to which maximum cache capacity is reduced when Java heap usage reaches the threshold defined by reduce_cache_sizes_at. Together with flush_largest_memtables_at, these properties are an emergency measure for preventing sudden out-of-memory (OOM) errors.

reduce_cache_sizes_at

When Java heap usage after a full concurrent mark sweep (CMS) garbage collection is higher than this percentage, Cassandra will reduce the cache capacity to the fraction of the current size as specified by reduce_cache_capacity_to. The default is 85 percent (0.85). 1.0 disables this feature.

sliced_buffer_size_in_kb

The buffer size (in kilobytes) to use for reading contiguous columns. This should match the size of the columns typically retrieved using query operations involving a slice predicate.

Remote Procedure Call Tuning Properties

The following properties are used to configure and tune remote procedure calls (client connections).

request_scheduler

Defines a scheduler to handle incoming client requests according to a defined policy. This scheduler only applies to client requests, not inter-node communication. Useful for throttling client requests in implementations that have multiple keyspaces. Allowed Values are:
  • org.apache.cassandra.scheduler.NoScheduler (default)
  • org.apache.cassandra.scheduler.RoundRobinScheduler
  • A Java class that implements the RequestScheduler interface If using the RoundRobinScheduler, there are additional request_scheduler_options properties.

request_scheduler_id

An identifier on which to perform request scheduling. Currently the only valid option is keyspace.

request_scheduler_options

Contains a list of additional properties that define configuration options for request_scheduler. NoScheduler does not have any options. RoundRobinScheduler has the following additional configuration properties: throttle_limit, default_weight, weights.

throttle_limit

The number of active requests per client. Requests beyond this limit are queued up until running requests complete. The default is 80. Recommended value is ((concurrent_reads + concurrent_writes) * 2).

default_weight

The default weight controls how many requests are handled during each turn of the RoundRobin. The default is 1.

weights

Allows control of weight per keyspace during each turn of the RoundRobin. If not set, each keyspace uses the default_weight. Takes a list of list of keyspaces: weights.

rpc_keepalive

Enable or disable keepalive on client connections.

rpc_max_threads

Cassandra uses one thread-per-client for remote procedure calls. For a large number of client connections, this can cause excessive memory usage for the thread stack. Connection pooling on the client side is highly recommended. Setting a maximum thread pool size acts as a safeguard against misbehaved clients. If the maximum is reached, Cassandra will block additional connections until a client disconnects.

rpc_min_threads

Sets the minimum thread pool size for remote procedure calls.

rpc_recv_buff_size_in_bytes

Sets the receiving socket buffer size for remote procedure calls.

rpc_send_buff_size_in_bytes

Sets the sending socket buffer size in bytes for remote procedure calls.

rpc_timeout_in_ms

The time in milliseconds that a node will wait on a reply from other nodes before the command is failed.

thrift_framed_transport_size_in_mb

Specifies the frame size in megabytes (maximum field length) for Thrift. 0 disables framing. This option is deprecated in favor of thrift_max_message_length_in_mb.

thrift_max_message_length_in_mb

The maximum length of a Thrift message in megabytes, including all fields and internal Thrift overhead.

Internode Communication and Fault Detection Properties

dynamic_snitch

When set to true (default), enables the dynamic snitch layer that monitors read latency and, when possible, routes requests away from poorly-performing nodes.

dynamic_snitch_badness_threshold

Sets a performance threshold for dynamically routing requests away from a poorly performing node. A value of 0.2 means Cassandra would continue to prefer the static snitch values until the node response time was 20 percent worse than the best performing node.

Until the threshold is reached, incoming client requests are statically routed to the closest replica (as determined by the configured snitch). Having requests consistently routed to a given replica can help keep a working set of data hot when read repair is less than 100% or disabled.

dynamic_snitch_reset_interval_in_ms

Time interval in milliseconds to reset all node scores (allowing a bad node to recover).

dynamic_snitch_update_interval_in_ms

The time interval in milliseconds for calculating read latency.

hinted_handoff_enabled

Enables or disables hinted handoff.

hinted_handoff_throttle_delay_in_ms

When a node detects that a node for which it is holding hints has recovered, it begins sending the hints to that node. This specifies a sleep interval (in milliseconds) after delivering each row or row fragment in an effort to throttle traffic to the recovered node.

max_hint_window_in_ms

Defines how long in milliseconds to generate and save hints for an unresponsive node. After this interval, hints are dropped. This can prevent a sudden demand for resources when a node is brought back online and the rest of the cluster attempts to replay a large volume of hinted writes. The default is one hour (3600000 ms).

phi_convict_threshold

The Phi convict threshold adjusts the sensitivity of the failure detector on an exponential scale . Lower values increase the likelihood that an unresponsive node will be marked as down, while higher values decrease the likelihood that transient failures will cause a node failure. In unstable network environments (such as EC2 at times), raising the value to 10 or 12 will prevent false failures. Values higher than 12 and lower than 5 are not recommended. The default is 8.

Automatic Backup Properties

incremental_backups

Backs up data updated since the last snapshot was taken. When enabled, each time an SSTable is flushed, a hard link is copied into a /backups subdirectory of the keyspace data directory.

snapshot_before_compaction

Defines whether or not to take a snapshot before each compaction. Be careful using this option, since Cassandra does not clean up older snapshots automatically. This can be useful to back up data when there is a data format change.

Security Properties

authenticator

The default value disables authentication. Basic authentication is provided using the SimpleAuthenticator, which uses the access.properties and password.properties configuration files to configure authentication privileges. Allowed values are: * org.apache.cassandra.auth.AllowAllAuthenticator * org.apache.cassandra.auth.SimpleAuthenticator * A Java class that implements the IAuthenticator interface

authority

The default value disables user access control (all users can access all resources). To control read/write permissions to keyspaces and column families, use the SimpleAuthority, which uses the access.properties configuration file to define per-user access. Allowed values are: * org.apache.cassandra.auth.AllowAllAuthority * org.apache.cassandra.auth.SimpleAuthority * A Java class that implements the IAuthority interface

internode_encryption

Enables or disables encryption of inter-node communication using TLS_RSA_WITH_AES_128_CBC_SHA as the cipher suite for authentication, key exchange and encryption of the actual data transfers. To encrypt all inter-node communications, set to all. You must also generate keys and provide the appropriate key and trust store locations and passwords.

keystore

Description: The location of a Java keystore (JKS) suitable for use with Java Secure Socket Extension (JSSE), the Java version of the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols. The keystore contains the private key used to encrypt outgoing messages.

keystore_password

Password for the keystore.

truststore

The location of a truststore containing the trusted certificate used to authenticate remote servers.

truststore_password

Password for the truststore.