Apache Cassandra 1.0 Documentation

Node and Cluster Configuration (cassandra.yaml)

This document corresponds to an earlier product version. Make sure you are using the version that corresponds to your version.

Latest Cassandra documentation | Earlier Cassandra documentation

The cassandra.yaml file is the main configuration file for Cassandra. It is located in the following directories:

  • Cassandra packaged installs: /etc/cassandra/conf
  • Cassandra binary installs: <install_location>/conf
  • DataStax Enterprise packaged installs: /etc/dse/cassandra
  • DataStax Enterprise binary installs: <install_location>/resources/cassandra/conf

After changing properties in this file, you must restart the node for the changes to take effect.

Option Default Value
authenticator org.apache.cassandra.auth.AllowAllAuthenticator
authority org.apache.cassandra.auth.AllowAllAuthority
broadcast_address same as listen_address
cluster_name Test Cluster
column_index_size_in_kb 64
commitlog_directory /var/lib/cassandra/commitlog
commitlog_sync periodic
commitlog_sync_period_in_ms 10000 (ten seconds)
commitlog_total_space_in_mb 4096
compaction_preheat_key_cache true
compaction_throughput_mb_per_sec 16
concurrent_compactors One per CPU core
concurrent_reads 32
concurrent_writes 32
data_file_directories /var/lib/cassandra/data
dynamic_snitch true
dynamic_snitch_badness_threshold 0.0
dynamic_snitch_reset_interval_in_ms 600000
dynamic_snitch_update_interval_in_ms 100
endpoint_snitch org.apache.cassandra.locator.SimpleSnitch
flush_largest_memtables_at 0.75
hinted_handoff_enabled true
hinted_handoff_throttle_delay_in_ms 50
in_memory_compaction_limit_in_mb 64
incremental_backups false
index_interval 128
initial_token n/a
internode_encryption none
keystore conf/.keystore
keystore_password cassandra
listen_address localhost
max_hint_window_in_ms 3600000 (one hour)
memtable_flush_queue_size 4
memtable_flush_writers One per data directory
memtable_total_space_in_mb 1/3 of the heap
partitioner org.apache.cassandra.dht.RandomPartitioner
phi_convict_threshold 8
reduce_cache_capacity_to 0.6
reduce_cache_sizes_at 0.85
request_scheduler org.apache.cassandra.scheduler.NoScheduler
request_scheduler_id keyspace
rpc_address localhost
rpc_keepalive true
rpc_max_threads Unlimited
rpc_min_threads 16
rpc_port 9160
rpc_recv_buff_size_in_bytes n/a
rpc_send_buff_size_in_bytes n/a
rpc_server_type sync
rpc_timeout_in_ms 10000
saved_caches_directory /var/lib/cassandra/saved_caches
seeds 127.0.0.1
seed_provider org.apache.cassandra.locator.SimpleSeedProvider
sliced_buffer_size_in_kb 64
snapshot_before_compaction false
storage_port 700
stream_throughput_outbound_megabits_per_sec 400
thrift_framed_transport_size_in_mb 15
thrift_max_message_length_in_mb 16
truststore conf/.truststore
truststore_password cassandra

Node and Cluster Initialization Properties

The following properties are used to initialize a new cluster or when introducing a new node to an established cluster, and should be evaluated and changed as needed before starting a node for the first time. These properties control how a node is configured within a cluster in regards to inter-node communication, data partitioning, and replica placement.

broadcast_address

If your Cassandra cluster is deployed across multiple Amazon EC2 regions (and you are using the EC2MultiRegionSnitch), you should set broadcast_address to public IP address of the node (and listen_address to the private IP). If not declared, defaults to the same address as specified for listen_address.

cluster_name

The name of the cluster. All nodes participating in a cluster must have the same value.

commitlog_directory

The directory where the commit log will be stored. For optimal write performance, DataStax recommends the commit log be on a separate disk partition (ideally a separate physical device) from the data file directories.

data_file_directories

The directory location where column family data (SSTables) will be stored.

initial_token

The initial token assigns the node token position in the ring, and assigns a range of data to the node when it first starts up. The initial token can be left unset when introducing a new node to an established cluster. Otherwise, the token value depends on the partitioner you are using. With the random partitioner, this value will be a number between 0 and 2**127. With the byte order preserving partitioner, this value will be a byte array of hex values based on your actual row key values. With the order preserving and collated order preserving partitioners, this value will be a UTF-8 string based on your actual row key values. See Generating Tokens for more information.

listen_address

The IP address or hostname that other Cassandra nodes will use to connect to this node. If left blank, you must have hostname resolution correctly configured on all nodes in your cluster so that the hostname resolves to the correct IP address for this node (using /etc/hostname, /etc/hosts or DNS).

partitioner

Sets the partitioning method used when assigning a row key to a particular node (also see initial_token). Allowed values are:
  • org.apache.cassandra.dht.RandomPartitioner (default)
  • org.apache.cassandra.dht.ByteOrderedPartitioner
  • org.apache.cassandra.dht.OrderPreservingPartitioner (deprecated)
  • org.apache.cassandra.dht.CollatingOrderPreservingPartitioner (deprecated)

rpc_address

The listen address for remote procedure calls (client connections). To listen on all configured interfaces, set to 0.0.0.0. If left blank, you must have hostname resolution correctly configured on all nodes in your cluster so that the hostname resolves to the correct IP address for this node (using /etc/hostname, /etc/hosts or DNS). Default Value: localhost Allowed Values: An IP address, hostname, or leave unset to resolve the address using the hostname configuration of the node.

rpc_port

The port for remote procedure calls (client connections) and the Thrift service. Default is 9160.

saved_caches_directory

The directory location where column family key and row caches will be stored.

seed_provider

The seed provider is a pluggable interface for providing a list of seed nodes. The default seed provider requires a comma-delimited list of seeds.

seeds

When a node joins a cluster, it contacts the seed node(s) to determine the ring topology and obtain gossip information about the other nodes in the cluster. Every node in the cluster should have the same list of seeds, specified as a comma-delimited list of IP addresses. In multiple data center clusters, the seed list should include at least one node from each data center (replication group).

storage_port

The port for inter-node communication. Default port is 7000.

endpoint_snitch

Sets the snitch to use for locating nodes and routing requests. In deployments with rack-aware replication placement strategies, use either RackInferringSnitch, PropertyFileSnitch, or EC2Snitch (if on Amazon EC2). Has a dependency on the replica placement_strategy, which is defined on a keyspace. The PropertyFileSnitch also requires a cassandra-topology.properties configuration file. Snitches included with Cassandra are:
  • org.apache.cassandra.locator.SimpleSnitch
  • org.apache.cassandra.locator.RackInferringSnitch
  • org.apache.cassandra.locator.PropertyFileSnitch
  • org.apache.cassandra.locator.Ec2Snitch

Performance Tuning Properties

The following properties are used to tune performance and system resource utilization (memory, disk I/O, CPU, etc.) for reads and writes.

column_index_size_in_kb

Column indexes are added to a row after the data reaches this size. This usually happens if there are a large number of columns in a row or the column values themselves are large. If you consistently read only a few columns from each row, this should be kept small as it denotes how much of the row data must be deserialized to read the column.

commitlog_sync

The method that Cassandra will use to acknowledge writes. The default mode of periodic is used in conjunction with commitlog_sync_period_in_ms to control how often the commit log is synchronized to disk. Periodic syncs are acknowledged immediately. In batch mode, writes are not acknowledged until fsynced to disk. It will wait the configured number of milliseconds for other writes before performing a sync. Allowed Values are periodic (default) or batch.

commitlog_sync_period_in_ms

Determines how often (in milliseconds) to send the commit log to disk when commitlog_sync is set to periodic mode.

commitlog_total_space_in_mb

When the commitlog size on a node exceeds this threshold, Cassandra will flush memtables to disk for the oldest commitlog segments, thus allowing those log segments to be removed. This reduces the amount of data to replay on startup, and prevents infrequently-updated column families from keeping commit log segments around indefinitely. This replaces the per-column family storage setting memtable_flush_after_mins.

compaction_preheat_key_cache

When set to true, cached row keys are tracked during compaction, and re-cached to their new positions in the compacted SSTable. If you have extremely large key caches for your column families, set to false (see the keys_cached attribute set on a column family).

compaction_throughput_mb_per_sec

Throttles compaction to the given total throughput across the entire system. The faster you insert data, the faster you need to compact in order to keep the SSTable count down. The recommended Value is 16-32 times the rate of write throughput (in MBs/second). Setting to 0 disables compaction throttling.

concurrent_compactors

Sets the number of concurrent compaction processes allowed to run simultaneously on a node. Defaults to one compaction process per CPU core.

concurrent_reads

For workloads with more data than can fit in memory, the bottleneck will be reads that need to fetch data from disk. Setting to (16 * number_of_drives) allows operations to queue low enough in the stack so that the OS and drives can reorder them.

concurrent_writes

Writes in Cassandra are almost never I/O bound, so the ideal number of concurrent writes depends on the number of CPU cores in your system. The recommended value is (8 * number_of_cpu_cores).

flush_largest_memtables_at

When Java heap usage after a full concurrent mark sweep (CMS) garbage collection is higher than this percentage, the largest memtables will be flushed to disk in order to free memory. This parameter serves as more of an emergency measure for preventing sudden out-of-memory (OOM) errors rather than a strategic tuning mechanism. It is most effective under light to moderate load, or read-heavy workloads. The default value of .75 means flush memtables when Java heap usage is above 75 percent total heap size. 1.0 disables this feature.

in_memory_compaction_limit_in_mb

Size limit for rows being compacted in memory. Larger rows spill to disk and use a slower two-pass compaction process. When this occurs, a message is logged specifying the row key. The recommended value is 5 to 10 percent of the available Java heap size.

index_interval

Each SSTable has an index file containing row keys and the position at which that row starts in the data file. At startup, Cassandra reads a sample of that index into memory. By default 1 row key out of every 128 is sampled. To find a row, Cassandra performs a binary search on the sample, then does just one disk read of the index block corresponding to the closest sampled entry. The larger the sampling, the more effective the index is (at the cost of memory usage). A smaller value for this property results in a larger, more effective index. Generally, a value between 128 and 512 in combination with a large column family key cache offers the best trade off between memory usage and performance. You may want to increase the sample size if you have small rows, thus decreasing the index size and memory usage. For large rows, decreasing the sample size may improve read performance.

memtable_flush_queue_size

The number of full memtables to allow pending flush, that is, waiting for a writer thread. At a minimum, this should be set to the maximum number of secondary indexes created on a single column family.

memtable_flush_writers

Sets the number of memtable flush writer threads. These will be blocked by disk I/O, and each one will hold a memtable in memory while blocked. If you have a large Java heap size and many data directories (see data_file_directories), you can increase this value for better flush performance. By default this is set to the number of data directories defined (which is 1).

memtable_total_space_in_mb

Specifies total memory used for all column family memtables on a node. Defaults to a third of your JVM heap size. This replaces the old per-column family storage settings memtable_operations_in_millions and memtable_throughput_in_mb.

reduce_cache_capacity_to

Sets the size percentage to which maximum cache capacity is reduced when Java heap usage reaches the threshold defined by reduce_cache_sizes_at. Together with flush_largest_memtables_at, these properties are an emergency measure for preventing sudden out-of-memory (OOM) errors.

reduce_cache_sizes_at

When Java heap usage after a full concurrent mark sweep (CMS) garbage collection is higher than this percentage, Cassandra will reduce the cache capacity to the fraction of the current size as specified by reduce_cache_capacity_to. The default is 85 percent (0.85). 1.0 disables this feature.

sliced_buffer_size_in_kb

The buffer size (in kilobytes) to use for reading contiguous columns. This should match the size of the columns typically retrieved using query operations involving a slice predicate.

stream_throughput_outbound_megabits_per_sec

Throttles all outbound streaming file transfers on a node to the specified throughput in Mb per second. Cassandra does mostly sequential I/O when streaming data during bootstrap or repair, which can lead to saturating the network connection and degrading client performance. The default is 400 Mb/s or 50 MB/s.

Remote Procedure Call Tuning Properties

The following properties are used to configure and tune remote procedure calls (client connections).

request_scheduler

Defines a scheduler to handle incoming client requests according to a defined policy. This scheduler only applies to client requests, not inter-node communication. Useful for throttling client requests in implementations that have multiple keyspaces. Allowed Values are:
  • org.apache.cassandra.scheduler.NoScheduler (default)
  • org.apache.cassandra.scheduler.RoundRobinScheduler
  • A Java class that implements the RequestScheduler interface If using the RoundRobinScheduler, there are additional request_scheduler_options properties.

request_scheduler_id

An identifier on which to perform request scheduling. Currently the only valid option is keyspace.

request_scheduler_options

Contains a list of additional properties that define configuration options for request_scheduler. NoScheduler does not have any options. RoundRobinScheduler has the following additional configuration properties: throttle_limit, default_weight, weights.

throttle_limit

The number of active requests per client. Requests beyond this limit are queued up until running requests complete. The default is 80. Recommended value is ((concurrent_reads + concurrent_writes) * 2).

default_weight

The default weight controls how many requests are handled during each turn of the RoundRobin. The default is 1.

weights

Allows control of weight per keyspace during each turn of the RoundRobin. If not set, each keyspace uses the default_weight. Takes a list of list of keyspaces: weights.

rpc_keepalive

Enable or disable keepalive on client connections.

rpc_max_threads

Cassandra uses one thread-per-client for remote procedure calls. For a large number of client connections, this can cause excessive memory usage for the thread stack. Connection pooling on the client side is highly recommended. Setting a maximum thread pool size acts as a safeguard against misbehaved clients. If the maximum is reached, Cassandra will block additional connections until a client disconnects.

rpc_min_threads

Sets the minimum thread pool size for remote procedure calls.

rpc_recv_buff_size_in_bytes

Sets the receiving socket buffer size for remote procedure calls.

rpc_send_buff_size_in_bytes

Sets the sending socket buffer size in bytes for remote procedure calls.

rpc_timeout_in_ms

The time in milliseconds that a node will wait on a reply from other nodes before the command is failed.

rpc_server_type

Cassandra provides three options for the rpc server. The default is sync because hsha is about 30% slower on Windows. On Linux, sync and hsha performance is about the same with hsha using less memory.

  • sync - (default) One connection per thread in the rpc pool. For a very large number of clients, memory will be your limiting factor; on a 64 bit JVM, 128KB is the minimum stack size per thread. Connection pooling is very, very strongly recommended.
  • hsha Half synchronous, half asynchronous. The rpc thread pool is used to manage requests, but the threads are multiplexed across the different clients.
  • async - Deprecated and will be removed in the next major release. Do not use.

thrift_framed_transport_size_in_mb

Specifies the frame size in megabytes (maximum field length) for Thrift. 0 disables framing. This option is deprecated in favor of thrift_max_message_length_in_mb.

thrift_max_message_length_in_mb

The maximum length of a Thrift message in megabytes, including all fields and internal Thrift overhead.

Internode Communication and Fault Detection Properties

dynamic_snitch

When set to true (default), enables the dynamic snitch layer that monitors read latency and, when possible, routes requests away from poorly-performing nodes.

dynamic_snitch_badness_threshold

Sets a performance threshold for dynamically routing requests away from a poorly performing node. A value of 0.2 means Cassandra would continue to prefer the static snitch values until the node response time was 20 percent worse than the best performing node.

Until the threshold is reached, incoming client requests are statically routed to the closest replica (as determined by the configured snitch). Having requests consistently routed to a given replica can help keep a working set of data hot when read repair is less than 100% or disabled.

dynamic_snitch_reset_interval_in_ms

Time interval in milliseconds to reset all node scores (allowing a bad node to recover).

dynamic_snitch_update_interval_in_ms

The time interval in milliseconds for calculating read latency.

hinted_handoff_enabled

Enables or disables hinted handoff.

hinted_handoff_throttle_delay_in_ms

When a node detects that a node for which it is holding hints has recovered, it begins sending the hints to that node. This specifies a sleep interval (in milliseconds) after delivering each row or row fragment in an effort to throttle traffic to the recovered node.

max_hint_window_in_ms

Defines how long in milliseconds to generate and save hints for an unresponsive node. After this interval, hints are dropped. This can prevent a sudden demand for resources when a node is brought back online and the rest of the cluster attempts to replay a large volume of hinted writes. The default is one hour (3600000 ms).

phi_convict_threshold

The Phi convict threshold adjusts the sensitivity of the failure detector on an exponential scale . Lower values increase the likelihood that an unresponsive node will be marked as down, while higher values decrease the likelihood that transient failures will cause a node failure. In unstable network environments (such as EC2 at times), raising the value to 10 or 12 will prevent false failures. Values higher than 12 and lower than 5 are not recommended. The default is 8.

Automatic Backup Properties

incremental_backups

Backs up data updated since the last snapshot was taken. When enabled, each time an SSTable is flushed, a hard link is copied into a /backups subdirectory of the keyspace data directory.

snapshot_before_compaction

Defines whether or not to take a snapshot before each compaction. Be careful using this option, since Cassandra does not clean up older snapshots automatically. This can be useful to back up data when there is a data format change.

Security Properties

authenticator

The default value disables authentication. Basic authentication is provided using the SimpleAuthenticator, which uses the access.properties and password.properties configuration files to configure authentication privileges. Allowed values are: * org.apache.cassandra.auth.AllowAllAuthenticator * org.apache.cassandra.auth.SimpleAuthenticator * A Java class that implements the IAuthenticator interface

Note

The SimpleAuthenticator and SimpleAuthority classes have been moved to the example directory of the Apache Cassandra project repository as of release 1.0. They are no longer available in the packaged and binary distributions. They never provided actual security, and in their current state are only meant as examples.

authority

The default value disables user access control (all users can access all resources). To control read/write permissions to keyspaces and column families, use the SimpleAuthority, which uses the access.properties configuration file to define per-user access. Allowed values are: * org.apache.cassandra.auth.AllowAllAuthority * org.apache.cassandra.auth.SimpleAuthority * A Java class that implements the IAuthority interface

internode_encryption

Enables or disables encryption of inter-node communication using TLS_RSA_WITH_AES_128_CBC_SHA as the cipher suite for authentication, key exchange and encryption of the actual data transfers. To encrypt all inter-node communications, set to all. You must also generate keys and provide the appropriate key and trust store locations and passwords.

keystore

Description: The location of a Java keystore (JKS) suitable for use with Java Secure Socket Extension (JSSE), the Java version of the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols. The keystore contains the private key used to encrypt outgoing messages.

keystore_password

Password for the keystore.

truststore

The location of a truststore containing the trusted certificate used to authenticate remote servers.

truststore_password

Password for the truststore.