The cassandra.yaml file is the main configuration file for Cassandra. This file is located in /etc/cassandra/conf/cassandra.yaml in packaged installations or $CASSANDRA_HOME/conf/cassandra.yaml in binary installations. After changing properties in this file, you must restart the node for the changes to take effect.
|commitlog_sync_period_in_ms||10000 (ten seconds)|
|concurrent_compactors||One per CPU core|
|max_hint_window_in_ms||3600000 (one hour)|
|memtable_flush_writers||One per data directory|
|memtable_total_space_in_mb||1/3 of the heap|
The following properties are used to initialize a new cluster or when introducing a new node to an established cluster, and should be evaluated and changed as needed before starting a node for the first time. These properties control how a node is configured within a cluster in regards to inter-node communication, data partitioning, and replica placement.
When set to true, populates a new node with a range of data when it joins an established cluster based on the setting of initial_token. If initial_token is not set, the newly added node will insert itself into the ring by splitting the token range of the most heavily loaded node. Leave set to false when initializing a brand new cluster.
The name of the cluster. All nodes participating in a cluster must have the same value.
The directory where the commit log will be stored. For optimal write performance, DataStax recommends the commit log be on a separate disk partition (ideally a separate physical device) from the data file directories.
The directory location where column family data (SSTables) will be stored.
The initial token assigns the node token position in the ring, and assigns a range of data to the node when it first starts up. The initial token can be left unset when introducing a new node to an established cluster using auto_bootstrap. Otherwise, the token value depends on the partitioner you are using. With the random partitioner, this value will be a number between 0 and 2**127. With the byte order preserving partitioner, this value will be a byte array of hex values based on your actual row key values. With the order preserving and collated order preserving partitioners, this value will be a UTF-8 string based on your actual row key values. See Calculating Tokens for more information.
The IP address or hostname that other Cassandra nodes will use to connect to this node. If left blank, you must have hostname resolution correctly configured on all nodes in your cluster so that the hostname resolves to the correct IP address for this node (using /etc/hostname, /etc/hosts or DNS).
The listen address for remote procedure calls (client connections). To listen on all configured interfaces, set to 0.0.0.0. If left blank, you must have hostname resolution correctly configured on all nodes in your cluster so that the hostname resolves to the correct IP address for this node (using /etc/hostname, /etc/hosts or DNS). Default Value: localhost Allowed Values: An IP address, hostname, or leave unset to resolve the address using the hostname configuration of the node.
The port for remote procedure calls (client connections) and the Thrift service. Default is 9160.
The directory location where column family key and row caches will be stored.
The seed provider is a pluggable interface for providing a list of seed nodes. The default seed provider requires a comma-delimited list of seeds.
When a node joins a cluster, it contacts the seed node(s) to determine the ring topology and obtain gossip information about the other nodes in the cluster. Every node in the cluster should have the same list of seeds, specified as a comma-delimited list of IP addresses. In multi data center clusters, the seed list should include at least one node from each data center (replication group).
The port for inter-node communication. Default port is 7000.
The following properties are used to tune performance and system resource utilization (memory, disk I/O, CPU, etc.) for reads and writes.
Column indexes are added to a row after the data reaches this size. This usually happens if there are a large number of columns in a row or the column values themselves are large. If you consistently read only a few columns from each row, this should be kept small as it denotes how much of the row data must be deserialized to read the column.
The size in MB to which the commit log will grow before creating a new commit log segment.
The method that Cassandra will use to acknowledge writes. The default mode of periodic is used in conjunction with commitlog_sync_period_in_ms to control how often the commit log is synchronized to disk. Periodic syncs are acknowledged immediately. In batch mode, writes are not acknowledged until fsynced to disk. It will wait the configured number of milliseconds for other writes before performing a sync. Allowed Values are periodic (default) or batch.
Determines how often (in milliseconds) to send the commit log to disk when commitlog_sync is set to periodic mode.
When set to true, cached row keys are tracked during compaction, and re-cached to their new positions in the compacted SSTable. If you have extremely large key caches for your column families, set to false (see the keys_cached attribute set on a column family).
Sets the priority for compaction threads. The thread priority determines execution preference by the JVM in relation to other Java processes. The default of 1 is the lowest priority.
Throttles compaction to the given total throughput across the entire system. The faster you insert data, the faster you need to compact in order to keep the SSTable count down. The recommended Value is 16-32 times the rate of write throughput (in MBs/second). Setting to 0 disables compaction throttling.
Sets the number of concurrent compaction processes allowed to run simultaneously on a node. Defaults to one compaction process per CPU core.
For workloads with more data than can fit in memory, the bottleneck will be reads that need to fetch data from disk. Setting to (16 * number_of_drives) allows operations to queue low enough in the stack so that the OS and drives can reorder them.
Writes in Cassandra are almost never I/O bound, so the ideal number of concurrent writes depends on the number of CPU cores in your system. The recommended value is (8 * number_of_cpu_cores).
When Java heap usage after a full concurrent mark sweep (CMS) garbage collection is higher than this percentage, the largest memtables will be flushed to disk in order to free memory. This parameter serves as more of an emergency measure for preventing sudden out-of-memory (OOM) errors rather than a strategic tuning mechanism. It is most effective under light to moderate load, or read-heavy workloads. The default value of .75 means flush memtables when Java heap usage is above 75 percent total heap size. 1.0 disables this feature.
Size limit for rows being compacted in memory. Larger rows spill to disk and use a slower two-pass compaction process. When this occurs, a message is logged specifying the row key. The recommended value is 5 to 10 percent of the available Java heap size.
Each SSTable has an index file containing row keys and the position at which that row starts in the data file. At startup, Cassandra reads a sample of that index into memory. By default 1 row key out of every 128 is sampled. To find a row, Cassandra performs a binary search on the sample, then does just one disk read of the index block corresponding to the closest sampled entry. The larger the sampling, the more effective the index is (at the cost of memory usage). A smaller value for this property results in a larger, more effective index. Generally, a value between 128 and 512 in combination with a large column family key cache offers the best trade off between memory usage and performance. You may want to increase the sample size if you have small rows, thus decreasing the index size and memory usage. For large rows, decreasing the sample size may improve read performance.
The number of full memtables to allow pending flush, that is, waiting for a writer thread. At a minimum, this should be set to the maximum number of secondary indexes created on a single column family.
Sets the number of memtable flush writer threads. These will be blocked by disk I/O, and each one will hold a memtable in memory while blocked. If you have a large Java heap size and many data directories (see data_file_directories), you can increase this value for better flush performance. By default this is set to the number of data directories defined (which is 1).
Specifies total memory used for memtables. During normal operation this complements the related column family limits on operations, throughput and SSTables. If this value is set to 0, only the column family specific limits are enforced. See also memtable_flush_after_mins, memtable_throughput_in_mb, memtable_operations_in_millions (which are set per column family).
Sets the size percentage to which maximum cache capacity is reduced when Java heap usage reaches the threshold defined by reduce_cache_sizes_at. Together with flush_largest_memtables_at, these properties are an emergency measure for preventing sudden out-of-memory (OOM) errors.
When Java heap usage after a full concurrent mark sweep (CMS) garbage collection is higher than this percentage, Cassandra will reduce the cache capacity to the fraction of the current size as specified by reduce_cache_capacity_to. The default is 85 percent (0.85). 1.0 disables this feature.
The buffer size (in kilobytes) to use for reading contiguous columns. This should match the size of the columns typically retrieved using query operations involving a slice predicate.
The following properties are used to configure and tune remote procedure calls (client connections).
An identifier on which to perform request scheduling. Currently the only valid option is keyspace.
Contains a list of additional properties that define configuration options for request_scheduler. NoScheduler does not have any options. RoundRobinScheduler has the following additional configuration properties: throttle_limit, default_weight, weights.
The number of active requests per client. Requests beyond this limit are queued up until running requests complete. The default is 80. Recommended value is ((concurrent_reads + concurrent_writes) * 2).
The default weight controls how many requests are handled during each turn of the RoundRobin. The default is 1.
Enable or disable keepalive on client connections.
Cassandra uses one thread-per-client for remote procedure calls. For a large number of client connections, this can cause excessive memory usage for the thread stack. Connection pooling on the client side is highly recommended. Setting a maximum thread pool size acts as a safeguard against misbehaved clients. If the maximum is reached, Cassandra will block additional connections until a client disconnects.
Sets the minimum thread pool size for remote procedure calls.
Sets the receiving socket buffer size for remote procedure calls.
Sets the sending socket buffer size in bytes for remote procedure calls.
The time in milliseconds that a node will wait on a reply from other nodes before the command is failed.
Specifies the frame size in megabytes (maximum field length) for Thrift. 0 disables framing. This option is deprecated in favor of thrift_max_message_length_in_mb.
The maximum length of a Thrift message in megabytes, including all fields and internal Thrift overhead.
When set to true (default), enables the dynamic snitch layer that monitors read latency and, when possible, routes requests away from poorly-performing nodes.
Sets a performance threshold for dynamically routing requests away from a poorly performing node. A value of 0.2 means Cassandra would continue to prefer the static snitch values until the node response time was 20 percent worse than the best performing node.
Until the threshold is reached, incoming client requests are statically routed to the closest replica (as determined by the configured snitch). Having requests consistently routed to a given replica can help keep a working set of data hot when read repair is less than 100% or disabled.
Time interval in milliseconds to reset all node scores (allowing a bad node to recover).
The time interval in milliseconds for calculating read latency.
Enables or disables hinted handoff.
When a node detects that a node for which it is holding hints has recovered, it begins sending the hints to that node. This specifies a sleep interval (in milliseconds) after delivering each row or row fragment in an effort to throttle traffic to the recovered node.
Defines how long in milliseconds to generate and save hints for an unresponsive node. After this interval, hints are dropped. This can prevent a sudden demand for resources when a node is brought back online and the rest of the cluster attempts to replay a large volume of hinted writes. The default is one hour (3600000 ms).
The Phi convict threshold adjusts the sensitivity of the failure detector on an exponential scale . Lower values increase the likelihood that an unresponsive node will be marked as down, while higher values decrease the likelihood that transient failures will cause a node failure. In unstable network environments (such as EC2 at times), raising the value to 10 or 12 will prevent false failures. Values higher than 12 and lower than 5 are not recommended. The default is 8.
Backs up data updated since the last snapshot was taken. When enabled, each time an SSTable is flushed, a hard link is copied into a /backups subdirectory of the keyspace data directory.
Defines whether or not to take a snapshot before each compaction. Be careful using this option, since Cassandra does not clean up older snapshots automatically. This can be useful to back up data when there is a data format change.
The default value disables authentication. Basic authentication is provided using the SimpleAuthenticator, which uses the access.properties and password.properties configuration files to configure authentication privileges. Allowed values are: * org.apache.cassandra.auth.AllowAllAuthenticator * org.apache.cassandra.auth.SimpleAuthenticator * A Java class that implements the IAuthenticator interface
Enables or disables encryption of inter-node communication using TLS_RSA_WITH_AES_128_CBC_SHA as the cipher suite for authentication, key exchange and encryption of the actual data transfers. To encrypt all inter-node communications, set to all. You must also generate keys and provide the appropriate key and trust store locations and passwords.
Description: The location of a Java keystore (JKS) suitable for use with Java Secure Socket Extension (JSSE), the Java version of the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols. The keystore contains the private key used to encrypt outgoing messages.
Password for the keystore.
The location of a truststore containing the trusted certificate used to authenticate remote servers.
Password for the truststore.