Apache Cassandra 0.7 Documentation

Storage Configuration

The main configuration file for 0.7.x versions of Cassandra is cassandra.yaml, located in the conf directory of the distribution. This file itself has enough documentation to get most users started, but additional details are listed below. Keyspace options and column family attributes are broken out into a separate table for readability.

Option Default Value
cluster_name Test Cluster
initial_token n/a
auto_bootstrap false
hinted_handoff_enabled true
max_hint_window_in_ms 3600000
authenticator org.apache.cassandra.auth.AllowAllAuthenticator
authority org.apache.cassandra.auth.AllowAllAuthority
partitioner org.apache.cassandra.dht.RandomPartitioner
commitlog_directory /var/lib/cassandra/commitlog
data_file_directories /var/lib/cassandra/data
saved_caches_directory /var/lib/cassandra/saved_caches
commitlog_rotation_threshold_in_mb 128
commitlog_sync periodic
commitlog_sync_period_in_ms 10000
flush_largest_memtables_at 0.75
reduce_cache_sizes_at 0.85
reduce_cache_capacity_to 0.6
seeds 127.0.0.1
disk_access_mode auto
concurrent_reads 32
concurrent_writes 32
memtable_flush_writers 1
sliced_buffer_size_in_kb 64
storage_port 7000
listen_address localhost
rpc_address localhost
rpc_port 9160
rpc_keepalive true
rpc_send_buff_size_in_bytes n/a
rpc_recv_buff_size_in_bytes n/a
thrift_framed_transport_size_in_mb 15
thrift_max_message_length_in_mb 16
snapshot_before_compaction false
compaction_thread_priority 1
binary_memtable_throughput_in_mb 256
column_index_size_in_kb 64
in_memory_compaction_limit_in_mb 64
rpc_timeout_in_ms 10000
phi_convict_threshold 8
endpoint_snitch org.apache.cassandra.locator.SimpleSnitch
dynamic_snitch true
dynamic_snitch_update_interval_in_ms 100
dynamic_snitch_reset_interval_in_ms 600000
dynamic_snitch_badness_threshold 0.0
request_scheduler org.apache.cassandra.scheduler.NoScheduler
request_scheduler_id keyspace
index_interval 128

Keyspace and Column Family Attributes

Many aspects of storage configuration are set on a per-keyspace or per-column family basis. These attributes can be manipulated programatically, but in most cases the practical method for defining keyspace and column attributes is to use the Cassandra CLI.

Prior to release 0.7.3, keyspace and column family attributes could be specified in cassandra.yaml, but that is no longer true in 0.7.4 and later. Use the CLI as described below to update these attributes in 0.7.4 and later.

To update keyspace and column family attributes using the CLI

update keyspace <keyspace> [with <att1>=<value1> [and <att2>=<value2> ...]];

update column family <cf> [with <att1>=<value1> [and <att2>=<value2> ...]];

For example, to change the replication strategy for a keyspace and then update the number of keys cached for a column family:

[default@twissandra] update keyspace twissandra with
placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy';

[default@twissandra] update column family users with keys_cached=25000;

Note

Two important exceptions to this syntax are strategy_options and column_metadata. See the detailed attribute descriptions below for the correct syntax.

For more information about similar commands, see Using the Cassandra CLI.

Keyspace Required Attributes

The following are the required attributes for configuring Keyspaces. These are only valid within a Keyspace element.

Option Default Value
name n/a (A user-defined value is required)
replica_placement_strategy org.apache.cassandra.locator.SimpleStrategy
replication_factor 1

Keyspace Optional Attributes

The following are the optional parameters for configuring Keyspaces. These are only valid within a Keyspace element.

Option Default Value
column_families n/a (A user-defined value is required)
strategy_options n/a

Column Family Required Attribues

The following options are required attributes of the ColumnFamily element, which itself is only valid within a Keyspace element.

Option Default Value
name n/a (A user-defined value is required)
compare_with BytesType
column_type Standard
compare_subcolumns_with BytesType

Column Family Optional Attributes

The following are optional attributes of the ColumnFamily element.

Option Default Value
keys_cached 200000
rows_cached n/a (disabled by default)
comment n/a
read_repair_chance 1.0 (always on)
gc_grace_seconds 864000 (10 days)
default_validation_class n/a
min_compaction_threshold 4
max_compaction_threshold 32
row_cache_save_period_in_seconds n/a
key_cache_save_period_in_seconds n/a
memtable_flush_after_mins 60
memtable_throughput_in_mb 1/8 the heapsize
memtable_operations_in_millions throughput / 64 * 0.3
column_metadata n/a

cluster_name

A human readable name for the cluster. This value is returned from the describe_cluster_name API call.

initial_token

Determines the placement of a node’s token in the ring. This setting is only checked on the first start up of a node. With RandomPartitioner configured, it can be used to force equal token spacing around the ring. With OrderPreservingPartitioner, users can specify the nodes token range if the key distribution is known.

auto_bootstrap

Enable this option for new nodes which will join the cluster. Can be used in conjunction with initial_token to specify which token range to take over. With no InitialToken, AutoBootstrap will acquire half the range of the most loaded node.

Note

There are several caveats with autobootstrap.

hinted_handoff_enabled

Set to false to disable HintedHandoff. Default is true.

max_hint_window_in_ms

Defines how long in seconds to generate hints for a host after it “dies,” or stops responding. After this interval, hints are dropped. This can prevent a sudden and overwhelming demand for resources when a node is brought back online and the rest of the cluster attempts to replay a large volume of hinted writes. Defaults to 3600000 seconds, or one hour.

authenticator

The default value of org.apache.cassandra.auth.AllowAllAuthenticator effectively disables authentication. For simple authentication, users may choose to switch to org.apache.cassandra.auth.SimpleAuthenticator and provide access.properties and passwd.properties for configuration. Users can add custom IAuthenticator implementations by using the fully qualified class name (provided the resource is available on the class path).

authority

The default value of org.apache.cassandra.auth.AllowAllAuthority effectively allows access to all resources. To control read/write permissions to keyspaces or column familes, change this value to org.apache.cassandra.auth.SimpleAuthority and provide the required per-user configuration in access.properties. Users can add custom IAuthority implementations by using the fully qualified class name (provided the resource is available on the class path).

partitioner

Partitioners control how keys are distributed across the ring. At a high level, the default RandomPartitioner places data on the ring according to an MD5 hash of the key. Other partitioner types available are org.apache.cassandra.dht.OrderPreservingPartitioner, org.apache.cassandra.dht.ByteOrderedPartitioner and org.apache.cassandra.dht.CollatingOrderPreservingPartitioner. Users are free to implement their own IPartitioner for custom functionality.

See Tokens, Partitioners, and the Ring for more details on Tokens and Partitioners.

data_file_directories

One or more DataFileDirectory elements can be defined as children of DataFileDirectories. These directories specify the location of SSTable files.

For performance reasons, the data file directories should on separate partitions (ideally separate physical devices) from the commitlog_directory.

commitlog_directory

Specifies the directory which will hold the commit log data.

Note

This should be on a separate partition from the data directory. See data_file_directories for more details.

saved_caches_directory

Specifies a directory for saved caches.

commitlog_rotation_threshold_in_mb

The size to which the commit log will grow before creating a new commit log segment.

commitlog_sync

The method that Cassandra will use to acknowledge writes. The default of periodic is used in conjunction with commitlog_sync_period_in_ms to control how often the commit log is synced to disk. Periodic syncs are acknowledged immediately.

commitlog_sync_period_in_ms

How often to send the commit log to disk when in periodic mode of commitlog_sync.

flush_largest_memtables_at

Sets a threshold based on Java heap usage for flushing the largest memtables to disk and thereby freeing memory. Whenever the heap usage after a full (CMS) garbage collection is above this fraction of the max, Cassandra will flush the largest memtables. Defaults to 0.75. Set to 1.0 to disable.

This parameter is part of a best-effort, “emergency pressure valve” feature set to help prevent sudden OOM events. Do not rely on this for your primary tuning mechanism. It is most effective under light to moderate load, or read-heavy workloads.

reduce_cache_sizes_at

Sets a threshold based on Java heap usage for reducing cache size to the capacity specified in reduce_cache_capacity_to (see below). Whenever the heap usage after a full (CMS) garbage collection is above this fraction of the max, Cassandra will reduce cache maximum capacity to the specified fraction of the current size. Defaults to 0.85. Set to 1.0 to disable.

reduce_cache_capacity_to

Set the size to which maximum cache capacity is reduced when heap usage reaches the threshold defined by reduce_cache_sizes_at (see above). Together with flush_largest_memtables_at, these parameters are part of a best-effort, “emergency pressure valve” feature set to help prevent sudden OOM events.

seeds

There must be one or more Seed elements for a working cluster. A Seed is a node used as a Gossip contact point for information regarding ring topology.

disk_access_mode

Controls if and how SSTable and Index files are mapped into memory via the mmap system call. The default mode of auto enables this feature on 64bit JVMs, as does the explicit use of mmap as the option. The next option, mmap_index_only, uses mmap for just the index files (and is also the result of auto on a 32bit JVM). The remaining option, standard, disables mmap usage.

concurrent_reads

The number of reader threads available in the system. A general rule is to keep this twice the number of processor cores in the system.

concurrent_writes

Defines the number of writer threads available in the system. On systems with many cores (12 or higher), increasing the default of 32 may yield performance improvements.

memtable_flush_writers

Sets the number of memtable flush writer threads. These will be blocked by disk IO, and each one will hold a memtable in memory while blocked. If you have a large heap and many data directories, you can increase this value for better flush performance. By default this is set to the number of data directories defined.

sliced_buffer_size_in_kb

The buffer size to use for reading contiguous columns. This should match the size of the columns typically retrieved using query operations involving a slice predicate.

storage_port

The port used for internal cluster communications. Default is 7000.

listen_address

The bind address for other nodes to communicate with this node.

This can be left blank if the hostname is set (using /etc/hostname, for example), DNS resolution is configured, and the address associated with the hostname is the correct one to use. In this case, the result of Java’s InetAddress.getLocalHost() is used. If your environment allows for this, it can help to make the configuration for all nodes the same, eliminating one potential source of configuration error. This also helps to ensure the correct interface is used.

Unlike rpc_address, you may not set this to 0.0.0.0. See the FAQ entry on the topic for more details.

rpc_address

The address to which the Thrift API calls will be bound. For users that want all interfaces to listen for Thrift, the value 0.0.0.0 may be used. Leaving this value blank has the same effect as for listen_address.

rpc_port

The port to which the Thrift service will be bound.

rpc_keepalive

Enable or disable keepalive on RPC connections.

rpc_send_buff_size_in_bytes

Unncomment and provide a value in bytes to set socket buffer sizes on RPC connections.

rpc_recv_buff_size_in_bytes

Unncomment and provide a value in bytes to set socket buffer sizes on RPC connections.

thrift_framed_transport_size_in_mb

To enable framing for the server, set this to true. Note that either way, this value must match the client side configuration.

thrift_max_message_length_in_mb

Sets the maximum length of a thrift message, including all fields and internal thrift overhead.

snapshot_before_compaction

Defines whether or not to take a snapshot before each compaction. Be careful using this option, since Cassandra won’t clean up the snapshots for you. This can be useful to back up data when there is a data format change.

compaction_thread_priority

Change this to increase the compaction thread’s priority. In java, 1 is the lowest available priority setting.

binary_memtable_throughput_in_mb

The memory to be consumed for BinaryMemtables (used in bulk-loading).

column_index_size_in_kb

Column indexes are added to a row after the data reaches this size. This usually happens if there are a large number of columns in a row or the column values themselves are large. If you consistently read only a few columns from each row, this should be kept small as it denotes how much of the row data must be deserialized to read the column.

in_memory_compaction_limit_in_mb

Size limit for rows being compacted in memory. Larger rows spill to disk and use a slower two-pass compaction process. Generally, a value of 5 to 10% of available Java heap size is reasonable. A message is logged specifying the row key.

rpc_timeout_in_ms

The time that a node will wait on a reply from other nodes before the command is failed.

phi_convict_threshold

The Phi Failure Accrual Detector value that must be reached before a node is marked as down.

Usually, the default value of 8 is fine. In environments with flaky networks (such as Amazon EC2, at times), this may need to be increased to 9 or 10 to help prevent a node being erroneously marked down.

endpoint_snitch

Sets the snitch to use for locating nodes and routing requests. In deployments with rack-aware placement strategies, use either RackInferringSnitch or PropertyFileSnitch. See Snitches for more details.

dynamic_snitch

When set to true (default), enables the dynamic snitch layer that monitors read latency and, when possible, routes requests away from poorly-performing nodes.

dynamic_snitch_update_interval_in_ms

Sets the interval to calculate read latency.

dynamic_snitch_reset_interval_in_ms

Sets the interval to reset all host scores and allow a bad node to recover

dynamic_snitch_badness_threshold

Sets a performance threshold for dynamically routing requests away from a node. A value of 0.2 means Cassandra would continue to prefer the static snitch values until the host was 20% worse than the fastest host.

request_scheduler

Can be set to a class that implements RequestScheduler, which will schedule incoming client requests according to the specific policy. This is useful for multi-tenancy (multiple keyspaces) with a single Cassandra cluster. This scheduler affects only requests from the client, not inter-node communication.

By default this is set to org.apache.cassandra.scheduler.NoScheduler. Also available is org.apache.cassandra.scheduler.RoundRobinScheduler.

request_scheduler_id

Specifies an identifer used in performing request scheduling. Currently, the only valid option is keyspace.

index_interval

The Index Interval determines how large the sampling of row keys is for a given SSTable. The larger the sampling, the more effective the index is at a cost in space.

Keyspace Required Elements

A keyspace has a user-defined name, a replication factor and a replica placement strategy.

Though it is possible to configure keyspaces in cassandra.yaml, the preferred method is to configure them dynamically through the API and the cassandra-cli utility. The elements in theis section are required for a valid keyspace.

replica_placement_strategy

Defines how replicas are placed on physical hardware.

The default org.apache.cassandra.locator.SimpleStrategy places the first replica at the node whose token is closest to the key (as determined by the Partitioner), and additional replicas on subsequent nodes along the ring in increasing Token order.

With NetworkTopologyStrategy, for each datacenter you can specify how many replicas you want on a per-keyspace basis. Replicas are placed on different racks within each datacenter, if possible. This strategy requires a rack aware snitch, such as RackInferringSnitch or PropertyFileSnitch.

OldNetworkTopologyStrategy places one replica in each of two datacenters, and the third on a different rack in the first. Additional datacenters are not guaranteed to get a replica. Additional replicas after three are placed in ring order after the third without regard to rack or datacenter.

See the section on Replication for more information.

replication_factor

The number of copies of data to keep in the cluster. The default of one does not mean “make one copy” it means that there is only one copy. Thus to have three way redundancy on data, the ReplicationFactor should be three.

Keyspace Optional Elements

These elements are optional for a keyspace.

column_families

A keyspace does not strictly require column familes in order to exist as a valid keyspace. In this sense, column familes are optional. In a practical sense, most useful keyspaces will contain column familes, whose elements are detailed below.

strategy_options

Values provided for strategy_options are used with the NetworkTopologyStrategy replica placement strategy for defining how many replicas to place in each datacenter.

To define strategy options, use the Cassandra CLI, noting that the command syntax for them is slightly different from other attributes. Note the use of brackets and curly brackets in this example for two data centers:

[default@twissandra] update keyspace twissandra with strategy_options=[{DC1:3,  DC2:2}];
9729e5e9-654c-11e0-bd77-ce01f0565ccc
Waiting for schema agreement...
... schemas agree across the cluster

ColumnFamily Required Elements

Column families contain rows and columns, and are roughly analogous to tables in relational model.

These elements are required for a valid column family.

name

Every column family must have a name.

compare_with

This attribute defines the sort algorithm which will be used to compare columns. Users may customize this behavior by extending org.apache.cassandra.db.marshal.AbstractType. The different values available for CompareWith are detailed below:

Type Description
BytesType Simple non-validating byte comparison (Default)
AsciiType Similar to BytesType, but validates that input is US-ASCII
UTF8Type UTF-8 encoded string comparison
LongType Compares values as 64 bit longs
LexicalUUIDType 128 bit UUID compared by byte value
TimeUUIDType Timestamp compared 128 bit version 1 UUID

column_type

Defaults to “Standard” for regular columns. For super columns <supercolumns>, use “Super”.

compare_subcolumns_with

Required when column_type is “Super”. Same as compare_with but for sub-columns of a SuperColumn.

For attributes of columns, see column_metadata.

ColumnFamily Optional Elements

The following elements are optional for column families.

keys_cached

Defines how many key locations will be kept in memory per SSTable (see rows_cached for details on caching actual row values). This can be a fixed size number, a percentage, or a fraction. To specify a percentage or fraction, use “%50” or “0.5” respectively.

rows_cached

Specifies how many rows to cache in memory. Using RowsCached means that the whole row is cached in memory. This can be actually be detrimental to performance in cases where rows are large or frequently modified or removed. The same syntax rules for defining keys_cached apply here.

comment

A human readable comment for a column family.

read_repair_chance

Specifies the probability with which read repairs should be invoked on non-quorum reads. Must be between 0 and 1. Defaults to 1.0 (always perform read repair). If the system is performing more reads than writes, lowering this value may improve throughput.

gc_grace_seconds

Specifies the time to wait before garbage collecting tombstones (deletion markers). Defaults to 864000, or 10 days, which allows a great deal of time for consistency to be achieved prior to deletion. In many deployments this interval can be reduced, and in a single-node cluster it can be safely set to zero.

default_validation_class

Specifies a validator class to use for validating all the column values in the column family. Valid values are AsciiType, BytesType, IntegerType, LexicalUUIDType, LongType, TimeUUIDTYpe, and UTF8Type. It is possible to implement additional validators by creating custom validation classes.

min_compaction_threshold

Sets the minimum number of SSTables to trigger a minor compaction. Raising this value causes minor compactions to start less frequently and be more intensive. Setting this to 0 disables minor compactions. Defaults to 4.

max_compaction_threshold

Sets the number of SSTables allowed before a minor compaction is forced. Decreasing this will cause minor compactions to start more frequently and be less intensive. Setting this to 0 disables minor compactions. Defaults to 32.

row_cache_save_period_in_seconds

Sets the number of seconds between saving row caches. The row caches can be saved periodically, and if one exists on startup it will be loaded.

key_cache_save_period_in_seconds

Sets number of seconds between saving key caches. The key caches can be saved periodically and if one exists on startup it will be loaded.

memtable_flush_after_mins

Flush a memtable after this many minutes regardless of other memtable settings. This setting cannot be too large, as unflushed column families cannot have their commit log segments deleted. Setting this too low could trigger too many flushes that would greatly impact I/O performance.

memtable_throughput_in_mb

Flush a memtable after this much data has been inserted or updated. Actual heap usage will be greater than this due to overhead from column indexing. This setting must be tuned carefully, as there is one memtable per column family.

memtable_operations_in_millions

Like memtable_throughput_in_mb this is per-memtable, but here we define the total number of columns in millions that will be kept in memory regardless of data size. This should be tuned in conjunction with MemtableThroughputInMB as the first one triggered will cause a memtable flush.

column_metadata

Column metadata defines attributes of the column. Values for name and validator are required, though the default validator for the column family is used if no validator is specified. Note that the optional index_name and index_type must be set together to successfully create a secondary index for a column.

Name Description
name Binds a validator and (optionally) an indexer to a column.
validator Abstract type (like compare_with) to check the column value.
index_name Name for the secondary index.
index_type Type of index. Currently the only valid value is KEYS.

Setting and updating column metadata with the Cassandra CLI requires a slightly different command syntax than other attributes; note the brackets in this example:

[default@demo] update column family users with comparator=UTF8Type
... and column_metadata=[{column_name: full_name, validation_class: UTF8Type, index_type: KEYS}];