DataStax Developer Blog

Configuration changes in Cassandra 1.2

By Jonathan Ellis -  December 5, 2012 | 1 Comment

Cassandra 1.2 brings a number of new and improved configuration options that it is good to be aware of.

Request timeouts

We’ve split the old rpc_timeout_in_ms setting into separate timeouts for [single-row] reads, range scans, writes, truncation, and miscellanea. This allows you more fine-grained control over timeouts; in particular, range queries tend to take longer than others, and truncate requires flushing so it will also be slower.

We’ve left the defaults alone for all of these but truncate, which was extended to 60s. (Incidentally, in 1.2 truncate only needs to flush the table being emptied, not every table in the cluster.)

Improved recovery from request overload

Cassandra deals with request overload by dropping requests that are so behind that they’ve timed out before being processed. Prior to Cassandra 1.2, each replica tracked request timeout locally — that is, it assumed that setting up the request on the coordinator was instantaneous. But if the coordinator is also overloaded, which is often the case, then this is not a good assumption.

For 1.2 we’ve added the ability to do this with the cross_node_timeout option. This is off by default, since it requires your Cassandra cluster’s clocks to be synchronized. If you have ntp enabled or otherwise synchronize your clocks, go ahead and turn cross node timeouts on.

End-to-end encryption

Cassandra has supported SSL between cluster nodes since 0.8. Now we’re extending that to client connections as well. Look for client_encryption_options in cassandra.yaml.

Bloom filters

Cassandra uses bloom filters in its log-structured storage engine to avoid scanning data files that can’t possibly include the partitions being queried.

Bloom filters are configured on a per-table basis, not globally like the above options. Compaction is also configured per-table.

Since leveled compaction does such a good job at minimizing the number of sstables that a given data partition can be spread across, we don’t need to be quite so aggressive with the bloom filters we create. By default, Cassandra 1.2 will use a bloom filter false positive chance of 0.1 for tables using leveled compaction, and 0.01 for tables using size-tiered compaction. This results in memory savings of about 50% for those bloom filters.

Others

We’ve blogged about some other configuration changes in longer articles:



Comments

  1. MK says:

    I need some clarifications regarding the read_request_timeout_in_ms setting.

    According to the client request documentation, the read request can be of two types: External request or Background repair request.

    Q1: Is this timeout imposed on both type of requests and what happens in each case?

    Now, focusing on just the external reads. Again, in the documentation linked above, it says that during the read, a background process is kicked off to maintain consistency.

    Q2: For an external read request, does the timeout include the time taken for the background process?

    I am asking these question because I want to impose a timeout on each read request, but I don’t want it to affect any other background process linked to reads.

    I also posted a question about read request timeout setting in stack overflow: http://stackoverflow.com/questions/19014524/read-request-timeout-in-ms-in-cassandra

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>