Apache Cassandra 1.1 Documentation

cassandra-stress

This document corresponds to an earlier product version. Make sure you are using the version that corresponds to your version.

Latest Cassandra documentation | Earlier Cassandra documentation

The cassandra-stress tool is a Java-based stress testing utility for benchmarking and load testing a Cassandra cluster. The binary installation of the tool also includes a daemon, which in larger-scale testing can prevent potential skews in the test results by keeping the JVM warm.

There are different modes of operation:

  • Inserting: Loads test data.
  • Reading: Reads test data.
  • Indexed range slicing: Works with RandomParititioner on indexed column families.

You can use these modes with or without the cassandra-stressd daemon running (binary installs only).

Usage

  • Packaged installs: cassandra-stress [options]
  • Binary installs: <install_location>/tools/bin/cassandra-stress [options]

The available options are:

Long Option

Short Option

Description

--average-size-values

-V

Generate column values of average rather than specific size.

--cardinality <CARDINALITY>

-C <CARDINALITY>

Number of unique values stored in columns. Default is 50.

--columns <COLUMNS>

-c <COLUMNS>

Number of columns per key. Default is 5.

--column-size <COLUMN-SIZE>

-S <COLUMN-SIZE>

Size of column values in bytes. Default is 34.

--compaction-strategy <COMPACTION-STRATEGY>

-Z <COMPACTION-STRATEGY>

Specifies which compaction strategy to use.

--comparator <COMPARATOR>

-U <COMPARATOR>

Specifies which column comparator to use. Supported types are: TimeUUIDType, AsciiType, and UTF8Type.

--compression <COMPRESSION>

-I <COMPRESSION>

Specifies the compression to use for SSTables. Default is no compression.

--consistency-level <CONSISTENCY-LEVEL>

-e <CONSISTENCY-LEVEL>

Consistency level to use (ONE, QUORUM, LOCAL_QUORUM, EACH_QUORUM, ALL, ANY). Default is ONE.

--create-index <CREATE-INDEX>

-x <CREATE-INDEX>

Type of index to create on column families (KEYS).

--enable-cql

-L

Perform queries using CQL (Cassandra Query Language).
--family-type <TYPE> -y <TYPE> Sets the column family type.

--file <FILE>

-f <FILE>

Write output to a given file.

--help

-h

Show help.

--keep-going

-k

Ignore errors when inserting or reading. When set, --keep-trying has no effect. Default is false.

--keep-trying <KEEP-TRYING>

-K <KEEP-TRYING>

Retry on-going operation N times (in case of failure). Use a positive integer. The default is 10.

--keys-per-call <KEYS-PER-CALL>

-g <KEYS-PER-CALL>

Number of keys to per call. Default is 1000.

--nodes <NODES>

-d <NODES>

Nodes to perform the test against. Must be comma separated with no spaces. Default is localhost.

--nodesfile <NODESFILE>

-D <NODESFILE>

File containing host nodes (one per line).

--no-replicate-on-write

-W

Set replicate_on_write to false for counters. Only for counters with a consistency level of ONE (CL=ONE).

--num-different-keys <NUM-DIFFERENT-KEYS>

-F <NUM-DIFFERENT-KEYS>

Number of different keys. If less than NUM-KEYS, the same key is re-used multiple times. Default is NUM-KEYS.

--num-keys <NUMKEYS>

-n <NUMKEYS>

Number of keys to write or read. Default is 1,000,000.

--operation <OPERATION>

-o <OPERATION>

Operation to perform: INSERT, READ, INDEXED_RANGE_SLICE, MULTI_GET, COUNTER_ADD, COUNTER_GET. Default is INSERT.

--port <PORT>

-p <PORT>

Thrift port. Default is 9160.

--progress-interval <PROGRESS-INTERVAL>

-i <PROGRESS-INTERVAL>

The interval, in seconds, at which progress is output. Default is 10 seconds.

--query-names <QUERY-NAMES>

-Q <QUERY-NAMES>

Comma-separated list of column names to retrieve from each row.

--random

-r

Use random key generator. When used --stdev has no effect. Default is false.

--replication-factor <REPLICATION-FACTOR>>

-l <REPLICATION-FACTOR>

Replication Factor to use when creating column families. Default is 1.

--replication-strategy <REPLICATION-STRATEGY>

-R <REPLICATION-STRATEGY>

Replication strategy to use (only on insert and when a keyspace does not exist.) Default is: SimpleStrategy.

--send-to <SEND-TO>

-T <SEND-TO>

Sends the command as a request to the cassandra-stressd daemon at the specified IP address. The daemon must already be running at that address.

--skip-keys <SKIP-KEYS>

-N <SKIP-KEYS>

Fraction of keys to skip initially. Default is 0.

--stdev <STDEV>

-s <STDEV>

Standard deviation. Default is 0.1.

--strategy-properties <STRATEGY-PROPERTIES>

--O <STRATEGY-PROPERTIES>

Replication strategy properties in the following format: <dc_name>:<num>,<dc_name>:<num>,... For use with NetworkTopologyStrategy.

--threads <THREADS>

-t <THREADS>

Number of threads to use. Default is 50.

--unframed

-m

Use unframed transport. Default is false.

--use-prepared-statements

-P

(CQL only) Perform queries using prepared statements.

Using the Daemon Mode

Usage for the daemon mode in binary installs:

<install_location>/tools/bin/cassandra-stressd start|stop|status [-h <host>]

During stress testing, you can keep the daemon running and send it commands through it using the --send-to option.

Examples

  • Inserts 1,000,000 rows to given host:

    /tools/bin/cassandra-stress -d 192.168.1.101

    When the number of rows is not specified, one million rows are inserted.

  • Read 1,000,000 rows from given host:

    tools/bin/cassandra-stress -d 192.168.1.101 -o read

  • Insert 10,000,000 rows across two nodes:

    /tools/bin/cassandra-stress -d 192.168.1.101,192.168.1.102 -n 10000000

  • Insert 10,000,000 rows across two nodes using the daemon mode:

    /tools/bin/cassandra-stress -d 192.168.1.101,192.168.1.102 -n 10000000 --send-to 54.0.0.1

Interpreting the output of cassandra-stress

The cassandra-stress tool periodically outputs information about the running tests. For example:

7251,725,725,56.1,95.1,191.8,10
19523,1227,1227,41.6,86.1,189.1,21
41348,2182,2182,22.5,75.7,176.0,31
...

Each line reports data for the interval between the last elapsed time and current elapsed time, which is set by the --progress-interval option (default 10 seconds). The following explains this information:

  • total: the total number of operations since the start of the test.
  • interval_op_rate: the number of operations performed during the interval.
  • interval_key_rate: the number of keys/rows read or written during the interval (normally be the same as interval_op_rate unless doing range slices).
  • latency: the average latency for each operation during that interval.
  • elapsed: the number of seconds elapsed since the beginning of the test.