| The Cassandra 1.2 documentation is transitioning to a new format! Please use the new Cassandra 1.2 documentation instead. | Back to Table of Contents All Documents List |
When planning a Cassandra cluster deployment, you should have a good idea of the initial volume of data you plan to store and a good estimate of your typical application workload. After reading this section, it recommended that you read Anti-patterns in Cassandra.
As with any application, choosing appropriate hardware depends on selecting the right balance of the following resources: memory, CPU, disks, number of nodes, and network.
The more memory a Cassandra node has, the better read performance. More RAM allows for larger cache sizes and reduces disk I/O for reads. More RAM also allows memory tables (memtables) to hold more recently written data. Larger memtables lead to a fewer number of SSTables being flushed to disk and fewer files to scan during a read. The ideal amount of RAM depends on the anticipated size of your hot data.
Insert-heavy workloads are CPU-bound in Cassandra before becoming memory-bound. Cassandra is highly concurrent and uses as many CPU cores as available:
Disk space depends a lot on usage, so it's important to understand the mechanism. Cassandra writes data to disk when appending data to the commit log for durability and when flushing memtables to SSTable data files for persistent storage. SSTables are periodically compacted. Compaction improves performance by merging and rewriting data and discarding old data. However, depending on the type of compaction_strategy and size of the compactions, compaction can substantially increase disk utilization and data directory volume. For this reason, you should leave an adequate amount of free disk space available on a node: 50% (worst case) for SizeTieredCompactionStrategy and large compactions, and 10% for LeveledCompactionStrategy. The following links provide information about compaction:
For information on calculating disk size, see Calculating usable disk capacity and Choosing node configuration options.
Recommendations:
Capacity and I/O: When choosing disks, consider both capacity (how much data you plan to store) and I/O (the write/read throughput rate). Some workloads are best served by using less expensive SATA disks and scaling disk capacity and I/O by adding more nodes (with more RAM).
Solid-state drives: SSDs are the recommended choice for Cassandra. Cassandra's sequential, streaming write patterns minimize the undesirable effects of write amplification associated with SSDs. This means that Cassandra deployments can take advantage of inexpensive consumer-grade SSDs. Enterprise level SSDs are not necessary because Cassandra's SSD access wears out consumer-grade SSDs in the same time frame as more expensive enterprise SSDs.
Number of disks - SATA: Ideally Cassandra needs at least two disks, one for the commit log and the other for the data directories. At a minimum the commit log should be on its own partition.
Commit log disk - SATA: The disk not need to be large, but it should be fast enough to receive all of your writes as appends (sequential I/O).
Data disks: Use one or more disks and make sure they are large enough for the data volume and fast enough to both satisfy reads that are not cached in memory and to keep up with compaction.
RAID on data disks: It is generally not necessary to use RAID for the following reasons:
RAID on the commit log disk: Generally RAID is not needed for the commit log disk. Replication adequately prevents data loss. If you need the extra redundancy, use RAID 1.
Extended file systems: DataStax recommends deploying Cassandra on XFS. On ext2 or ext3, the maximum file size is 2TB even using a 64-bit kernel. On ext4 it is 16TB.
Because Cassandra can use almost half your disk space for a single file, use XFS when using large disks, particularly if using a 32-bit kernel. XFS file size limits are 16TB max on a 32-bit kernel, and essentially unlimited on 64-bit.
Prior to version 1.2, the recommended size of disk space per node was 300 to 500GB. Improvement to Cassandra 1.2, such as JBOD support, virtual nodes, off-heap Bloom filters, and parallel leveled compaction (SSD nodes only), allow you to use few machines with multiple terabytes of disk space.
Since Cassandra is a distributed data store, it puts load on the network to handle read/write requests and replication of data across nodes. Be sure to choose reliable, redundant network interfaces and make sure that your network can handle traffic between nodes without bottlenecksT.
Cassandra efficiently routes requests to replicas that are geographically closest to the coordinator node and chooses a replica in the same rack if possible; it always chooses replicas located in the same data center over replicas in a remote data center.
If using a firewall, make sure that nodes within a cluster can reach each other. See Configuring firewall port access.
Generally, when you have firewalls between machines, it is difficult to run JMX across a network and maintain security. This is because JMX connects on port 7199, handshakes, and then uses any port within the 1024+ range. Instead use SSH to execute commands remotely connect to JMX locally or use the DataStax OpsCenter.
DataStax provides an Amazon Machine Image (AMI) to allow you to quickly deploy a multi-node Cassandra cluster on Amazon EC2. The DataStax AMI initializes all nodes in one availability zone using the SimpleSnitch.
If you want an EC2 cluster that spans multiple regions and availability zones, do not use the DataStax AMI. Instead, install Cassandra on your EC2 instances as described in Installing Cassandra Debian packages, and then configure the cluster as a multiple data center cluster.
Use the following guidelines when setting up your cluster:
For production Cassandra clusters on EC2, use Large or Extra Large instances with local storage.
Amazon Web Service recently reduced the number of default ephemeral disks attached to the image from four to two. Performance will be slower for new nodes unless you manually attach the additional two disks; see Amazon EC2 Instance Store.
RAID 0 the ephemeral disks, and put both the data directory and the commit log on that volume. This has proved to be better in practice than putting the commit log on the root volume (which is also a shared resource). For more data redundancy, consider deploying your Cassandra cluster across multiple availability zones or using EBS volumes to store your Cassandra backup files.
Cassandra JBOD support allows you to use standard disks, but you may get better throughput with RAID0. RAID0 splits every block to be on another device so that writes are written in parallel fashion instead of written serially on disk.
EBS volumes are not recommended for Cassandra data volumes for the following reasons:
For more information and graphs related to ephemeral versus EBS performance, see the blog article at http://blog.scalyr.com/2012/10/16/a-systematic-look-at-ec2-io/.
To calculate how much data your Cassandra nodes can hold, calculate the usable disk capacity per node and then multiply that by the number of nodes in your cluster. Remember that in a production cluster, you will typically have your commit log and data directories on different disks.
Start with the raw capacity of the physical disks:
raw_capacity = disk_size * number_of_data_disks
Account for file system formatting overhead (roughly 10 percent):
(raw_capacity * 0.9) = formatted_disk_space
During normal operations, Cassandra routinely requires disk capacity for compaction and repair operations. For optimal performance and cluster health, DataStax recommends not filling your disks to capacity, but running at 50% to 80% capacity depending on the compaction_strategy and size of the compactions. With this in mind, calculate the usable disk space as follows:
formatted_disk_space * (0.5 to 0.8) = usable_disk_space
As with all data storage systems, the size of your raw data will be larger once it is loaded into Cassandra due to storage overhead. On average, raw data is about two times larger on disk after it is loaded into the database, but could be much smaller or larger depending on the characteristics of your data and tables. The following calculations account for data persisted to disk, not for data stored in memory.
Column Overhead: Every column in Cassandra incurs 15 bytes of overhead. Since each row in a table can have different column names as well as differing numbers of columns, metadata is stored for each column. For counter columns and expiring columns, add an additional 8 bytes (23 bytes total). So the total size of a regular column is:
regular_total_column_size = column_name_size + column_value_size + 15
counter-expiring_total_column_size = column_name_size + column_value_size + 23
Row Overhead: Every row in Cassandra incurs 23 bytes of overhead.
Primary Key Index: Every table also maintains a primary index of its row keys. Sizing of the primary row key index can be estimated as follows (in bytes):
primary_key_index = number_of_rows * (32 + average_key_size)
Replication Overhead: The replication factor plays a role in how much disk capacity is used. For a replication factor of 1, there is no overhead for replicas (as only one copy of data is stored in the cluster). If replication factor is greater than 1, then your total data storage requirement will include replication overhead.
replication_overhead = total_data_size * (replication_factor - 1)
A major part of planning your Cassandra cluster deployment is understanding and setting the various node configuration properties. The properties listed in this section are set in the cassandra.yaml configuration file. Each node must be correctly configured before starting it for the first time: