Selecting hardware for enterprise implementations
Choosing appropriate hardware depends on selecting the right balance of the following resources: memory, CPU, disks, number of nodes, and network.
The more memory a Cassandra node has, the better read performance. More RAM allows for larger cache sizes and reduces disk I/O for reads. More RAM also allows memory tables (memtables) to hold more recently written data. Larger memtables lead to a fewer number of SSTables being flushed to disk and fewer files to scan during a read. The ideal amount of RAM depends on the anticipated size of your hot data.
- For dedicated hardware, the optimal price-performance sweet spot is 16GB to 64GB; the minimum is 8GB.
- For a virtual environments, the optimal range may be 8GB to 16GB; the minimum is 4GB.
- For testing light workloads, Cassandra can run on a virtual machine as small as 256MB.
- For setting Java heap space, see Tuning Java resources.
Insert-heavy workloads are CPU-bound in Cassandra before becoming memory-bound. (All writes go to the commit log, but Cassandra is so efficient in writing that the CPU is the limiting factor.) Cassandra is highly concurrent and uses as many CPU cores as available:
- For dedicated hardware, 8-core CPU processors are the current price-performance sweet spot.
- For virtual environments, consider using a provider that allows CPU bursting, such as Rackspace Cloud Servers.
Disk space depends a lot on usage, so it's important to understand the mechanism. Cassandra writes data to disk when appending data to the commit log for durability and when flushing memtable to SSTable data files for persistent storage. SSTables are periodically compacted. Compaction improves performance by merging and rewriting data and discarding old data. However, depending on the compaction strategy and size of the compactions, compaction can substantially increase disk utilization and data directory volume. For this reason, you should leave an adequate amount of free disk space available on a node: 50% (worst case) for SizeTieredCompactionStrategy and large compactions, and 10% for LeveledCompactionStrategy. The following links provide information about compaction:
- Configuring compaction
- The Apache Cassandra storage engine
- Leveled Compaction in Apache Cassandra
- When to Use Leveled Compaction
For information on calculating disk size, see Calculating usable disk capacity.
- Capacity per node
- Ideal capacity for Cassandra 1.2 and later is 3-5TB per node. For Cassandra 1.1, it is 500-800GB per node.
- Capacity and I/O
- When choosing disks, consider both capacity (how much data you plan to store) and I/O (the write/read throughput rate). Some workloads are best served by using less expensive SATA disks and scaling disk capacity and I/O by adding more nodes (with more RAM).
- Solid-state drives
- SSDs are recommended for Cassandra. The NAND Flash chips that power SSDs provide
extremely low-latency response times for random reads while supplying ample sequential
write performance for compaction operations. A large variety of SSDs are available on
the market from server vendors and third-party drive manufacturers. DataStax customers
that need help in determining the most cost-effective option for a given deployment and
workload, should contact their Solutions Engineer or Architect.Note: For SSDs it is recommended that both commit logs and SSTables are on the same mount point.
- Number of disks - SATA
- Ideally Cassandra needs at least two disks, one for the commit log and the other for the data directories. At a minimum the commit log should be on its own partition.
- Commit log disk - SATA
- The disk not need to be large, but it should be fast enough to receive all of your writes as appends (sequential I/O).
- Data disks
- Use one or more disks and make sure they are large enough for the data volume and fast enough to both satisfy reads that are not cached in memory and to keep up with compaction.
- RAID on data disks
- It is generally not necessary to use RAID for the following reasons:
- Data is replicated across the cluster based on the replication factor you've chosen.
- Starting in version 1.2, Cassandra includes a JBOD (Just a bunch of disks) feature to take care of disk management. Because Cassandra properly reacts to a disk failure either by stopping the affected node or by blacklisting the failed drive, you can deploy Cassandra nodes with large disk arrays without the overhead of RAID 10. You can configure Cassandra to stop the affected node or blacklist the drive according to your availability/consistency requirements.
- RAID on the commit log disk
- Generally RAID is not needed for the commit log disk. Replication adequately prevents data loss. If you need the extra redundancy, use RAID 1.
- Extended file systems
- DataStax recommends deploying Cassandra on XFS. On ext2 or ext3, the maximum file size
is 2TB even using a 64-bit kernel. On ext4 it is 16TB.
Because Cassandra can use almost half your disk space for a single file, use XFS when using large disks, particularly if using a 32-bit kernel. XFS file size limits are 16TB max on a 32-bit kernel, and essentially unlimited on 64-bit.
Number of nodes
Prior to version 1.2, the recommended size of disk space per node was 300 to 500GB. Improvement to Cassandra 1.2, such as JBOD support, virtual nodes (vnodes), off-heap Bloom filters, and parallel leveled compaction (SSD nodes only), allow you to use few machines with multiple terabytes of disk space.
Since Cassandra is a distributed data store, it puts load on the network to handle read/write requests and replication of data across nodes. Be sure that your network can handle traffic between nodes without bottlenecks. You should bind your interfaces to separate Network Interface Cards (NIC). You can use public or private depending on your requirements.
- Recommended bandwidth is 1000 Mbit/s (gigabit) or greater.
- Thrift/native protocols use the rpc_address.
- Cassandra's internal storage protocol uses the listen_address.
Cassandra efficiently routes requests to replicas that are geographically closest to the coordinator node and chooses a replica in the same rack if possible; it always chooses replicas located in the same data center over replicas in a remote data center.
If using a firewall, make sure that nodes within a cluster can reach each other. See Configuring firewall port access.
Generally, when you have firewalls between machines, it is difficult to run JMX across a network and maintain security. This is because JMX connects on port 7199, handshakes, and then uses any port within the 1024+ range. Instead use SSH to execute commands remotely connect to JMX locally or use the DataStax OpsCenter.