Planning an Amazon EC2 cluster
Before planning an Amazon EC2 cluster, please see the User guide in the Amazon Elastic Compute Cloud Documentation.
DataStax AMI deployments¶
The DataStax AMI is intended only for a single region and availability zone. For an EC2 cluster that spans multiple regions and availability zones, see EC2 clusters spanning multiple regions and availability zones.
Use AMIs from trusted sources¶
Use only AMIs from a trusted source. Random AMI's pose a security risk and may perform slower than expected due to the way the EC2 install is configured. The following are examples of trusted AMIs:
EC2 clusters spanning multiple regions and availability zones¶
Production Cassandra clusters on EC2¶
For production Cassandra clusters on EC2, use these guidelines for choosing the instance types:
- Development and light production: m3.large
- Moderate production: m3.xlarge
- SSD production with light data: c3.2xlarge
- Largest heavy production: m3.2xlarge (PV) or i2.2xlarge (HVM)
EBS volumes are not recommended¶
EBS volumes are not recommended for Cassandra data storage volumes for the following reasons:
- EBS volumes contend directly for network throughput with standard packets. This means that EBS throughput is likely to fail if you saturate a network link.
- EBS volumes have unreliable performance. I/O performance can be exceptionally slow, causing the system to back load reads and writes until the entire cluster becomes unresponsive.
- Adding capacity by increasing the number of EBS volumes per host does not scale. You can easily surpass the ability of the system to keep effective buffer caches and concurrently serve requests for all of the data it is responsible for managing.
For more information and graphs related to ephemeral versus EBS performance, see the blog article Systematic Look at EC2 I/O.
Disk Performance Optimization¶
To ensure high disk performance to mounted drives, it is recommended that you pre-warm your drives by writing once to every drive location before production use. Depending on EC2 conditions, you can get moderate to enormous increases in throughput. See Optimizing Disk Performance in the Amazon Elastic Compute Cloud Documentation.
Storage recommendations for Cassandra 1.2 and later¶
Cassandra 1.2 and later supports JBOD (just a bunch of disks). JBOD excels at tolerating partial failures in a disk array. Configure using the disk_failure_policy in the cassandra.yaml file. Addition information is available in the Handling Disk Failures In Cassandra 1.2 blog.
Storage recommendations for Cassandra 1.1 and earlier¶
RAID 0 the ephemeral disks. Then put both the data directory and the commit log on that volume. This has proved to be better in practice than putting the commit log on the root volume, which is also a shared resource. For more data redundancy, consider deploying your Cassandra cluster across multiple availability zones or using EBS volumes to store your Cassandra backup files.