|Understanding the architecture / Planning a cluster deployment|
DataStax provides an Amazon Machine Image (AMI) to allow you to quickly deploy a multi-node Cassandra cluster on Amazon EC2.
The DataStax AMI initializes all nodes in one availability zone using the SimpleSnitch.
If you want an EC2 cluster that spans multiple regions and availability zones, do not use the DataStax AMI. Instead, install Cassandra on your EC2 instances as described in Installing Cassandra Debian packages, and then configure the cluster as a multiple data center cluster.
Use the following guidelines when setting up your cluster:
For production Cassandra clusters on EC2, use Large or Extra Large instances with local storage.
Amazon Web Service has reduced the number of default ephemeral disks attached to the image from four to two. Performance will be slower for new nodes unless you manually attach the additional two disks; see Amazon EC2 Instance Store.
RAID 0 the ephemeral disks, and put both the data directory and the commit log on that volume. This has proved to be better in practice than putting the commit log on the root volume (which is also a shared resource). For more data redundancy, consider deploying your Cassandra cluster across multiple availability zones or using EBS volumes to store your Cassandra backup files.
Cassandra JBOD support allows you to use standard disks, but you may get better throughput with RAID0. RAID0 splits every block to be on another device so that writes are written in parallel fashion instead of written serially on disk.
EBS volumes are not recommended for Cassandra data volumes for the following reasons:
For more information and graphs related to ephemeral versus EBS performance, see the blog article Systematic Look at EC2 I/O.