Installing a DataStax Enterprise cluster on Amazon EC2
Installing a DataStax Enterprise cluster on Amazon EC2.
This is a step-by-step guide to using the Amazon Web Services EC2 Management Console to set up a DataStax Enterprise (DSE) cluster using the DataStax AMI (Amazon Machine Image). Installing via the AMI allows you to quickly deploy a cluster with a pre-configured mixed workload. When you launch the AMI, you can specify the total number of nodes in your cluster and how many nodes should be Real-Time/Transactional (Cassandra), Analytics (Hadoop), or Search (Solr).
You can also launch a single node using the DataStax AMI and then create the cluster from OpsCenter.
The DataStax AMI does the following:
- Installs the latest version of DataStax Enterprise with an Ubuntu 12.04 LTS (Precise Pangolin), image (Ubuntu Cloud 20140227 release), Kernel 3.8+.
- Installs Oracle Java 7.
- Install metrics tools such as dstat, ethtool, make, gcc, and s3cmd.
- Uses RAID0 ephemeral disks for data storage and commit logs.
- Choice of PV (Para-virtualization) or HVM (Hardware-assisted Virtual Machine) instance types.
- Launches EBS-backed instances for faster start-up, not database storage.
- Uses the private interface for intra-cluster communication.
- Starts the nodes in the specified mode (Real-time, Analytics, or Search).
- Sets the seed nodes cluster-wide.
- Installs the DataStax OpsCenter on the first node in the cluster (by default).
EC2 clusters spanning multiple regions and availability zones¶
The DataStax AMI is intended for a single region and availability zone. When creating an EC2 cluster that spans multiple regions and availability zones, use OpsCenter to set up your cluster. You can use any of the supported platforms. It is best practice to use the same platform on all nodes. If your cluster was instantiated using the DataStax AMI, use Ubuntu for the additional nodes. The following topics describe OpsCenter provisioning:
For production Cassandra clusters on EC2, use Large or Extra Large instances with local storage. RAID0 the ephemeral disks, and put both the data directory and the commit log on that volume. This has proved to be better in practice than putting the commit log on the root volume (which is also a shared resource). For more data redundancy, consider deploying your Cassandra cluster across multiple availability zones or using OpsCenter to backup to S3. Also see Production deployment planning.