DataStax Enterprise 4.6

Installing a DataStax Enterprise cluster on Amazon EC2

This is a step-by-step guide to using the Amazon Web Services EC2 Management Console to set up a DataStax Enterprise (DSE) cluster using the DataStax AMI (Amazon Machine Image). Installing via the AMI allows you to quickly deploy a cluster with a pre-configured mixed workload. When you launch the AMI, you can specify the total number of nodes in your cluster and how many nodes should be Real-Time/Transactional (Cassandra), Analytics (Hadoop), or Search (Solr).

You can also launch a single node using the DataStax AMI and then create the cluster from OpsCenter.

Note: Because Amazon changes the EC2 console intermittently, there may be some differences in screens. For details on each step, read the User guide in the Amazon Elastic Compute Cloud Documentation.

For information about upgrading or expanding an existing installation, see Upgrading the DataStax AMI or Expanding a DataStax AMI cluster.

The DataStax AMI does the following:

  • Installs the latest version of DataStax Enterprise with an Ubuntu 12.04 LTS (Precise Pangolin), image (Ubuntu Cloud 20140227 release), Kernel 3.8+.
  • Installs Oracle Java 7.
  • Install metrics tools such as dstat, ethtool, make, gcc, and s3cmd.
  • Uses RAID0 ephemeral disks for data storage and commit logs.
  • Choice of PV (Para-virtualization) or HVM (Hardware-assisted Virtual Machine) instance types.
  • Launches EBS-backed instances for faster start-up, not database storage.
  • Uses the private interface for intra-cluster communication.
  • Starts the nodes in the specified mode (Real-time, Analytics, or Search).
  • Sets the seed nodes cluster-wide.
  • Installs the DataStax OpsCenter on the first node in the cluster (by default).
Note: The DataStax AMI does not install DataStax Enterprise nodes with virtual nodes enabled.

EC2 clusters spanning multiple regions and availability zones

The DataStax AMI is intended for a single region and availability zone. When creating an EC2 cluster that spans multiple regions and availability zones, use OpsCenter to set up your cluster. You can use any of the supported platforms. It is best practice to use the same platform on all nodes. If your cluster was instantiated using the DataStax AMI, use Ubuntu for the additional nodes. The following topics describe OpsCenter provisioning:

Production considerations

For production Cassandra clusters on EC2, use Large or Extra Large instances with local storage. RAID0 the ephemeral disks, and put both the data directory and the commit log on that volume. This has proved to be better in practice than putting the commit log on the root volume (which is also a shared resource). For more data redundancy, consider deploying your Cassandra cluster across multiple availability zones or using OpsCenter to backup to S3. Also see Production deployment planning.

Note: Hadoop and Solr nodes require their own nodes/disks and have specific hardware requirements. See Capacity Planning in the DataStax Enterprise Reference Architecture and the Hadoop and Solr documentation.
Show/hide