Apache Cassandra 1.2 Documentation

Installing a Cassandra cluster on Amazon EC2

This is a step-by-step guide to using the Amazon Web Services EC2 Management Console to set up a simple Cassandra cluster using the DataStax Community Edition AMI (Amazon Machine Image). Installing via the AMI allows you to quickly deploy a Cassandra cluster within a single availability zone. When you launch the AMI, you can specify the total number of nodes in your cluster.

The DataStax Cassandra AMI does the following:

  • Launches the newest stable release of DataStax Community Edition.
  • Installs Cassandra on an Ubuntu 12.04 LTS (Precise Pangolin) image (Ubuntu Cloud 20121218 release).
  • Uses RAID0 ephemeral disks for data storage and commit logs.
  • Uses the private interface for intra-cluster communication.
  • Configures a Cassandra cluster using the RandomPartitioner.
  • Configures the Cassandra replication strategy using the EC2Snitch.
  • Sets the seed node cluster-wide.
  • Starts Cassandra on all the nodes.
  • Installs DataStax OpsCenter on the first node in the cluster (by default).

If you want an EC2 cluster that spans multiple regions and availability zones, do not use the DataStax AMI. Instead, install Cassandra on your EC2 instances as described in Installing Cassandra Debian packages, and then configure the cluster as a multiple data center cluster.

Production considerations

For production Cassandra clusters on EC2, use Large or Extra Large instances with local storage. RAID0 the ephemeral disks, and put both the data directory and the commit log on that volume. This has proved to be better in practice than putting the commit log on the root volume (which is also a shared resource). For more data redundancy, consider deploying your Cassandra cluster across multiple availability zones or using EBS volumes to store your Cassandra backup files.

Creating an EC2 security group for DataStax Community Edition

An EC2 Security Group acts as a firewall that allows you to choose which protocols and ports are open in your cluster. You can specify the protocols and ports either by a range of IP addresses or by security group. The default EC2 security group opens all ports and protocols only to computers that are members of the default group. This means you must define a security group for your Cassandra cluster. Be aware that specifying a Source IP of 0.0.0.0/0 allows every IP address access by the specified protocol and port range.

  1. In your Amazon EC2 Console Dashboard, select Security Groups in the Network & Security section.

  2. Click Create Security Group. Fill out the name and description and then click Yes, Create.


    ../../_images/1ami_securitygroup.png
  3. Click the Inbound tab and add rules for the ports listed in the table below:

    • Create a new rule: Custom TCP rule.
    • Port range: See table.
    • Source: See table. To create rules that open a port to other nodes in the same security group, use the Group ID listed in the Group Details tab.

Port Description
Public Facing Ports
22 SSH port.
8888 OpsCenter website port.
Cassandra Inter-node Ports
1024+ JMX reconnection/loopback ports. See description for port 7199.
7000 Cassandra inter-node cluster communication.
7199 Cassandra JMX monitoring port. After the initial handshake, the JMX protocol requires that the client reconnects on a randomly chosen port (1024+).
9160 Cassandra client port (Thrift).
OpsCenter ports
61620 OpsCenter monitoring port. The opscenterd daemon listens on this port for TCP traffic coming from the agent.
61621 OpsCenter agent port. The agents listen on this port for SSL traffic initiated by OpsCenter.

Note

Generally, when you have firewalls between machines, it is difficult to run JMX across a network and maintain security. This is because JMX connects on port 7199, handshakes, and then uses any port within the 1024+ range. Instead use SSH to execute commands remotely connect to JMX locally or use the DataStax OpsCenter.

  1. After you are done adding the above port rules, click Apply Rule Changes. Your completed port rules should look similar to this:


    ../../_images/2ami_securityports_dsc.png

Warning

This security configuration shown in the above example opens up all externally accessible ports to incoming traffic from any IP address (0.0.0.0/0). The risk of data loss is high. If you desire a more secure configuration, see the Amazon EC2 help on Security Groups.

Launching the DataStax Community AMI

After you have created your security group, you are ready to launch an instance of Cassandra using the DataStax AMI.

  1. Right-click the following link to open the DataStax Amazon Machine Image page in a new window:

    https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2

  2. Click Launch AMI, then select the region where you want to launch the AMI.


    ../../_images/ami_launch.png
  3. On the Request Instances Wizard page, verify the settings and then click Continue.

  4. On the Instance Details page, enter the total number of nodes that you want in your cluster, select the Instance Type, and then click Continue.

    Use the following guidelines when selecting the type of instance:

    • Extra large for production.
    • Large for development and light production.
    • Small and Medium not supported.

../../_images/3ami_num_instances.png

Note

EBS volumes are not recommended. In Cassandra data volumes, EBS throughput may fail in a saturated network link, I/O may be exceptionally slow, and adding capacity by increasing the number of EBS volumes per host does not scale. For more information and graphs related to ephemeral versus EBS performance, see the blog article at http://blog.scalyr.com/2012/10/16/a-systematic-look-at-ec2-io/.

  1. On the next page, under Advanced Instance Options, add the following options to the User Data section according to the type of cluster you want, and then click Continue.

    For new clusters the available options are:

    Option

    Description

    --clustername <name>

    Required. The name of the cluster.

    --totalnodes <#_nodes>

    Required. The total number of nodes in the cluster.

    --version community

    Required. The version of the cluster. Use community to install the latest version of DataStax Community.

    --opscenter [no]

    Optional. By default, DataStax OpsCenter is installed on the first instance. Specify no to disable.

    --reflector <url>

    Optional. Allows you to use your own reflector. Default: http://reflector2.datastax.com/reflector2.php

    For example, --clustername myDSCcluster --totalnodes 6 --version community


    ../../_images/4ami_cassandra_options_dsc.png
  2. On the Storage Device Configuration page, you can add ephemeral drives if needed.

    Note

    Amazon Web Service recently reduced the number of default ephemeral disks attached to the image from four to two. Performance will be slower for new nodes unless you manually attach the additional two disks; see Amazon EC2 Instance Store.

  3. On the Tags page, give a name to your DataStax Community instance, such as cassandra-node, and then click Continue.

  4. On the Create Key Pair page, create a new key pair or select an existing key pair, and then click Continue. Save this key (.pem file) to your local machine; you will need it to log in to your DataStax Community instance.

  5. On the Configure Firewall page, select the security group that you created earlier and click Continue.

  6. On the Review page, review your cluster configuration and then click Launch.

  7. Close the Launch Install Wizard and go to the My Instances page to see the status of your Cassandra instance. Once a node has a status of running, you can connect to it.

Connecting to your DataStax Community EC2 instance

You can connect to your new Datastax Community EC2 instance using any SSH client, such as PuTTY or from a Terminal. To connect, you will need the private key (.pem file you created earlier and the public DNS name of a node.

Connect as user ubuntu rather than as root.

If this is the first time you are connecting, copy your private key file (<keyname>.pem) you downloaded earlier to your home directory, and change the permissions so it is not publicly viewable. For example:

chmod 400 datastax-key.pem
  1. From the My Instances page in your AWS EC2 Dashboard, select the node that you want to connect to.

    Because all nodes are peers in Cassandra, you can connect using any node in the cluster. However, the first node generally runs OpsCenter and is the Cassandra seed node.


    ../../_images/5ami_connect_dsc.png
  2. To get the public DNS name of a node, select Instance Actions > Connect.

  3. In the Connect Help - Secure Shell (SSH) page, copy the command line and change the connection user from root to ubuntu, then paste it into your SSH client.


    ../../_images/6ami_connect_dsc.png
  4. The AMI image configures your cluster and starts the Cassandra services. After you have logged into a node, run the nodetool status command to make sure your cluster is running. For more information, see the nodetool utility.


../../_images/nodetool_status_ami.png
  1. If you installed the OpsCenter with your Cassandra cluster, allow about 60 to 90 seconds after the cluster has finished initializing for OpsCenter to start. You can launch OpsCenter using the URL: http://<public-dns-of-first-instance>:8888.


    ../../_images/ami_cassandra_instance.png
  2. After the OpsCenter loads, you must install the OpsCenter agents to see the cluster performance data.

    1. Click the Fix link located near the top of the Dashboard in the left navigation pane to install the agents.


      ../../_images/agent_initial_fix.png
    2. When prompted for credentials for the agent nodes, use the username ubuntu and copy and paste the entire contents from your private key (.pem) file that you downloaded earlier.


    ../../_images/ami_opscenter.png