Apache Cassandra 0.8 Documentation

Initializing a Brisk Cluster on Amazon EC2 Using the DataStax AMI

This is a step-by-step guide to using the Amazon Web Services EC2 Management Console to set up a Brisk cluster using the DataStax Brisk AMI (Amazon Machine Image). Installing via the AMI allows you to quickly deploy a Brisk cluster with split workload pre-configured. When you launch the AMI, you can specify the total number of nodes in your cluster and how many nodes should be Brisk Hadoop nodes versus pure Cassandra nodes. The AMI sets up multi-datacenter replication using a special BriskSnitch. This allows data to be replicated between the low-latency Cassandra side of the cluster to the Brisk Hadoop analytics side of the cluster without impacting your real-time application performance.

The Brisk AMI does the following:

  • Installs Brisk on an Ubuntu 10.10 image
  • Uses RAID0 ephemeral disks for data storage
  • Uses the root volume for the Cassandra commit log
  • Uses the local interface for intra-cluster communication
  • Starts the nodes in the specified mode (Brisk or Cassandra)
  • Configures the Cassandra replication strategy using a special BriskSnitch (for a mixed workload cluster)
  • Sets the Cassandra seed nodes cluster-wide

Creating an EC2 Security Group for Brisk

  1. In your Amazon EC2 Console Dashboard, select Security Group in the My Resources section.
  2. Click Create Security Group and fill out a description and click Yes, Create.
../../_images/ami1_securitygroup.png
  1. Click Inbound and add a rule for the following ports.
Port Rule Type Description
22 SSH Default SSH port
7000 Custom TCP Rule Cassandra intra-node port (source is the current security group)
9160 Custom TCP Rule Cassandra client port
7199 Custom TCP Rule Cassandra JMX monitoring port (8080 in prior releases)
8012 Custom TCP Rule Hadoop Job Tracker client port
50030 Custom TCP Rule Hadoop Job Tracker website port
50060 Custom TCP Rule Hadoop Task Tracker website port
8888 Custom TCP Rule OpsCenter website port
1024+ Custom TCP Rule OpsCenter intra-node monitoring ports (source is the current security group)
  1. After you are done adding the above port rules, click Apply Rule Changes. Your completed port rules should look something like this:
../../_images/ami2_securityports.png

Launching the Brisk AMI

After you have created your security group, you are ready to launch an instance of Brisk using the DataStax AMI.

  1. From your Amazon EC2 Console Dashboard, click Launch Instance. This launches the Request Instances Wizard.
  2. On the Choose an AMI page, Select the Community AMIs tab. Search for the datastax-clustering-ami image, and click Select to launch it.
  3. On the Instance Details page, enter the total number of nodes you want in your cluster in the Number of Instances field and select the Instance Type (Large is the smallest size recommended for a Brisk cluster).
../../_images/ami3_num_instances.png

Click Continue.

  1. Under Advanced Instance Options add the following options to the User Data section depending on the type of Brisk cluster you want.

    For new Brisk clusters the available options are:

    Option

    Description

    -n | –clustername <name>

    Required. The name of the cluster.

    -s | –clustersize <num_nodes>

    Required. The total number of nodes in the cluster.

    -d | –deployment <version>

    Required. The version of the cluster: brisk.

    -v | –vanillanodes <num_nodes>

    Optional. For mixed-workload clusters, the number of vanilla Cassandra nodes. Default is 0.

    -c | –cfsreplication <rf_num>

    Optional. Sets the replication factor for the CFS keyspace. Default is 1.

    -e | –email <smtp>:<port>:<email>:<password>

    Optional. Sends AMI installation log files to/from this email address.

    -o | –opscenter <user>:<pass>

    Optional. Install OpsCenter free version on the first instance. Requires registration username and password.

    -p | –paidopscenter <user>:<pass>

    Optional. Install OpsCenter paid version on the first instance. Requires registration username and password.

    The following additional options are available if adding nodes to an existing Brisk cluster. New nodes can be added to an established Brisk cluster one at a time.

    Option

    Description

    -t | –token <token>

    Assigns a token to the node being added.

    -z | –seeds <seed>,<seed>

    The seed node(s) to contact in order to join a cluster. Seeds must be in the same region as the node being added.

    -w | –thisisvanilla 1

    Using this option with 1 forces the joining node to be a vanilla Cassandra node.

    ../../_images/ami4_brisk_options.png
Click Continue.
  1. On the Tags page, give a name to your Brisk instance. This can be any name you like (For example: mixed-workload-brisk). Click Continue.
  2. On the Create Key Pair page create a new key pair or select an existing key pair and click Continue. You will need this key (.pem file) to log in to your Brisk instance, so save it to a location on your local machine.
  3. On the Configure Firewall page, select the Brisk security group you created earlier and click Continue.
  4. On the Review page, review your cluster configuration and then click Launch.
  5. Go to the My Instances page to see the status of your Brisk instance. Once a node has a status of running, you are able to connect to it.

Connecting to Your Brisk EC2 Instance

You can connect to your new Brisk EC2 instance using any SSH client (PuTTY, Terminal, etc.). To connect, you will need a private key (the .pem file you created earlier) and the public DNS name of a node. Connect as user ubuntu rather that root.

If this is the first time you are connecting, copy your private key file (<keyname>.pem) you downloaded earlier to your home directory, and change the permissions so it is not publicly viewable. For example:

chmod 400 briskkey.pem
  1. From the My Instances page in your AWS EC2 Dashboard, select the node you want to connect to. Since all nodes are peers in Brisk, you can connect using any node in the cluster. However, the first node is typically the node running your Job Tracker and OpsCenter services (and is also the Cassandra seed node).

    ../../_images/ami_connect1.png
  2. To get the public DNS name of a node, select Instance Actions > Connect

  3. This will open a Connect Help - Secure Shell (SSH) page for the selected node. This page will have all of the information you need to connect via SSH. If you copy and paste the command line, change the connection user from root to ubuntu.

    ../../_images/ami_connect2.png
  4. The AMI image configures your cluster and starts the Brisk Cassandra and Hadoop services. For next steps, see Getting Started with Brisk Hive, The Brisk/Hive Demo, or Getting Started with Brisk Pig.

Note

If you specified the user option to install OpsCenter with your Brisk cluster (along with the correct username and password from your OpsCenter registration), allow about 60-90 seconds after the cluster has finished loading for OpsCenter to start. If the AMI did not install OpsCenter, you can register for OpsCenter and then install and start it using the instructions in the OpsCenter documentation.

Powered by Rackspace
Apache, Apache Cassandra, Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation.