This is a step-by-step guide to setting up a DataStax Enterprise (DSE) cluster in the HP Cloud. DataStax supports installation on Ubuntu 11.04 Natty Narwhal and Ubuntu 11.10 Oneiric Ocelot. Installation includes the following steps:
Note
Links to some HP documents require that you are logged into the HP Cloud Console to open.
You need a key pair (.pem file) to login to your DataStax Enterprise nodes.
From the HP Cloud Dashboard, click Manage Servers or Activate in one of the Availability Zones.
Click Key Pairs.
Click Add KeyPair.
Note
For multiple availability zones, use the same key pair in each zone. If you used the HP Cloud console to create the key pair, you can retrieve the public key using the REST API. You must first create an authorization token to execute the API calls, then use the List Key Pairs command to retrieve the public key.
A security group acts as a firewall that allows you to choose which protocols and ports are open in your cluster. A Cassandra cluster requires that certain ports are open for inter-node, OpsCenter, and SSH communication. You can specify the protocols and ports either by a range of IP addresses or by security group. It is much simpler and requires less maintenance to define port access by security group. Currently the HP Cloud console does not provide the capability to specify ports by security group. However, you can install and use the HP Extended Python Novaclient for this purpose.
The HP Security Groups document provides information on defining rules for security groups.
Note
After making any change to a security group, you must restart the nodes. You cannot change which security group is associated with an instance after the instance is created.
To create a security group:
Using the HP Extended Python Novaclient, create a security group:
nova secgroup-create DSESecurityGroup "Security group for DataStax Enterprise"
Create the rules for the security group. For example, to create a rule that opens port 7000 to other nodes in the security group:
nova secgroup-add-group-rule DSESecurityGroup DSESecurityGroup --ip_proto tcp --from_port 7000 --to_port 7000
+-------------+-----------+---------+----------+-------------------+
| IP Protocol | From Port | To Port | IP Range | Source Group |
+-------------+-----------+---------+----------+-------------------+
| tcp | 7000 | 7000 | | DSESecurityGroup |
+-------------+-----------+---------+----------+-------------------+
A Cassandra/DataStax Enterprise cluster requires the following ports:
Port |
IP Protocol |
Description |
|---|---|---|
Internet Control Message Protocol |
||
-1 |
icmp |
Use for ping |
Public Facing Ports |
||
22 |
tcp |
Default SSH port |
DataStax Enterprise Specific |
||
8012 |
tcp |
Hadoop Job Tracker client port |
8983 |
tcp |
Solr port and Demo applications website port (Portfolio, Search, Search log) |
50030 |
tcp |
Hadoop Job Tracker website port |
50060 |
tcp |
Hadoop Task Tracker website port |
OpsCenter Specific |
||
8888 |
tcp |
OpsCenter website port |
Internode Ports |
||
Cassandra Specific |
||
1024+ |
tcp (use security group) |
JMX reconnection/loopback ports |
7000 |
tcp (use security group) |
Cassandra intra-node port |
7199 |
tcp (use security group) |
Cassandra JMX monitoring port |
9160 |
tcp (use security group) |
Cassandra client port |
DataStax Enterprise Specific |
||
9290 |
tcp (use security group) |
Hadoop Job Tracker Thrift port |
OpsCenter Specific |
||
50031 |
tcp (use security group) |
OpsCenter HTTP proxy for Job Tracker |
61620 |
tcp (use security group) |
OpsCenter intra-node monitoring port |
61621 |
tcp (use security group) |
OpsCenter agent ports |
Note
Generally, when you have firewalls between machines, it is difficult to run JMX across a network and maintain security. This is because JMX connects on port 7199, handshakes, and then uses any port within the 1024+ range. Instead use SSH to execute commands to remotely connect to JMX locally or use the DataStax OpsCenter.
After you are done adding the port rules, you can also view them on the HP Cloud console:
Warning
This security configuration shown in the above graphic opens ports 22 and 8888 to incoming traffic from any IP address (0.0.0.0/0). If you desire a more secure configuration, see the HP Security Groups document.
From the HP Cloud Dashboard, click Manage Servers or Activate in one of the Availability Zones.
Under Create Servers, select the following:
Click Create.
Click Create for each additional instance.
chmod 400 DataStaxKey.pem
After the instance is running, click Connect.
From the Instance dialog box, copy the example and change the connection user from root to ubuntu, then paste it into your SSH client.
Oracle Java SE Runtime Environment (JRE) 6 is required to run DataStax Enterprise. The latest version is recommended.
The easiest way to put the Oracle JRE on an HP Cloud instance is to download it to your local machine from Oracle Java SE Downloads and then use the secure copy command to copy it onto the node:
scp -i DataStaxKey.pem jre-6u43-linux-x64.bin ubuntu@<ip_address>:~/
Install DataStax Enterprise as described in Installing the DataStax Enterprise Package on Debian and Ubuntu.
Note
You only need to install OpsCenter on one node.
You can configure DataStax Enterprise as described in Single Data Center Deployment or Single Data Center Deployment using the following guidelines.
Single availability zone:
If necessary, change the default the delegated_snitch to DSESimpleSnitch. It is located in the /etc/dse/dse.yaml configuration file.
delegated_snitch: com.datastax.bdp.snitch.DseSimpleSnitch
In the /etc/dse/cassandra/cassandra.yaml configuration file, use the private IP addresses of the nodes, not the public IP addresses:
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "<private_ip_of_seed1>,<private_ip_of_seed2>"
listen_address: <private_ip_of_the_node>
Multiple availability zones:
In the /etc/dse/dse.yaml configuration file, set the delegated_snitch to PropertyFileSnitch:
delegated_snitch: org.apache.cassandra.locator.PropertyFileSnitch
In the /etc/dse/cassandra/cassandra.yaml configuration file, use the public IP addresses for the seeds and set the broadcast_address:
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "<public_ip_of_seed1>,<public_ip_of_seed2>"
listen_address: <private_ip_of_the_node>
broadcast_address: <public_ip_of_the_node>
DataStax Enterprise OpsCenter is installed when you install DataStax Enterprise using the sudo apt-get install dse-full opscenter command. If you have not already installed OpsCenter, install it as described in Installing OpsCenter from Debian or Ubuntu Packages.
Note
If you are installing OpsCenter on Ubuntu 11.10, be sure to install OpenSSL 0.9.8 on the node where OpsCenter is installed:
$ sudo apt-get install libssl0.9.8
In the /etc/opscenter/opscenterd.conf configuration file, set the [webserver] interface to the private IP address of the OpsCenter node:
[webserver]
port = 8888
interface = <private_ip_of_the_opscenter_node>
Connect to the OpsCenter using the following URL:
http://<private_ip_of_the_opscenter_node>:8888>
Install the agents as described in Automatically Deploying Agents - Packaged Installations.