Apache Cassandra™ 2.0

Adding a data center to a cluster

Adding a data center to an existing cluster.

Procedure

  1. Ensure that you are using NetworkTopologyStrategy for all of your keyspaces.
  2. For each node, set the following properties in the cassandra.yaml file:.
    1. Add (or edit) auto_bootstrap: false.

      By default, this setting is true and not listed in the cassandra.yaml file. Setting this parameter to false prevents the new nodes from attempting to get all the data from the other nodes in the data center. When you run nodetool rebuild in the last step, each node is properly mapped.

    2. Set other properties, such as -seeds and listen_address, to match the cluster settings.

      For more guidance, see Initializing a multiple node cluster (multiple data centers).

    3. If you want to enable vnodes, set num_tokens.

      The recommended value is 256. Do not set the initial_token parameter.

  3. If using the PropertyFileSnitch, update the cassandra-topology.properties file on all servers to include the new nodes. You do not need to restart.
  4. Ensure that your client does not auto-detect the new nodes so that they aren't contacted by the client until explicitly directed. For example in Hector, use sethostConfig.setAutoDiscoverHosts(false);
  5. If using a QUORUM consistency level for reads or writes, check the LOCAL_QUORUM or EACH_QUORUM consistency level to see if the level meets your requirements for multiple data centers.
  6. Start Cassandra on the new nodes.
  7. After all nodes are running in the cluster:
    1. Change the keyspace properties to specify the desired replication factor for the new data center.

      For example, set strategy options to DC1:2, DC2:2.

      For more information, see ALTER KEYSPACE.

    2. Run nodetool rebuild specifying the existing data center on all nodes in the new data center:
      nodetool rebuild -- name_of_existing_data_center

      Otherwise, requests to the new data center with LOCAL_ONE or ONE consistency levels may fail if the existing data centers are not completely in-sync.

      You can run rebuild on one or more nodes at the same time. The choices depends on whether your cluster can handle the extra IO and network pressure of running on multiple nodes. Running on one node at a time has the least impact on the existing cluster.

      Attention: If you don't specify the existing data center in the command line, the new nodes will appear to rebuild successfully, but will not contain any data.

    Related topics