DataStax Enterprise 2.2 Documentation

Using Solr and Hadoop in a cluster

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

A common question is how to use real-time (Cassandra), analytics (Hadoop), or search (Solr) nodes in the same cluster. To mix workloads in a cluster, you need to segregate the real-time, analytics, or search nodes into separate data centers.


Within the same data center, attempting to run Solr on some nodes and real-time queries or analytics on other nodes does not work.

Do not run Solr and Hadoop on the same node in either production or development environments.

A DSE data center (DC) can be physical or virtual. In this diagram, nodes in data centers 1 and 2 (DC 1 and DC 2) run a mix of:

  • Real-time queries (Cassandra and no other services)
  • Analytics (Cassandra and Hadoop)

Data centers 3 and 4 (DC 3 and DC 4) are dedicated to search.


Using separate data centers for different types of nodes, you can make some of your DSE nodes handle search while others handle MapReduce, or just act as ordinary Cassandra nodes. Cassandra ingests the data, Solr indexes the data, and you run MapReduce against that data, all in one cluster without having to do any manual extract, transform, and load (ETL) operations. Cassandra handles the replication and isolation of resources.

The Solr nodes run HTTP and hold the indexes for the column family data. If a Solr node goes down, the commit log replays the Cassandra inserts, which correspond to Solr inserts, and the node is restored automatically.

Deploying multiple data centers

To set up a mixed workload cluster, which is a cluster that has more than one data center to accommodate different types of nodes, see Multiple Data Center Deployment.

Replicating data across data centers

You set up replication for Solr nodes exactly as you do for other nodes in a Cassandra cluster, by creating or altering a keyspace to define the replication strategy.

You can use the pre-release CQL 3 CREATE KEYSPACE and ALTER KEYSPACE statements to set up replication.