A common question is how to use real-time (Cassandra), analytics (Hadoop), or search (Solr) nodes in the same cluster. To mix workloads in a cluster, you need to segregate the real-time, analytics, or search nodes into separate data centers.
Within the same data center, attempting to run Solr on some nodes and real-time queries or analytics on other nodes does not work.
Do not run Solr and Hadoop on the same node in either production or development environments.
A DSE data center (DC) can be physical or virtual. In this diagram, nodes in data centers 1 and 2 (DC 1 and DC 2) run a mix of:
Data centers 3 and 4 (DC 3 and DC 4) are dedicated to search.
Using separate data centers for different types of nodes, you can make some of your DSE nodes handle search while others handle MapReduce, or just act as ordinary Cassandra nodes. Cassandra ingests the data, Solr indexes the data, and you run MapReduce against that data, all in one cluster without having to do any manual extract, transform, and load (ETL) operations. Cassandra handles the replication and isolation of resources.
The Solr nodes run HTTP and hold the indexes for the column family data. If a Solr node goes down, the commit log replays the Cassandra inserts, which correspond to Solr inserts, and the node is restored automatically.
To set up a mixed workload cluster, which is a cluster that has more than one data center to accommodate different types of nodes, see Multiple Data Center Deployment.
You set up replication for Solr nodes exactly as you do for other nodes in a Cassandra cluster, by creating or altering a keyspace to define the replication strategy.