A common question is how to use real-time (Cassandra), analytics (Hadoop), or search (Solr) nodes in the same cluster. To mix workloads in a cluster, you need to segregate the real-time, analytics, or search nodes into separate data centers.
Note
Within the same data center, attempting to run Solr on some nodes and real-time queries or analytics on other nodes does not work.
In production environments, do not run Solr and Hadoop on the same node. In development environments, running Solr and Hadoop on the same node is feasible.
A DSE data center (DC) can be physical or virtual. In this diagram, nodes in data centers 1 and 2 (DC 1 and DC 2) run a mix of:
Data centers 3 and 4 (DC 3 and DC 4) are dedicated to search.
Using workload provisioning, you can make some of your DSE nodes handle search while others handle MapReduce, or just act as ordinary Cassandra nodes. Cassandra ingests the data, Solr indexes the data, and you run MapReduce against that data, all in one cluster without having to do any manual extract, transform, and load (ETL) operations. Cassandra handles the replication and isolation of resources.
The Solr nodes run HTTP and hold the indexes for the column family data. If a Solr node goes down, the commit log replays the Cassandra inserts, which correspond to Solr inserts, and the node is restored automatically.
For more information about cluster partitioning by workload, Elastic Workload Re-provisioning.
To set up a mixed workload cluster, which is a cluster that has more than one data center to accommodate different types of nodes, see Multiple Data Center Deployment.
You set up replication for Solr nodes exactly as you do for other nodes in a Cassandra cluster, by creating or altering a keyspace to define the replication strategy.
You can use the pre-release CQL 3 CREATE KEYSPACE and ALTER KEYSPACE statements to set up replication.