Getting started with Solr in DataStax Enterprise
DataStax Enterprise supports Open Source Solr (OSS) tools and APIs, simplifying migration from Solr to DataStax Enterprise. DataStax Enterprise is built on top of Solr 4.3.
DataStax Enterprise 4.0 turns off virtual nodes (vnodes) by default. DataStax does not recommend turning on vnodes for Hadoop or Solr nodes, but you can use vnodes for any Cassandra-only cluster, or a Cassandra-only data center in a mixed Hadoop/Solr/Cassandra deployment. If you have enabled virtual nodes in Solr nodes, see Disabling virtual nodes. You can skip this step to run the Solr getting started tutorial.
Introduction to Solr¶
The Apache Lucene project, Solr features robust full-text search, hit highlighting, and rich document (PDF, Microsoft Word, and so on) handling. Solr also provides more advanced features like aggregation, grouping, and geo-spatial search. Today, Solr powers the search and navigation features of many of the world's largest Internet sites. With the latest version of Solr, near real-time indexing can be performed.
Solr integration into DataStax Enterprise¶
The unique combination of Cassandra, DSE Search with Solr, and DSE Analytics with Hadoop bridges the gap between online transaction processing (OLTP) and online analytical processing (OLAP). DSE Search in Cassandra offers a way to aggregate and look at data in many different ways in real-time using the Solr HTTP API or the Cassandra Query Language (CQL). Cassandra speed can compensate for typical MapReduce performance problems. By integrating Solr into the DataStax Enterprise big data platform, DataStax extends Solr’s capabilities.
DSE Search is easily scalable. You add search capacity to your cluster in the same way as you add Hadoop or Cassandra capacity to your cluster. You can have a hybrid cluster of nodes, provided the Solr nodes are in a separate data center, some running Cassandra, some running search, and some running Hadoop. If you have a hybrid cluster of nodes, follow instructions on isolating workloads. If you don't need Cassandra or Hadoop, migrate to DSE strictly for Solr and create an exclusively Solr cluster. You can use one data center and run Solr on all nodes.
Storage of indexes and data¶
The solr implementation in DataStax Enterprise does not support JBOD mode. Indexes are stored on the local disk inside Lucene, actual data is stored in Cassandra.
Sources of information about OSS¶
For more information, see Solr 4.x Deep Dive by Jack Krupansky.
Benefits of using Solr in DataStax Enterprise¶
- A fully fault-tolerant, no-single-point-of-failure search architecture
- Linear performance scalability--add new search nodes online
- Automatic indexing of data ingested into Cassandra
- Automatic and transparent data replication
- Isolation of all real-time, Hadoop, and search/Solr workloads to prevent competition between workloads for either compute resources or data
- The capability to read/write to any Solr node, which overcomes the Solr write bottleneck
- Selective updates of one or more individual fields (a full re-index operation is still required)
- Search indexes that can span multiple data centers (OSS cannot)
- Solr/search support for CQL for querying Solr documents (Solr HTTP API may also be used)
- Creation of Solr indexes from existing tables created with CQL
Data added to Cassandra is locally indexed in Solr and data added to Solr is locally indexed in Cassandra.