DataStax Enterprise 3.0 Documentation

Introduction to DSE Search/Solr

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

Coming from the Apache Lucene project, Solr is the most popular open source enterprise search platform in use today. Solr’s primary features include robust free-text search, hit highlighting, and rich document (PDF, Microsoft Word, and so on) handling. Solr also provides more advanced features like aggregation, grouping, and geo-spatial search. Today, Solr powers the search and navigation features of many of the world's largest Internet sites. With the inclusion of Solr 4.0, near real-time indexing can be performed.

The unique combination of Cassandra, Solr, and Hadoop in DSE bridges the gap between online transaction processing (OLTP) and online analytical processing (OLAP). DSE Search in Cassandra offers a way to aggregate and look at data in many different ways in real-time. Cassandra speed compensates for typical MapReduce performance problems. By integrating Solr into the DataStax Enterprise big data platform, DataStax extends Solr’s capabilities and overcomes the shortcomings of Open Source Solr (OSS) mentioned in the next section.

../../_images/ds_integration.png

DSE Search is easily scalable. You add search capacity to your cluster in the same way as you add Hadoop or Cassandra capacity to your cluster. You can have a hybrid cluster of nodes, some running Cassandra, some running search, and some running Hadoop. If you don't need Cassandra or Hadoop, migrate to DSE strictly for Solr and create an exclusively Solr cluster. The DSE cluster configuration improves upon the master-slave configuration supported by OSS.

OSS tools and APIs are supported, simplifying migration from Solr to DSE Search for Solr users.

Sources of information about OSS

Covering all the features of OSS is beyond the scope of DataStax Enterprise documentation. Because DSE Search/Solr supports all Solr tools and APIs, refer to Solr documentation for information about topics, such as how to construct Solr query strings to retrieve indexed data.

Benefits of using Solr in DataStax Enterprise

DataStax Enterprise Search 3.0 and later is built on top of the released version of Solr 4.0. Solr offers real-time querying of files. Search indexes remain tightly in line with live data. There are significant benefits of running your enterprise search functions through DataStax Enterprise instead of OSS, including:

  • A fully fault-tolerant, no-single-point-of-failure search architecture
  • Linear performance scalability that comes from adding new search nodes online
  • Automatic indexing of data ingested into Cassandra
  • Automatic and transparent data replication
  • Isolation of all real-time, Hadoop, and search/Solr workloads to prevent competition between workloads for either compute resources or data
  • The capability to read/write to any Solr node, which overcomes the Solr write bottleneck
  • Selective updates of one or more individual fields supported (a full re-index operation is still required)
  • Search indexes that can span multiple data centers (OSS cannot)
  • Limited CQL support for Solr/search queries (Solr HTTP API recommended)
  • Creation of Solr indexes from existing column families created with CQL 2

DSE Search takes secondary indexes to a new level: data added to Cassandra is locally indexed in Solr and data added to Solr is locally indexed in Cassandra.

Unsupported features

DSE Search does not support:

  • Cassandra super columns
  • Cassandra counter columns
  • Cassandra timeseries type rows
  • Creation of Solr indexes from existing column families created with CQL 3. Column families created with CQL 2 are supported.
  • Cassandra composite columns, Solr fields must be strings.