<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; Tag: lucene - Recent Posts</title>
		<link>http://www.datastax.com/support-forums/tags/lucene</link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Wed, 22 May 2013 16:59:42 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/tags/lucene" rel="self" type="application/rss+xml" />

		<item>
			<title>jas on "Solr Index Files"</title>
			<link>http://www.datastax.com/support-forums/topic/solr-index-files#post-1886</link>
			<pubDate>Wed, 09 May 2012 21:26:08 +0000</pubDate>
			<dc:creator>jas</dc:creator>
			<guid isPermaLink="false">1886@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Thanks, Rock, that does help.  I like to know where I stand. :)&#60;/p&#62;
&#60;p&#62;BTW, I see you're a forum &#34;Member&#34; like me as opposed to a &#34;Moderator&#34;.  Do you work for DataStax or just so happen to know these details? :)&#60;/p&#62;
&#60;p&#62;Thanks,&#60;/p&#62;
&#60;p&#62;Jeff
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Anonymous on "Solr Index Files"</title>
			<link>http://www.datastax.com/support-forums/topic/solr-index-files#post-1878</link>
			<pubDate>Wed, 09 May 2012 15:39:58 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">1878@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Jeff,&#60;/p&#62;
&#60;p&#62;Your analysis of how the DSE Solr product works is correct.  The index files are stored locally.  This produces the maximum query speed possible, and enables taking advantage of new Lucene posting codecs.&#60;/p&#62;
&#60;p&#62;Optimization of the Lucene / Solr index will help in the read-only case.  If updates are frequent, then optimizing will not be beneficial.&#60;/p&#62;
&#60;p&#62;&#38;gt; Does every node in the ring index the same content and product its own index files?&#60;/p&#62;
&#60;p&#62;Yes&#60;/p&#62;
&#60;p&#62;Hope that helps!
&#60;/p&#62;</description>
		</item>
		<item>
			<title>jas on "Solr Index Files"</title>
			<link>http://www.datastax.com/support-forums/topic/solr-index-files#post-1865</link>
			<pubDate>Mon, 07 May 2012 21:14:43 +0000</pubDate>
			<dc:creator>jas</dc:creator>
			<guid isPermaLink="false">1865@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hi:&#60;/p&#62;
&#60;p&#62;For whatever reason I thought the DSE 2 Solr indexes were maintained in Cassandra.  But in this document:&#60;/p&#62;
&#60;p&#62;&#60;a href=&#34;http://www.datastax.com/docs/datastax_enterprise2.0/search/dse_search_cluster#updating-individual-fields-in-a-solr-document&#34; rel=&#34;nofollow&#34;&#62;http://www.datastax.com/docs/datastax_enterprise2.0/search/dse_search_cluster#updating-individual-fields-in-a-solr-document&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;Under &#34;Manage the Location of Solr Data&#34;, describes the existence of &#60;code&#62;solr.data&#60;/code&#62;.  I went there and I see what look like numerous Lucene index files.  This indicates to me that the indexes reside within the node's local file system, and not somewhere in Cassandra. I know the equivalently named keyspace and CF in Cassandra has the field data, and the Solr configuration files. Does &#60;code&#62;com.datastax.bdp.cassandra.index.solr.SolrSecondaryIndex&#60;/code&#62; place the Lucene indexes themselves in the file system rather than in Cassandra itself?&#60;/p&#62;
&#60;p&#62;This means the indexes themselves are not distributed and replicated?  Does every node in the ring index the same content and product its own index files?  I guess that's a form of replication, but it could well exceed the desired replication factor. If I add a node, does it then have to go and index the Solr related CFs?&#60;/p&#62;
&#60;p&#62;If the indexes are indeed in the file system, then optimizing the index is still useful, right?  I have a lot of read-only data indexed, and optimizing pays off well in regular Solr. I figured Cassandra's compaction took care of that, but I guess if the indexes don't reside there, that's not the case.&#60;/p&#62;
&#60;p&#62;Thanks,&#60;/p&#62;
&#60;p&#62;Jeff
&#60;/p&#62;</description>
		</item>
		<item>
			<title>tjake on "Lucene/Mahout Error: class  overrides final method tokenStream."</title>
			<link>http://www.datastax.com/support-forums/topic/lucenemahout-error-class-overrides-final-method-tokenstream#post-1561</link>
			<pubDate>Tue, 10 Apr 2012 14:06:09 +0000</pubDate>
			<dc:creator>tjake</dc:creator>
			<guid isPermaLink="false">1561@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;This looks to be happening because DSE comes with Lucene/Solr 4.0&#60;/p&#62;
&#60;p&#62;We can fix this in the next point release so the Solr jars aren't in the classpath when you run hadoop.  In the meantime you can move those jars out of the way $DSE_HOME/resources/solr/lib&#60;/p&#62;
&#60;p&#62;-Jake
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Anonymous on "Lucene/Mahout Error: class  overrides final method tokenStream."</title>
			<link>http://www.datastax.com/support-forums/topic/lucenemahout-error-class-overrides-final-method-tokenstream#post-1544</link>
			<pubDate>Fri, 06 Apr 2012 15:33:24 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">1544@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hey, I've been working on getting Mahout to run on top of DataStax Enterprise, but I'm running into issues with Lucene incompatibility.&#60;/p&#62;
&#60;p&#62;For example, using the Mahout wikipedia bayes example (&#60;a href=&#34;https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html)&#34; rel=&#34;nofollow&#34;&#62;https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html)&#60;/a&#62;, when I execute the following command (step #6 from the example), I end up with errors as shown. (the below command has some *magic* in it, as mahout will assume that a &#60;code&#62;hadoop&#60;/code&#62; executable will exist, so I created a &#60;code&#62;hadoop&#60;/code&#62; shell script serving as an alias that basically does &#60;code&#62;dse hadoop $@&#60;/code&#62;; that's what &#60;code&#62;HADOOP_HOME&#60;/code&#62; is pointing to&#60;/p&#62;
&#60;p&#62;&#60;code&#62;&#60;br /&#62;
sudo env JAVA_HOME=$JAVA_HOME HADOOP_HOME=/home/ubuntu/hadoop HADOOP_CONF_DIR=/etc/dse/hadoop ./mahout wikipediaDataSetCreator -i wikipedia/chunks -o wikipediainput -c /home/ubuntu/mahout/examples/src/test/resources/country.txt&#60;br /&#62;
&#60;/code&#62;&#60;/p&#62;
&#60;p&#62;&#60;code&#62;&#60;br /&#62;
    MAHOUT-JOB: /home/ubuntu/mahout/examples/target/mahout-examples-0.7-SNAPSHOT-job.jar&#60;br /&#62;
    12/04/06 14:38:03 WARN driver.MahoutDriver: No wikipediaDataSetCreator.props found on classpath, will use command-line arguments only&#60;br /&#62;
    12/04/06 14:38:04 INFO bayes.WikipediaDatasetCreatorDriver: Input: wikipedia/chunks Out: wikipediainput Categories: /home/ubuntu/mahout/examples/src/test/resources/country.txt&#60;br /&#62;
    12/04/06 14:38:04 INFO common.HadoopUtil: Deleting wikipediainput&#60;br /&#62;
    12/04/06 14:38:05 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.&#60;br /&#62;
    12/04/06 14:38:06 INFO input.FileInputFormat: Total input paths to process : 555&#60;br /&#62;
    12/04/06 14:38:07 INFO mapred.JobClient: Running job: job_201204051216_0017&#60;br /&#62;
    12/04/06 14:38:08 INFO mapred.JobClient:  map 0% reduce 0%&#60;br /&#62;
    12/04/06 14:38:26 INFO mapred.JobClient: Task Id : attempt_201204051216_0017_m_000005_0, Status : FAILED&#60;br /&#62;
    Error: class org.apache.lucene.analysis.ReusableAnalyzerBase overrides final method tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream;&#60;br /&#62;
    12/04/06 14:38:28 INFO mapred.JobClient: Task Id : attempt_201204051216_0017_m_000002_0, Status : FAILED&#60;br /&#62;
    Error: class org.apache.lucene.analysis.ReusableAnalyzerBase overrides final method tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream;&#60;br /&#62;
&#60;/code&#62;&#60;/p&#62;
&#60;p&#62;From reading about this error, it appears that Lucene has been patched to make TokenStream final, resulting in the above issues. Mahout itself has an implementation of lucene jars that do not have this problem. I am assuming that the solution lies in getting the dse hadoop to use mahout lucene instead of the lucene it has. Any guidance on that?&#60;/p&#62;
&#60;p&#62;Cheers,&#60;/p&#62;
&#60;p&#62;Tristan
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
