I've not yet updated my EC2 instances, but I've been trying to resolve the FVH issue using my 2010 Core i7 iMac, with 16GB of RAM, using the DSE tarball install.
A typical query generated by our application is used below, using the standard highlighter and then the fast vector highlighter. I'm using curl to speak directly to Solr/DSE, eliminating any additional application processing delays. The curl request is issued on the same system running DSE (localhost).
curl "http://localhost:8983/solr/IngenuityContent.SearchMain/select?hl.requireFieldMatch=true&facet=true&facet.offset=0&facet.mincount=1&facet.limit=8&hl=true&rows=10&fl=id,n_type,n_typeCategory,n_is_group,n_nameExact,n_synonymExact,n_macromolecule_name,n_macromolecule_id,n_macromolecule_species,n_macromolecule_summary,n_c_pubchem_cid,n_c_formula,n_c_cas_number,n_m_acc,n_m_descr&facet.sort=index&start=0&q=egfr&facet.field=n_diseaseFacet&facet.field=n_typeFacet&facet.field=n_tissue_typeFacet&facet.field=n_pathway_nameExact&facet.field=n_macromolecule_species&facet.field=n_locationFacet&facet.field=n_functionFacet&hl.usePhraseHighlighter=true&qt=partner-tmo&fq=type:node&debug=timing" > dse-search-egfr.timing.xml
The DSE log file shows:
INFO [http-8983-3] 2012-04-27 10:09:39,469 SolrCore.java (line 1470) [IngenuityContent.SearchMain] webapp=/solr path=/select params={hl.requireFieldMatch=true&facet=true&facet.mincount=1&facet.offset=0&facet.limit=8&debug=timing&hl=true&rows=10&fl=id,n_type,n_typeCategory,n_is_group,n_nameExact,n_synonymExact,n_macromolecule_name,n_macromolecule_id,n_macromolecule_species,n_macromolecule_summary,n_c_pubchem_cid,n_c_formula,n_c_cas_number,n_m_acc,n_m_descr&facet.sort=index&start=0&q=egfr&facet.field=n_diseaseFacet&facet.field=n_typeFacet&facet.field=n_tissue_typeFacet&facet.field=n_pathway_nameExact&facet.field=n_macromolecule_species&facet.field=n_locationFacet&facet.field=n_functionFacet&qt=partner-tmo&hl.usePhraseHighlighter=true&fq=type:node} hits=2354 status=0 QTime=285
Running this several times, the QTime varied from 279-292 ms. Well, the first time was 704 ms, but let's say that filled the caches...
The Solr timing information for that request above, attributes 281 ms to HighlightComponent.
Now, I added hl.useFastVectorHighlighter=true to the request:
curl "http://localhost:8983/solr/IngenuityContent.SearchMain/select?hl.requireFieldMatch=true&facet=true&facet.offset=0&facet.mincount=1&facet.limit=8&hl=true&hl.useFastVectorHighlighter=true&rows=10&fl=id,n_type,n_typeCategory,n_is_group,n_nameExact,n_synonymExact,n_macromolecule_name,n_macromolecule_id,n_macromolecule_species,n_macromolecule_summary,n_c_pubchem_cid,n_c_formula,n_c_cas_number,n_m_acc,n_m_descr&facet.sort=index&start=0&q=egfr&facet.field=n_diseaseFacet&facet.field=n_typeFacet&facet.field=n_tissue_typeFacet&facet.field=n_pathway_nameExact&facet.field=n_macromolecule_species&facet.field=n_locationFacet&facet.field=n_functionFacet&hl.usePhraseHighlighter=true&qt=partner-tmo&fq=type:node&debug=timing" > dse-search-egfr.timing.fast.xml
Log indicates:
INFO [http-8983-3] 2012-04-27 10:15:48,687 SolrCore.java (line 1470) [IngenuityContent.SearchMain] webapp=/solr path=/select params={hl.requireFieldMatch=true&facet=true&facet.mincount=1&facet.offset=0&facet.limit=8&debug=timing&hl=true&rows=10&fl=id,n_type,n_typeCategory,n_is_group,n_nameExact,n_synonymExact,n_macromolecule_name,n_macromolecule_id,n_macromolecule_species,n_macromolecule_summary,n_c_pubchem_cid,n_c_formula,n_c_cas_number,n_m_acc,n_m_descr&hl.useFastVectorHighlighter=true&facet.sort=index&start=0&q=egfr&facet.field=n_diseaseFacet&facet.field=n_typeFacet&facet.field=n_tissue_typeFacet&facet.field=n_pathway_nameExact&facet.field=n_macromolecule_species&facet.field=n_locationFacet&facet.field=n_functionFacet&qt=partner-tmo&hl.usePhraseHighlighter=true&fq=type:node} hits=2354 status=0 QTime=182
This was pretty consistent, varying from 181-192 ms. I will take a gain of 100 ms, thank you very much! :) The timing info reports 177 ms for HighlightComponent.
This is admittedly a pretty narrow test case. But it shows I can get it to work and enjoy some benefits. Now, my problem is that the highlights produced by FVH are not yet ready for prime time. For example, for a given document, the old highlighter provides me:
<lst name="ING:6xwoe">
<arr name="n_name">
<str><span class="ingReasonText">egfr</span></str>
</arr>
<arr name="n_synonym">
<str><span class="ingReasonText">egfr</span></str>
</arr>
</lst>
For that same document, FVH gives:
<lst name="ING:6xwoe">
<arr name="n_name">
<str><span class="ingReasonText">ING:</span>6xwoe egfr </str>
</arr>
<arr name="n_synonym">
<str><span class="ingReasonText">ING:</span>6xwoe egfr </str>
</arr>
</lst>
For whatever reason, the document ID is included in the snippet, and FVH is highlighting a portion of it rather than egfr. Go figure... Oh well, that's what I have to do now. :)
Cheers,
Jeff