DataStax Enterprise 3.0.2 includes updated components, enhancements, and changes. These release notes list issues and resolved issues.
Improved Lucene/Solr concurrency
Some users may experience performance gains.
Removal of JNA jars from DataStax Enterprise tarball installation
Warnings appear in the system log about the absence of the jars. For information about installing JNA, see Installing JNA.
Access to the CassandraFS
A Cassandra File System (CFS) superuser can modify files in the CFS without any restrictions. Files that a superuser adds to the Cassandra File System are password-protected.
DSE Search/Solr support for copy fields
If stored=false in the copyField directive:
If stored=true in the copyField directive (backward compatibility mode):
Support for changing the stored attribute value of copyField directives
To change the stored attribute value of a copyField directive from true to false:
Previously stored copies of data are not automatically removed from Cassandra.
Changing the stored attribute value from false to true is not directly supported. The workaround is:
Stored values are not automatically removed from Cassandra.
In earlier releases, when a nodetool drain operation occurred during the DSE Search/Solr shutdown process, a call to stop tomcat caused a node to hang. The shutdown now occurs without hanging. (DSP-1994)
Classpath problems that affected running Hadoop jobs have been fixed. The way classes and libraries are loaded has changed and dependencies set by the CLASSPATH have been minimized. (DSP-1810)
Cassandra-5098 has been backported to DataStax Enterprise 3.0.2 to fix a problem in Pig that incorrectly decoded row keys in widerow mode has been backported. (C*-5098)
The reference to the fair-scheduler.xml file in the mapred-site.xml that caused problems with the fair scheduling assignment of resources to Hadoop jobs has been fixed. To enable the fair scheduler you uncomment a section in the mapred-site.xml that looks something like this:
<!-- FairScheduler is included. Uncomment to enable. -->
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.FairScheduler</value>
</property>
. . .
<value>dse-3.0.2/dse/resources/hadoop/conf/fair-scheduler.xml</value>
</property>
You might need to change the value element shown here. Check for the presence of a file named fair-scheduler.xml in the Hadoop conf directory. If the file has a different name, change the name of the file to fair-scheduler.xml. Specify the absolute path to the file. (DSP-1971)
After upgrading DataStax Enterprise 2.x to 3.0.2, a Solr-indexed field containing an empty date would cause a parse exception when encountered in search results. This problem has been resolved. (DSP-1944)
In DataStax Enterprise 3.0, before compaction and after all columns in a row were expired by the time-to-live (TTL) mechanism, you could still search for and find expired columns. This issue has been resolved: Expired columns are no longer returned in search results after all columns in a row/Solr document are expired. (DSP-1884)
DataStax Enterprise would not stop when issuing the cassandra-stop command. This problem has been resolved. (DSP-1998)
Fixed an issue where Solr field deletes were not being distributed to all Solr nodes. (DSP-1979)
DataStax Enterprise 3.0.1 includes updated components, enhancements, and changes. These release notes list issues and resolved issues.
DataStax Enterprise 3.0.1 has been enhanced or changed in the following ways:
The default consistency level has changed from ONE to QUORUM for reads and writes to resolve a problem finding a CassandraFS block when using consistency level ONE on a Hadoop node. (DSP-1809)
Solr type mapping to Cassandra validator types has been refactored in this release. (DSP-1876)
The configuration files for these DSE Search/Solr demos have been modified to use new type mapping:
Running DSE Search/Solr demos using legacy data describes how to use data from an earlier release.
DSE Search provides a new multi-threaded indexing implementation to improve performance on multi-core machines. All index updates are internally dispatched to a per-core indexing thread pool and executed asynchronously: this allows for greater concurrency and parallelism, but as a consequence, index requests will return a response before the indexing operation is actually executed. The number of available indexing threads per-core is by default equal to number of available cores times 2: it can be configured by editing the max_solr_concurrency_per_core parameter in the dse.yaml configuration file; if set to 1, DSE Search will go back to the synchronous indexing behavior of the earlier release. (DSP-1644)
Also, DSE Search provides advanced, JMX-based, configurability and visibility through the IndexPool-ks.cf (where ks.cf is the name of a DSE Search Solr core) MBean under the com.datastax.bdp namespace.
On the cqlsh command line, Tab completion now reveals user names when you type a CQL security command that takes a known user name as an option. (DSP-1371)
This release includes a plugin API for Solr updates and a plugin to the CassandraDocumentReader. The plugin API transforms data from the secondary indexing API before it is submitted to Solr. The plugin to the CassandraDocumentReader transforms the results data from Cassandra to Solr. (DSP-1493)
The deprecated Solr document cache is now disabled. (DSP-1794)
Solr/Cassandra range manipulation and token filtering algorithms have been rewritten to improve performance and internal maintenance. This change is backward compatible with previous releases. (DSP 1708)
This release includes two features for performing an anti-entropy node repair on a subrange of data instead of all the data in a keyspace. (DSP-1661)
Using these commands, DSE Search now performs a partial re-index instead of a full re-index of Solr data after an anti-entropy repair.
You can now track memory usage of internal Lucene and Solr data structures using OpsCenter. These metrics are among those you can track: (DSP-1617)
Cassandra 5155 has been backported to the Cassandra component included in this release, Cassandra 1.1.9.3. With the enhancement, you can configure an Ec2Region data center name. In the same EC2 region, you can now run a real-time Cassandra data center and a DSE Search/Solr cluster. (DSE-1685)
The Query Elevation search component now functions correctly if you upload the elevate.xml to Cassandra like you upload the solrconfig.xml. Alternatively, put elevate.xml in a directory on all the nodes. (DSP-1652)
To insert data using CQL or Thrift that will be indexed by Solr, run the inserts on a Solr node. (DSP-2007)
Use a single CQL statement or batch operation in thrift to insert data in fields that the Solr schema declares a copyField having a multi-value destination. Otherwise any, subsequent writes to those fields overwrite any values in the field being copied to. (DSP-1882)
For example, using pycassa:
import pycassa
from pycassa.pool import ConnectionPool
from pycassa.columnfamily import ColumnFamily
pool = ConnectionPool('test')
col_fam = pycassa.ColumnFamily(pool, 'copy')
b = col_fam.batch()
b.insert('thrift_test',
{'text1': 'textval1',
'text2': 'textval2'})
b.insert('thrift_test', {'multi_text_col1': "solrjson:['foo','bar']"})
b.send()
b2 = col_fam.batch()
During upgrading, you might see warnings when initially starting up a Analytics/Hadoop node. To avoid making concurrent changes to the schema, which are not fully supported in this release, nodes coordinate the configuration of the system keyspaces. When the node designated to update the schema is not fully initialized or a user runs dsetool before the schema update occurs, this type of warning occurs: (1804)
INFO [main] 2013-04-15 19:09:22,362 CassandraFSPlugin.java (line 35)
Found CFS filesystem in Hadoop config: cfs-archive
WARN [RMI TCP Connection(2)-10.190.155.233] 2013-04-15 19:09:27,570
TrackerManager.java (line 201) JobTracker location query failed with consistency level QUORUM, retrying with level ONE
WARN [RMI TCP Connection(2)-10.190.155.233] 2013-04-15 19:09:27,571
CassandraJobConf.java (line 358) Unable to retrieve JobTracker primary and reserve locations, will set local address as JT for Analytics-Analytics
WARN [RMI TCP Connection(2)-10.190.155.233] 2013-04-15 19:09:27,574
TrackerManager.java (line 157) Error writing JT location
InvalidRequestException(why:Keyspace dse_system does not exist)
. . .
You can ignore these warnings. (DSP-1916)
Solr can return duplicated results because Solr improperly indexes Cassandra when all of these conditions exist:
The solution is to ensure that the Solr unique key field is of type string (solr.StrField). (DSP 882 and 839)
When using Async or HsHa, Hadoop users may see an error that a connection failed (Failed to open server transport) and a different transport will be used (Falling back to TFramedTransport). This error is benign. To remove the error, in mapred-site.xml, set the property cassandra.client.transport.factory to org.apache.cassandra.thrift.TFramedTransportFactory". You may also need to fix the property in the dsetool, nodetool, and cassandra-cli scripts. (DSP-1844)
The RPC Thrift server doesn't support Async or HsHa when using Kerberos. You must either change the settings in cassandra.yaml to rpc_server_type: sync, or disable Kerberos and restart the server. (DSP-1844)
DataStax Enterprise 3.0 includes updated components enhancements, and changes.
DataStax Enterprise 3.0 has been enhanced in the following ways:
Changes to the Solr demo script
In this release, changes have been made to the Solr demo script. The scripts to run the wikipedia demo have been updated. For example, the 1_add_schema.sh script has been updated to include these lines:
CREATE_URL="http://${host}:8983/solr/admin/cores?action=CREATE&name=${KS}.${CF}" curl -X POST $CREATE_URL echo "Created index."
Disk full alert
DataStax assumes that the Customer's operation team monitors cluster resources to ensure that enough disk space exists. In the event of an oversight, Cassandra marks the node to be decommissioned when the disk is approaching full. The server should stop serving when the disk is almost full, the node is removed from the ring, and Cassandra issues an alert.
Java requirements
DataStax recommends using the latest 64-bit version of Java 6.
This release fixes the following issues:
This release has the following issues: