DataStax Enterprise 3.0 Documentation

Release notes

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

DataStax Enterprise 3.0.9

In DataStax Enterprise 3.0.9, the Cassandra component has been upgraded.

Components

  • Apache Cassandra 1.1.12 (updated)
  • Apache Hadoop 1.0.4.8
  • Apache Hive 0.9.0.1
  • Apache Pig 0.9.2
  • Apache Solr 4.0.2.3
  • Apache log4j 1.2.16
  • Apache Sqoop 1.4.2.1
  • Apache Mahout 0.6
  • Apache Tomcat 6.0.32
  • Apache Thrift 0.7.0
  • Apache Commons

Issue Resolved

This release fixes the issue that caused a Java-level deadlock. (DSP-2579)

Issue

During a DataStax Enterprise 2.2.3 to 3.0.9 upgrade, you might encounter the following error in conjunction with a CQL solr_query: (DSP-2699)

ERROR [Finalizer] 2013-12-02 16:43:12,343 CoreContainer.java (line 478) CoreContainer was not shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!  instance=1618047676

DataStax Enterprise 3.0.8

In DataStax Enterprise 3.0.8, the following issues have been resolved:

  • Fixed error where setting max_solr_concurrency_per_core parameter to 1 caused error. (DSP-2321)
  • Fixed problem of secondary indexes not working reliably when row cache was enabled. (DSP-2551)
  • HadoopTrackerPlugin now waits for ring to stabilize before checking for keyspace. (DSP-2555)
  • Overlapping Solr shard ranges. Empty subsets returned duplicating results. (DSP-2563)
  • Include and set up the old snappy for older distributions. (DSP-2567)
  • Fixed deadlock between SystemClassLoader and ModuleClassloader. (DSP-2579)

DataStax Enterprise 3.0.7

DataStax Enterprise 3.0.7 updates Cassandra and Hadoop components and includes a new dsetool command for checking for checking the Cassandra File System (CassandraFS).

Components

  • Apache Cassandra 1.1.9.9
  • Apache Hadoop 1.0.4.8
  • Apache Hive 0.9.0.1
  • Apache Pig 0.9.2
  • Apache Solr 4.0.2.3
  • Apache log4j 1.2.16
  • Apache Sqoop 1.4.2.1
  • Apache Mahout 0.6
  • Apache Tomcat 6.0.32
  • Apache Thrift 0.7.0
  • Apache Commons

Issues resolved

  • This release solves the issue preventing Hadoop from accessing libraries in Hive. (DSP-1495)
  • The issue that prevented MapReduce Jobs from running longer than 24 hours on kerberized clusters is resolved. (DSP-2402)
  • Fixed race condition between CFS compaction and nodetool scrub. (DSP-2425)
  • Fixed the problem causing repeated Hive queries to impact CassandraFS performance. (DSP-2441)
  • When starting a large cluster, a node could see all the other nodes repeatedly going down and coming back up. This problem has been resolved. (DSP-2534)
  • The issue causing DEBUG messages to appear when running certain jobs, such as the portfolio demo, has been resolved. (DSP-2550)

DataStax Enterprise 3.0.6

In DataStax Enterprise 3.0.6, the following issues have been resolved:

  • The fix for issue Cassandra-5529, "thrift_max_message_length_in_mb makes long-lived connections error out" has been backported. (DSP-2380)
  • Removed redundant versions of Jetty from the distribution files. (DSP-2378)
  • Fixed an issue that occurred under certain circumstances when a query selected multiple partitioned columns in a hive table. No results were returned. (DSP-2374)
  • Fixed broken hive Views. (DSP-2369)
  • Fixed the nodetool enablethrift and nodetool disablethrift commands. (DSE_2390)

DataStax Enterprise 3.0.5

DataStax Enterprise 3.0.5 fixes the issue entitled, "Filter query of +_token_lhs:* being created". Resolution of this issue improves performance on certain Solr queries, such as filter queries that use an asterisk (*). (DSP-2298)

DataStax Enterprise 3.0.4

DataStax Enterprise 3.0.4 fixes this issue: Removed excessive AssertionError warnings in SetCoverFinder. (DSP-2062)

DataStax Enterprise 3.0.3

DataStax Enterprise 3.0.3 includes enhancements and a number of resolved issues.

Enhancements

Issues resolved

  • Analytics cluster running 3 or more nodes list command (dse hadoop fs -ls /) produces "No such file or directory". (DSP-787)
  • The dsetool ring command now properly shows Job Tracker annotation (JT) for all data centers. (DSP-2028)
  • Improve performance of CQL queries. (DSP-2054)
  • Backport CASSANDRA-4049: (DSP-2077)
  • Backport CASSANDRA-5361: Enable ThreadLocal allocation in the JVM. (DSP-2091)
  • On error, JobTracker now properly shut downs and releases its bound port. (DSP-2135)
  • Hadoop clusters now show a warning that exceptions will occur because the ring isn't fully up after about 10 seconds. (DSP-2137)
  • When TaskTracker fails to start, prevents retrying on the next attempt. (DSP-2140)
  • The dseTypeMapping version now includes a force option (for use by experts only). (DSP-2163)
  • Remove invalid file warnings. DSP-2077)

DataStax Enterprise 3.0.2

DataStax Enterprise 3.0.2 includes updated components, enhancements, and changes. These release notes list issues and resolved issues.

Components

  • Apache Cassandra 1.1.9.7
  • Apache Hadoop 1.0.4.7
  • Apache Hive 0.9.0.1
  • Apache Pig 0.9.2
  • Apache Solr 4.0.2.3
  • Apache log4j 1.2.16
  • Apache Sqoop 1.4.2.1
  • Apache Mahout 0.6
  • Apache Tomcat 6.0.32
  • Apache Thrift 0.7.0
  • Apache Commons

Enhancements and changes

  • Improved Lucene/Solr concurrency

    Some users may experience performance gains.

  • Removal of JNA jars from DataStax Enterprise tarball installation

    Warnings appear in the system log about the absence of the jars. For information about installing JNA, see Installing JNA.

  • Access to the CassandraFS

    A Cassandra File System (CFS) superuser can modify files in the CFS without any restrictions. Files that a superuser adds to the Cassandra File System are password-protected.

  • DSE Search/Solr support for copy fields

    If stored=false in the copyField directive:

    • Ingested data is copied by the copyField mechanism to the destination field for search, but data is not stored in Cassandra.
    • When you add a new copyField directive to the schema.xml, pre-existing and newly ingested data is re-indexed when copied as a result of the new directive.

    If stored=true in the copyField directive (backward compatibility mode):

    • Ingested data is copied by the copyField mechanism and data is stored in Cassandra.
    • When you add a new copyField directive to the schema.xml, pre-existing data is re-indexed as the result of an old copyField directive, but not when copied as the result of a new copyField directive. To be re-indexed, data must be re-ingested after you add a new copyField directive to the schema.
  • Support for changing the stored attribute value of copyField directives

Issues

  • Associating hostname with IPv6 loopback address in /etc/hosts breaks Hadoop. (DSP-2003)

Issues resolved

  • In earlier releases, when a nodetool drain operation occurred during the DSE Search/Solr shutdown process, a call to stop tomcat caused a node to hang. The shutdown now occurs without hanging. (DSP-1994)

  • Classpath problems that affected running Hadoop jobs have been fixed. The way classes and libraries are loaded has changed and dependencies set by the CLASSPATH have been minimized. (DSP-1810)

  • Cassandra-5098 has been backported to DataStax Enterprise 3.0.2 to fix a problem in Pig that incorrectly decoded row keys in widerow mode has been backported. (C*-5098)

  • The reference to the fair-scheduler.xml file in the mapred-site.xml that caused problems with the fair scheduling assignment of resources to Hadoop jobs has been fixed. To enable the fair scheduler you uncomment a section in the mapred-site.xml that looks something like this:

    <!-- FairScheduler is included. Uncomment to enable. -->
      <property>
        <name>mapred.jobtracker.taskScheduler</name>
        <value>org.apache.hadoop.mapred.FairScheduler</value>
      </property>
      . . .
      <value>dse-3.0.2/dse/resources/hadoop/conf/fair-scheduler.xml</value>
      </property>
    

    You might need to change the value element shown here. Check for the presence of a file named fair-scheduler.xml in the Hadoop conf directory. If the file has a different name, change the name of the file to fair-scheduler.xml. Specify the absolute path to the file. (DSP-1971)

  • After upgrading DataStax Enterprise 2.x to 3.0.2, a Solr-indexed field containing an empty date would cause a parse exception when encountered in search results. This problem has been resolved. (DSP-1944)

  • In DataStax Enterprise 3.0, before compaction and after all columns in a row were expired by the time-to-live (TTL) mechanism, you could still search for and find expired columns. This issue has been resolved: Expired columns are no longer returned in search results after all columns in a row/Solr document are expired. (DSP-1884)

  • DataStax Enterprise would not stop when issuing the cassandra-stop command. This problem has been resolved. (DSP-1998)

  • Fixed an issue where Solr field deletes were not being distributed to all Solr nodes. (DSP-1979)

DataStax Enterprise 3.0.1

DataStax Enterprise 3.0.1 includes updated components, enhancements, and changes. These release notes list issues and resolved issues.

Components

  • Apache Cassandra 1.1.9.3
  • Apache Hadoop 1.0.4.3
  • Apache Hive 0.9.0.1
  • Apache Pig 0.9.2
  • Apache Solr 4.0.2.2
  • Apache log4j 1.2.16
  • Apache Sqoop 1.4.2.1
  • Apache Mahout 0.6
  • Apache Tomcat 6.0.32
  • Apache Thrift 0.7.0
  • Apache Commons

Enhancements and changes

DataStax Enterprise 3.0.1 has been enhanced or changed in the following ways:

  • The default consistency level has changed from ONE to QUORUM for reads and writes to resolve a problem finding a CassandraFS block when using consistency level ONE on a Hadoop node. (DSP-1809)

  • Solr type mapping to Cassandra validator types has been refactored in this release. (DSP-1876)

  • The configuration files for these DSE Search/Solr demos have been modified to use new type mapping:

    • Solr wikipedia demo
    • Log search demo
    • Solr stress demo

    Running DSE Search/Solr demos using legacy data describes how to use data from an earlier release.

  • DSE Search provides a new multi-threaded indexing implementation to improve performance on multi-core machines. All index updates are internally dispatched to a per-core indexing thread pool and executed asynchronously: this allows for greater concurrency and parallelism, but as a consequence, index requests will return a response before the indexing operation is actually executed. The number of available indexing threads per-core is by default equal to number of available cores times 2: it can be configured by editing the max_solr_concurrency_per_core parameter in the dse.yaml configuration file; if set to 1, DSE Search will go back to the synchronous indexing behavior of the earlier release. (DSP-1644)

    Also, DSE Search provides advanced, JMX-based, configurability and visibility through the IndexPool-ks.cf (where ks.cf is the name of a DSE Search Solr core) MBean under the com.datastax.bdp namespace.

  • On the cqlsh command line, Tab completion now reveals user names when you type a CQL security command that takes a known user name as an option. (DSP-1371)

  • This release includes a plugin API for Solr updates and a plugin to the CassandraDocumentReader. The plugin API transforms data from the secondary indexing API before it is submitted to Solr. The plugin to the CassandraDocumentReader transforms the results data from Cassandra to Solr. (DSP-1493)

  • The deprecated Solr document cache is now disabled. (DSP-1794)

  • Solr/Cassandra range manipulation and token filtering algorithms have been rewritten to improve performance and internal maintenance. This change is backward compatible with previous releases. (DSP 1708)

  • This release includes two features for performing an anti-entropy node repair on a subrange of data instead of all the data in a keyspace. (DSP-1661)

    • A new dsetool command, list_subranges, estimates subranges of data in a keyspace based on a specified number of rows.
    • New nodetool repair options, start token (-st) and end token (et), designate subranges of data for distribution within those ranges.

    Using these commands, DSE Search now performs a partial re-index instead of a full re-index of Solr data after an anti-entropy repair.

  • You can now track memory usage of internal Lucene and Solr data structures using OpsCenter. These metrics are among those you can track: (DSP-1617)

    • Doc values (including norms)
    • Field caches
    • Doc-set caches
    • Terms index (FST) caches
    • NRTCachingDirectory (internal RAMDirectory)
    • IndexWriter RAM usage
  • Cassandra 5155 has been backported to the Cassandra component included in this release, Cassandra 1.1.9.3. With the enhancement, you can configure an Ec2Region data center name. In the same EC2 region, you can now run a real-time Cassandra data center and a DSE Search/Solr cluster. (DSE-1685)

  • The Query Elevation search component now functions correctly if you upload the elevate.xml to Cassandra like you upload the solrconfig.xml. Alternatively, put elevate.xml in a directory on all the nodes. (DSP-1652)

Issues

  • To insert data using CQL or Thrift that will be indexed by Solr, run the inserts on a Solr node. (DSP-2007)

  • Use a single CQL statement or batch operation in thrift to insert data in fields that the Solr schema declares a copyField having a multi-value destination. Otherwise any, subsequent writes to those fields overwrite any values in the field being copied to. (DSP-1882)

    For example, using pycassa:

    import pycassa
    from pycassa.pool import ConnectionPool
    from pycassa.columnfamily import ColumnFamily
    
    pool = ConnectionPool('test')
    col_fam = pycassa.ColumnFamily(pool, 'copy')
    b = col_fam.batch()
    b.insert('thrift_test',
      {'text1': 'textval1',
      'text2': 'textval2'})
    b.insert('thrift_test', {'multi_text_col1': "solrjson:['foo','bar']"})
    b.send()
    
    b2 = col_fam.batch()
    
  • During upgrading, you might see warnings when initially starting up a Analytics/Hadoop node. To avoid making concurrent changes to the schema, which are not fully supported in this release, nodes coordinate the configuration of the system keyspaces. When the node designated to update the schema is not fully initialized or a user runs dsetool before the schema update occurs, this type of warning occurs: (1804)

    INFO [main] 2013-04-15 19:09:22,362 CassandraFSPlugin.java (line 35)
      Found CFS filesystem in Hadoop config: cfs-archive
    WARN [RMI TCP Connection(2)-10.190.155.233] 2013-04-15 19:09:27,570
      TrackerManager.java (line 201) JobTracker location query failed with consistency level QUORUM, retrying with level ONE
    WARN [RMI TCP Connection(2)-10.190.155.233] 2013-04-15 19:09:27,571
      CassandraJobConf.java (line 358) Unable to retrieve JobTracker primary and reserve locations, will set local address as JT for Analytics-Analytics
    WARN [RMI TCP Connection(2)-10.190.155.233] 2013-04-15 19:09:27,574
      TrackerManager.java (line 157) Error writing JT location
      InvalidRequestException(why:Keyspace dse_system does not exist)
    . . .
    

    You can ignore these warnings. (DSP-1916)

  • Solr can return duplicated results because Solr improperly indexes Cassandra when all of these conditions exist:

    • Primary key values contain special characters, such as #, @ and $.
    • The unique key field in the Solr schema is of field type "text" (solr.TextField).
    • The field has a tokenizer that treats such special characters as white space.

    The solution is to ensure that the Solr unique key field is of type string (solr.StrField). (DSP 882 and 839)

  • When using Async or HsHa, Hadoop users may see an error that a connection failed (Failed to open server transport) and a different transport will be used (Falling back to TFramedTransport). This error is benign. To remove the error, in mapred-site.xml, set the property cassandra.client.transport.factory to org.apache.cassandra.thrift.TFramedTransportFactory". You may also need to fix the property in the dsetool, nodetool, and cassandra-cli scripts. (DSP-1844)

  • The RPC Thrift server doesn't support Async or HsHa when using Kerberos. You must either change the settings in cassandra.yaml to rpc_server_type: sync, or disable Kerberos and restart the server. (DSP-1844)

Issues resolved

  • Range query performance problems and inconsistency across nodes. (DSP-1577)
  • The Query Elevation search component now works correctly. You can upload the elevate.xml to C* like solrconfig.xml or have it reside in a directory on all the nodes. (DSP-1652)
  • Solr queries return inconsistent values after nodes are rebalanced. (DSP-1672)
  • Solr delete by query, when the key is a UUID, creates a corrupt tombstone. (DSP-1709)
  • Cassandra-5301 has been backported to the Cassandra 1.1.9.3 included in this release. This fix relaxes the consistency level for authentication queries for non-default users. The default super user, "cassandra", requires reading the dse_auth keyspace at QUORUM. Other superusers read and write at a consistency level of ONE.
  • For backward compatibility, the Thrift set_keyspace behavior has been altered to allow a call to set_keyspace followed by a call to login. (DSP-1878)
  • You no longer need to enable Hadoop to connect to external addresses. DSE automatically sets the listen_address:rpc_port. (DSP-1139)
  • An extraneous copy of the Cassandra Command Line Interface (CLI) in <install location>/resources/cassandra/bin in a tarball installation was not correctly configured. This copy has been removed. Use the CLI utility located in <install location>/bin. (DSP-1832)

DataStax Enterprise 3.0

DataStax Enterprise 3.0 includes updated components enhancements, and changes.

Components

  • Apache Cassandra 1.1.9
  • Apache Hadoop 1.0.4.2
  • Apache Hive 0.9.0.1
  • Apache Pig 0.9.2
  • Apache Solr 4.0 GA
  • Apache Thrift 0.7.0
  • Apache log4j 1.2.16
  • Apache Sqoop 1.4.2.1
  • Apache Mahout 0.6

Enhancements

DataStax Enterprise 3.0 has been enhanced in the following ways:

  • Kerberos security added to all components (Cassandra, Hadoop, Solr)
  • Cassandra-based password authentication backported (Cassandra only)
  • Full Cassandra-level data auditing
  • At rest data encryption
  • Integration of the final Solr 4.0 release
  • Various Solr stability improvements:
    • Concurrent solr core reload
    • Improved handling of bad schemas
    • Improved stability under concurrent access situations
    • Proper de-duplication of search results
  • Other Solr improvements
  • Update to Hadoop 1.0.4.2 and other component updates
  • Various Hadoop stabilization fixes
  • Various improvements to stabilize the upgrade process

Changes and requirements

Changes to the Solr demo script

In this release, changes have been made to the Solr demo script. The scripts to run the wikipedia demo have been updated. For example, the 1_add_schema.sh script has been updated to include these lines:

CREATE_URL="http://${host}:8983/solr/admin/cores?action=CREATE&name=${KS}.${CF}"
curl -X POST $CREATE_URL
echo "Created index."

Disk full alert

DataStax assumes that the Customer's operation team monitors cluster resources to ensure that enough disk space exists. In the event of an oversight, Cassandra marks the node to be decommissioned when the disk is approaching full. The server should stop serving when the disk is almost full, the node is removed from the ring, and Cassandra issues an alert.

Java requirements

DataStax recommends using the latest 64-bit version of Java 6.

  • DSE and Cassandra: JRE not supported below 1.6.0_29.
  • Kerberos: The JRE should be later than 1.6.0_26 due to a Kerberos bug.

Resolved issues

This release fixes the following issues:

  • Fields in FieldInfos that are in the SolrInputDocument when the document is sent to Solr were neither stored in Solr nor indexed. (DSP-1352)
  • On node restart, errant Solr documents in commit log prevented the node from starting. (DSP-1297)
  • After storing data on one Solr node, the data replication was inconsistent. Now, the data shows up in searches from other nodes. (DSP-1121)

Issues

This release has the following issues:

  • An outdated file, SECURITY_NOTES, is included in the installation directory. This file should have been removed. For security information and procedures, use this document.
  • The Cassandra log4j appender doesn't support multiple hosts. (DSP-1601)
  • The sstableloader tool does not work in an environment using Kerberos. The workaround is to run sstableloader in a ring that doesn't use Kerberos. (DSP-1168)
  • When you have a multiple data center cluster with Kerberos enabled, all keyspaces accessed from Hadoop must be configured with NetworkTopologyStrategy. If configured with SimpleStrategy, Hadoop jobs will hang. (DSP-1630)
  • MapReduce jobs hang before completing or finishing cleanup with older versions of Hadoop (MAPREDUCE-4560, MAPREDUCE-4299. The workaround is remove the mapred.reduce.slowstart.completed.maps parameter and restart. (DSP-1154)
  • The nodetool repair -pr command does not completely repair a keyspace unless it is in every datacenter. (CASSANDRA-5424)
  • In earlier releases, the authenticator allowed various Cassandra clients, such as Hector, to set a keyspace, and then login. In this release, the org.apache.cassandra.auth.PasswordAuthenticator requires that the client login and then set the keyspace. (Cassandra-5423, DSP-1878)
  • Open Source Solr (OSS) supports relative paths set by the <lib> property in the solrconfig.xml, but DSE Search/Solr does not. Configuring Solr library paths describes a workaround for this issue that DataStax Enterprise will address in a future release. (DSP-1840)
  • After upgrading DataStax Enterprise 2.x to 3.x, a Solr-indexed field containing an empty date causes a parse exception when encountered in search results. (DSP-1944)