DataStax Enterprise 3.1 Documentation

Release notes

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

DataStax Enterprise 3.1.x release notes cover these releases:

DataStax Enterprise 3.1.6 release notes

DataStax Enterprise has been enhanced to include Cassandra 1.2.13. Apache documentation covers release notes for Cassandra 1.2.13 and earlier. Cassandra 1.2.13 supports CQL 3 and CQL2; CQL 2 is deprecated and removal is planned for Cassandra 3.0.

Components

  • Apache Cassandra 1.2.13
  • Apache Hadoop 1.0.4.8
  • Apache Hive 0.9.0.3
  • Apache Pig 0.9.2
  • Apache Solr 4.3.0.1.4
  • Apache log4j 1.2.16
  • Apache Sqoop 1.4.2.11.1
  • Apache Mahout 0.6
  • Apache Tomcat 6.0.32
  • Apache Thrift 0.7.0
  • Apache Commons

Issue resolved

This release fixes the issue causing snapshot repairs to block operations when another node fails to respond to the repair message. (CASSANDRA-6415)

Issues

  • In this release, the flush_largest_memtables_at setting is 0.75, which is typically too small causing excessive flushing of the memtable to disk. The workaround is to change the setting to 0.80 in the cassandra.yaml. (DSP-2989)

  • A DSE 3.1 node can fail to start if you comment out or remove one of the following sections but leave the other. (DSP-3078)

    # Replication strategy to use for the auth keyspace.
    # Following an upgrade from DSE 3.0 to 3.1, this should be removed
    auth_replication_strategy: org.apache.cassandra.locator.SimpleStrategy
    
    # Replication options to use for the auth keyspace.
    # Following an upgrade from DSE 3.0 to 3.1, this should be removed
    auth_replication_options:
        replication_factor: 1
    

    This can occur on either a fresh installation of 3.1 or an upgrade from 3.0.

    After the node is restarted the first time, the following error appears in system.log and when running nodetool ring:

    ERROR 16:12:35,327 Exception in thread Thread[OptionalTasks:1,5,main] java.lang.RuntimeException
       at org.apache.cassandra.locator.AbstractReplicationStrategy.
       createReplicationStrategy(AbstractReplicationStrategy.java:274)
       at org.apache.cassandra.db.Table.createReplicationStrategy(Table.java:278)
    

    Warning: To avoid this issue make sure that you comment out or remove both sections at once. If you restart the node a second time, it will fail to start, so do not restart any nodes once this error has occurred until you have performed one of the following workarounds.

    If you have already encountered the issue, use one of the following workarounds.

Workaround 1: If the error is occurring but all nodes are still running

Use this workaround on a fresh installation or upgrade from DSE 3.0 to 3.1.

  1. Comment out both sections in the cassandra.yaml file:

    # Replication strategy to use for the auth keyspace.
    # Following an upgrade from DSE 3.0 to 3.1, this should be removed
    #auth_replication_strategy: org.apache.cassandra.locator.SimpleStrategy
    
    # Replication options to use for the auth keyspace.
    # Following an upgrade from DSE 3.0 to 3.1, this should be removed
    #auth_replication_options:
    #    replication_factor: 1
    
  2. Fix the replication strategy on the dse_auth keyspace using cqlsh:

    ALTER KEYSPACE dse_auth WITH replication = {
     'class': 'SimpleStrategy',
     'replication_factor': 1
    };
    
  3. Follow instructions for enabling internal security in the documentation.

Workaround 2: If some nodes are already down

  1. Perform steps 1-2 of the first workaround.
  2. On any nodes that do not start, move these directories from the system keyspace directory to a backup location:
    • Schema
    • Migrations
    • schema_columnfamilies
    • schema_columns
    • schema_keyspaces
  3. Restart the nodes. They should start and get the schema from the other nodes that are still running.
  4. Follow instructions for enabling internal security in the documentation.

Workaround 3: If all of the nodes in the cluster are already down

  1. Perform steps 1-2 of the second workaround.
  2. Start the nodes. The schema will now be empty.
  3. If you have a backup schema creation script written in cqlsh or cassandra-cli, replay the script to restore the schema after you start the nodes. If you do not have a backup, recreate the schema from memory to avoid losing the data.
  4. Follow instructions for enabling internal security in the documentation.

DataStax Enterprise 3.1.5 release notes

These release notes cover components, enhancements and changes, and issues resolved and unresolved:

Components

  • Apache Cassandra 1.2.12.2
  • Apache Hadoop 1.0.4.8
  • Apache Hive 0.9.0.3
  • Apache Pig 0.9.2
  • Apache Solr 4.3.0.1.4
  • Apache log4j 1.2.16
  • Apache Sqoop 1.4.2.11.1
  • Apache Mahout 0.6
  • Apache Tomcat 6.0.32
  • Apache Thrift 0.7.0
  • Apache Commons

Enhancements and changes

This release includes the following changes and enhancements:

  • Support for using an external file system, such as s3ql in Hive while using the CassandraFS as the DSE Analytics/Hadoop file system.

    Follow instructions in this document to set up and use this feature.

  • Improved Hadoop performance when running vnodes within the same, or a different, data center. (DSP-2572/CASSANDRA-6268)

  • For DSE Analytics/Hadoop nodes, the default consistency level has changed from ONE to LOCAL_ONE for MapReduce jobs.

Issues resolved

This release fixes the following issues:

  • Fixed the issue preventing Hadoop from accessing libraries in Hive. (DSP-1495)
  • Fixed the DESCRIBE SCHEMA command on an Analytics node that resulted in an error message on the cqlsh command line. (DSP-2268)
  • Fixed issue that caused an error when setting max_solr_concurrency_per_core parameter to 1. (DSP-2321)
  • Fixed the issue preventing access to external file systems by Hive. (DSP-2377)
  • Fixed the problem causing repeated Hive queries to impact CassandraFS performance. (DSP-2441)
  • Two files are now available on packaged as well as tarball installations to copy/paste code and run a pig script instead of stepping through the Explore library data demo manually. (DSP-2461)
  • Fixed issue of Job Tracker moving from one node to another. (DSP-2520)
  • Fixed the problem that returned incorrect values when applications set widerows=true or set the cassandra.range.batch.size=1. (DSP-2537)
  • Fixed the problem causing the Nodetool utility to fail to recognize the --port option. (DSP-2545)
  • Fixed the problem that caused secondary indexes to work unreliably with row caches. (DSP-2551)
  • Fixed an issue that caused duplicate DSE Search/Solr results to be returned. (DSP-2563)
  • Include and set up the old snappy for older distributions. (DSP-2567)
  • Fixed the issue involving long-running repairs on Solr admin keyspace that prevented timely core create/reload operations on large clusters. (DSP-2570)
  • Fixed the issue involving multiple dse cassandra -Dcassandra.replace_address commands. (DSP-2574)
  • Fixed the issue that caused a Java-level deadlock. (DSP-2579)
  • Fixed the issue causing the Solr core recovery to deadlock while waiting for the core to be recreated. (DSP-2585)
  • Fixed issue where dsetool checkcfs throws NPE when file listed in the directory does not exist. (DSP-2594)
  • Fixed file handle leaks in the CassandraFS. (DSP-2660)
  • Fixed the issue that affected distribution of data by the bulk loader. The sstableloader utility now works correctly. (DSP-2612)
  • Fixed error when upgrading packages on Debian appears to corrupt the limits configuration. (DSP-2696)
  • Fixed a race condition when loading solr cores during an upgrade. (DSP-2702)
  • Fixed the issue that caused CQL Native connections to be refused when you enable internal authentication. (DSP-4097)

Issues

This release has the following unresolved issues:

  • Running Hive queries on a node in an analytics data center that has no replica causes a TimedoutException. The error might look something like this:

    Exception in thread "Thread-11" java.lang.RuntimeException: Error while
      reading from task log url at org.apache.hadoop.hive.ql.exec.errors.
      TaskLogProcessor.getStackTraces(TaskLogProcessor.java:240)
    . . .
      Caused by: java.io.IOException: Server returned HTTP response code:
      400 for URL: http://ip-10-182-188-95.ec2
    . . .
    

    The job tracker error message contains more details information than shown here.

    Perform one of these workarounds to increase the replication factor:

    • If data is not local, in the TBLPROPERTIES clause of the Hive query, configure the cassandra.consistency.level property to increase the replication from the new default LOCAL_ONE to at least ONE. (DSP-2718)

    • Configure the read and write properties to change the Cassandra consistency level for MapReduce jobs in the mapred-site.xml. Change the consistency level from LOCAL_ONE to at least ONE:

      <property>
        <name>cassandra.consistencylevel.read</name>
        <value>ONE</value>
      </property>
      <property>
        <name>cassandra.consistencylevel.write</name>
        <value>ONE</value>
      </property>
      
    • Alter the keyspace replication factor to guarantee at least one replication of the analytics node in the data center.

DataStax Enterprise 3.1.4 release notes

These release notes cover components, enhancements and changes, issues resolved, and outstanding issues in the release.

Components

  • Apache Cassandra 1.2.10
  • Apache Hadoop 1.0.4.8
  • Apache Hive 0.9.0.1
  • Apache Pig 0.9.2
  • Apache Solr 4.3.0.1.2
  • Apache log4j 1.2.16
  • Apache Sqoop 1.4.2.3
  • Apache Mahout 0.6
  • Apache Tomcat 6.0.32
  • Apache Thrift 0.7.0
  • Apache Commons

Enhancements and changes

This release includes the following changes and enhancements:

  • Pig CQL 3 push down filter (DSP-2214)

  • Support for CQL collections in Pig (DSP 2373, DSP-2360)

  • Improved debugging of CassandraFS corruption using the dsetool utility (DSP-2416)

  • Enforced syntax for flags, such as -h, prefaced by a hyphen. These flags must come before other dsetool command arguments. (DSP-2430) For example:

    dsetool -h 127.0.0.1 ring
    
  • Error logging of Solr request errors that includes parameters sent. (DSP-2450)

  • Counting now performed when the commit log stores and replays Solr entries. A new CommitLog-core.name mbean publishes the counters, named entries and replayed. (DSP-2454)

  • Finalized finalized Pig formatting syntax, which differs from that of DSE 3.1.2-3.1.3 and conforms to Cassandra 1.2.10. (DSP-2464)

  • Includes tuning knobs for dealing with large blobs and many CFs (CASSANDRA-5982, DSP-2470)

  • Fixes the issue that deletes snapshots in use during snapshot repair. (CASSANDRA-6011, DSP-2489)

  • Updated version of Cassandra to 1.2.10 (DSP-2504)

Issues resolved

This release fixes the following issues:

  • After deleting Solr data by dropping table using the CQL DROP TABLE command, or by manually deleting the Solr index directory, you needed to shut down and restart the server before attempting to recreate the Solr core. In this release, after dropping the table, you can upload the Solr schema and configuration and create the Solr core. (DSP-2024)

  • The issue causing DataStax Enterprise to crash when you added client_encryption_options to the dse.yaml file has been resolved. In this release, when you attempt to add these options to the dse.yaml file, an error message results. (DSP-2279)

    Configure the client_encryption_options only in the cassandra.yaml file, as described in the SSL documentation.

  • The incorrect default gc_grace storage parameter for Cassandra tables has been corrected from 60 seconds to 10 days. (DSP-2342)

  • Added missing information in User resource limits. (DSP-2344)

  • The issue that prevented MapReduce Jobs from running longer than 24 hours on kerberized clusters is resolved. (DSP-2402)

  • Under certain circumstances the DataStax Enterprise service reported a failed exit status when the service actually started. This issue has been resolved. (DSP-2422)

  • The issue causing Pig to return an extraneous empty tuple has been resolved. (DSP-2424)

  • Fixed race condition between CFS compaction and nodetool scrub. (DSP-2425)

  • The issue causing complex Hive and Pig queries to cause a deadlock has been resolved. (DSP-2434)

  • The back-pressure implementation has been enhanced to improve indexing performance in the following ways: (DSP-2447)

    • Throttling new index requests rather than completely blocking them
    • Making the default back-pressure threshold based on the total number of queued requests, rather than the average
    • Adding support for configurable back-pressure using a back_pressure_threshold_per_core option in dse.yaml file
  • Scrubbing SSTables in the CassandraFS erroneously removed information about CassandraFS files. This caused SSTables containing deleted CassandraFS files to accumulate despite compaction or other cleanup operations. This issue has been resolved. (DSP-2472)

  • The issue causing Solr multivalued date fields to malfunction has been resolved. Multivalued date fields now work. (DSP-2480)

  • The issue caused by pending compactions causing nodes to die as if disk space was low has been resolved. Cassandra-5605 has been backported to this release. (DSP-2485)

  • Liveness issues calling CassandraSolrConfig#getColumnLimit under high concurrency have been resolved. (DSP-2494)

  • The issue causing the ShardRouter to write many messages to the system.log has been resolved. (DSP-2495)

  • Solr core resource management has been made more robust. Core loading/creation/reload no longer fails at the first attempt when resource loading/writing fails. (DSP-2505)

  • The issue with inserting a NULL value using Hive, for example when import data into Cassandra, has been resolved. (DSP-2521)

  • Making a mistake when typing a Pig LOAD command, for example, a mistake in the kesypace name, would cause not only the mistaken command to fail, but also the next, corrected command to fail with the same message, for example:

    Unexpected internal error. InvalidRequestException(why:Keyspace 'ks1' does not exist.)
    

    This issue has been resolved. (DSP-2527)

Issues

The following issues are present in this release:

  • Commit log file handles can be left open during heavy Solr indexing. Do not truncate a table that Solr has indexed during the indexing operation. You can check indexing status using the Solr Admin. (DSP-2540)

  • To prevent Hive from throwing the error, "Unrecognized option: -javaagent:/usr/share/dse/cassandra/lib/jamm-0.2.5.jar" comment out the following line in the cassandra-env.sh: (DSP-2549)

    #echo "xss = $JVM_OPTS"
    
  • The sstableloader data distribution is broken. The workaround is to use 3.1.3 as the bulk loading client; it will work with a 3.1.4 cluster. (Cassandra-6272)

DataStax Enterprise 3.1.3 release notes

This release fixes the following issues:

  • The exception (array index out-of-bounds) that occurred when writing to a wide-row Thrift table using CqlRecordWriter has been resolved. (DSP-2334)
  • The problem causing a slow replay of the commit log has been resolved by optimizing flushing of SSTables, making inserts multi-threaded, and other improvements. (DSP-2405)
  • The problem causing an index to fail when an empty string is inserted into an indexed column has been resolved. (DSP-2429, CASSANDRA-5965)

DataStax Enterprise 3.1.2 release notes

This release includes the changes, enhancements, and resolved issues.

Changes in the upgrade procedure

  • Upgrading to DataStax Enterprise 3.1.0 - 3.1.2 directly from some versions of DataStax Enterprise, DataStax Community, and Cassandra are not supported. See version restrictions.
  • The client_encryption_options for enabling client-to-node SSL have been removed from dse.yaml in 3.1.2. To enable client-to-node SSL, set the option in the cassandra.yaml file. If you are upgrading from an earlier version of 3.1, manually remove these settings from your dse.yaml.

Other changes and enhancements

Issues

A Cassandra issue can cause a problem when decommissioning a node that interferes with streaming data from the node to SSTables. The workaround is to decommission the node and then repair the cluster. The efficient and recommended way to repair a node, or cluster, is to use the subrange repair method.

Running the DESCRIBE SCHEMA command on an Analytics node results in an error message on the cqlsh command line. (DSP-2268)

Issues resolved

This release resolves these issues:

  • Fixes the issue of sstableloader failing with Kerberos and SSL by adding support for Kerberos and SSL to the sstableloader command. (DSP-1168)
  • Fixed the issue that caused the sstableloader command to return 0 on success or failure. Now, the command returns 1 on failure and 0 on success. (DSP-2325)
  • Fixed the nodetool enablethrift and nodetool disablethrift commands that failed to enable and disable the Thrift transport. (DSP-2343)
  • Fixed the issue that prevented the Hive hwi service from starting. (DSP-2364)
  • Backported CASSANDRA-5855 that fixed the nodetool scrub command to handle the CQL compound primary key. (DSP-2367)
  • Fixed broken hive Views. (DSP-2369)
  • Fixed the issue associated with using the EC2MultiRegionSnitch that caused Solr to report unavailable shards for ranges. (DSP-2371)
  • Fixed the issue that caused Hive to return an error when you query a table having null values in any columns. (DSP-2372)
  • Fixed the issue with the pre-flight check tool that falsely reported a cassandra.yaml error. The pre-flight check tool is located in /usr/share/dse/tools of packaged installs and is a collection of tests that can be run on a node to ensure that it uses the best settings. The tool can fix most of the settings it finds if invalid, or not ideal. The tool is not available in tarball installations. (DSP-2273)
  • Fixed an issue that occurred under certain circumstances when a query selected multiple partitioned columns in a hive table. No results were returned. (DSP-2374)
  • Removed redundant versions of Jetty from the distribution files. (DSP-2378)
  • Fixes a problem using the Solr DataImportHandler to import into a Solr core running in DataStax Enterprise 3.1.1, which caused an exception when the SolrWriter submitted the document to Cassandra. (DSP-2381)
  • Fixed the issue that selected and re-indexed all data having a TTL property, even data set to expire in the future. (DSP-2385)
  • Fixed the shard availability problem after nodes go down and come up. (DSP-2400)
  • Backported CASSANDRA-5234 to fix a problem with counters. (DSP-2408)
  • Fixed the issue that caused Solr queries to fail when bootstrapping nodes are selected as shards. (DSP-2411)

DataStax Enterprise 3.1.1 release notes

Release 3.1.1 resolves this issue:

The issue that causes a Solr startup problem when a PropertyFileSnitch or GossipingPropertyFileSnitch is used is resolved. (DSP-2283)

Issues

Thread stack size. To avoid StackoverflowErrors, you may need to set the JVM option -Xss to 190k or higher in the cassandra-env.sh file:

JVM_OPTS="$JVM_OPTS -Xss190k"

DataStax Enterprise 3.1 release notes

DataStax Enterprise 3.1 includes updated components, enhancements, and changes. These release notes list issues and resolved issues.

Components

  • Apache Cassandra 1.2.6.1
  • Apache Hadoop 1.0.4.8
  • Apache Hive 0.9.0.1
  • Apache Pig 0.9.2
  • Apache Solr 4.3.0.1
  • Apache log4j 1.2.16
  • Apache Sqoop 1.4.2.3
  • Apache Mahout 0.6
  • Apache Tomcat 6.0.32
  • Apache Thrift 0.7.0
  • Apache Commons

Enhancements and changes

  • General

    • Support for virtual nodes in Cassandra. Currently, DataStax recommends using virtual nodes only on data centers running purely Cassandra workloads. You should disable virtual nodes on data centers running either Hadoop or Solr workloads.
    • Support for the Murmur3 partitioner.
    • Tested on Oracle Java 1.7; no issues found.
    • Support for audit logging of queries and prepared statements submitted to the DataStax Java Driver, which uses the CQL binary protocol.
    • Support for the Cassandra sstableupgrade tool.
  • Solr

    • Capability to perform a relatively fast repair of a subrange instead of repairing the entire range, which can incur a time-consuming, full re-index of Solr data.

      Although announced in a minor release of DataStax Enterprise 3.0, this enhancement is listed here to raise awareness of its availability and use. Using the faster process to repair subranges is recommended for handling inconsistencies in Solr query results and for handling other problems.

    • Support for docValues, introduced in Solr 4.2, in the schema field definition.

    • Change to the DSE Search/Solr ttl rebuild timeout properties, which ensures purging of expired data from Solr indexes. New options are:

      • ttl_index_rebuild_options.initial_delay
      • ttl_index_rebuild_options.fixed_rate_period
      • ttl_index_rebuild_options.max_docs_per_batch

      The ttl (time-to-live) field format and query on the Lucene side has not changed, so upgrading to DataStax Enterprise 3.1 is not affected by this change.

    • Configurable TTL for a field or document using the Solr HTTP API.

    • Configurable column limit prevents out of memory errors by controlling the maximum number of indexed columns overall, not just dynamic field columns, as well as columns returned during queries. Effective only when using dynamic fields.

    • Per-segment caching for filters and docsets, which improves real-time search performance.

    • The dseTypeMapping version includes a force option <changing-solr-type> (for use by experts only).

    • Solr shard routing has been changed resulting in a slightly improved throughput and query times on Solr clusters, but not when using Vnodes.

    • DSE Search Solr distributed delete and search performance has been improved using a technique that loads only the unique key and explicitly requested fields.

Major Issue: Solr integration problem using PropertyFileSnitch or GossipingPropertyFileSnitch

Solr integration will not work with PropertyFileSnitch or the GossipingPropertyFileSnitch. This limitation will be removed as soon as possible. (DSP-2283)

Other Issues

  • GLIBCXX_3.4.9 not found. This error may appear in older Linux distributions when installing DSE from the binary tarball. The workaround is to replace snappy-java-1.0.5.jar with snappy-java-1.0.4.1.jar. (DSP-2189)

  • Do not run MapReduce jobs while the cluster is in a partially upgraded state or fail to observe any other limitations during upgrading.

  • Before upgrading a cluster in which you have decommissioned a node, follow the relevant steps in Recommissioning a node.

  • Issuing a DESCRIBE SCHEMA command on an Analytics node results in an error message on the cqlsh command line that you can ignore: (DSP-2268)

    Don't know how to parse type string
    u'org.apache.cassandra.db.marshal.DynamicCompositeType
    (t=>org.apache.cassandra.db.marshal.TimeUUIDType,
    . . .
    
  • The cqlsh DESCRIBE command can produce ddl that has the wrong parameters in it. If compression is not set for a table, cqlsh omits the compression attribute. If you rename the table and issue the DESCRIBE command again, the Snappy compression setting appears. This may occur with other parameters.(CASSANDRA-5766)

Issues resolved

  • Distributed search with spellcheck problems have been resolved. (DSP-2132) For usage information, see Querying using spellcheck.
  • Distributed search with Solr groupby and trie fields now works. (DSP-2130)
  • Performance problems executing CQL over Thrift in DataStax Enterprise have been resolved by refactoring CQL handing in DataStax Enterprise. (DSP-2054)
  • The reference to the fair-scheduler.xml file in the mapred-site.xml that caused problems with the fair scheduling assignment of resources to Hadoop jobs has been fixed. The fairscheduler jar has been updated. (DSP-1964)
  • Support for running MapReduce jobs on a remote cluster. (DSP-2113) See Configuration for running jobs on a remote cluster.
  • Datetime parsing in cqlsh 3 has been fixed. (DSP-2170)
  • The problem causing Hive queries to fail when connecting to local host when the rpc_address is not 0.0.0.0 has been resolved. (DSP-1996)
  • The dse shell script no longer spawns an extra parent process for java. (DSP-1779)