What's new in Apache Cassandra
Cassandra 1.2 introduced many improvements, which are described briefly in this section.
Key general features
- Cassandra 1.2.2 and later support CQL3-based implementations of IAuthenticator and IAuthorizer for use with these security features, which were introduced a little earlier:
- Internal authentication based on Cassandra-controlled login accounts and passwords.
- Object permission management using internal authorization to grant or revoke permissions for accessing Cassandra data through the familiar relational database GRANT/REVOKE paradigm.
- Client-to-node encryption that protects data in flight from client machines to a database cluster was also released in Cassandra 1.2.
- Virtual nodes: Prior to this release, Cassandra assigned one token per node, and each node owned exactly one contiguous range within the cluster. Virtual nodes change this paradigm from one token and range per node to many tokens per node. This allows each node to own a large number of small ranges distributed throughout the ring, which has a number of important advantages.
- Murmur3Partitioner: This new default partitioner provides faster hashing and improved performance.
- Faster startup times: The release provides faster startup/bootup times for each node in a cluster, with internal tests performed at DataStax showing up to 80% less time needed to start primary indexes. The startup reductions were realized through more efficient sampling and loading of indexes into memory caches. The index load time is improved dramatically by eliminating the need to scan the primary index.
- Improved handling of disk failures: In previous versions, a single unavailable disk had the potential to make the whole node unresponsive (while still technically alive and part of the cluster). Memtables were not flushed and the node eventually ran out of memory. If the disk contained the commitlog, data could no longer be appended to the commitlog. Thus, the recommended configuration was to deploy Cassandra on top of RAID 10, but this resulted in using 50% more disk space. New disk management solves these problems and eliminates the need for RAID as described in the hardware recommendations.
- Multiple independent leveled compactions in parallel: Increases the performance of leveled compaction. Cassandra's leveled compaction strategy creates data files of a fixed, relatively small size that are grouped into levels.
- Configurable and more frequent tombstone eviction: Tombstones are evicted more often and automatically in Cassandra 1.2 and are easier to manage. Configuring tombstone eviction instead of manually performing compaction can save users time, effort, and disk space.
- Support for concurrent schema changes: Cassandra 1.1 introduced modifying schema objects in a concurrent fashion across a cluster, but did not support programmatically and concurrently creating and dropping tables (permanent or temporary). Version 1.2 includes this support, so multiple users can add/drop tables, including temporary tables, in this way.
Key CQL features
CQL 3, which was previewed in Beta form in Cassandra 1.1, has been released in Cassandra 1.2.
CQL 3 is now the mode for cqlsh. CQL 3 supports schema that map Cassandra storage engine cells to a more powerful and natural row-column representation than earlier CQL versions and the Thrift API. CQL3 transposes data partitions (sometimes called "wide rows") into familiar row-based resultsets, dramatically simplifying data modeling. New features in Cassandra 1.2 include:
- Collections: Collections provide easier methods for inserting and manipulating data that consists of multiple items that you want to store in a single column; for example, multiple email addresses for a single employee. There are three different types of collections: set, list, and map. Common tasks that required creating a multiple columns or a separate table can now be accomplished intuitively using a single collection.
- The CQL native/binary protocol frame-based transport designed for CQL 3 is a flexible alternative to the Thrift API. To use the new binary protocol, change the start_native_transport option to true in the cassandra.yaml file. An open source DataStax Java Driver and .NET Driver support the CQL binary protocol. See the driver documentation for more information.
- Query profiling/request tracing: This cqlsh feature includes performance diagnostic utilities aimed at helping you understand, diagnose, and troubleshoot CQL statements sent to a Cassandra cluster. You can interrogate individual CQL statements in an ad-hoc manner, or perform a system-wide collection of all queries/commands sent to a cluster. The new nodetool utility adds probabilistic tracing for collecting all statements sent to a database to isolate and tune most resource intensive statements.
- System information: You can easily retrieve details about your cluster configuration and database objects by querying tables in the system keyspace using CQL.
- Atomic batches: Prior versions of Cassandra allowed for batch operations for grouping related updates into a single statement. If some of the replicas for the batch failed mid-operation, the coordinator would hint those rows automatically. However, if the coordinator itself failed in mid operation, you could end up with partially applied batches. In version 1.2 of Cassandra, batch operations are guaranteed by default to be atomic, and are handled differently than in earlier versions of the database.
- Flat file loader/export utility: A new cqlsh utility facilitates importing and exporting flat file data to/from Cassandra tables. Although initially introduced in Cassandra 1.1.3, the new load utility wasn’t formally announced until now. The utility mirrors the COPY command from the PostgreSQL RDBMS. A variety of file formats are supported including comma-separated value (CSV), tab-delimited, and more, with CSV being the default.
Other enhancements and changes
A number of additional CQL 3 enhancements to Cassandra have been made. See the list of other features.