The past several releases of DSE have included several pivotal analytics innovations, specifically the popular Always On Spark SQL and DataStax Enterprise File System (DSEFS) features. With the latest release of DSE, the analytics story continues as we provide enhancements to DSE Analytics that make all of its great features more secure, faster, and easier to use.
In DSE 6.7, we’ve rounded out the DSE Analytics Kerberos authentication story by providing support for DSEFS REST API users. In addition to this, we’ve added convenience methods to the DSEFS API to enable recursive commands as well as improved logging in the DataStax Spark Shell, a popular choice for developers working with DSE Analytics. Finally, we’ve implemented intelligent throttling mechanisms that optimize performance in DSE Analytics between the Apache Cassandra™ and Spark™ components.
Users of the DSEFS REST API now have the ability to access DSEFS data securely with DSE’s Unified Authentication functionality. In DSE 6.7, the DSEFS REST API has been integrated with Kerberos providing peace of mind for architects, developers, and operators of applications built on the power of DataStax’s distributed file system.
Setup and configuration of a secured DSEFS REST endpoint is easy thanks to the options provided by the integration of SPENGO and Kerberos token delegation in DSE Analytics. Users can simply choose the method they desire to obtain a delegation token and then register that token with the REST endpoint. Now, all DSEFS REST API calls to a specific endpoint are secured through Kerberos authentication.
DSEFS Recursive Commands
Users of the first releases of DSEFS have missed a few convenience methods that support bulk operations for file management. With DSE 6.7, DSEFS users have the ability to make bulk changes using the recursive flag –R for operations, such as copy (cp), permissions management (chmod), and owner management (chown). There’s no configuration needed to take advantage of this feature; simply upgrade to DSE 6.7 and you’re able to start using the recursive command flag.
Spark Shell Logging
Users who come to DSE Analytics from a Spark background have noticed that the logging and messages passed into the DSE Analytics shell differ from those of the open source Spark shell. These differences have been removed and now messages experienced in both environments are in synch, providing a friendly experience for native Spark users in DSE Analytics.
At DataStax we’re constantly looking for ways to improve the lives of developers building applications against DSE. One small but impactful improvement included for DSE Analytics in 6.7 is the introduction of a rate-limiting feature for DSE Analytics read operations. This new feature ensures that developers don’t inadvertently overload DSE clusters by issuing overly aggressive queries that can consume all available DSE Analytics resources.
Working with this new feature is simple: Users simply set the newly introduced spark.cassandra.input.throughputMBPerSec property available per DSE Analytics task; once set, DSE Analytics will ensure that reads are limited at the rate specified.
This new property is specifically useful in situations where a DSE Analytics task leverages the JoinWithCassandraTable method.
More to Come
The features contained in DSE 6.7 help strengthen the usability of DSE for analytics users. DataStax will continue to listen to our users and provide meaningful advances in DSE Analytics for years to come.