DataStax Enterprise is not a data warehouse platform like those offered by pure Hadoop vendors, but rather is designed to use Hadoop for analyzing line-of-business data stored in a distributed Cassandra database cluster. DataStax Enterprise smartly separates analytic operations from transactional workloads so that neither competes with the other for data management resources.
Further, there is no need to extract-load (ETL) data between different transactional and analytic systems as Cassandra’s built-in replication automatically moves data from database cluster nodes marked for transactional (Cassandra) data to nodes specified for analytic operations using Hadoop. Data modifications done on analytic nodes can also be replicated back to transactional nodes with no involvement.
Datastax Enterprise supports analytic operations using MapReduce, Hive, Pig, Sqoop, and Mahout.
Benefits of Integrated Analytics
The benefits you receive from running analytics on your Cassandra data in DataStax Enterprise include the following:
- Comprehensive workload management – DataStax Enterprise lets you run analytics on your Cassandra data using a number of Hadoop components including MapReduce, Hive, Pig, Sqoop, and Mahout. It also allows you to run mixed real-time transactions, analytics and enterprise search workloads in a single, seamlessly integrated database with no resource conflict.
- Continuously available analytics – Because DataStax Enterprise uses Cassandra as its storage foundation, you enjoy the benefit of having a continuously available analytics platform that doesn’t suffer from the drawbacks of a traditional Hadoop implementation, which has single points of failure and does not work across multiple data centers.
- Simpler deployment and management – To run Hadoop-style analytics on your Cassandra data, there is no need to set up a Hadoop name node, secondary name node, Zookeeper, and so on. Instead, you simply specify which nodes in your DataStax Enterprise cluster will be used for analytics and which will be used for transactional workloads, and that’s it. Moreover, you can visually create new clusters and bring new analytic nodes online in just minutes using DataStax OpsCenter.
Scenarios it transforms
Here’s just a glimpse of some of the ways you can gain greater insight from your line-of-business data in less time using the integrated analytics capabilities in DataStax Enterprise:
- Large-scale batch analytics – Enterprises can make sense of enormous data volumes to quickly spot trends, identify bottlenecks and anticipate issues in practically any complex, large-scale system, including computer networks, communications infrastructure, transportation systems, supply chains, manufacturing processes, refining operations and scores of others. For example, eBay relies on the integrated analytics capabilities in DataStax Enterprise to analyze much of what buyers and sellers do on their site.
- Sales and marketing campaign analysis – Sales and marketing organizations can easily analyze torrents of data from a multitude of sources to gauge the effectiveness of their initiatives. For instance, internet advertising services firm ReachLocal uses the analytics capabilities of DataStax Enterprise to help produce reports on the effectiveness of its customers’ paid search, display advertising and social media marketing campaigns – data that consists of more than 50 million inputs a day.
- Buyer behavior analytics – By using the built-in analytics features in DataStax Enterprise, companies can glean crucial insights into what people are most likely to purchase by analyzing all data their customers generate through page views, clickstreams, searches, online comments, chats, tweets and other interactions.
- Customer recommendations – Enterprises can “connect the dots” for thousands, even millions, of individual customers by recommending products, services, books, movies, music and other goods they will likely enjoy. The suggestions are based on the preferences customers reveal through their engagements with companies over time as well as their online activities. Analytics in DataStax Enterprise can do the same in pointing out people or groups that share common interests with customers.
- Fraud detection – Integrated analytics in DataStax Enterprise can help companies rapidly identify irregularities within massive amounts of unrelated customer data that may suggest suspicious activity, such as credit card or I.D. theft. Better still, analytics results are fed to the integrated Apache Cassandra component in DataStax Enterprise and available for querying in real-time, greatly accelerating response times and minimizing potential exposure to threats.
- Compliance/regulatory analysis – Similarly, companies can detect anomalies in machine and device sensor output to flag potential environmental compliance issues. Financial institutions, healthcare organizations and government agencies can use these same analytics capabilities to monitor their transactional systems for potential non-compliance problems and security issues. In all cases, integration with Cassandra extends continuous availability to monitoring efforts with no downtime – a capability not available in other analytic distributions.