Extracting meaningful business insights from ever-expanding volumes of data has never been more important. But it’s a very difficult task to achieve with traditional relational database systems. That’s where Apache Hadoop™ comes in. It’s the most effective technology available today for analyzing massive amounts of highly complex and diverse data. In DataStax Enterprise an advanced distribution of Hadoop provides comprehensive batch analytics – and it does so in ways that are substantially more reliable and simpler to manage than with other Hadoop solutions.
Why Apache Hadoop?
Apache Hadoop is an open source framework that’s widely regarded as the best means for performing batch analytics on big data workloads. With Hadoop, you can perform much deeper, more thorough analysis than you can with traditional RDBMS solutions. That’s because it enables you to use all types of data, both structured and unstructured, to glean timely insights into business processes, product performance, customer preferences (and complaints!), market dynamics and much more.
A couple of Hadoop’s primary big data features are also found in Apache Cassandra and include:
- Extreme scalability, speed and cost efficiency – Works by storing and distributing very large data sets across clusters of hundreds, even thousands, of inexpensive servers operating in parallel, making it enormously scalable, very fast and economical. Main components include a distributed file system (HDFS) and a processing framework called MapReduce, which spreads the workload across parallel computers in manageable chunks.
- Flexible schema for more comprehensive analysis – Can consume nearly any kind of data – not only structured table-based information like that used in ERP, BI and CRM applications but unstructured data as well from sources such as emails, user forums, customer reviews, blog posts and text documents. Consequently, gives enterprises the ability to analyze all of their data no matter where it comes from or how it’s stored. Enables them to make eye-opening connections between disparate data sets they would never see otherwise with the same data locked inside incompatible enterprise systems and data silos.
Want to start putting more of your big data to work with Apache Hadoop? Get it today with DataStax Enterprise.
How we make it even better
Built into DataStax Enterprise is an enhanced Hadoop distribution that overcomes some key limitations of community open source Hadoop. By using Apache Cassandra for many of its core services, Hadoop in DataStax Enterprise provides greater reliability, no single points of failure, higher performance, simpler deployment and lower total cost of ownership (TCO) than a traditional Hadoop solution.
With enhanced Apache Hadoop in DataStax Enterprise, you get:
- Continuously available Hadoop – In contrast to the standard open source version, Hadoop in DataStax Enterprise scales more effectively and is always available. That’s because it eliminates the complexity and single points of failure of the typical Hadoop HDFS layer. From an operational standpoint, there is no need to set up a Hadoop name node, secondary name node, Zookeeper, and so on. Instead, Hadoop in DataStax Enterprise inherits the continuous availability that Cassandra’s peer-to-peer architecture makes possible.
- Better performance – Hadoop in DataStax Enterprise provides better support for multiple data centers and the cloud through the genuine read-write anywhere capabilities of Cassandra. It also dispenses with the need to perform complex and performance-draining extract-transform-load (ETL) operations that are normally required to move data from real-time systems to analytic databases or data warehouses.
- Comprehensive workload integration – DataStax Enterprise integrates with existing HDFS, Hadoop, MapReduce, Hive and Pig tools and utilities. More importantly, it allows you to run mixed real-time transactions, analytics and enterprise search workloads in a single, seamlessly integrated database. You can even perform real-time analytics using a Cassandra cluster while you run deeper batch analytics on a Hadoop cluster simultaneously with no resource conflict.
- Much easier deployment and management – Traditional Hadoop is notoriously difficult to deploy, configure and operate. Enough to put off many enterprises that lack the time or skills to implement it. With DataStax Enterprise, you can bring new Hadoop clusters online in just minutes using a browser-based management tool, DataStax OpsCenter that has the dual benefit of lowering your TCO and your stress.
Get DataStax Enterprise and experience a better Hadoop.
Scenarios it transforms
Here’s just a glimpse of some of the ways enterprises can gain greater insight from their big data in less time using the integrated Hadoop analytics capabilities in DataStax Enterprise:
- Large-scale batch analytics – Enterprises can make sense of enormous data volumes to quickly spot trends, identify bottlenecks and anticipate issues in practically any complex, large-scale system, including computer networks, communications infrastructure, transportation systems, supply chains, manufacturing processes, refining operations and scores of others. For example, SourceNinja relies on the integrated Hadoop capabilities in DataStax Enterprise to digest hundreds of millions of pieces of information related to its work with open source software.
- Sales and marketing campaign analysis – Sales and marketing organizations can easily analyze torrents of data from a multitude of sources to gauge the effectiveness of their initiatives. For instance, internet advertising services firm ReachLocal uses the DataStax distribution of Hadoop to help produce reports on the effectiveness of its customers’ paid search, display advertising and social media marketing campaigns – data that consists of more than 50 million inputs a day.
- Buyer behavior analytics – With Hadoop and DataStax Enterprise, companies are able to glean crucial insights into what people are most likely to purchase by readily analyzing all the data their customers generate through page views, clickstreams, searches, online comments, chats, tweets and other interactions.
- Customer recommendations – Using our implementation of Hadoop, enterprises can “connect the dots” for thousands, even millions, of individual customers by recommending products, services, books, movies, music and other goods they will likely enjoy. The suggestions are based on the preferences customers reveal through their engagements with companies over time as well as their online activities. Hadoop can do the same in pointing out people or groups that share common interests with customers.
- Fraud detection – Integrated Hadoop in DataStax Enterprise can help companies rapidly identify irregularities within massive amounts of unrelated customer data that may suggest suspicious activity, such as credit card or I.D. theft. Better still, analytics results are fed to the integrated Apache Cassandra component in DataStax Enterprise and available for querying in real-time, which greatly accelerates response times and minimizes potential exposure to threats.
- Compliance/regulatory analysis – Similarly, companies can detect anomalies in machine and device sensor output to flag potential environmental compliance issues. Financial institutions, healthcare organizations and government agencies can use these same Hadoop analytics capabilities to monitor their transactional systems for potential non-compliance problems and security issues. In all cases, integration with Cassandra extends continuous availability to monitoring efforts with no downtime – a capability not available in other Hadoop distributions.
