DataStax Blog

Solving a Major Gripe of Hadoop/Analytics Users

By Robin Schumacher -  November 10, 2011 | 0 Comments

According to David Menninger, VP and Research Director at Ventana Research: “67% of Hadoop users see the lack of real-time capabilities as the #1 technology obstacle in analyzing big data on Hadoop.” The thing is, from a broad based perspective, this is the age-old problem database tech pro’s and users have faced for decades: “I really need data in this system over here in that system…” and vice-versa. It’s just now rearing its head in Hadoop implementations.

I have good news for those complaining about this issue: we’ve got it covered. With DataStax Enterprise, you get real-time and analytic capabilities coupled together in the same database, with smart workload isolation that ensures neither competes with the other for compute or data resources. Cassandra takes care of the real-time aspect, and our 100% compatible Hadoop feature set has the analytics covered.

To see what I mean, download and install a copy of DataStax Enterprise. It should take you all of 5-10 minutes and you can work with it without charge forever in a development environment.

Bundled with DataStax Enterprise is a sample database and application (a stock portfolio use case) that showcases these capabilities in action. You can follow the setup instructions for the sample DB and app in our online docs to get things going.

The last part of setting up the sample database involves running a Hive Hadoop job that kicks off a series of MapReduce tasks to calculate the worst 10-day loss for each portfolio tracked in the database:

You can monitor the MapReduce tasks using the bundled Hadoop jobtracker Web GUI that’s packaged with DataStax Enterprise.Once the MapReduce tasks complete you have the results of the Hadoop analytic job now available for access and display in the Portfolio demo GUI:

Portfolio Demo

But here’s the great part: the output from the Hive/MapReduce job is now also available as a Cassandra column family and may be queried in real-time mode:

CQL output

This means you have access to your analytic Hadoop output in your real-time database. And this happens whether you have a single node install of DataStax Enterprise or a multi-node install where you configure some nodes to be real-time and others to be analytic in nature. Cassandra takes care of replicating the data among all nodes so you have access to it from anywhere.

Which means a major gripe of Hadoop users is solved in DataStax Enterprise.

Download a copy of DataStax Enterprise today and let us know what you think.



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>