DataStax Blog

The Five Minute Interview – NASA

By Robin Schumacher -  August 27, 2012 | 0 Comments

This article is one in a series of quick-hit interviews with companies using Apache Cassandra and/or DataStax Enterprise for key parts of their business.  For this interview, we talked to Chris Keller who is a solutions architect for CSC and has NASA Ames as one of his customers.

DataStax: Chris, many thanks for chatting with us today. What can you tell us about your work at NASA?

Chris: One of my customers is NASA’s Advanced Supercomputing Division in Mountain View, CA and I work mostly with the IT security group there, among other tasks.

DataStax: How are you using Cassandra at the supercomputing center?

Chris: We use Cassandra in a virtualized environment to manage the vast amounts of security data that we collect and maintain. We needed a way to take all our data feeds and correlate everything in an intelligent way that could be understood and analyzed to present a total picture.

DataStax: Did you start with Cassandra or with something else?

Chris: We had been using a commercial product with a relational database on the back end. I just wasn’t happy with it’s performance in retrieving large data volumes, and so we decided to build our own solution. That’s when we decided to use Apache Cassandra.

DataStax: What types of data are you working with and what kind of analysis do you perform on it?

Chris: Quite a bit of the data is time series in nature – data that comes in with a particular start/end times or duration, and we consume all of that and begin to slice and dice it from a point-in-time perspective. We might ask broad questions such as “Show me all the data associated with a particular time stamp” to analytics that are much more involved with unstructured data.

One key analysis we perform is that we survey all of the potential security threats that are made public around the world and cross reference that against our systems. This is more proactive than just being notified once a security risk is identified along with its patch-fix. Instead, we want to know the instant a particular issue is known so we can immediately understand our exposure.

DataStax: Why Cassandra for these use cases?

Chris: Cassandra’s NoSQL data model allows us to insert and query data much more naturally than what we had previously. There’s just so much more flexibility without having to model every query beforehand. For example, if I needed to ask the question: ‘At this exact second in time, show me all the information regarding a specific IP address’, it just wasn’t easy or fast to get an answer. With Cassandra, now it is. The analysts who routinely use this data were impressed with the flexibility and speed at which the queries came back.

DataStax: What else?

Chris: The built-in time-to-live (TTL) feature is huge for us. Being able to insert data with specific TTL parameters and have Cassandra automatically remove that data for us when it’s time vs. having to manually write, edit, and maintain routines to scrub obsolete data saves us a lot of time.

Being able to tune performance based on our data consistency requirements adds in a lot of flexibility. Much of our data doesn’t change much once it’s written, so we can retrieve data very quickly without having to use quorums for most of it.  For data that can change, we can add in quorums to the code without having to change Cassandra at all.

Lastly, we have data coming in from many different feeds, all at the same time around the clock, so Cassandra’s excellent write performance is something we really appreciate. In fact, anytime we’ve seen any kind of slowdown, it’s because we’ve started down the wrong path; it’s never been due to a Cassandra issue.

Cassandra has really become an enabling technology for what I’m building.

DataStax: What kind of tips would you give folks using Cassandra?

Chris: One big recommendation for virtualization shops is to use your VM tools to pin disks to certain Cassandra VM’s if you have multiple VM’s on the same box. Since we don’t need the flexibility of moving the VM’s between physical nodes, we saw our performance increase when we did that.

Also, make sure you get out on the Cassandra IRC channel and mailing list and ask questions. DataStax engineers have been incredibly helpful whenever I had an issue or saw some anomalous behavior in our cluster.

DataStax: Chris, thanks for the time!

Chris: Sure thing.

 



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>