Shiyi Gu

Threat Stack Provides Continuous Security Monitoring with DataStax Enterprise

By Shiyi GuApril 15, 2015

This post is one in a series of quick-hit interviews with companies using Apache Cassandra and/or DataStax Enterprise (DSE) for key parts of their business.  For this interview, we talked with Sam Bisbee who is CTO at Threat Stack.

“It definitely allowed us to come to market faster and scale more quickly. It gives us the ability to rule out the database for things like load spikes because we know Cassandra handles our load very well.”

Sam Bisbee
CTO, Threat Stack

DataStax: Sam, thanks for talking with us today. Can you give everyone a quick overview of Threat Stack?

Sam: Threat Stack provides continuous security monitoring for Linux servers that are typically deployed on clouds such as AWS. We give customers insight into their system’s and operator’s activities, both as an IDS and audit tool, helping them understand everything that’s occurred in their environment over time.

DataStax: What’s your infrastructure look like?

Sam: Our solution is about 90% node.js and 10% Scala. On the back end, we spread everything across three availability zones on Amazon and have multiple data platforms. We use DataStax Enterprise, Elasticsearch, Postgres, Redis, and are bringing Spark streaming on board very soon.

DataStax: What were the use cases that caused you to select DataStax Enterprise for your application?

Sam: We do a lot of pre-computation and roll materialized views. We deal with a lot of data that is collected continuously from our agents running on every monitored server. For example, we had a day recently where we consumed 4.8 billion data points, which added up to about 3TB before replication. That’s obviously a lot of data per day, so we work hard to create very efficient lookup tables that hold only the data that’s needed. We ingest all the data into RabbitMQ and then use Cassandra and DataStax Enterprise today to hold those lookup tables. We hold hot data for about 30 days and then archive it out to cold storage.

We love having a database like Cassandra that’s distributed and that can easily spread itself across multiple availability zones on AWS as it makes our work a lot easier.

DataStax: So the Cassandra tables are visible to your customers, correct?

Sam: Right. Any time a customer goes to one of our dashboards and views, for example, a process’s history, its execution tree, the associated console session, and much more – that all comes from our Cassandra lookup tables.

DataStax: Did you evaluate other databases for this part of your application?

Sam: We started out trying to use Elasticsearch, but it’s really not built for this type of thing. So we turned to Cassandra and DataStax Enterprise, and now have a good deal of experience running it at scale in our application.

DataStax: What does your deployment of DataStax Enterprise look like?

Sam: We run 33 nodes on AWS for production and each availability zone is a rack. We maintain 10’s of TBs in a rolling fashion in the cluster, where again, we keep hot data and roll out colder info.

DataStax: How would you summarize the benefits you’ve realized from DataStax Enterprise?

Sam: It definitely allowed us to come to market faster and scale more quickly. It gives us the ability to rule out the database for things like load spikes because we know Cassandra handles our load very well.

DataStax: Thanks again for the time Sam.

Sam: Sure.



Your email address will not be published. Required fields are marked *

Tel. +1 (408) 933-3120 Offices France Germany

DataStax Enterprise is powered by the best distribution of Apache Cassandra™.

© 2017 DataStax, All Rights Reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache Cassandra, Apache, Tomcat, Lucene, Solr, Hadoop, Spark, TinkerPop, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.