Five Minute Interview – Riptide IO
This article is one in a series of quick-hit interviews with companies using Apache Cassandra and DataStax Enterprise for key parts of their business. For this interview, we spoke with David Leimbrock who is the Co-Founder and Chief Technical Officer at Riptide IO.
We were also really impressed with the fact that no matter how hard we tried, we couldn’t break Cassandra. We would remove machines from the cluster and blow things away, but it was just indestructible.
DataStax: Can you tell us a little bit about what Riptide IO does for its customers?
David: Riptide IO is in the business of managing intelligent machines, which we define as “machines that are networkable and equipped with various sensors & actuators that are used to monitor and control the real world.” We play in the Internet of Things space.
Devices that would fall into that category include electrical meters, intelligent air conditioning machines, lighting control panels, and devices that are used to manage intelligent buildings and improve the required performance of these buildings from an energy and operations perspective. Our system helps identify operational faults that would impact the environment, for example, pharmacy refrigerator failing in a pharmacy has a very real-world impact on that environment. Failure to maintain proper cooling temperatures could place high value medical inventory at risk.
The software and systems that we use are able to diagnose faults such as these and provide the detection to improve a building’s operational efficiency. Additionally, the system will measure energy impact and find ways to reduce energy utilization.
DataStax: Do you sell a software solution, consulting, or a combination of the two?
David: It’s a combination. We have a hardware platform called BrightEdge that’s typically deployed on premise and is used to interface with various upon-premise networks. We have several large retail customers right now. A typical retail store may have half a dozen different control networks for their equipment that are all using different communication protocols. In some cases, they’re all serial busses. In other cases, they’re IT based.
The first thing we do is install BrightEdge, our Linux-based appliance that resides on premise. It’s used to communicate and normalize the behavior of all those different protocols and provide a common interaction. It allows us to retrieve all of the sensor data from these different machines. Then we push that information to our cloud application software that is called BrightWorks. BrightWorks is installed in a customer defined private cloud, or on our Google Cloud deployment. BrightWorks allows you to organize, analyze and control the devices.
DataStax: How are you using DataStax software?
David: Cassandra’s at the core of the BrightWorks architecture. It’s our horizontally scalable big data distributed store. I would say probably 95% of the data that we are storing within Cassandra is all vector historical data. It’s immutable data, i.e. write-once sensor readings over time.
Typically, we’re doing 15-minute samples of current readings from these sensors. In some cases, like for main meter kWh energy demand for a particular space, we sample that data at a frequency of 1 minute. BrightWorks provides facilities for collecting the data from the on-premise devices and then stores it within Cassandra.
Our customers have clusters ranging from between 3 and 10 different machines. In a lot of cases, they’re not quite sure yet how long they want to preserve this data, so one of the things that they find really attractive about Cassandra is the idea that if they decide there’s value in keeping the information longer, it’s real easy for us to grow the cluster over time. We also use OpsCenter extensively for management.
They really like that flexibility from the perspective that they don’t have to know exactly what their use cases are going to be for the data. They’re comfortable that in a very economical fashion, they’re able to keep everything and then decide down the road what is producing value and what they have the ability to prune, or what they should keep.
DataStax: So Cassandra serves as the analytics data store. What type of end result is communicated back to your customers or users?
David: Some customers like Walgreens and Ulta have an operations dashboard powered by BrightWorks which presents a number of Key Performance Indicators. A simple KPI may be energy consumption normalized by the square feet, or normalized by weather. For example, we’ll present a scatter plot where the facilities analyst can see high and low performance outliers within the building portfolio. This helps them to identify areas within the portfolio that they need to focus on.
The analyst is then able to drill down into a specific building user interface and pull a time series of sensor data so that he can see utility demand over a defined date range. He can also see what equipment was running when spikes take place as far as their utility utilization is concerned. Then he has the ability to access the raw equipment data and see exactly how the machines are operating and how they are impacting energy use. These customers are big believers in sustainability and use these tools to run more energy efficient stores.
We’ve focused on making it easy to gather and organize these real-time data streams so that analytics and distributed control are possible in an affordable manner. This problem alone is a necessary prerequisite for analytics, and a big missing piece from most solutions. You’ll see us coming out with more specialized cloud-based applications leveraging the data very soon.
DataStax: Did you start with a NoSQL solution in Cassandra or had you been using something previously like a relational database? What drove you to NoSQL?
David: At first we were really focused on just the on-premise device and integration, just dealing with those heterogeneous device networks. We were relying on partners to aggregate the data or other players within the industry to aggregate the data.
What we found dealing with a number of large customers was that everyone was falling down flat there and their solutions primarily were application stacks that were designed to scale vertically. They were using either Oracle or MySQL as the back end. Just from observing what other competitors were doing, we knew that the technologies that they were applying weren’t working very well.
That being said, during my time at Cisco, I believe WebEx and some other people within Cisco were actually Cassandra users. Within Cisco, it’s pretty well known and there’s quite a bit of buzz around Cassandra, so it was on my radar. I knew that one of the areas where it really shined was in dealing with the immutable time-series data where you just obtain once, you want to read it quickly, and you want to write right back. You want performance reads and writes. We did spend a little bit of time doing some benchmarking using MySQL, but at the same time, we were looking heavily at Cassandra.
I have to say in all honesty, I was a bit biased based off of the research I had done, what I had heard speaking to other engineers, what I had learned attending events such as the Cassandra summit in 2012 and 2013. I felt very confident in the promise of Cassandra. Basically, we tried to break it and we tried to make sure that all the claims that had been made were truthful.
We spent quite a bit of time doing some early benchmarking and we found very quickly that as far as reads and writes were concerned, it provided much better performance than what we were getting from MySQL. We were also really impressed with the fact that no matter how hard we tried, we couldn’t break Cassandra. We would remove machines from the cluster and blow things away, but it was just indestructible.
We really liked the idea. We felt confident that from a maintenance perspective, it was going to be very resilient for us. We liked the idea of being able to easily grow over time, and performance was fantastic.
DataStax: Are there any particular tech stats that stand out to you?
David: I definitely do have some stats from our early benchmarking where one of the things that we were particularly paying attention to was, as the size of the data set grew, what were the performance characteristics? Definitely in the Cassandra case, we saw that as we grew the cluster, or even as the amount of data within a single machine or a handful of machines grew over time, the reads and writes stayed very consistent. In our benchmarking, when we were looking at MySQL, you could definitely see the gradual degradation.
DataStax: Why did you choose DataStax Enterprise instead of open-source Cassandra?
David: One of the things I was going to mention is we are actually a big fan of using open source, just from a philosophical perspective. We want to be supportive of companies that are investing their time and resources in making the product better because we know it’s in our best interest. Just from a philosophical perspective, that’s the way we operate, but we also tracked the mailing list extensively before going down this route.
As I mentioned, we attended some of the conferences and we felt very, very confident that if there were ever issues that we needed to escalate, we were very impressed by DataStax and the level of competency that exists there.
That being said, we actually sent several developers to your admin and developer training. We’ve actually invested in some of the consulting services that you offer. It was really just for a level of comfort so that if there was ever an issue with these production systems, we knew that there was a partner we could call on to get it resolved quickly.
DataStax: You said that you use OpsCenter extensively. Is that your primary go-to solution for managing these clusters or do you use a combination of that and something else?
David: We’re using OpsCenter extensively for management at this point.
DataStax: If you were to boil it all down and summarize the benefits you believe your company has realized from DataStax Enterprise and Cassandra, how would you articulate it?
David: In this particular domain, and working with these large data sets, it’s been a huge industry problem for a long time. This has been a major differentiator for us from the perspective of being able to offer solutions where all of the data coming from these various systems can be handled by one architecture.
We’re most definitely a small startup company focused on shipping software, so after we did the initial evaluation, we spent some time in training, spent some time coding and getting things working. Since that point, we’ve actually spent very little time from a research perspective to revisit those decisions or do any sort of performance optimization. It’s just been a solid solution and we’ve been able to focus on the real value of our application software. Providing analytics that are useful to customers is helping them to reduce their energy usage and improve their building operations.
In the data store and using Cassandra there, that’s been a problem solved for well over a year now. It’s been solid and we have had to spend very little time there. It’s been great from the perspective of establishing a technology differentiator for us. It’s been a really low-impact one. We’ve been absolutely ecstatic about that choice.
Actually, when I hear what other companies are doing involving time series sensor data, I’m only shocked when they’re not using Cassandra and just imagine that it’s only a matter of time before they migrate that route. It’s been such a no-brainer for us at this point.
For more information on Riptide IO, see: http://www.riptideio.com/.