SkyWatch Scales Space Data Management with DataStaxOctober 13, 2015
This post is one in a series of quick-hit interviews with companies using Apache Cassandra™ and/or DataStax Enterprise (DSE) for key parts of their business. For this interview, we talked with Roland Sing, Co-founder at SkyWatch.
DataStax: Tell us about SkyWatch and your role there?
Roland: SkyWatch is a Kitchener-Waterloo startup on a mission to become the leader in data and intelligence from space. There are hundreds of satellites orbiting the Earth capturing varying and diverse datasets and imagery that would be advantageous to dozens of industries all over the world. The SkyWatch API enables our users to find the satellite data they need easier than ever before. We currently also provide data from space observatories and telescopes for some of the world’s top astronomers and astrophysicists through our application called Supernova.
My role at SkyWatch is primarily development of our data collection and back end technology, managing the infrastructure to support our architecture, and product management.
DataStax: What kind of data will people share in the exchange and how do others access it? What sets you apart from other solutions?
Roland: We consume and aggregate data from satellites both for Earth observation and for astronomy and astrophysics directly from the satellite operators. We then create a more manageable way through which users can access and use this data much more efficiently and effectively than they currently do. The data varies a lot, from measurement and observation data to imagery.
Current solutions are either siloed, as is the case with Earth observation satellites or are “home-brewed” software written by astronomers when dealing with space observatories and telescopes. As a result, the end user experience isn’t great and requires a lot of effort when trying to incorporate multiple datasets.
DataStax: Did you use a different technology before DataStax Enterprise?
Roland: We tried MongoDB for a bit before Apache Cassandra™, but found it to be limiting in its ability to scale. Once you shard MongoDB, everything gets infinitely more complicated, and because there’s a master/slave setup, if one node goes down the whole thing goes down.
DataStax: Why did you pick DataStax Enterprise and why? What kind of data is stored there?
Roland: We chose it primarily for the ability to scale in Apache Cassandra™ and for its fault tolerance. As far as I’m aware it’s the only database that is linearly scalable and although it isn’t too much of an issue for us now, it is possible that within a year or two we will need to be able to store and serve data potentially on the petabyte scale. For the same reason, the ability to use Apache Spark™ with this database is very valuable to us.
DataStax: What features from the DataStax Enterprise stack are you using at the moment? What business use case do they fulfill?
Roland: For Supernova we are just using the Apache Cassandra™ and Apache Solr™ integration provided by DataStax Enterprise. DataStax Enterprise is our primary database, and we use Apache Solr™ to power our search features.
DataStax: Tell us about the future of your project, will you be leveraging other parts of DSE to make it a reality?
Roland: Right now the project is focused on providing a real-time service, but as we collect more data we want to create tools for astronomers to work with larger, static datasets. We have every intention of using DSE Analytics to make this possible.
DataStax: What advice would you give to other startups that are thinking about using Apache Cassandra™ for the first time in their solutions?
Roland: Spend as much time as possible learning the data model. It’s unlike anything you’ve used before and it will throw you for a bunch of loops. Also, get over having just a single table for related data, or normalizing it since there’s no joins. You’re going to have to write to and read from multiple different tables and that’s okay.
SHARE THIS PAGE