Cassandra databases at Spotify hold all sorts of interesting data sets. Quite obviously, we would like to allow our data scientists tap these data sets.
Recent developments in the offerings of cloud vendors allowed us to engineer systems that answer this use case in an unprecedented way.
In this talk we'll present how we turned the process of exporting data from Cassandra clusters into a trivially parallelizible problem. Using just a few basic cloud products we've managed to dump our largest clusters containing terabytes of data in the order of minutes.