Kaiko – Real-Time Bitcoin Analytics with DataStax EnterpriseJanuary 20, 2016
This post is one in a series of quick-hit interviews with companies using Apache Cassandra™ and/or DataStax Enterprise (DSE) for key parts of their business. For this interview, we talked with Vincent de Lagabbe at kaiko.com.
DataStax: Hello Vincent, thanks a lot for your time today. Could you please tell us a bit about kaiko.com, what exactly you offer and your role there?
Kaiko: Thanks. At Kaiko we do data and analytics for Bitcoin. You can find on our site a very comprehensive set of bitcoin data that is also all available in an API. From price, to a blockchain explorer, to exchanges data, to mempool, to asset protocols, etc…I’m doing most of the devops tasks there and some backend development as well. By the way, we are looking for bitcoin-enthusiast engineers and data analysts to expand our team, we are headquartered in London and the R&D is in Paris, so don’t hesitate to get in touch!
DataStax: What makes your data source for digital currency and blockchain technology successful, what differentiates you to other applications?
Kaiko: Most agents in this sector are currently focusing on one aspect of the bitcoin world (exchange or blockchain for instance). We want have a broader approach and find links between those data to come up with interesting analytics tools and indices.
DataStax: Did you use a different technology before you started using Cassandra?
Kaiko: We started from scratch so we tried many things during a prototyping phase before sticking to Cassandra: standard SQL, levelDB, hyperdex, pure Redis…
DataStax: Why did you decide to use Cassandra? What kind of data is stored there?
Kaiko: We decided to use Cassandra because it’s pretty stable, plus the documentation available and support are good. We needed a solution where we could start low and scale up in terms of data storage/access; which was robust (replicated) and didn’t need lots of maintenance (we are a small team). Of course there are lots of tradeoffs to do, especially during the data modelling phases and you have to understand very clearly what you can’t do with the technology.
We store all our data there: the whole de-serialized bitcoin blockchain, values computed from blockchain data (addresses balances for instance, or miners details) and everything we pull from various bitcoin exchanges: trades (live and historical), order book snapshots (those tend to use a lot of space). Everything is constantly growing at a pretty slow but regular pace.
DataStax: How would you sum up the benefits you’ve achieved with DataStax Enterprise (DSE)?
Kaiko: We are taking advantage of some features offered by OpsCenter (integrated s3 backup and scheduled repair mainly) but are also very interested to use Spark and Solr for some features we have planned.
DataStax: What caused you to use DSE over open source Cassandra?
Kaiko: We used open source Cassandra for about two months. To put it pretty bluntly, we had tons of stability issues before the switch (we tried several stable/unstable versions) and discovered we didn’t have them with DSE. That is the main reason we kept using Cassandra. Now, I’m speaking about something that happened a year ago and was probably pretty specific to our use case.
DataStax: What features from the DataStax Enterprise (DSE) stack are you using at the moment? What business use case do they fulfil?
Kaiko: We use Cassandra (obviously), Opscenter and In-Memory as well. In fact, we use In-Memory for some of our short lived data (with a TTL of 1 week) that needs very fast/regular access.
DataStax: Tell us about the future of your project, do you intend to leverage other parts of DSE to make it a reality?
Kaiko: We currently use numerous custom scripts to aggregate/compute values and would love to leverage Spark to do so. We have several analyses where we need to make complex queries in a big-ish dataset and would need to use it anyway. We also want to allow users to search for chunks of information inside the blockchain and will most probably use Solr to enable this.
DataStax: What advice would you give to other startups that are thinking about using Cassandra for the first time in their solutions?
Kaiko: When you design your data model, think about how you access your data first (‘pagination? what info will be presented?’), not how you want to store them. This is written everywhere but it’s indeed the main point. Do not hesitate to duplicate values in different column families: storage is cheaper than you think and internal compression works well. If you have to write heavy applications, use SSDs, don’t even think about magnetic drives.
Finally, do a PoC using a more-than-one-node cluster, and run the nodes on different machines: things are very different when you have consistency in place. By the way, QUORUM is your friend. Don’t use ONE which is the default on most drivers. You’ll save yourselves a few months of pain.
SHARE THIS PAGE