email iconemail phone iconcall



Diego Ferreira

Offering Data Easily and Efficiently with ScrapeHero and DataStax Enterprise

By Diego FerreiraMay 31, 2017

This post is one of a series of quick-hit interviews with companies using DataStax Enterprise (DSE) for key parts of their business. In this interview, we talked with Manu Prasad, co-founder at ScrapeHero.

DataStax: Hello Manu, thanks a lot for your time today. Could you please tell us a bit about ScrapeHero and your role at the company?

ScrapeHero: Thanks for allowing us to highlight the work that we do through this interview. ScrapeHero is a “Data as a Service” provider. Like other SaaS providers, we obfuscate the underlying gory details of the Service from our customers and let them get the data they want as easily as possible. The service in our case is providing the Data, not storing the data (which is a service provided by some DaaS companies).

We are a bootstrapped startup, and as a co-founder, I wear a lot of hats at ScrapeHero, dabbling in technology, management, hiring, sales, and ordering coffee supplies on any given day.

DataStax: What differentiates ScrapeHero from similar services?

ScrapeHero: We are a technology-first company that believes in keeping customers “very happy” and not just “satisfied”. We have and continue to build an amazingly scalable technology stack using some cutting edge technologies. Our goal is to have a self-healing, scalable microservices based architecture that has total redundancy built in without being tied to a single hosting platform such as AWS.

We like to build on commodity hardware across various hosting providers. Being a bootstrapped startup, costs are a big driver and the relentless and obsessive need for efficiency push us to extract the last ounce of performance for the dollars we burn.

Barring one of our main competition, the rest of the companies in our space are focused on servicing customers without really investing into the technology stack or the platform. As our customers get to know us, they realize the long term and tangible benefits our technology provides them for no additional cost.

At the end of the day, we are all data service providers, but how we provide the data matters to a good portion of our customer base. We also believe in automation and look at automating as many processes in our company as possible. This makes us more efficient, more scalable and less error prone than our competition.

DataStax: Did you use a different technology before you started using DataStax Enterprise (DSE)?

ScrapeHero: Yes, we used MongoDB and Apache Cassandra™. However, at the scale we were gathering data and reaching speeds of thousands of pages per second, MongoDB gave up on us multiple times; that is when we looked at alternatives and started using Cassandra. The DataStax startup program was a lifesaver for us due to our bootstrapped nature. We pushed Cassandra to the limits too but with the help of some incredible DataStax engineers, we were able to overcome those issues.

DataStax: Why did you decide to use DataStax Enterprise? What kind of data is stored in DataStax Enterprise?

ScrapeHero: We use DataStax enterprise for storing variety of data gathered from all over the Internet. We then parse the unstructured data using Spark and then index it using Solr. We moved over from MongoDB to the Open Source version of Cassandra because we were unable to have MongoDB handle our large write loads and scaling MongoDB was not something we were able to accomplish with all the resources available online.

We spent some time trying to setup Open Source Cassandra and integrate it with Spark and Solr, and scale it up. That’s when we came across the DataStax Enterprise edition and the amazing Startup Program that your provide.

The initial setup was done in less than a day and we went into staging with DSE in the same week with the help of your talented engineers and your willingness to help a small startup like us.

DataStax: How would you sum up the benefits you’ve achieved with DataStax Enterprise?

ScrapeHero: Getting up and running with DSE was quick and painless. We received a lot of help from the DataStax Team and through the well written documentation, instructor led training videos and technical calls. The DataStax technical team provided us a lot of assistance in setting up the databases for a heavy write load.

DataStax saved us countless hours which we would have otherwise spent fiddling with configuration files and monitoring tools.

DataStax: What caused you to use DSE over open source Apache Cassandra™?

ScrapeHero: Open Source Apache Cassandra, like most other complex Apache Projects is a pain to setup and integrate with other Apache projects, unless you have a lot of experience doing this. We spent weeks trying to setup the Open Source Cassandra and integrating it with Solr and Spark. DSE had it all packaged and ready to go.

The Initial setup was easy, and OpsCenter had most of the maintenance and monitoring taken care of. We hardly had to spent any time battling configs across nodes.

DataStax: What features from the DataStax Enterprise stack are you using at the moment? What business/ customer experience outcomes have you achieved by using DataStax Enterprise?

ScrapeHero: We use Cassandra, Analytics (Spark) and Search (Solr). The integration between these three is smooth. We have never had a problem with the integration over the last two years we were using DSE. And then there is the OpsCenter, the best monitoring and maintenance tool ever. Almost all of the maintenance tasks, adding or removing nodes and data centers is done easily through OpsCenter.

The business impact has been to avoid a large disruption as we scale. With the huge increase in customers that we sign up, our data needs grow exponentially and DSE has allowed us to scale the data storage and management components easily and in-step with our business growth.

DataStax: Tell us about the future of your project(s), do you intend to leverage other parts of DSE to make it a reality?

ScrapeHero: We always look at bleeding edge technologies. It is a daily exercise for us. It helps us stay on top of technology developments and enhance our platform with the latest and greatest. We would be happy to be beta testers for all other technologies that DataStax has to offer.

DataStax: What advice would you give to other startups that are thinking about using Apache Cassandra™ and DSE for the first time in their solutions?

ScrapeHero: We would definitely suggest startups that are dealing with a huge volume of data to explore DSE and the avail of the amazing support DataStax offers to startups to grow with them. There is pride in going alone, but with the time and cost pressures startups face, there is no shame in asking for help sometimes.



Your email address will not be published. Required fields are marked *