Enabling Real-time Fraud Detection At Scale with Cassandra and Spark – SimilityJanuary 22, 2016
This post is one in a series of quick-hit interviews with companies using Apache Cassandra™ and/or DataStax Enterprise (DSE) for key parts of their business. For this interview, we talked with Uttam Phalnikar, Co-Founder and Chief Architect at Simility, a fraud detection company.
DataStax: Tell us about Simility and your role there?
Simility: Simility is a fraud detection company that combines the best of human analysis and machine learning in a seamless user interface. Customers can start fighting fraud immediately and write new fraud rules with no technical expertise. Our SaaS platform is highly customizable and provides data visualization capabilities to help fraud managers quickly and accurately root out fraudsters while protecting customers and company reputation.
As a Co-Founder & Chief Architect, my role is to build a system that is highly available, scalable and reliable. Our system is very critical to our customers and we need to make sure that it is handling large amounts of data at great speed.
DataStax: Did you use a different technology before Cassandra?
Simility: Prior to using Cassandra, we evaluated various options like Hadoop-HBase, Neo4j, MongoDb & OrientDb. While HBase is distributed in nature, it lacks kind of availability we need. Rest of the databases couldn’t scale to our requirement.
DataStax: Why did you pick Cassandra and DataStax Enterprise? What kind of data is stored there?
Simility: It’s very critical for us to be scalable and respond with lightening speed. Cassandra + Spark is a great combination. With Cassandra, the system is highly available and Spark provides means for distributed computing.
Using DataStax Enterprise means I don’t have to spend time figuring out how to make Spark work with Cassandra. DSE has provided a jump start necessary to build system in no time. The Solr integration is very handy for text-based analysis.
Our data storage requirements range from short texts like email, IPs, usernames to large content blocks like HTML to binaries like image files. We run analysis on data provided by customer to predict fraud score.
DataStax: You currently use the DataStax Analytics feature, what business use case does it fulfill?
Simility: In order to predict fraud score, we need to run various ETL processes over customer data. DataStax Analytics makes it easy for us to define and execute these transformations. Spark being co-located with Cassandra reduces the latency for data transfer and with Spark streaming we can build system that can handle large number of requests reliably.
DataStax: What advice would you give to other startups that are thinking about using Cassandra for the first time in their solutions?
Simility: If you need high availability, scalability & reliability, using Cassandra is a no-brainer. I would recommend using a distribution like DSE over installing Cassandra + Spark individually and spending the time to make them work together. DataStax Academy is a great place to learn best practices and avoid anti-patterns while using Cassandra. Feel free to reach out to DSE Support. They have been very helpful to overcome challenges that we faced during the initial phase. For those who are not aware, DSE has a startup program that allows Startups like ours to use DSE for free. Make use of all these resources for successful Cassandra adoption.
SHARE THIS PAGE