Wealthport Provides AI-driven Data Preparation Technology with DataStax EnterpriseMarch 22, 2017
This post is one in a series of quick-hit interviews with companies using DataStax Enterprise (DSE) for key parts of their business. In this interview, we talked with Tobias Widmer, CTO of Wealthport.
DataStax: Hello Tobias, thanks a lot for your time today. Could you please tell us about Wealthport and your role at the company?
Wealthport: Wealthport offers an artificial intelligence (AI)-powered data preparation software as a service (SaaS). Our service automatically integrates, transforms and enriches data from various sources of structured and semi-structured data, even exceeding human precision. Retail, e-commerce and online marketplace companies rely on Wealthport to integrate product data from hundreds of suppliers, clean, normalise and repair the product database, as well as identify product categories and product variants. We also offer solutions for various other industries.
As CTO of Wealthport, I lead our international R&D team of experts in natural language processing, machine learning and data integration, and oversee product development and operations activities.
DataStax: What differentiates Wealthport from similar applications? What makes your AI solution successful?
Wealthport: Unlike standard data preparation tools and applications, our solution is entirely based on probabilistic methods, machine learning and artificial intelligence. Data today presents itself in a huge variety and veracity, both in terms of structure and content, and tends to change over time. Traditional, rule-based solutions do not cater well to these characteristics of data, resulting in expensive maintenance work to adapt processes and rule engines.
Wealthport’s probabilistic approach embraces these facts and introduces algorithms modelled after human behaviour in data preparation. In a nutshell, our algorithms learn millions of probabilistic rules on-the-fly. They are very good at detecting which parts in data sets are similar, and which parts can be made similar by applying a set of transformations, without being fooled by spelling errors, ambiguous abbreviations, different terminology, etc.
Our customers benefit from an exceptionally high degree of automation and super-human accuracy of more than 95% on the most common data preparation activities.
DataStax: Why did you decide to use DataStax Enterprise? What kind of data is stored there?
Wealthport: We evaluated various NoSQL database management systems, but finally decided to use DataStax Enterprise because we place a high value on read/write performance, fault-tolerance and linear scalability. For our types of workloads, DataStax Enterprise turned out to be a clear winner. The always-on nature of DataStax Enterprise proved to be critical for long-running, data-intensive jobs we run in Apache Spark™.
Our DataStax Enterprise cluster mainly holds data sets ingested by customers and intermediate processing results like statistics, histograms and other probabilistic information computed from customer data.
DataStax: How would you sum up the benefits you’ve achieved with DataStax Enterprise (DSE)?
Wealthport: With DataStax Enterprise, we found an ideal database platform which offers a scalable persistence solution based on Apache Cassandra™, a distributed computation engine based on Apache Spark™, and a powerful operations centre used to manage our production clusters. Seamless integration of all these components spares us all the integration and deployment hassles and allows us to focus on our core business: integrating, transforming and enriching data to make our customers happy.
DataStax: Why did you choose DataStax Enterprise over open source Apache Cassandra™?
Wealthport: At Wealthport, we favour open-source software across our technology stack. But when it comes to reliable support, completeness of documentation and production readiness, commercial software still has its merits. When adopting a new DSE release, we know that it has been thoroughly tested by early-adoption customers in production-like settings for some time. The DSE cluster is at the heart of our data preparation service, so the absence of critical bugs is absolutely essential for zero down-time and reliable operation of our service.
DataStax: What features from the DataStax Enterprise (DSE) stack are you using?
Wealthport: We currently use DSE Analytics (including Spark Job Server) and DSE OpsCenter. Together, these features of DSE constitute a big part of our backend services which cover everything from ingesting, transforming and persisting customer data.
DataStax: Tell us about the future of your project. Do you intend to leverage other parts of DSE to make it a reality?
Wealthport: In the next couple of months, we want to make our service available to individual users, not just organisations, in a complete self-service fashion. Users will be able to upload their data, clean, integrate, transform and enrich it, without having to sit in front of their computer all the time. Unlike traditional data preparation tools, Wealthport favours automation over interactivity, feedback over commands. We believe that most data preparation activities are recurring and repetitive, and ultimately to be automated by machines. Human feedback is just needed to improve the accuracy of Wealthport’s data preparation service even further, adapting to the specific characteristics of our user’s data and how it changes over time. We could very well imagine using additional features of DSE, such as DSE Graph or DSE Search in the future.
DataStax: What advice would you give to other startups that are thinking about using Cassandra for the first time in their solutions?
Wealthport: One of the most important aspects of adopting DataStax Enterprise is understanding really well how the whole system has been designed and how it achieves high fault-tolerance and linear scalability. This is particularly true for people which have a background in more traditional (relational) database management systems. DataStax Enterprise is different. Start small, follow some tutorials, master data modelling in DataStax Enterprise. In our experience, understanding the differences to traditional data modelling was crucial in achieving a reliable database cluster.
SHARE THIS PAGE