The Five Minute Interview – IgnitionOne
This article is one in a series of quick-hit interviews with companies using Apache Cassandra and DataStax Enterprise for key parts of their business. For this interview, we spoke with Andras Szerdahelyi who is a Solutions Architect at IgnitionOne.
DataStax: Thanks, Andras for speaking with our audience today. Can you please provide an overview of what services IgnitionOne provides its customers?
Andras: IgnitionOne is a digital marketing solutions company providing world-class proprietary technology and expert services to improve digital marketing performance. IgnitionOne’s integrated Digital Marketing Suite (DMS) helps marketers centralize, manage and optimize digital media, understand cross-channel attribution while helping to optimize conversions on a marketer’s website.
DataStax: How does your application work, at a high level, and how does your database infrastructure support that application?
Andras: Our applications collect, filter, reduce, and make real-time decisions on behavioral user data. The input of these applications vary from users’ raw web session data to compact Profiles, augmented by external data sources and analytics processes. The output is usually the creative of a campaign, a complex on-site interaction, a web search ad or other content, for which the user is most likely to convert on. Our database infrastructure ensures that all this data flows from collection points through analytics processes to the consuming application that selects and triggers the correct interaction.
DataStax: How do you support your business? Is it on premise or do you operate in the cloud?
Andras: The majority of our application is supported in a virtualized environment distributed across multiple datacenters per geographic region (US, Asia, Europe, Middle East and South America) with both a cloud and co-located presence to work more closely with partners and deliver speedy services to clients. Our DataStax Enterprise (DSE) nodes are distributed across these datacenters for both product delivery and heavy analytical processing.
DataStax: So you’ve got your own datacenters all over the globe. What technical and business challenges drove you to change your architecture?
Andras: The driving force behind this move was the business’ need to present our customers with a single solution, through bringing our already successful individual products under the same roof. This presented us with a problem of integration as we moved our high-resolution data closer together. This data is created, updated and accessed in a very heterogeneous environment. Our software architecture is a mix of LAMP and .NET stacks that all need to operate on the same data points.
Integration alone wouldn’t necessarily mean that we have to throw away all our current databases such as MySQL and Microsoft SQL Server. But because of the data volume that this integration was bringing in to the same database, the ETL processes that were moving data between the application servers, MySQL servers, MS SQL servers – all of those were becoming single points of failure. They were also beginning to perform poorly and cause availability problems.
It was a very complex architecture. At first we weren’t handling an overly significant amount of data, but after the integration it grew into a tremendous amount. So integration and data growth drove us to re-evaluate our software architecture.
DataStax: That’s a significant challenge. What caused you to select Cassandra?
Andras: We re-evaluated our existing relational databases for this purpose and we also evaluated some NoSQL solutions like CouchDB, Redis, MongoDB, even Neo4J. We looked into all these technologies and found that only Cassandra was distributed from the ground up by design. Cassandra just feels as if it’s made like an indestructible distributed system, which is not something you can say about its mostly master/slave counterparts.
During our evaluation, we found that replication as implemented by most NoSQL servers was not sufficient to support our data-locality requirements (1 replicas here, 2 there.. et cetera), in addition to maintaining master/slave just being a pain in general. As opposed to master/slave, Cassandra is a master-master architecture, so no single point of failure and you can write wherever you want, both of which we very quickly took a liking to. That really drove our decision. The other products also lack a rich data structure; they are unable to present a rich, flexible schema to the applications.
DataStax: For the data that was originally in MySQL and SQL Server, are you migrating that to Cassandra in parts or entirely? Is it new data that you’re now putting into Cassandra?
Andras: It’s a mix. We employ a custom-written middleware in between our applications, some of these legacy database systems and our new ones. This middleware is designed to smooth out the transition. You could say that we are doing a rolling migration of sorts.
DataStax: Can you project how much data you will store in Cassandra versus other databases, within a year or two from now?
Andras: I’m pretty sure that it will increase tenfold within a year. Before we were managing about 200 to 300 gigabytes of data in these systems. Due to the integration work that we’re doing and fattening up our products, in a year I think we will have several terabytes. That would just be for the basic profile, common to all of our point solutions. We have hundreds of terabytes of high-resolution data that are outside of Cassandra, but eventually we would like to have Cassandra as our single store solution for anything profile-related.
DataStax: Is Cassandra handling external facing portions of your application or more of an internal facing function doing backend operations? Are they more things that your customers actually see, from response time queries or inputting data or things like that?
Andras: Externally, Cassandra is just a couple of layers away from a user browsing a pixel-ed webpage or one displaying an ad unit. Internally, we have our analytics and data maintenance jobs running on the store, too. I would say it does both.
DataStax: What caused you to look at DataStax Enterprise versus just open-source Cassandra?
Andras: The primary reason was that these internal services that we’re building as a part of this integration effort are created and maintained by a relatively small team of system and software engineers. We wanted to have the same kind of support that we have for our legacy, or I should say established, systems.
DataStax has the most committers and Project Management Committee members on it’s staff. It just gives us a peace of mind that whatever Cassandra issues we encounter, we can just go straight to you. And on top of it, I found your support contract very welcoming. In particular, the fact that you can open a general Cassandra question of not a great urgency, like a development environment question or anything like that, and it will be updated in a working day. Of course, the high priority support, the one-hour, 24/7 response time is great.
From the software side, just having Cassandra and the Hadoop stack integrated in a single software package was also a huge timesaver for us; it greatly reduced operational complexity – with very little additional configuration we could start analytics jobs against our data.
DataStax: How do you manage your DataStax Enterprise clusters?
Andras: We have an existing setup for managing our hardware architecture. We use Puppet to manage configuration. We have our own deployment system, into which the DataStax Debian package fits rather nicely. We use OpsCenter mainly for monitoring.
DataStax: If someone brand new took Cassandra and/or DataStax Enterprise were to come to you, or just new to NoSQL, and asked you for some advice in terms of how they should get started, the things they should watch out for, or best practices, what kind of advice would you give them?
Andras: Just start at the data model, at the needs of the data. Get that right and everything else will follow.
For more information on IID, see: www.ignitionone.com.