Youneeq Drives Innovations in Media Industry with Real-time Personalized Content RecommendationsMay 28, 2015
This post is one in a series of quick-hit interviews with companies using Apache Cassandra and/or DataStax Enterprise (DSE) for key parts of their business. For this interview, we talked with Mike Lally who is the Chief Architect at Youneeq.
“Spark in particular is a compelling offering; we don’t use Hadoop currently because we found it too slow at most of what we do, but having that functionality at a lower performance footprint, and without the setup headaches that comes with manually building a Spark cluster, changes the game considerably.”
DataStax: Why don’t we start off with what Youneeq does and your role there?
Mike: Youneeq delivers web content personalized to every visitor of a site, updated with every click they make. If you’re familiar with Tim Berners-Lee’s vision of a Web 3.0, providing a semantic web that uses data driven standards to empower an adaptive experience for every user, that’s the sort of work we do while more so leveraging the Web 2.0 standards and patterns that are common today. Most of our clients are digital publishers, and for them we typically provide better targeting of their internal story content while helping to better align their ad content and campaigns with individuals, producing a more targeted and less congested experience for site visitors. Essentially, we organically improve site revenue through a more engaged user base who consume more content. As Chief Software Architect I’m responsible for designing the technologies that makes us “Youneeq”, and ensuring that what we build consistently addresses our customers’ needs.
DataStax: How do you use Cassandra?
Mike: We use Cassandra to support our real-time analytics for personalizing content on the fly. We need to be fast enough to load results such that our content regions on a site are largely indistinguishable from what’s being generated by the clients own CMS, as well as support ad integration options that impose similar performance constraints. We also provide custom report analytics that require deep mining of our data, which needs an extensible data model to handle generic OLAP type problems effectively. Overall there’s a similar volume of reads vs writes hitting our databases that varies tremendously on a per table basis, so the tunable consistency of Cassandra and ability to tweak per table options (e.g. compaction) are features we leverage a lot.
DataStax: What led you to Cassandra in the first place and what other technologies was it up against?
I’ve personally been following the development of Cassandra for several years, but we only made the decision to evaluate Cassandra in earnest towards the end of 2013 when our MongoDB data store was hemorrhaging at the seams. When the company started out we were using a relational database back-end, but when we began trialing our services on a high traffic wallpaper site we found that in order to meet their demands we’d either need to price the service higher than we felt was tenable or drastically reduce the quality of the product. To its credit MongoDB helped us for a while hold off much of the qualitative compromises we were considering but just couldn’t scale out to our requirements. We had some initial issues in the switch to Cassandra, but once we got past those hurdles we haven’t looked back, and are definitely benefiting from the additions DSE provides.
DataStax: What attracted you to the DataStax startup program? Which parts of DataStax Enterprise do you use?
Mike: Once we made the decision to move forward with Cassandra the DataStax startup program was a no brainer. DataStax Enterprise is an incredible addition to C*, and to be able to use that for free while our company grows helps tremendously. We evaluated Cassandra sans DataStax, DataStax Community, and DSE; the latter was such a seamless value add that we were hooked right away. In terms of key features, we are currently benefitting from the OpsCenter additions, integrated security, and are actively evaluating the SOLR and Spark integration. I spoke with an individual from one startup at last summer’s Cassandra Summit whose company skipped the program in favor of regular DSE licensing so that they could access premium support directly, so it’s not always the best fit for every company that qualifies, but for most companies in this space every nickel counts and a program like this can be the difference between becoming a success or road kill on the information-super-highway.
DataStax: How do you like the DataStax Startup Program?
Mike: The program is excellent! The best part for us is that the DataStax team really listens to our input, helping us to get the most out of the program while also helping them improve both the Startup Program and the overall DSE product offering.
DataStax: How does DataStax / OpsCenter / integrated Solr/Spark benefit your application/use case/team?
Mike: When we first used OpsCenter our primary application was a quick and easy way to see all the JMX stats from our nodes, but as we began to explore its functionality in greater depth it has become the go-to management tool for our Cassandra clusters. A good UI is hard to come by in server management tools, but I’m finding myself more comfortable running, for instance, various nodetool commands via OpsCenter than via secure shell.
Independent of DSE we’ve been exploring, and for internal use-cases implementing, Spark and Solr for a little while now. Spark in particular is a compelling offering; we don’t use Hadoop currently because we found it too slow at most of what we do, but having that functionality at a lower performance footprint, and without the setup headaches that comes with manually building a Spark cluster, changes the game considerably.
With our infrastructure “in the cloud” security is hugely important for us, but as a startup we can’t afford to spend huge amounts of hours supporting the infrastructure, and this is where DSE security helps. Going NoSQL usually goes hand-in-hand with build-your-own-security, but with DSE all the security features we need are right there and just work.
DataStax: How does that benefit your customers / end users?
Mike: The expression “do more with less” is often tossed around, and in my opinion usually inappropriately, but in our case we can provide deeper analysis of our customer’s data in less time, allowing us to produce a better quality of personalized content. Redundancy and tunable consistency in our Cassandra deployment also means less worry about downtime or unexpected lag in producing results.
DataStax: Can you share some metrics?
Mike: Our production Cassandra cluster typically sees between 400 – 1100 concurrent read/write requests per second at an average latency of under 0.2 milliseconds for both, with occasional peaks approaching 10,000 r/w requests per second (which only bumps the latency up to around an average of 0.3 ms). This allows us to provide a set of personalized content in usually under 1/3 of a second, a substantial portion of the lag being the time it takes to send the data to and from our servers.
Our in-memory distributed analytics have always been fast, but before we went with Cassandra we had to do a ton of caching to match our front-end performance. Now the tables are turned and we’re looking to do more analytics closer to the Cassandra nodes since our front-end can’t match the phenomenal results our data nodes are producing.
SHARE THIS PAGE