NYC* Big Data Tech Day - Wednesday, March 20, 2013

Thanks to everyone that attended NYC* Cassandra Big Data Tech Day. Videos of presentations are available here.

Join NYC* Big Data Tech Day and take a deep dive into Apache Cassandra™, the massively scalable NoSQL database! This two-track event will feature over 14 interactive sessions, delivered by Apache Cassandra experts. Jonathan Ellis, Project Chair for Apache Cassandra, will kick-off the day with an exciting Keynote on what’s new in Cassandra 1.2. Sessions following will host compelling use case discussions along with technical deep dives. Don’t miss out on this learning and sharing opportunity, with multiple breaks for networking including a Meet the Experts area.

NYC* Big Data Tech Day will conclude with an evening social gathering. Cheers to your peers, while enjoying food, networking and fun in the heart of Manhattan’s finest.

Register today! This event is great for developers and architects already familiar with, or wanting to know more about Apache Cassandra.

Apache Cassandra, Cassandra, Apache Hadoop, Hadoop, Apache Solr, Solr and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission as of 2011. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by DataStax.


We have 2 rooms set aside for presentations and a third room to meet and talk to Cassandra Experts.


Room 1

Room 2

Room 3

08:00AM-09:00AMRegistration and Breakfast
09:00AM-09:45AMJonathan Ellis (DataStax)
The State of Cassandra
10:00AM-10:50AMThomas Pinckney (eBay)
Graph-based Recommendation Systems at eBay
Matt Pfeil (DataStax), Rick Branson (Instagram), Russ Bradberry (SimpleReach), Matt Conway (Backupify), Jake Luciani (BlueMountain), Ed Capriolo (m6d)
We F****d Up So You Don't Have To
11:05AM-11:55AMSameer Farooqui
How to Analyze the Human Genome with Cassandra
Eric Lubow (SimpleReach)
The Big Data Revolution is an Evolution
12:00PM-12:50PMRick Branson (Instagram)
Sanjay Sharma (Impetus), Jeff Siegman (Pitney Bowes), Michael Shaler (DataStax)
Implement Big Data Now
12:50PM-02:00PMLunchMeet the Experts
02:00PM-02:50PMJake Luciani (Blue Mountain)
Building a Scalable Time-Series Database with Cassandra
Ed Capriolo (m6d)
Advanced Data Processing: Beyond Queries and Slices
03:00PM-03:50PMAmeet Chaubal and Mauricio Vacas (Accenture)
Large Scale Data Ingestion, Processing and Analysis: Then, Now & Future
Dave Finnegan and Tupshin Harper(DataStax)
NoSQL No Longer Equals No Security
04:10PM-05:00PMBrian O'Neill (Health Market Science)
A Big Data Quadfecta: Cassandra, Storm, Elastic Search and Kafka
Michael Figuiere (DataStax)
New Cassandra Drivers in Depth
05:10PM-06:00PMJohn McCann (Comcast)
Using Cassandra for DVR Scheduling at Comcast
Nathan Milford (Outbrain)
06:00PM-08:30PMDrink Up
Lightning Talks

Confirmed Sessions

Advanced Data Processing: Beyond Queries and Slices
Ed Capriolo, m6d
The ColumnFamily data model and wide-row support provides the ability to store and access data efficiently in a de-normalized state. Recent enhancements for CQL's spare tables and built-in indexing provide the capability to store data in a manner similar to that of relational databases.

For many use cases hybrid approaches are needed, because complete de-normalization is appropriate for some access patterns whereas more structured data is appropriate for others. At times a single logical event becomes multiple insertions across multiple column families. Likewise a user request might require a several reads across different column families.

This talk describes some of these scenarios and demonstrates how advanced operations such multiple step procedures, filtering, intersection, and paging can be implemented client side or server side with the help of the IntraVert plugin.

A Big Data Quadfecta: Cassandra, Storm, Elastic Search and Kafka
Brian O'Neill, Health Market Science
A successful Big Data platform combines distributed processing and polyglot persistence into a single cohesive infrastructure. Over the past few years, Health Market Science has transitioned from traditional relational databases and enterprise systems to a massively scalable Big Data platform that combines Cassandra and Storm to ingest thousands of feeds of data from the health market industry to produce a single high-quality masterfile. Hear how we applied event processing and NoSQL to deliver real-time analytics, while accommodating structural change over time, and fuzzy/geospatial search.

The Big Data Revolution is an Evolution
Eric Lubow, SimpleReach
Dealing with data doesn't only require a data store, it requires an infrastructure. At SimpleReach, we have 5 data storage layers to service all of our data needs. These range from high volume, high velocity data ingestion with real-time analytics to ad-hoc style historical analysis with search capabilities. To communicate effectively between applications, data stores sit behind a service architecture for consistent data access patterns and failover/redundancy. This talk is a story of how we came to this architecture and some of the lessons we learned along the way.

Building a Scalable Time-Series Database with Cassandra
Jake Luciani and Carl Yeksigian, BlueMountain Capital
This talk will focus on our approach to building a scalable TimeSeries database for financial data using Cassandra 1.2 and CQL3. We will discuss how we deal with a heavy mix of reads and writes as well as how we monitor and track performance of the system.

Exactly Why You Can't Have a Pony
Rick Branson, Instagram
It's upsetting whenever we hear that we can't have things that we want. It'd be nice to live in a world where it was possible to have things ACID transactions, uniqueness guarantees, and sequential counters that were globally and always available. What makes this worse is that when we're told we can't have them, people just wave their arms around in the air and shout things like "CAP theorem." In this talk, I'll walk through some of these "ponies" and demonstrate the points at which things start falling apart with practical, real-world examples.

Graph-based Recommendation Systems at eBay
Thomas Pinckney, eBay
Recommendation and personalization systems are an important part of many modern websites. Graphs provide a natural way to represent the behavioral data that is the core input to many recommendation algorithms. Thomas Pinckney and his colleagues at Hunch (recently acquired by eBay) built a large scale recommendation system, and then ported the technology to eBay. Thomas will be discussing how his team uses Cassandra to provide the high I/O storage of their fifty billion edge graphs and how they generate new recommendations in real time as users click around the site.

Large Scale Data Ingestion, Processing and Analysis: Then, Now & Future
Ameet Chaubal and Mauricio Vacas, Accenture
The presentation aims to highlight the challenges posed by large scale and near real-time data processing problems. In past, such problems were solved using conventional technologies, primarily a database and JMS queue. However these solutions had their limits and presented serious problems in terms of scale and redundancy. The new breed of products - a la Cassandra & Kafka, being innately distributed in their design, aim to tackle such challenges in a very elegant manner. The presentation will showcase some of the use cases of this genre from the industry and describe the solutions which have been increasing in their sophistication.

New Cassandra Drivers in Depth
Michael Figuiere, DataStax
Cassandra 1.2 finalizes CQL3 and introduces a new binary protocol for client/server communication. These two components are the foundation of the new line of drivers developed by DataStax. Based on years of experience with Cassandra, these new drivers for Java, .Net and Python come with an asynchronous and lightweight architecture, a clean and simple API, a standardized way to discover nodes and to manage load balancing and fail over. This presentation will give an in depth look at these new drivers which will make your Cassandra-based applications even more robust, efficient and simple to write.

NoSQL no longer equals No Security
Dave Finnegan, DataStax
An April 2012 InformationWeek special report entitled "Why NoSQL Equals No Security" began by stating: "If it seems security is an afterthought at best in the big data ecosystem, you're right."

DataStax Enterprise 3.0 overcomes this perception and is the first big data platform in the NoSQL industry to bring the type of enterprise security used in traditional RDBMS's to secure systems and important data to the big data/NoSQL market. This presentation will describe each aspect of DataStax Enterprise 3.0's security feature set. Note that all security features are optional; the administrator can decide to use none, some, or all of them depending on their specific application.

Features described include Internal Authentication, External Authentication, Permission Management, Transparent Data Encryption, Data Auditing, Client to Node Encryption

Real Time Processing for Big Data
John Burke, GigaSpaces
Learn how integrating in-memory computing, such as the GigaSpaces platform, with disk-based NoSQL stores results in a platform of awesome power, blending in memory speed, ACID transactions, and real-time event processing with horizontal elasticity of both processing and data.

Using Cassandra for DVR Scheduling at Comcast
John McCann, Comcast
Comcast is developing a highly scalable cloud DVR scheduling system on top of Cassandra. The system is responsible for managing all DVR data and scheduling logic for devices on the X1 platform. This talk will cover the overall architecture of the scheduling system, data model, message queue and notification software that have been developed as part of this ambitious project. We'll take a deep dive into the details of our data model and review the implementation of Comcast's open-source, Cassandra-based clones of Amazon SQS and SNS.

Apache Cassandra, Cassandra, Apache Hadoop, Hadoop, Apache Solr, Solr and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission as of 2011. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by DataStax.

Event Sponsors

GazzangGazzang provides data security solutions and operational diagnostics that help enterprises protect sensitive information and maintain performance in cloud environments. The company has over 200 customers across multiple industries including SaaS providers, Financial Services, Technology, Healthcare and public sector organizations. Gazzang is backed by Austin Ventures and Silver Creek Ventures. For more information, visit

ImpetusImpetus provides Big Data thought leadership and services, creating new ways of analyzing data to gain key business insights across enterprises. Impetus’ experience extends across the big data ecosystem including Hadoop, NoSQL, newsql, MPP databases, machine learning, and visualization. Impetus offers a Quick Start program, Architecture Advisory Services, Proof of Concept, and Implementation.

eBayeBay Inc. pioneers communities built on commerce, sustained by trust, and inspired by opportunity. eBay brings together millions of people every day on a local, national and international basis through an array of websites that focus on commerce, payments and communications.

PalominoDBFor startups and established companies of all sizes, PalominoDB provides ongoing operational support and professional expertise in database architecture, performance and scale. With a focus on open-source and other best-in-class software components, and extensive experience in all major and emerging database technologies, PalominoDB engages with customers to develop custom, cost-effective projects and long-term support contracts in areas from system design to automation to business intelligence and more.

Rally SoftwareRally Software is a leading global provider of cloud-based solutions for managing Agile software development. The Rally Agile application lifecycle management (ALM) platform transforms the way organizations manage the software development lifecycle by closely aligning software development and strategic business objectives, facilitating collaboration, increasing transparency and automating manual processes. Companies use Rally to accelerate the pace of innovation, improve productivity and more effectively adapt to rapidly changing customer needs and competitive dynamics. Rally supports 154,000 paid users and more than 1,000 customers, including 36 of the Fortune 100 companies.

ComcastComcast Cable is the nation's largest video, high-speed Internet and phone provider to residential customers under the XFINITY brand and also provides these services to businesses. Comcast has invested in technology to build an advanced network that delivers among the fastest broadband speeds, and brings customers personalized video, communications and home management offerings. Comcast Corporation (Nasdaq: CMCSA, CMCSK) is a global media and technology company. Visit for more information.

KnowledgentKnowledgent is a purpose-built Industry Information Consultancy that provides advanced Information Management and Analytical solutions with industry-specific specialization in Financial Services, Life Sciences, Healthcare and Commercial markets. Knowledgent is founded with the mission to help an enterprise become more information centric. We chose this mission because while information has been recognized as a strategic corporate asset, its importance has now reached an inflection point: corporations that don’t learn to innovate through information and compete on analytics will be left behind by those that do. That’s because the extreme volumes of data exhaust now being generated by systems, machines and the Internet represents an unexplored ocean of data with unlimited potential of mining insights that can be used for instructive predictions about products, markets, customers and human behaviors.

GigaSpacesGigaSpaces Technologies is the pioneer of a new generation of application virtualization platforms and a leading provider of end-to-end scaling solutions for distributed, mission-critical application environments, and cloud enabling technologies. GigaSpaces complementary solutions are XAP Elastic Application Platform and Cloudify - Easy Deployment of Mission Critical Applications to the Cloud.

AccentureAccenture is a global management consulting, technology services and outsourcing company, with approximately 259,000 people serving clients in more than 120 countries. Combining unparalleled experience, comprehensive capabilities across all industries and business functions, and extensive research on the world’s most successful companies, Accenture collaborates with clients to help them become high-performance businesses and governments. The company generated net revenues of US $27.9 billion for the fiscal year ended Aug. 31, 2012.

Apache Cassandra, Cassandra, Apache Hadoop, Hadoop, Apache Solr, Solr and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission as of 2011. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by DataStax.

Event Sponsors
Gazzang Impetus eBay Palomino DB Rally Software Comcast Knowledgent GigaSpaces Accenture

Wednesday, March 20, 2013 at 8:00 AM (ET)


Metropolitan Pavilion
125 West 18th Street
New York, NY 10011