Why people care about big data and multi-datacenter replication
This is my final post in a series (here, here, and here) breaking down the following paragraph from our recent press release that touches on some of the key reasons why people choose Cassandra. It reads:
Customers this year chose Cassandra time and time again over competing solutions. The peer-to-peer design allows for high performance with linear scalability and no single points of failure, even across multiple data centers. Combine this with native optimization for the cloud and an extremely robust data model and Cassandra clearly stands apart from the competition for enterprise, mission-critical systems. [emphasis added]
For many, the idea of spanning multiple datacenters with a single database conjures images of late nights, amazing complexity, and delicate “bubble gum and shoestring” solutions that once created, you would never even think about touching for fear of watching it all crumble. The architectures were so challenging that the cost and complexity simply outweighed the benefits. But in today’s world, it’s becoming more than a benefit — it’s now a requirement. Let me share a few examples:
- Disaster avoidance. Some companies need to plan for a worst-case scenario where they lose contact with an entire datacenter (or “region” in Amazon’s cloud). During the outage, their application running on the database must not fail.
- Performance. Taking the processing to the locality where the application is interacting with its users.
- Scale. One customer of ours is running an application simultaneously collecting massive amounts of device data from its infrastructure in more than ten (10) datacenters across the globe.
- Security. Certain industries have regulations that require local copies of the data, but nobody wants to wait on things like log shipping and batch loads.
The world has shrunk for businesses and IT is often in a race to catch up. The fact is that now big data comes from a wide variety of geographic locations. It only makes sense that developers and operations teams are going to require that their underlying database keep pace. But because of all the “scar tissue” formed from many years of trying to handle even keeping just two (2) datacenters in sync (let alone multiple) people have a healthy skepticism as to whether or not it can really be done.
Not only “can” it be done, it “must” be done as that is going to be a common requirement in this big data world.
I think about it a little like wireless. When wireless networking became a reality, many were ultra skeptical. Remember how hard everyone fought hard to get their computers and phones in their homes and offices in just the right places with just the right wiring? Wireless, at first, seemed far too complicated and a “pie in the sky” vision. But once we started using it, there was no going back. The benefits were too great, and we all started demanding it.
When I talk to customers whose businesses now live and love multi-datacenter replication, I can’t help but think that in a few years, we’ll all be looking back and asking: “How did I ever build applications on just one, or even two, datacenters?