Fixing live databases – “the stuff of nightmares”
RIM’s much publicized blackout a few weeks ago was clearly an awful event that every company hopes to avoid. It conjures memories of a similar story from last year about an outage at JPMorgan Chase. Unfortunately, databases can and do get corrupted — more often than we’d care to think about. When I read about such failures, my heart genuinely goes out to those involved because of the tremendous damage it can do to a company’s reputation with its customers.
According to the RIM article, one of their main Oracle databases became corrupted, which forced them to initiate a repair while it was still running. An unnamed network engineer was quoted as saying:
Working with a live database like that is the stuff of nightmares.
I couldn’t agree more. It’s like trying to change a jet engine in mid flight and it’s no fun. But, do you really have a choice when a database becomes corrupt? Well, you do have at least one other option, which is to take the database down entirely. And for the most part, in the relational world, those are about the only two choices you tend to have due to the way almost all RDBMS apps are architected.
With some NoSQL databases, there’s a third choice, and it’s often misunderstood. Let’s start by looking at a blog post on the JPMorgan Chase incident. The author makes the following observation, with which I agree:
One point that jumps out at me is this – not everything in that user profile database needed to be added via ACID transactions. The vast majority of updates are surely web-usage-log kinds of things that could be lost without impinging the integrity of JPMorgan Chase’s financial dealings, not too different from what big web companies use NoSQL (or sharded MySQL) systems for. Yes, some of it is orders for the scheduling of payments and so on – but on the whole, the database was probably over-engineered, introducing unnecessary brittleness to the overall system.
When we talk about ACID transactions and the NoSQL world, all sorts of confusion can arise. Even to a lot of relational folks, the meanings of each letter in ACID have become obscured over the years. The layman’s way of thinking about an ACID transaction is that you are guaranteed that there will be no surprises with the state of your data when reading it from, or writing it to, the database. It will always be in a “consistent” state for every user of the database at all times.
For example, in a bank account transaction that debits one account and credits another, a “consistent” view of the data would mean that before, during and after the transaction, the sum of the funds in the two bank accounts is the same for anyone querying the database. The data is always “consistent” for everyone who is working with it.
It makes perfect sense to do that for something like a bank transaction between accounts. The trouble is, when you put that kind of restriction on ALL of your transactions, it is often overkill. And, in the case of a corrupted database, that demand for consistency is largely responsible for pigeonholing you into the two difficult choices of changing the jet engine in mid flight, or landing the plane to repair it.
This is why tunable consistency is such a powerful choice in Cassandra. By granting your developers the ability to set their consistency level on a per-transaction basis, you give them a tremendous amount of power when it comes time to fix (or make changes to) a live database. In both the RIM and JPMorgan Chase examples, tunable consistency would have allowed the application team to start fixing problems in the database that did not require strong consistency for complex transactions. With “tunable consistency” you can decide to allow the changes to the data to propagate, over time, throughout the entire system. The advantage is that the database never has to come down, and the method of propagating the changes is completely transparent to the developers as they make the necessary changes. It is decidedly NOT the stuff of nightmares. Quite the opposite, in fact.
Too many times, those of us raised on the relational model dismiss the advantages of tunable consistency out of hand when we first hear about it. But when you think about these dreaded corrupted database scenarios, and how they can be handled much more elegantly with a tunable consistency model, it should help us to understand that there are real advantages to thinking about how we handle transactions under a different paradigm — one that allows for the right level of consistency based on the needs of each type of transaction.