Confessions of an Oracle DBA – Part 3
This post concludes a short series on how a hard-core Oracle guy came to see that NoSQL databases are here to stay and are able to handle things that Oracle was never meant to.
In Part 1 and Part 2, I covered the industry changes I’ve witnessed as an Oracle DBA and how I believe we’re now at another point in the data management industry that is necessitating a move to complementing, or in some cases, replacing Oracle. Let’s now continue and see why the new rules of data management have really redefined what it takes to compete in today’s industry and the impact these have on RDBMSs like Oracle.
What I’d like to do is walk down some of the top requirements I’ve seen come up time after time in the customer visits I’ve been on in the past five or so years and compare how Oracle and a NoSQL database attempt to answer them.
All Kinds of Data Types and Structures
It’s becoming common knowledge that (1) most every company has to manage a plethora of different data types (e.g. semi-structured, unstructured) and data formats, and (2) the ability of NoSQL to tackle this need is nearly always better than an RDBMS like Oracle.
For example, a number of our customers at DataStax have businesses that trace every interaction that a user has with an online video or movie. That means data tracked for one user might equate to only a handful of interactions vs. another user who has hundreds. Now, there are ways of modeling this in an RDBMS, but they don’t come out as clean or performant as they do in a NoSQL database like Cassandra that allows you to easily have rows in the same table that have wildly different numbers of columns and datatypes.
Such ability helps with data model flexibility, storage efficiency, and speed as one of our customers, NASA, told me: “Cassandra’s NoSQL data model allows us to insert and query data much more naturally than what we had previously. The analysts who routinely use this data were impressed with the flexibility and speed at which the queries came back.”
Or take one of our healthcare customers. You know the old joke about not being able to read doctor’s handwriting? Imagine having to scan in doctor’s handwritten notes to your database, analyze them, and do so with such a degree of precision that you use the information to bill back Medicare/Medicaid. That’s the kind of use case that our customer is handling with Cassandra’s flexible data model that handles all data formats equally well.
High Speed Data Consumption
Oracle has many different mechanisms to insert data quickly (e.g. parallelized and direct loads, no logging inserts, etc.), but today’s high velocity data that’s made up of extreme levels of both concurrency and volume sometimes outstrips Oracle’s ability to consume data as fast and cost efficiently as an application requires, especially if it’s a mixed load consisting of lots of updates in addition to new data.
This is where a logging based NoSQL architecture like Cassandra can make a difference. For write intensive systems, Cassandra’s append-style method is able to meet the requirements of big data systems that consist of sensor or device data, time-series financial streams, web click streams, and similar data that come in at the speed of light.
As an example, one of our customers at DataStax is a very prominent supplier of music and movies and as the popularity of their service grew, their Oracle database proved to be a bottleneck for their gift card redemption process. They’ve since replaced Oracle with Cassandra and for the past two Christmases, they’ve handled all incoming requests with no problem.
Another customer example is Gnip who serves as a supplier of social media data to 90% of the Fortune 500. Just with Twitter alone (and they also drink in data from Facebook, WordPress, and others), they get 20,000 tweets per second or more and are only able to handle that kind of consistently blazing input with a NoSQL database like Cassandra.
Something that can compound the data velocity problem is the need to handle not only high-speed data, but data coming in from everywhere. This translates into the need for a true location-independence design that is read/write anywhere vs. the standard read-only scale out sharding implementations that some try with RDBMSs like MySQL or Oracle.
The standard master-to-master or multi-master replication architectures of Oracle just aren’t designed for these types of use cases, which is not a knock on them; they were never meant to handle the ‘data everywhere’ scenarios. But some NoSQL solutions like Cassandra were built from the ground up to meet such scenarios, and that’s one of the reasons why many of our customers (e.g. Adobe, eBay, Netflix, etc.) use it.
Future Proof Scale that’s Always On
In his somewhat deceptively titled article “Why NoSQL is No Oracle Killer“, InfoWorld’s Andrew Oliver spends the bulk of his time explaining why, although Oracle will still be a staple in many corporations, NoSQL could indeed strike a mortal blow to Oracle RAC. In fact, Oliver ends his article by stating, “NoSQL is a RAC killer.”
When you need the type of scale I term “future proof” coupled with the requirement for continuous (not just “high”) availability, then you need to look past new Oracle mainframes or other such options to a NoSQL database like Cassandra.
I’ve used Oracle RAC, fully understand the pros and cons of Data Guard, and worked with the guys at GoldenGate before Oracle bought them. With all due respect to each of them, they aren’t designed to seamlessly add scaling capacity or guarantee uptime in the way that something like Cassandra can. The proven linear scaling nature of Cassandra and fairly universal acknowledgement of it being the gold standard where multi-data center and multi-cloud availability zone support is concerned should turn an Oracle DBA’s head when they’re called upon to tackle projects that involve the need for scale and constant uptime.
What About Transactions?
Beyond the above new rules that are defining data management today, there are always a few other issues that pop up when NoSQL is compared to something like Oracle, so I thought I would briefly address the ones I hear the most often.
By far the number one question on comparing Oracle to NoSQL has to do with transaction support, and NoSQL not being “ACID compliant.” Let me say just a few things on this subject because I fully understand the spirit of the complaint.
First, as a post from the Oracle Alchemist points out, most databases aren’t really ACID compliant when it comes to adhering to the strictest definitions. Moreover, there are those such as tech consultant Dan McCreary who claim “Ninety-five percent of database-driven systems today don’t need ACID transactions”.
Second, some NoSQL databases like Cassandra can deliver what IT pros practically want and need in this area. Cassandra has durability . . . and atomicity . . . and isolation. And it offers consistency, albeit not in the way RDBMSs do with referential integrity, but rather with an eventual consistency across a database cluster that is tunable per operation (i.e. each insert, update, delete) as to how strong or eventual that consistency is.
And it can do these things for batch operations. So while Cassandra isn’t ACID compliant in the strictest sense of the term, it can deliver what’s needed for many use cases from a practical perspective.
The Learning Curve and Management Burden?
What about the learning curve and management overhead that comes with bringing in new technology like Cassandra? Actually, there is a fair amount of familiar ground in the areas of general terminology and function. Further, things like the Cassandra Query Language (CQL), which looks and functions exactly like SQL make it easy for developers and DBAs to get started.
Without a doubt, the number one hurdle to clear from an education perspective is the change needed in modeling data. Trying to build Cassandra tables just as you would Oracle tables isn’t normally going to produce the end result you want, so time will absolutely need to be taken to understand how to model data in the NoSQL world. Fortunately, there are good general tutorials and training available, as well as blog posts from companies like eBay on how they data model for success with Cassandra.
As to the management overhead associated with NoSQL, it goes without saying that your mileage will vary depending on your particular situation. Further, the claims some have made about NoSQL obsoleting the role of DBA are absolutely false.
But what I have observed with our customers is that the amount of babysitting and admin care a DBA has to provide a Cassandra database cluster is typically a lot less than Oracle systems I’ve had to manage in the past (some of which have a one-to-one ratio).
For example, two of our customers each have clusters that are 700+ machines in size, and yet their admin staffs only consists of three people. Another customer has 200 nodes in production and when I asked them how big a staff they had devoted to their Cassandra cluster, I was told “none.” Instead, they have a DBA who spends about an hour or so each week with it, but beyond that, they said they considered the implementation one that was “set it and forget it.”
Stating the Obvious
I doubt anyone will be surprised when I say that a subscription to DataStax costs magnitudes less than Oracle per machine, or that NoSQL databases like Cassandra are built to run on inexpensive commodity hardware whether that’s on premise or on a cloud provider like AWS.
The personnel costs factor has already been referred to in the previous section and won’t be an issue where NoSQL databases are concerned except for, perhaps, finding an experienced pro to sign on. But even this is becoming easier as solutions like Cassandra have been in production for quite some time now, which means more IT personnel are becoming schooled in the technology. Plus, training classes like we offer at DataStax usually operate at maximum capacity so we’re doing our part to make sure there are plenty of skilled people in the market.
A couple of years ago, Oracle put out a white paper whose premise was that NoSQL was mostly hype and that the technology shouldn’t be trusted for serious enterprise applications.
Four months after that paper appeared, Oracle released its own NoSQL database.
At the time, graphics used by Oracle in white papers and presentations gave the impression the company may be using NoSQL as no more than a funnel to get customers to eventually buy Exadata or something similar:
But, regardless, let’s fact it: Oracle didn’t get to be the behemoth it is today by being dumb; it’s recognized what NoSQL brings to the table. Further, Oracle should get the credit that it deserves. Much of its success is absolutely due to the fact that it produces and supports a world-class RDBMS that, in my opinion, sets the bar where relational technology is concerned.
I agree with Andrew Oliver’s statements in his InfoWorld article: Oracle will remain a top pick for any business that (1) needs a serious RDBMS and, (2) has the cash to pay for it.
But for the many use cases brought about by today’s new data rules and requirements – the ones that steam roll over Oracle and really any RDBMS – those situations have convinced this hard-core Oracle guy that one size really doesn’t fit all and that complementing or even replacing Oracle with a NoSQL engine like Cassandra is at times the right thing to do.
 “The CIO’s Guide to NoSQL,” Dataversity Webinar, June 12, 2012: http://www.dataversity.net/webinar-the-cios-guide-to-nosql-2/.