Why does Scalability matter, and how does Cassandra scale?
date: October 30, 2010
The term scalability is inherently a bit amorphous and typically dependent on a specific use case. For the sake of this discussion, we'll define scalability as the ability to add computational resources to a database in order to gain more throughput. We'll look specifically at the two types of scalability available - vertical and horizontal - and provide a discussion of each in this context.
Vertical scalability involves moving from one machine to another that has more capacity - whether that's RAM, CPU, storage or some combination thereof. Though this may seem like the most direct approach, scaling a database vertically is expensive in both financial outlay and resource education.
In considering financial outlay, larger, more robust hardware is expensive to acquire and operate. Particularly when dealing with large amounts of data, moving to some sort of attached storage infrastructure is necessary as a single instance having access to the number of required disks becomes impractical in a single chassis.
In terms of resource dedication, if there are any requirements for maintaining uptime, significant operational planning and effort are usually required to migrate to the new system. If the volume of data is large, then the physical transfer from the old system to the new can take an inordinate amount of time depending on the load.
In contrast, a system is horizontal scalable if hardware can be added incrementally. If you need more capacity, you add additional hardware. In an ideal horizontally scalable system, addition of hardware should provide linear increases in capacity available without reconfiguration or downtime required of existing nodes.
Apache Cassandra meets the requirements of an ideal horizontally scalable system by allowing for seamless addition of nodes. As you need more capacity, you add nodes to the cluster and the cluster will utilize the new resources automatically.
This level of scaling flexibility easily lends itself to efficient deployment on either commodity hardware or cloud based infrastructure. By utilizing the benefits of the cloud, you can scale your Cassandra cluster as you need to, and on the fly, thanks to service providers like Rackspace's Cloud Servers and Amazon's EC2 offerings. As your demands grow, simply launch new instances to increase the size of your Cassandra cluster.