Cassandra Summit is back! Join us in San Jose, CA on March 13-14. Learn more.
For businesses that heavily rely on digital means to interact with their customers – and these days that’s pretty much every company – the Uptime Institute recently released a report on downtime trends that is very unsettling. The report, which Uptime says is the most comprehensive study on downtime ever performed across all industries, contains these sobering conclusions:
- IT service and data center outages around the world are not only common, suggesting most SLAs are very often broken, and that outages may actually be increasing.
- Almost one third (31%) of those responding had experienced an IT downtime incident or severe degradation of service in the past year – an increase of 6% from the prior year. Moreover, about half (48%) said they had experienced at least one outage in the past three years either at one of their own sites or that of a service provider.
- Failures at third-party cloud, colocation, and hosting providers, when aggregated, are now the second most commonly cited reason for IT service failure (and you thought the cloud kept you safe!).
- The biggest cause of IT service outage is a data center power outage, closely followed by network problems, followed by an IT system failure.
- A number of survey respondents reported an outage that had cost over $1 million. One outage cost over $50 million.
- Eighty-percent of Uptime survey respondents say that their most recent service outage could have been prevented.
- Many organizations have little understanding of the likely financial and overall business impact of particular IT service failures, nor have they carefully assessed the particular risks they face.
As Uptime’s data shows, not only are outages on the rise, but the consequences of failure can be more expensive and damaging than ever. These facts fly in the face of supposed 99.99% uptime guarantees made by various IT and cloud providers, leading the 451 Group to issue their own report entitled, “Is 99.99 an industry myth?”
But the Cloud Will Save Me…
While many today think that simply signing on to a cloud provider will immediately cure the downtime virus, Uptime’s study showed a different reality. In fact, Uptime classified it as “a new factor reducing IT service availability,” which is downtime experienced at third party service providers (colocation, hosting, or cloud). Uptime found cloud outages accounted for almost a third of all downtime, which is only slightly fewer than on-site power failures at enterprise data centers. This led Uptime to say, “third-party failures have become a critical issue; in hybrid environments, CIOs need to be as mindful of their data center suppliers as they are of their data center operations.”
The Strategy for Stopping Downtime
The only way to completely prevent downtime is to use a multi-home strategy, which Google outlined in a 2015 tech paper. Uptime’s report highlighted this approach and referred to it as “Distributed Resiliency”.
In its paper, Google is careful to distinguish this design from legacy failover-based approaches:
“Failover-based approaches, however, do not truly achieve high availability, and can have excessive cost due to the deployment of standby resources. Our teams have had several bad experiences dealing with failover-based systems in the past. Since unplanned outages are rare, failover procedures were often added as an afterthought, not automated and not well tested. On multiple occasions, teams spent days recovering from an outage, bringing systems back online component by component. . .tuning the system as it tried to catch up processing the backlog starting from the initial outage. These situations not only cause extended unavailability, but are also extremely stressful for the teams running complex mission-critical systems.”
For a multi-home strategy to work, the primary component in the application stack must be multi-home as well – the database platform. Yet, making this happen with legacy database software, or more recent DBMS’s that are based on the same older architecture, is next to impossible.
But, as I highlighted in a recent blog post, there are companies using DataStax software that have not experienced a single second of downtime in over six years with their DBMS platform. The reason for this is DataStax Enterprise (DSE) is multi-home at its very foundation. In fact, DSE is the only masterless architecture available from any mainstream database vendor today.
DSE’s masterless architecture provides the promise of true, zero downtime at the database platform level as well as the ultimate in deployment freedom. Whether it’s multiple on-premise data centers, multiple regions with a single cloud provider, multiple cloud providers, or hybrid cloud, DSE enables real data autonomy and 100% continuous uptime. Plus, DSE’s OpsCenter makes it easy to provision, upgrade, and monitor a single cluster that spans multiple locations and clouds: