The Azure Failure Snowball

The Azure Failure Snowball

March 4

It must be tough to be a big business. They’ve got an awful lot to take care of. Everything from patent infringements, to product development, support, and company softball leagues need to be addressed. Yet, it’s hard to feel bad for them when they do something really stupid.

In an ironic twist, three days after being declared the number one cloud provider in the world because of its speed, availability, uptime, and because it had never registered an error, public cloud storage service Microsoft Azure crashed and shut down service to thousands of users worldwide.

Leave it to clouds to ruin a sunny day.

But what would cause such a large outage? Surely it must be massive hardware failure? A cyber-attack? Aliens? Actually, just like the recent outage of cloud competitor Amazon Web Services, Windows Azure crashed because of simple human error.

Microsoft reported on the Windows Azure blog that the disruption was caused by an expired secure sockets layer (SSL) certificate. The shutdown affected a number of Azure services that are dependent on storage, which meant that users of 52 different Microsoft services, everything from email clients to Xbox Live, were without service for several hours. General manager of Windows Azure business operations Steve Martin (no relation to the Grammy-winning banjo player) has issued a statement saying that because of the scope of the outage, Microsoft will be providing credits to impacted customers in accordance with their SLA.

It’s not clear exactly how many users were affected, but the cost of crediting them will be an awful expense. As always, it’s often more important to think about the little things that can go wrong because one little thing can set off a fire-cracker fuse of issues.

Because of Microsoft’s one little oversight, thousands of users were without service for as long as twenty-four hours. The effect on Azure customers even trickled down to their customers. For example, StorageCraft was unable to send an important customer communication through our Azure-hosted email client. Microsoft’s failure seeped all the way down to us, even though we don’t even use Azure directly.

(Luckily for us, as a backup and disaster recovery company, we were ready to send out the communication via alternate (dare I say “backup”) methods and the communication was sent as planned.)

Compared to big issues, there are countless more little things that can happen and forgetting which ones need to be handled can cost money. Think of the cost of a ticket you’d get if you forgot to register your car or the emotional price you’ll pay if you forget to register for our content marketing webinar on March 12. Not thinking ahead can really take a toll.

The takeaway here is:

  1. Remember the little things
  2. Register for the content marketing webinar on March 12.

Photo Credit: redjar via Compfight cc