The AWS Outage That Broke Half the Internet: Your Cloud Isn't as Safe as You Think Artwork

Good intentions, bad outcomes

A podcast about challenges and practices you might encounter in the workplace... things that were intended well, but have outcomes that aren't so great. In most cases, the organizations aren't even aware of how bad the outcomes are.

Every episode we discuss a situation that has something wrong with it: the what, the why and what can be done to address it.

All Episodes

Good intentions, bad outcomes

The AWS Outage That Broke Half the Internet: Your Cloud Isn't as Safe as You Think

October 21, 2025 • Xodiac • Season 1 • Episode 16

0:00 | 14:11

Ever moved to the cloud thinking you'd finally eliminate those dreaded outages? In this episode, Gino Marckx and Wayne Hetherington break down what happened when AWS went down and took half the world's services with it.

The intent behind cloud migration is solid. Move off your own hardware, get better reliability, scale as needed, and never worry about infrastructure again. The cloud provider handles redundancy, right? Except when AWS goes down, so does everything running on it. You've just traded one single point of failure for another.

We walk through why this keeps happening. Most organizations assume the cloud provider has built-in redundancy across regions and availability zones. And they do - within their own system. But if you're only on AWS, or only on Azure, or only on Google Cloud, you're still vulnerable when that one provider has issues.

The solution? Multi-cloud architecture. Spread your critical services across different providers. Yes, it costs more. Yes, it adds complexity. But if uptime actually matters for your business, it's the only real answer.

We also talk about when it's okay to accept the risk. A pet grooming appointment booking site can probably survive a few hours down per year. Medical services or air traffic control? That's a different calculation. It comes down to understanding how many nines of uptime you actually need and what you're willing to pay for it.

Timestamps:

0:00 - Introduction

0:33 - AWS outage hits half the world

1:16 - Why organizations move to cloud in the first place

2:50 - The promise of always-available infrastructure

4:39 - So why did everything go down?

5:28 - You still have a single point of failure

6:25 - The assumption of built-in redundancy

7:21 - Building real backup plans across providers

8:40 - How unlikely is a multi-cloud failure?

9:41 - The challenge of keeping environments consistent

10:58 - Cost vs. redundancy: the eternal tradeoff