If last week began for you like it did for me, Monday morning was a parade of broken web pages and non-working apps. The root of the problem was that Amazon's AWS premier region had barfed all over itself, and took a lot of the internet down with it.

This was coming.

I called it a year ago, when Amazon started hard-line mandating a return-to-office (RTO). I connected the dots then, and now that picture is starting to become clear — ooh, it's a picture of a server farm in flames.

Look. I'm totally speculating here, but as you'll read below, I'm no longer alone in this speculation. And Amazon wasn't alone in making the mistake of allegedly using an RTO mandate as the most blunt kind of employee weeding-out mechanism. A ton of companies did the same thing, from Fortune 500 megacaps to startups and SMBs. A ton of those companies are now paying the price for it.

There's no more painful example of that RTO mandate payback than when your company takes down a ton of other people's businesses — then finds itself frozen in place for far too long before being able to determine cause and correction, because no one in the room could figure it out.

Again, I'm only connecting dots, not taking a victory lap.

Although I think I could take that victory lap. Because it's becoming clear exactly why something like this was going to happen. And it's clear that it's going to happen again.

OK, it's a victory lap, but I also have a fix.

Amazon's Big RTO Mistake

A little over a year ago, I wrote about Amazon kicking off a much more hardline wave of return to office mandates. In that article, I pointed to five reasons why their love-it-or-leave-us approach was a major mistake:

  1. Even before Amazon's RTO announcement, their competitors' recruiters were already using the forthcoming announcement to cherry pick Amazon's most experienced talent.
  2. Amazon would effectively shrink their pool for recruiting new talent by up to 90 percent. This means the more senior talent available outside of Amazon's geography were more likely to land somewhere else than relocate.
  3. The morale hit that their workforce would take would impact their most senior talent the hardest, and those are the folks with the option to repair their morale by going somewhere else.
  4. The productivity hit they would take by adding commutes and removing focus time would also impact their most senior people the hardest, as their time was logically the most valuable to the company.
  5. The excuses spun up to cover the more obvious reason for RTO — Amazon's sunk investments in additional corporate HQs — would be most apparent to those folks who had been at Amazon when those HQ investments were made.

You don't have to be a distinguished engineer to see the common thread here. If a company like Amazon wants to smack its most senior and experienced resources in the back of the head five times, their 2024 RTO mandate was the most efficient way to do it.

The Tech Industry Is Leaking Senior Experience

It's not just Amazon. And it's not necessarily just an RTO mandate that will push experienced talent out the door. It's actually the use of RTO mandates without exception, which could easily be interpreted as a "shoot first and separate later" way to trim an overhired workforce, that turns an RTO mandate into a self-inflicted wound.

It happened the same way with overarching AI adoption mandates, cut-at-all-costs calls for profitability, blanket adoption of tired product development practices — basically it seemed like every trendy corporate organizational move since 2022 was invoked to treat all talent as equal and equally expendable.

I came up through the industry as a developer. One thing I learned very early on in my corporate executive journey is that if you don't treat more talented resources with the respect their talent demands, you'll be left with the talent you deserve.

That statement irritates a lot of modern non-tech corporate executives to no end, until they discover that all their experienced employees got tired of their flat-org-for-thee-but-not-for-me bullshit, and they realize they fired all their junior employees to make room for AI productivity theater.

Now these executive leaders are left with managers who follow outdated methodologies, leading shell-shocked junior employees who are just fighting to stay on payroll, because without the job, they can't afford to live where the company HQ is.

Did this happen to Amazon? I don't know. But the Register seems to think it's a possibility.

A Strong Case for AWS Brain Drain as the Root Cause

As I was searching for why the hell it took so long for AWS to fix the glitch, I found this column at the Register by Corey Quinn, a "Chief Cloud Economist" no less, who took a stab at connecting even more dots. DNS dots.

"And so, a quiet suspicion starts to circulate: where have the senior AWS engineers who've been to this dance before gone? And the answer increasingly is that they've left the building — taking decades of hard-won institutional knowledge about how AWS's systems work at scale right along with them."

It took 75 minutes, which might as well be 75 days in low-level back end time, just to get to what was going on. Corey speculates why:

"When that tribal knowledge [experience with 'wonky' DNS issues] departs, you're left having to reinvent an awful lot of in-house expertise that didn't want to participate in your RTO games, or play Layoff Roulette yet again this cycle. This doesn't impact your service reliability — until one day it very much does, in spectacular fashion. I suspect that day is today."

Gangster.

Then, knowing the punches are coming, Corey pre-emptively lists "27,000+ Amazonians impacted by layoffs between 2022 and 2024, continuing into 2025," and Amazon suffering "from 69 percent to 81 percent regretted attrition" and "anecdata of senior Amazonians lamenting the hamfisted approach of their Return to Office initiative."

I'll break here to add that Amazon announced another 14K to 30K job cuts two days ago, and I'll have a take on that shortly.

Corey calls this a tipping point. I love a good tipping point. But I think this is just the first domino to fall. Because…

Brain Drain Caused the Problem, But Cost Cutting Made It Worse

My websites stayed up. My apps continued to work.

For me, the damage was limited to websites and platforms that I use as an extension of what I do, not my primary business. And every time I got stopped out of doing what I do, reading error messages coming unfiltered directly from AWS, the only thing I could think of was:

"Where is your failover?"

In other words, why were these websites and app platforms unprepared for their lifeblood provider — hosting and processing — to go down in flames?

But my question was hypothetical. I didn't need an answer because I happen to know something about almost all the companies who provide the platforms that help me do what I do. And I can again speculate, comfortably, that every single one of them cut their costs to the bone, including cutting their own experienced talent.

Which made this quote from Corey hit home: "I want to be very clear on one last point. This isn't about the technology being old. It's about the people maintaining it being new. If I had to guess what happens next, the market will forgive AWS this time, but the pattern will continue."

This is exactly what I said a year ago. The pattern continues. And it will continue to get more painful until we bring experience back to the table and give it the respect it deserves.

Please join the rebel alliance of over 10K tech professionals on my email list. Some of us are old like Kenobi, others are young like Rey, but we all kind of hated "Rise"