What Happened to AWS on Monday

Day 22/90

Stevie Emmit

~4 min read · October 22, 2025 (Updated: October 22, 2025) · Free: Yes

What Happened to AWS Yesterday

On the 20th of October 2025 at about 07:11 GMT, Amazon Web Service (AWS) experienced a major outage that disrupted countless websites and apps around the world

The incident originated in AWS's US-EAST-1 region (Northern Virginia, USA), one of its largest and most critical data centers. The root was cause was traced to a DNS resolution issue for the API endpoint of the DynamoDB service in that region. DNS is the system that translates human-readable domain names into the numerical IP addresses computers use to locate servers.

Because DynamoDB (a core database service) was impacted, many other AWS services and customer apps that depend on it failed. Numerous AWS services and customer application that depend on DynamoDB went down or experienced severe slowdowns.

How Long did it last, and was it resolved?

AWS reported that they applied initial mitigations early in the incident and by around the afternoon of that day 10:11 GMT (US time) they had all their services back to normal operations.

However, even after "normal operations" there were still back-logs of internal processing, and some smaller issues lingered for a while.

Who and What was Impacted?

Many popular apps and services: social (e.g., Snapchat), gaming (e.g., Fortnite, Roblox), financial platforms (e.g., Coinbase, Venmo), streaming, etc.

Even internal AWS-backed services (including Amazon's own apps/devices) were disrupted because they rely on the same infrastructure.

Because AWS is a backbone for so many services, the outage has a cascading effect beyond AWS itself

How did Amazon respond?

AWS acknowledged the outage and said engineers were "immediately engaged" to fix the problem.

AWS said it worked on "multiple parallel paths to accelerate recovery". It also reported that the main issue had been fully resolved, though some users continued to face minor delays as systems recovered.

The company also said it would publish a detailed post-event summary explaining what happened.

Why does this Matter?

It shows how much modern digital services depend on a small set of cloud providers and key data-centers. A failure in one region can ripple globally.

For businesses, if your service runs on AWS (especially in US-EAST-1 or depends on services like DynamoDB/AWS APIs), you may need to review redundancy, fail-over, or regional backups

For users, even if a particular app isn't obviously reliant on AWS, many services sit on AWS, so a problem there can show up in unexpected ways.

Why Diversification is Non-negotiable in the Cloud Landscape

The three major players, Amazon Web Services (AWS), Microsoft Azure, and Google Cloud (GCP) — together control around 63% of the global cloud infrastructure services market as of early to mid-2025.

Individual Market Shares

· AWS: about 29–30% in Q1/Q2 2025

· Azure: around 22–23% in Q1 2025

· Google Cloud: approximately 12–13% in the same period

· Smaller providers: for example, Alibaba Cloud holds roughly 4% globally in Q2, 2025.

These figures make clear that the hyperscaler could market is highly concentrated, with a few large providers dominating global infrastructure.

Implications for Companies

Given the current market structure and growth trends, there are several key takeaways for organizations relying on cloud services:

1. Dependence Risk: if most of your infrastructure or workloads sit with one provider or one region, you are exposed to vendor-specific or regional failures, like the AWS outage discussed earlier.

2. Fail-over and redundancy: To minimize exposure, companies should consider distributing workloads across multiple cloud providers (multi-cloud strategy) and/or across different regions within a single provider (multi-regional setup).

3. Avoiding vendor lock-in: Relying solely on one cloud may seem simpler, but it increases switching costs, limits flexibility and can delay recovery when outages occur.

4. Cost vs. resilience trade-off: Multi-cloud or multi-region strategies can be more complex and costly, but they bring stronger resilience and reduce the risk of a full-system failure.

5. Strategic Planning is key: Evaluate your critical workloads and ask: What happens if my provider's region goes down? Do you have backups, alternate regions, or another provider ready to take over?

6. Future-proofing: As cloud spending grows and AI-driven workloads surge, the ability to move seamlessly between multiple cloud ecosystems will be vital for operational continuity.

#aws #cloud-computing #cloud-services #writer #information-technology

What Happened to AWS on Monday

Day 22/90

Reporting a Problem