A Postmortem of Three Recent Issues

Published: Sep 17, 2025 Author: Written by Sam McAllister, with thanks to Stuart Ritchie, Jonathan Gray, Kashyap Murali, Brennan Saeta, Oliver Rausch, Alex Palcuie, and many others. Source: Anthropic Engineering Blog

Overview

This technical report covers three infrastructure bugs that intermittently degraded Claude's responses between August and early September 2025. Anthropic states they "never reduce model quality due to demand, time of day, or server load" — the issues were infrastructure bugs alone.

How Claude Is Served

Claude is served via the first-party API, Amazon Bedrock, and Google Cloud's Vertex AI across multiple hardware platforms: AWS Trainium, NVIDIA GPUs, and Google TPUs. Each platform requires specific optimizations, and infrastructure changes require validation across all platforms and configurations.

Timeline of Events

User reports of degraded responses began in early August. Initially these were hard to distinguish from normal variation. By late August, increasing frequency prompted an investigation that uncovered three separate bugs. The overlapping nature of the bugs made diagnosis particularly challenging.

August 5: First bug introduced, affecting ~0.8% of Sonnet 4 requests
August 25-26: Two more bugs arose from deployments
August 29: A load balancing change increased affected traffic, causing broader user impact
August 31: Worst impacted hour — 16% of Sonnet 4 requests affected

The Three Bugs

1. Context Window Routing Error

Some Sonnet 4 requests were misrouted to servers configured for the upcoming 1M token context window. The bug started at 0.8% of requests on August 5. A routine load balancing change on August 29 unintentionally increased misrouted traffic. At peak, 16% of Sonnet 4 requests were affected. About 30% of Claude Code users had at least one message routed incorrectly during this period. Routing was "sticky," meaning once a request hit the wrong server, follow-ups likely did too.

Fix deployed: September 4
Rollout completed: September 16 (first-party + Vertex AI), September 18 (AWS Bedrock)

2. Output Corruption

A misconfiguration deployed to Claude API TPU servers on August 25 caused errors during token generation. A runtime performance optimization occasionally assigned high probability to tokens that should rarely appear — for example, Thai or Chinese characters appearing in English responses, or obvious syntax errors in code. This affected Opus 4.1 and Opus 4 (August 25-28) and Sonnet 4 (August 25–September 2). Third-party platforms were unaffected.

Fix: Rolled back on September 2. Detection tests for unexpected character outputs were added to the deployment process.

3. Approximate Top-K XLA:TPU Miscompilation

Code deployed August 25 to improve token selection inadvertently triggered a latent bug in the XLA:TPU compiler, confirmed to affect Haiku 3.5 and believed to potentially impact a subset of Sonnet 4 and Opus 3 requests. Third-party platforms were unaffected.

Fixes: Haiku 3.5 rolled back September 4; Opus 3 rolled back September 12; Sonnet 4 rolled back out of caution. Worked with the XLA:TPU team on a compiler fix and switched to exact top-k with enhanced precision.

Deep Dive: The XLA Compiler Bug

When Claude generates text, it calculates probabilities for next tokens and uses "top-p sampling" to avoid nonsensical outputs. On TPUs, models run across multiple chips, making probability sorting a distributed sort operation.

December 2024: A bug was discovered where the TPU implementation occasionally dropped the most probable token at temperature zero. A workaround was deployed.

Root cause: Mixed precision arithmetic. Models compute in bf16 (16-bit), but the TPU vector processor is fp32-native, so the XLA compiler can optimize by converting some operations to fp32. This caused a mismatch where operations disagreeing on the highest probability token, sometimes dropping it entirely from consideration.

August 26: A sampling code rewrite was deployed to fix precision issues and handle tokens near the top-p threshold. This fix removed the December workaround, believing the root cause was solved. However, this exposed a deeper bug in the approximate top-k operation — a performance optimization that quickly finds highest-probability tokens. The approximation sometimes returned completely wrong results, but only for certain batch sizes and model configurations. The December workaround had been inadvertently masking this problem.

The bug's behavior was inconsistent, changing depending on unrelated factors like surrounding operations or whether debugging tools were enabled.

Ultimately, Anthropic discovered that exact top-k no longer had the prohibitive performance penalty it once did, so they switched from approximate to exact top-k and standardized additional operations on fp32 precision. They note the corrected implementation "may result in slight differences in the inclusion of tokens near the top-p threshold."

Why Detection Was Difficult

Existing evaluations didn't capture the degradation users reported, partly because Claude often recovers well from isolated mistakes
Internal privacy controls limited engineers' ability to examine problematic interactions not reported as feedback
Each bug produced different symptoms on different platforms at different rates, creating confusing reports
They "relied too heavily on noisy evaluations" and lacked a way to connect online reports to specific changes
The August 29 load balancing change wasn't immediately connected to the negative report spike

What They're Changing

More sensitive evaluations that can better differentiate working from broken implementations
Quality evaluations in more places — running continuously on true production systems
Faster debugging tooling to better debug community-sourced feedback without sacrificing user privacy

Users can provide feedback via the /bug command in Claude Code, the "thumbs down" button in Claude apps, or by emailing feedback@anthropic.com.

A Postmortem of Three Recent Issues ​

Overview ​

How Claude Is Served ​

Timeline of Events ​

The Three Bugs ​

1. Context Window Routing Error ​

2. Output Corruption ​

3. Approximate Top-K XLA:TPU Miscompilation ​

Deep Dive: The XLA Compiler Bug ​

Why Detection Was Difficult ​

What They're Changing ​