๐จ The Most Dangerous Line of Code in Distributed Systems
@Retryable(maxAttempts = 3)
public PaymentResponse charge(PaymentRequest request) {
return paymentClient.charge(request);
}Looks responsible. Feels resilient. Passes code review.
And yetโฆ
This single annotation has caused:
- Duplicate payments
- Inventory corruption
- Cascading outages
- Customer trust loss
Retries don't fix failures. They repeat them.
๐ง The False Promise of Retries
The assumption:
"If it failed once, it might work next time."
The reality:
- What if the operation already succeeded?
- What if the failure is permanent?
- What if everyone retries at the same time?
Retries amplify uncertainty.
๐ฅ Bug #1: The Duplicate Side Effect Problem
Spring Boot Example
@PostMapping("/pay")
public void pay(@RequestBody PaymentRequest request) {
paymentService.charge(request);
}The request times out. The client retries.
But the database already committed.
Now:
- Card charged twice
- Order duplicated
- Support ticket created
The retry didn't fail.
It worked twice.
๐ฃ Bug #2: Retrying Makes Outages Worse
@Retryable(
maxAttempts = 5,
backoff = @Backoff(delay = 100)
)
public Inventory reserve(Long productId) {
return inventoryClient.reserve(productId);
}When inventory is down:
- Every request retries
- Thread pools fill
- Connection pools exhaust
- Latency explodes
You didn't add resilience.
You added pressure.
โ ๏ธ Bug #3: Retries Hide the Root Cause
QA:
- Network is stable
- Retry succeeds
- Bug disappears
Production:
- Network flaps
- Retry storms
- System collapses
Retries mask failures until traffic exposes them.
๐งจ Bug #4: Retries Break Ordering Guarantees
retry(() -> eventPublisher.publish(event));Retries don't preserve:
- Event order
- Timing guarantees
- Business sequence
Now your system says:
"Order shipped before payment completed"
No exception thrown. Just broken logic.
๐ Why Retries Pass QA but Fail Production
QA:
- Low traffic
- Fast services
- No contention
Production:
- High concurrency
- Partial failures
- Slow downstreams
Retries multiply load exactly when systems are weakest.
โ What Smart Spring Boot Teams Do Instead
๐น 1. Make Operations Idempotent
@Transactional
public void charge(PaymentRequest request) {
if (paymentRepository.existsByRequestId(request.getId())) {
return;
}
processPayment(request);
}Now retries are safe.
Without idempotency, retries are dangerous.
๐น 2. Fail Fast, Don't Retry Blindly
@Bean
public RestTemplate restTemplate() {
return new RestTemplateBuilder()
.setConnectTimeout(Duration.ofSeconds(2))
.setReadTimeout(Duration.ofSeconds(2))
.build();
}Fast failure > slow chaos.
๐น 3. Use Circuit Breakers, Not Just Retries
@CircuitBreaker(name = "inventory")
public Inventory reserve(Long productId) {
return inventoryClient.reserve(productId);
}Circuit breakers:
- Protect your app
- Protect downstream services
- Protect users
Retries protect nothing by default.
๐ง The Hard Truth About Retries
Retries without idempotency are data corruption with good intentions.
๐ Final Thought (This Is the Lesson)
Retries feel like safety nets.
But in distributed systems:
- They increase uncertainty
- They amplify failure
- They hide design flaws
Retries should be:
- Rare
- Controlled
- Intentional
Never automatic.