Image courtesy of aws.amazon.com
The Eight Fallacies of Distributed Systems, Revisited for 2025
Peter Deutsch's 1994 list of distributed systems fallacies remains relevant but incomplete. A new analysis adds three more fallacies specific to cloud-native and microservices architectures that systems engineers routinely encounter in modern distributed system design.
Distributed Systems Fallacies for the Cloud Era
Peter Deutsch's eight fallacies of distributed systems have been a staple of systems engineering curricula since the 1990s: the network is reliable, latency is zero, bandwidth is infinite, the network is secure, topology doesn't change, there is one administrator, transport cost is zero, the network is homogeneous.
These remain valid. Modern cloud-native and microservices architectures have added new failure patterns that Deutsch's list, formulated in an era of enterprise intranets, does not capture.
Fallacy 9: Service Contracts Are Stable
In a microservices architecture with independent team ownership, each service's API is a contract. The fallacy is assuming that the consuming team and the providing team share an understanding of what that contract means — that schema compatibility, semantic compatibility, and performance expectations are all captured and stable.
In practice: schema evolution breaks consumers. Performance characteristics change with load. Implicit ordering guarantees are violated when the providing service scales horizontally. The recovery is explicit contract testing (consumer-driven contracts via tools like Pact), API versioning policies, and SLA documentation that includes performance bounds, not just functional specifications.
Fallacy 10: Retry Is Free
The knee-jerk response to transient failures in distributed systems is retry. Retry logic is often added without analysis of what happens when many services retry simultaneously against a failing dependency. The result is retry storms that prevent the failing dependency from recovering — a pattern that converted a transient fault into an extended outage in several high-profile incidents.
The correct pattern: exponential backoff with jitter; circuit breakers that stop retrying after a failure threshold; and load shedding at the dependency boundary. These are systems-level design requirements, not implementation details.
Fallacy 11: Observability Is an Operational Concern
The fallacy is treating observability (metrics, logs, traces) as something to add during operations, or to bolt onto an existing system when diagnosing incidents. In practice, systems that were not designed for observability are opaque when they fail, and the missing information is precisely what would allow diagnosis.
Observability is a design requirement: what state does each component need to expose to allow diagnosis of failures? What does a healthy component look like in telemetry, and what deviations indicate what failure modes? Answering these questions at design time is dramatically more tractable than reverse-engineering them from a failing production system.