InfrastructureSource: githubstatus.comMay 27, 2026

Internal Analytics Overload Triggers CPU Saturation and Cascading Failures Across GitHub Git Infrastructure

On May 27, 2026, a 69-minute service degradation on GitHub impacted Git operations, Pull Requests, Issues, and GraphQL APIs. The incident stemmed from an internal analytics component that saturated host CPUs, causing elevated error rates for write operations on underlying Git file servers.

Incident Timeline and System Impact

On May 27, 2026, between 12:07 UTC and 13:16 UTC, GitHub experienced a service degradation affecting core components including Git operations, Pull Requests, Issues, and the GraphQL API. The outage degraded operations dependent on GitHub's underlying Git file servers, leading to elevated transaction failure rates.

While read-heavy operations survived the degradation—with zero reported failures for fetches or clones—write operations experienced measurable transaction loss. Specifically, 3.5% of HTTPS pushes failed during the incident window, while SSH-based pushes experienced a lower failure rate of 0.2%.

Root Cause Analysis and Cascading Failures

The degradation originated within an internal analytics subsystem. This component generated an unexpectedly high processing load, resulting in severe CPU saturation across the underlying compute infrastructure supporting GitHub's Git file servers.

Because Git operations depend directly on these file servers, the CPU starvation triggered cascading latency and timeouts. This degradation propagated upward through the service stack, impacting high-level application features such as Pull Requests, Issues, and API requests that query repository state. GitHub's engineering team detected the degradation and initiated an investigation at 12:10 UTC, confirming continued degradation at 12:54 UTC as they traced the load.

Mitigation and Remediation Engineering

Systems engineers mitigated the issue by identifying and stopping the offending internal analytics component. Following the termination of this process, CPU utilization normalized, and dependent services began recovery. Full service restoration was validated at 13:16 UTC.

Implementing strict resource limits to bound the maximum CPU and memory consumption of auxiliary and non-critical services.
Integrating dedicated kill switches to quickly isolate and disable analytical workloads without affecting production traffic paths.

Read the original article at githubstatus.com.