Modernizing Azure's Consensus Engine: Building a 300K Ops/Sec Multi-Paxos Implementation in Rust with AI Agents
Systems engineer Cheng Huang has rewritten Azure's decade-old Replicated State Library (RSL) in Rust, producing a high-performance multi-Paxos consensus engine that scales from 23K to 300K ops/sec. By combining AI-driven code contracts, property-based testing, and lightweight spec-driven development, the project demonstrates how modern AI workflows can safely accelerate low-level distributed systems engineering.
Modernizing Azure's Legacy Consensus Engine
Azure's Replicated State Library (RSL) implements the multi-Paxos consensus protocol to underpin critical replication infrastructure across cloud services. However, the original codebase lacks modern optimization vectors, specifically pipelining, non-volatile memory (NVM) support, and RDMA awareness. Without pipelining, requests stall during in-flight votes, raising latency under load. To resolve these limitations, systems engineer Cheng Huang modernized the library from scratch, developing a high-throughput, Rust-based multi-Paxos consensus engine. Based on a simplified design by Jay Lorch of Microsoft Research, the new implementation currently addresses two of the three key RSL architectural gaps.
High-Velocity Code Generation and Tooling Workflows
The project yielded over 130,000 lines of Rust code in roughly six weeks, covering core distributed state machine requirements including multi-Paxos, leader election, log replication, snapshotting, and dynamic configuration changes. Huang completed the core implementation in four weeks and allocated an additional three weeks to performance tuning. The tooling environment integrated GitHub Copilot, Claude Code, Codex, Augment Code, Kiro, and Trae, with Claude Code and Codex CLI serving as the primary drivers. To bypass LLM rate limits during high-intensity coding sessions, the workflow utilized an Anthropic maximum tier subscription paired with two separate ChatGPT Plus subscriptions split across alternating halves of the week.
Rigorous Correctness via AI-Generated Code Contracts
Ensuring distributed consensus safety required a multi-layered verification strategy comprising over 1,300 tests. This suite spans basic unit testing, isolated proposer-acceptor integration tests, and multi-replica cluster tests with simulated network and node failures. The primary mechanism for ensuring protocol safety was the integration of code contracts, which specify preconditions, postconditions, and invariants. Written by LLMs (with Opus 4.1 and GPT-5 High executing the generation), these contracts compile to runtime assertions during testing and are stripped from production builds.
The contract pipeline utilizes three distinct phases: first, generating explicit contracts for critical functions, such as the 16 contracts written for the Paxos phase 2a message-handling method (`process_2a`); second, generating targeted test cases to validate each postcondition; and third, translating contracts into property-based tests. This randomized testing approach identified a critical Paxos safety violation prior to deployment, preventing potential replication state divergence.
Lightweight Spec-Driven Development
Initial development phases utilized a rigid Spec-Driven Development (SDD) model, transitioning from a markdown requirement specification to a formal design document and subsequent task lists. Maintainability challenges and documentation drift led to the adoption of a lightweight approach. Using the `/specify` command from spec kit, the workflow generates targeted markdown specifications consisting of concise user stories and acceptance criteria. Huang refined these stories through the `/clarify` prompt, instructing the LLM to self-critique, flag omissions, and suggest edge cases. Individual user stories serve as the discrete boundary for agent execution, facilitating manageable context windows and cleaner incremental logic.
Eliminating Bottlenecks: Tuning from 23K to 300K Ops/Sec
After securing functional correctness, the engineering loop focused entirely on throughput optimization. Over a three-week optimization phase, throughput rose from 23,000 operations per second to 300,000 operations per second on a single laptop. The performance loop relied on telemetry and analysis:
- Instrument latency metrics across all execution paths.
- Execute high-throughput benchmarks and capture raw trace logs.
- Employ AI agents to parse logs using custom Python scripts to calculate latency quantiles and detect bottlenecks.
- Implement targeted optimizations and re-measure performance.
This cycle isolated and eliminated lock contention across async paths, removed redundant memory copies, and trimmed unnecessary task spawns. Leveraging Rust's safety guarantees allowed for aggressive optimizations—including zero-copy mechanics and the elimination of async overhead—without introducing memory safety vulnerabilities or race conditions.
Future Paradigms for Autonomous Engineering Agents
Experience with current-generation coding tools highlights clear avenues for next-generation systems engineering workflows. The primary bottleneck remains the manual orchestration required to prompt agents, manage context, and direct test coverage expansion. An ideal future state includes fully autonomous user story execution where agents direct their own cycles, alongside automated contract workflows that can independently author, test, and debug code invariants. Furthermore, applying autonomous optimization frameworks directly to large-scale, end-to-end performance testing environments could automate the identification and resolution of micro-architectural bottlenecks.