Amdahl's law for AI agents
This article applies Amdahl's Law to AI agents, arguing that speedup from parallel agents is bounded by the fraction of workflow requiring human judgment (H). It introduces the concept of 'self-liquidating H' where each human intervention produces an artifact that eliminates future similar interventions. Emphasizes 'configurancy'—explicit behavioral commitments and conformance suites—to encode human knowledge so agents can operate autonomously. Examples from ElectricSQL, Gas Town, and Ralph Loop illustrate the principles.
Article intelligence
Key points
- Speedup from AI agents is limited by the human judgment fraction H; reducing H is key.
- Self-liquidating H: each human intervention should produce a reusable artifact (test, spec update) to prevent recurrence.
- Investing in configurancy (specs, conformance suites, documented decisions) yields higher ROI than improving agent capabilities.
- Real-world examples: ElectricSQL's protocol change with conformance suite, Ralph Loop's iterative artifact accumulation.
Why it matters
This matters because speedup from AI agents is limited by the human judgment fraction H; reducing H is key.
Technical impact
May affect model selection, inference cost, product capability, and evaluation benchmarks.
Are you an LLM? You can read better optimized documentation at /blog/2026/02/19/amdahls-law-for-ai-agents.md for this page in Markdown format
Multi-agent systems are delivering real results. Geoffrey Huntley's Ralph Loop runs autonomous coding agents in a while-loop until every PRD item is complete. Steve Yegge's Gas Town orchestrates 20–30 concurrent agents across seven specialized roles. Cursor ships an 8-agent parallel system. The throughput gains are real — and the teams getting the most out of them share a pattern.
But the gains aren't automatic. Flask creator Armin Ronacher told The Pragmatic Engineer: "I sometimes kick off parallel agents, but not as much as I used to do. The thing is: it's only so much my mind can review!" The teams getting the most value share a pattern — and a law from 1967 explains exactly where the leverage is.
The Original Insight
In 1967, Gene Amdahl made a simple observation about parallel computing: the speedup of a program is limited by the fraction that must run sequentially.
His formula: speedup = 1 / (S + (1-S)/N)
Where S is the serial fraction and N is the number of parallel processors. The key insight isn't in the formula — it's in the limit. As N approaches infinity, speedup converges to 1/S. If 10% of your work is inherently serial, you will never exceed 10x speedup. Not with 100 cores. Not with a million.
This wasn't an argument against parallelism — it was an engineering guide for where to focus. The same logic applies to agents.
The Agent Version
The equivalent law for AI agents:
The maximum speedup from AI agents is bounded by 1/H, where H is the fraction of the workflow that requires human judgment.
H includes every moment the system blocks on a human: clarifying intent ("Did you mean the admin dashboard or the user-facing one?"), making judgment calls ("Should we refund this customer or offer credit?"), review cycles ("This isn't quite what I wanted, try again"), approval gates, resolving ambiguity ("The policy says 'handle escalations appropriately' — what does that mean?"), and applying taste ("This is technically correct but feels wrong").
If H = 40% of your total workflow time, no improvement in agent capability can ever get you past 2.5x speedup. At 50%, the ceiling is 2x. Even an optimistic 20% only gets you to 5x.
Human fraction (H)Max possible speedup
50%2x
30%3.3x
20%5x
10%10x
5%20x
H dominates the speedup equation, not agent capability. But unlike Amdahl's original serial fraction S — a fixed property of the algorithm — H isn't static. Better models do shrink some components of H. An agent that needs less clarification and makes fewer errors directly reduces human time.
The catch: the components that models reduce aren't the ones that dominate at scale. Model improvements shrink clarification and verification. Taste and novel decisions — the components that become dominant once you've automated the mechanical parts — are largely irreducible by better models. These ceilings are real at any given moment.
The question is how fast H is declining — and what you do with the gains.
Self-Liquidating H
The highest-leverage investment isn't making agents faster or smarter. It's changing the character of H. And the tools for doing that aren't new — tests, specs, automation, documented decisions. These are engineering best practices that have been around for decades. What's new is the return on investment. When agents can act autonomously on every encoded decision, the payoff for writing that test or updating that spec is dramatically higher than it used to be.
The goal isn't to minimize human involvement. It's to make it self-liquidating: every human intervention should produce the artifact — the test, the spec update, the documented decision — that makes the same type of intervention unnecessary next time. (The term comes from finance: a self-liquidating loan generates the revenue to pay itself off. A self-liquidating intervention generates the artifact that eliminates its own recurrence.)
A team where 40% of workflow time is human but it's all taste and strategy is in a fundamentally different position than a team where 40% is human but it's all "did you mean X or Y?" and "let me re-check this output." Self-liquidating practices convert the second type into the first.
This requires what I've been calling configurancy — the smallest set of explicit behavioral commitments (and rationales) that allow a bounded agent to safely modify the system without rediscovering invariants. Specs, conformance suites (automated test suites that verify behavior against a spec), documented rationale. Every implicit assumption in your system is a future human-blocking event.
A conformance suite that makes human review unnecessary is crystallized cognition — human judgment about correctness, encoded at the moment it was made so agents don't have to rediscover it. The AGENTS.md file that prevents agents from stumbling into known gotchas is the same thing. Every good piece of scaffolding is human judgment captured as a durable, machine-readable artifact.
Which human involvement should you target? The test is concrete: "Is this intervention encodable?" When a human catches a bug, can that catch become a test case? When a human clarifies an ambiguity, can that clarification update the spec? When a human makes a taste call, can that call become a documented precedent? If agents keep requiring the same type of human intervention, your configurancy is incomplete.
Verification is highly encodable — catches become test cases. Clarification is highly encodable — resolutions become spec updates. Specification is partially encodable — patterns become reusable templates. Taste and novel decisions are the least encodable — which is fine, because they're the components where human judgment genuinely creates value. The system naturally converges toward a state where humans do only the work that can't be encoded, because everything encodable has been.
But capturing alone isn't enough. Naively appending every intervention creates its own problem — an AGENTS.md that's 400 lines of contradictory gotchas nobody reads, a test suite of overlapping cases encoding conflicting assumptions. Raw accumulation creates noise, not knowledge.
The real pattern is accumulate, then compress: individual judgments pile up, then periodically get integrated into coherent higher-level artifacts. Common law accumulates case decisions, then synthesizes them into principles and statutes. Science accumulates papers, then compresses them into review articles and textbooks. In practice: capture every intervention as a local artifact (a test case, an AGENTS.md entry, a decision record), then periodically integrate those artifacts into updated specs, refactored test suites, and revised skill definitions. Compression is where you ask whether the governing variables themselves are still right — whether 40 test cases encoding 40 variations of an assumption reveal that the assumption is wrong. Teams that only accumulate end up with drift and contradiction. Teams that only try to compress end up over-engineering upfront. The cycle needs both.
Two levers make the self-liquidation cycle practical:
Capture the signal. When a human intervenes — catching a bug, clarifying a spec, making a taste call — the system should generate the artifact that encodes that intervention. A review that catches a bug but doesn't update the test suite is wasted signal. A clarification that doesn't update the spec will recur.
Maintain high configurancy so agents operate autonomously everywhere else. When the system's knowledge is explicit — specs, invariants, conformance suites, documented rationale — agents don't block on humans for things that could have been encoded upfront.
Agent scaffolding is the serial code optimization of the AI era. It concentrates human time where it has maximum leverage and captures the signal from every intervention as durable knowledge the system can reuse.
What Scaffolding Looks Like
At ElectricSQL, an agent recently propagated a protocol change through 67 files — the spec, two server implementations, 10 client libraries across 10 languages — in 20–30 minutes. No human reviewed 67 files. The conformance suite is the review. Without it, that's hours of careful manual verification across 10 languages. With it, I spent a few minutes reviewing the PR. The human designed the protocol change; the suite automated everything downstream. And each new decision about correctness becomes another test case, making the next change even more autonomous.
Emil Stenström built a complete HTML5 parser with agents by hooking in the html5lib-tests conformance suite from the start. Then Simon Willison ported it to JavaScript in 4.5 hours by pointing a different agent at the same suite. The conformance suite made human review unnecessary because the spec was already encoded as executable verification.
The model handles the task; the configurancy handles the trust. Together they compound.
Teams that treat agent deployment as "pick a model and write some prompts" plateau quickly. They haven't touched H. The teams seeing real speedup have invested heavily in the scaffolding layer — often spending more engineering effort on configurancy than on the agent integration itself.
Scaling Past the Bottleneck
As agents get faster, H feels like it's growing. When an agent takes 2 hours to research a topic and you spend 30 minutes reviewing its output, that 30 minutes is background noise. When the agent takes 30 seconds and you still spend 30 minutes reviewing, suddenly you are the bottleneck. The absolute time hasn't changed, but the relative weight has shifted dramatically. You're never waiting anymore — which means you're always the one being waited on.
And then you add more agents, and it gets worse.
Amdahl's Law tells you there's a ceiling. Donald Reinertsen's The Principles of Product Development Flow tells you where the engineering problem is: naively adding parallel agents can degrade performance — but the fix is tractable.
Reinertsen applied queueing theory to product development and showed that capacity utilization increases queue size exponentially. At 50% utilization, the queue is manageable. At 80%, it's 4x larger. At 90%, 9x. At 95%, 19x. The human reviewing agent output is a single server in a queue. Five parallel agents quintuple the arrival rate, driving utilization toward 100% and queue times toward infinity.
Gas Town is the most vivid illustration. Yegge describes "palpable stress" as 20–30 agents run simultaneously at speeds too fast to comprehend. Early users describe their role as "keep your Tamagotchi alive" and note that "your management span of control is directly correlated to your attention span and memory." One user went from 5 PRs in 3 hours to 36 PRs in 4 hours — but at $100/hour in Claude tokens and with intense, unbroken cognitive engagement. The throughput is real, but so is the human queue saturation.
Reinertsen's prescription: manage queue size directly — work-in-progress limits, smaller batch sizes, faster feedback loops — rather than maximizing utilization. The agent equivalent: the way to scale parallel agents is to eliminate the friction that causes them to block on humans in the first place — invest in configurancy that lets agents verify their own work, so that when they do need a human, it's for judgment that actually matters.
The Ralph Loop gets this right. It works not because it runs agents in parallel but because it's a self-liquidation engine: a well-defined PRD as the spec, automated test verification as the acceptance criteria, and AGENTS.md files that accumulate discovered patterns across iterations. Each iteration where a human corrects an agent encodes that correction as an artifact the next iteration can consume. The same pattern shows up at every scale. shadcn describes running a /done
[truncated for AI cost control]