2026-06-26 09:59 UTCIn-site rewrite3 min readUpdated: 2026-06-26 10:14 UTC

The AI Productivity Trap Is More Output

AI makes it cheap to generate code and documents, but this leads to more review and verification work, actually slowing down delivery. Studies show that using AI tools can increase task time by 19% due to larger diffs and more artifacts. The problem is that output increases but throughput and outcomes remain flat. Organizations need to measure review time, rework rate, etc., rather than just token counts.

SourceHacker News AIAuthor: vincent_s

Currently Available: Need a skilled Software Developer for your next project?

Hire Me Today

Categories

LLM Productivity Software Development

The AI Productivity Trap Is More Output

June 26, 2026 by ChatGPT

AI makes it dirt cheap to generate code, draft tickets, summarize meetings, and write proposals. But cheaper generation doesn't mean the work actually gets done any faster. More raw output means more review, more verification and more downstream cleanup, a trend that is inevitable. If overall time to get a correct pull request merge or sound engineering decision does not improve, then the engine is just producing more noise. The real speed of delivery doesn't change.

We have to differentiate output, throughput and outcomes. Output is the raw artifacts we generate. Throughput is the validated work that is successfully moving through the delivery system. An outcome is a correct decision or a safe production change. AI increases individual output, and often leaves throughput and outcomes pretty flat.

This gap is highlighted by a recent METR study on experienced open source developers. When they began to use the tools, developers predicted AI would cut their task times by almost a quarter. Even after the study they still felt a subjective sense of saving time. But the real-world measured results were the reverse: tasks took 19 percent longer when developers were allowed to use AI tools. Standard measures of output can be misleading, the researchers said. Generative tools tend to produce verbose but equivalent code, or to break down a task into more pieces without actually reducing total cognitive effort. So you get one bug report which becomes five tickets, three pull requests and a migration note. This results in larger diffs to review, more generated tests to run, and more artifacts to triage, increasing the chances of missing critical details.

What’s interesting about this research is that the developer experience was still very positive. AI is great at removing the friction of a blank page and it makes the work feel a lot more fluid. AI changes the human labor from creation to evaluation, and thus subjective speed and actual measured time differ. It is easier to check someone else's work than to write it yourself, even though it often takes longer to check. Downstream work of review, cleanup, and verification piles up and developers feel unblocked. AI is still super useful. Controlled experiments with tools like Copilot show that developers can do bounded programming tasks like writing boilerplate, generating API glue, or writing test scaffolding much faster.

The problem is that review and verification take more time, which further stalls the delivery cycle down the line. The annual DORA report notes that AI adoption increases individuals’ feelings of productivity, but AI adoption can become a bottleneck to software delivery stability and throughput. Faster code generation increases review queues and merging risks which drives teams into larger batch sizes. AI tends to amplify the system it enters. Most teams that have well-defined software ownership, rigorous review processes, and highly reliable deployment pipelines will benefit from the technology. But with weak incentives, fuzzy boundaries on production, and bad verification, AI just becomes an accelerant for low quality output.

Measurement is a big problem when organizations reward heavy AI token use as a proxy for productivity (e.g., lines of code or commit counts). These metrics are easy to game and have virtually nothing to do with real business value.

You see this pattern in telemetry from thousands of developers. A Faros analysis of more than 10,000 developers found that while high AI adoption correlated with more tasks completed and merged pull requests, it also led to much longer review times, larger pull requests, and more bugs per developer. They did not find a meaningful association between high AI adoption and firm-level improvements in delivery metrics or quality key performance indicators.

That is obvious. If a developer can code twice as fast but human review is still the bottleneck, the work just gets stuck in review. A large diff from AI generation greatly expands the search space for a reviewer to look at. Subtle mistakes in generated code that looks almost right require more expert attention than reviewing hand written code with a clear human design. If there is no strong ownership or robust verification, then the plausible output is review debt.

We see this outside of code too, in the form of AI generated workslop. This is slick stuff that reads well but doesn’t really move projects forward. It’s endless memos that don’t resolve anything, meeting notes that hide misalignment, and proposals that throw the onus of due diligence onto the readers instead of the author. This creates a real hidden cost for all who have to spend time reading and processing low substance output.

The actual work just gets pushed downstream to the people that have to verify the claims, reconcile the summaries, and determine if the proposal is actually feasible. Instead of counting artifacts we need to measure if the team takes less total time to get to a correct merge, a solid decision, or a shipped result. This needs better metrics like review time per accepted change, rework rate, change failure rate, decision latency and reviewer load. Gathering these measures is more difficult than gathering simple token counts or completed tasks, and as a result many organizations don’t gather them. Good delivery tracking needs to take into account accepted work, and the true cost of delivery.

AI is a powerful tool for lowering the barrier to drafts, scaffolding, and first runs. After that, the bottleneck is still review.

What I'm building

Delegate tasks. Get software.

Give Vroni a GitHub issue, bug report, spec, or rough idea. It reads the repo, plans the change, writes code, runs checks, and works toward a review-ready pull request.

Take a look at vroni.com