2026-06-19站内改写6 min readUpdated: 2026-06-19

AI Flow Dynamics – The Loops Don't Get Faster on Their Own

AI makes writing code cheap and fast, but three feedback loops—review, market measurement, and customer absorption—are required to turn code into value. Customer absorption, in particular, does not speed up. The article discusses solutions like automating checks, separating CI from CD, and using simulations to manage bottlenecks.

SourceHacker News AIAuthor: flail

Jun 18, 2026

TL;DR

AI made writing code cheap and fast. Three feedback loops turn code into value — review, market measurement, customer absorption. Especially customer absorption does not get faster.

We spent fifteen years building CI/CD to be able to ship a single line code change cheap and fast. Now we pump thousand-line, zero-cost changes through the same pipeline. The constraint moved from writing to reviewing, measuring, and being absorbed, but the pipeline is mostly unchanged.

Small batches only ever helped wherever a loop was closed. CI/CD closed the technical loops; we rarely bothered with the market loop.

The work is flow discipline, not a limit: build everything, but level the input to each loop (set up a stop light system), build many options and ship few (option storming), and limit customer-facing change to what you can measure and what customers can absorb.

Mature teams measure the output into the loops that validate it. The rest ship because they can. Your choice.

AI Flow Science: Three loops, not all speed up

Setup

Gladly, now the code arrives faster than I can read it. An agent writes in twenty minutes what used to take me a week. The eternal busy beaver, it gives me four thousand lines, and waits for me to approve them. 5 Features in 6 variants if I want: I can judge when it’s done. Don’t have to overthink if I even get started. Cool. We spent fifteen years making code creation cheaper. And getting the code done, was also what set the pace for everything before and after code creation.

The premise we built on for the last 20 years

Software is malleable in a way hardware is not. You can change it after it ships, cheaply, again and again. That property is the foundation under everything the product discipline learned and executed in the last twenty years.

As a consequence, in the physical part, we cut the metal, start the CNC, push the button on the expensive production line, after a correct design, mostly based on actual customer orders (i.e. the product OS already validated). Hardware most times has a solid economic, validated model, before the production is started. Software, though, is mostly built on assumptions and ideas - about what users need, about what will work, about what will pay for itself. And those assumptions are not known to be true when the work starts. They have to be checked. (Except in bespoke software, where actually the deal is to deliver after the correct order.) Difficult enough, but a different game. The rest of the software world is a different beast. A whole huge org from Marketing and Sales over R&D to Product Development trying to figure out what the strange animal, the customer, actually needs.

The check happens at two moments. First, before building. Code is expensive, in money and time, so it is worth asking how true an assumption is before betting on it. And, secondly, after release, to find out whether the thing we shipped really creates value for anyone, pays the bill, gets adopted, changes behavior, etc. Each check is a loop: do something, watch what happens, adjust, go again.

That’s what Eric Ries coded into Lean Startup’s “Build, Measure, Learn”, lean always more elegantly (and with more far reaching consequences) called the OODA loop. You get it.

The central lesson of that era was that smaller batches make all of this better - cleaner code, faster correction, better, more precise market signal. Each small batch ideally just one change, so when something changes you know why it had to be changed. And if it actually did (was there outcome to the output?). Despite all that knowledge, we mostly ignored the core requirement: even a tiny, small batch only helps if the loop around it exists and is closed. A small change without the measurement loop produces no learning.

While the two inner loops often got closed, even if only to monitor what these guys in IT do (the most probable reason why is much effort is spent on these loops), the third one was mostly ignored. It would also fall back on the higher deciders.

CI/CD closed the technical loops. The inner loop is the developer’s own cycle - edit, run, see the result - in seconds. The outer loop is the integration cycle - integrate, test, review, release - in hours. Continuous Integration automated the testing; Continuous Delivery automated the release. Running the loops became cheap enough that a single-line change was worth shipping on its own. Small batches paid off, and feedback on correctness came back fast.

Hold the thought: We spent the last ten to fifteen years, optimising for the smallest code change to be “free”, low cost, low impact. Lower transaction cost for small releases, so we can release the smallest changes.

The market loop, though, stayed open in most orgs. It is the slowest one - release, measure how customers behave, learn, adjust - over weeks and quarters. All lagging indicators. But required to replace the crystal ball. Some teams approached this with the occasional A/B test and / or a product-analytics tool like Pendo, and little more. Closing that loop is actually, genuinely hard. It takes tech, grit, patience and a lot of late gratification psychology vs the instant dopamine hit. So while we got very good at shipping small correct batches quickly, most of us remained really bad at knowing if and which were actually worth shipping.

What free code actually changed

Generated code is fast and nearly free. The step that used to be slow and expensive - writing the change - has collapsed and the little agent genies do it for us, remote controlled from the iPhone on the porch. The patience is waiting for Claude to come back from its work.

What we ignored is that this rises the batch size of production. An agent produces a large, coherent change in the time a person once spent on a small one. Production no longer paces itself to how fast a human can type, and that human pace had set the rhythm of the whole system for decades.

We spent ten or fifteen years building this pipeline for one purpose: to make a single small change cheap to ship. AI now pushes the same pipeline changes of a thousand lines at once, at no cost, and they run straight through the machine we built for the opposite problem. The infrastructure that was tuned for the smallest possible batch is now being hammered with the largest batch produced over night for free, and nothing in it complains: the checks go green, the deploys fire. The only thing that changed is the size of what flows through, and that stays invisible until it reaches a human. The human might suffer under the PR load as long as the infra job of adding a gazillion automated tests to that step is not done. At least to forces us to define what a PR actually is. But that’s the easy part, sorry to say.

So production got faster and the loops did not. Speed up one station and leave the rest alone, and the work piles up in front of the next one, which Goldratt described forty years ago. Speeding up one station relocates the constraint to the next one, and here it relocates to three places: review, measurement, market absorption.

Thanks for reading The Intentful Company! This post is public so feel free to share it.

Breakpoint 1: review

Changes now reach human review faster than humans can read them, and a person reviews at a fixed rate. And it’s supposed to be that way. Frontier AI augmented coding as per the Shapiro scale sees radical “productivity” increase from level three, which means “I don’t review my code at the line level”. Now, keeping the batches small and the number of reviews climbs until review, not production, caps the throughput - you have relocated the jam to the next stage: review. Making the batches big and each one affecting more code than a person can actually examine, means human review stops working and defects get through to production. Somewhere between those is a batch size that keeps the queue stable and review honest. It is larger than the small-batch optimum we are used to, and it is still bounded.

The way to raise that boundary, is the same thing that CI/CD made on deployment: automate the checking. Types, contracts, property-based tests, generated test cases - every tiny machine-checkable aspect that can be automated, takes the load off the human. Then the human can spends time checking intent - does the feature what it’s supposed to do, what we designed it to do, does the assumption hold at least locally - while the checker agents confirm correctness the lines in a gazillion of aspects. Building that infrastructure layer is the current post-AI version of the investment we made in deployment pipelines a decade ago. In parts, the old investment pays off, but we now add test infra on the micro / code level that was guaranteed by the human until now.

The second step is to stop fusing CI and CD. We always conflated them too often. When integration and release are the same thing, a change goes live the moment it is built. That was the art and we were actually proud of. Read what I mean by option storming later and you’ll see how dependent we are now on breaking the connection: There is judgement / filter after the CI step required. We can simply build too much and the customer has only a small max tolerance of change. Pull them apart and a change can be built, run, and inspected without reaching a single customer. That built-but-unreleased state is the check and decision point for everything downstream, and most of what follows depends on that filter.

Breakpoint 2: measurement

To say that a release caused a change in customer behavior, you need enough signal to separate it from noise. The sample you need grows with the inverse square of the effect you are trying to see. On a 10% baseline, detecting a 2% relative change takes roughly 350,000 observations per variant at ordinary confidence. Ron Kohavi did the work to show how few real experiments have any impact; most are underpowered and mislead, based on misleading assumptions, guesses.

How much traffic do you and what is the the number of changes you can measure with clear attribution in a quarter? The number is fixed, and often surprisingly small. Free code has no influence on that number. But the the number of changes you send to the customer rises - linear to your speedup factor. The gap between what you ship and what you can measure explodes. Everything you produce beyond what you can measure reaches customers with no way to attribute and measure the effect. So: zero learning. And now we’re back to judgement, opinion, taste, the very thing we wanted to get rid of for the last ten years. Now it’s used as the last line of defence against the AI. I doubt it. It gets decided by taste, or it ships and is never really evaluated.

You can test a built option against a model of the market before it ships - a simulation, a proxy metric, a panel reacting to the real artifact - and spend no live traffic doing it. That extends how much you can evaluate. So you save your handful of real experiments for the genuinely new bets, the ones a model of the market cannot predict.

Breakpoint 3: market absorption

The outer market loop, is the toughest one to handle. The hardest bottleneck. It’s beyond statistics. Metrics are coming in late and fuzzy, attribution is hard to find, little signal in all the noise. Customers accept change only up to a limit and that limit is also defined to the radically of change. Little changes with little impact are accepted easier and faster, are easier to measure. Bigger changes, the opposite. Hence a bias for smaller changes: easier to manage, easier to measure, we often do what’s easier to measure. Fundamental product changes can take years to show effect and until they can be reliably measured. You know that from your own experience when the CFO asks why the feature only shows revenue next year when it released this summer.

Take eBay as an example. The company ha

[truncated for AI cost control]