2026-06-16站内改写7 min readUpdated: 2026-06-16

Ratchet: An AI Delivery Loop That Can Only Move Forward

Praveen Vijayan presents Ratchet, an AI delivery workflow designed to prevent backward slips. Built on GitHub, it constrains coding agents to move work only forward through a sequence of steps, with a human always approving the final merge. Ratchet is open-source, agent-agnostic, and focuses on auditability and safety.

SourceHacker News AIAuthor: praveenvijayan

Praveen Vijayan

Jun 14, 2026

I spent most of the last decade shipping software inside banks. So when I started letting coding agents into real repositories, I didn’t reach for more autonomy. I reached for a brake. This is the workflow I built, why I built it the way I did, and where it fits.

A ratchet turns one way and locks against the other. That’s the whole idea.

The thing nobody likes to admit

Gartner thinks more than 40% of agentic AI projects will be scrapped before 2027 is out. If you’ve watched a slick “autonomous developer” demo and then turned the same tool loose on a codebase you actually care about, that number won’t shock you.

The demos really are good. An agent reads a ticket, writes the code, runs the tests, opens a pull request, all on its own. Then you point it at your own repo — the one with the payment flows, the audit trail, the service nobody fully remembers writing — and it does one of three things instead.

It drifts. You asked for one change and it quietly “improves” four others while it’s in there.

It stalls. A test goes red, the agent gets confused, and the work just sits there with nobody’s name on it.

Or, worst of all, it ships. Something lands on the main branch that no human ever properly read.

I’ve sat in enough release post-mortems to know that the third one is the career-ending one. In a bank, “an AI merged it and no one looked” is not a sentence you ever want to say out loud.

The industry’s reflex is to fix all this by making the agent smarter: a better planner, a bigger sandbox, more freedom to act. I went the other way. I decided the agent should be allowed to do less — and that the system around it should make going backwards physically impossible.

I called it Ratchet.

Why that name

A ratchet is the little toothed mechanism inside a socket wrench. It turns freely one way and locks hard against the other. You can tighten; you can’t accidentally loosen.

That’s the exact property I wanted. Work should only ever move toward shipped. And when something goes wrong — a test fails, a reviewer pushes back — the work doesn’t roll the whole machine backwards or disappear into limbo. It drops back into the queue, gets picked up, and gets pushed forward again.

Tighten, never loosen. Once you’ve felt a delivery process that can silently slip backwards, the appeal of one that can’t is hard to overstate.

How it works, without the jargon

Picture a conveyor belt with seven steps. Work travels along it, and every step can only hand the work forward. The agent runs the middle of the belt. GitHub handles the ends. A human owns the one gate that matters.

The agent picks the highest-priority unblocked task in the queue — and rework always jumps ahead of new work, because finishing beats starting.

It claims the task by creating its own branch off the latest code. Creating that branch is the claim, so two agents can never grab the same thing at once. No locks, no scheduler, no coordination service — just a branch name.

It builds, in small commits, sticking to the patterns already in the repo. If the task turns out to be bigger than it looked, it doesn’t barrel on. It stops, suggests a way to split it, and puts the pieces back in the queue.

It verifies by running your project’s own checks — formatting, types, tests, build. Two attempts to get them green. If it’s still red after that, the work goes back to the queue with a note saying why. Nothing broken ever leaves the bench, and red work never even reaches your CI.

Then it hands off: it opens a pull request and stops. That’s the end of the road for the agent.

If you ask for changes, it reworks the same pull request — re-runs the checks, answers your comments, points you at the commits that fixed each one. It never opens a second PR to dodge the conversation.

And when you merge, the system closes the loop: GitHub marks the task done, anything that was waiting on it becomes available, and any task an agent abandoned mid-flight gets swept back to the queue. The belt refills itself.

Underneath all of that, each step is just a plain GitHub label you can read at a glance — ready, in-progress, in-review, changes-requested, blocked. Nothing is hidden in a database somewhere. If you can open your own repo, you can see exactly where every piece of work stands.

There’s really only one rule that holds the whole thing together:

The merge is the human gate. No agent ever merges, approves, or closes anything. The pull request is the last thing it touches.

That single constraint is what lets me run this on a codebase that matters. The machine does all the dull forward motion — picking, claiming, coding, checking, opening the PR — but the decision to ship stays with a person turning the key.

What’s actually left for people to do

If the agents do the building, what’s the team for? Two things, and only two.

You write the plans. A plan is a short markdown file saying what needs to happen and why. Good plans in, good work out — and writing a sharp plan turns out to be a far better use of a senior engineer’s time than hovering over an agent’s keystrokes. This is where the real thinking lives.

And you review the pull requests. That’s the gate, and it’s where taste, experience, and accountability sit. Someone reads what the agent produced and decides whether it ships.

Everything in between — the branching, the typing, the test runs, the PR boilerplate — happens with nobody in the chair. That’s the trade I was after: keep the judgement, hand off the toil.

How it’s different from Devin and GitHub Copilot

Two names come up whenever I describe this, so let me be straight about where Ratchet sits next to them.

Devin, from Cognition, is the poster child for full autonomy — a sandboxed AI engineer that plans, codes, tests, and carries a task end to end inside its own cloud, priced for enterprises. It’s a genuinely impressive piece of engineering. But you’re trusting a long-running agent inside a box you can’t really see into, on its environment, running its loop.

GitHub’s Copilot coding agent is excellent if you live inside GitHub and Microsoft already. That’s also the catch: you’re buying a vendor’s surface and moving at a vendor’s pace.

Ratchet is a different kind of thing on purpose, and the differences are the whole point.

It’s a protocol, not a product. There’s no orchestrator, no database, no service to babysit. The entire workflow lives in things you already have — issues, branches, labels, Actions, pull requests. Read your repo and you’ve read the whole machine.

It doesn’t care which agent you use. Devin locks you to Devin; Copilot locks you to Copilot. Ratchet runs on anything that can use a command line and read a markdown file — Claude Code, Codex, Antigravity, or whatever’s winning a year from now. You’re not betting your delivery process on one company staying ahead.

It’s auditable because of how it’s built, not because someone bolted on a log. Every move is a real GitHub event with a name and a timestamp already attached. If you’ve ever had to explain a production change to a risk or compliance team, you know that’s not a nice-to-have — it’s the line between a tool you can use and one that never gets approved.

And it keeps the human at the gate deliberately. The fashionable pitch is “let the agent merge its own code.” Mine is the opposite: the agent earns trust by doing everything up to the gate cleanly, and a person always turns the key.

The shortest way I can put it: Devin and Copilot try to hand you a smarter worker. Ratchet hands you a safer factory floor — one that works with whatever workers you walk in with.

It gets better over time — and you can read why

There’s one quiet part of Ratchet I’m fonder of than any single step in the loop, and it follows the same rule as everything else: nothing gets written behind your back.

Ratchet keeps a small, curated memory of the project — the decisions, the gotchas, the “we always do it this way” conventions. It lives in two plain files in the repo. One belongs to the humans; the agent only ever reads it. The other is a distilled cache the agent is allowed to suggest edits to — but only ever inside a pull request. So when an agent thinks it’s learned something, that lesson gets reviewed exactly like code before it becomes part of the team’s shared memory.

The full record — every closed task, every merged PR, every line of history — stays right where it already is, in GitHub, and the agent searches it when it needs to. The curated memory stays deliberately tiny and just points back at the real source instead of trying to hoard it.

The result is the one you actually want. The project gets easier to work on. The agents stop repeating the mistakes your reviewers already corrected once. And you never have to trust a memory you can’t read — there’s no vector database, no hand-wave about how “the model just knows now.” If you want to know what the system has learned, you open a file.

Trying it

Ratchet is open source under MIT and ships as a GitHub template, so picking it up is closer to copying a folder than installing a platform. The whole working vocabulary is a handful of commands you run from your agent’s prompt:

/ratchet-init — run once per repo. Sets up the labels, works out your test and build commands, and gets the workspace ready.

/ratchet-plan — turn an idea you’ve been talking through into plan files, one per task, then stop so you can review. Nothing becomes a real task until you commit.

/ratchet-report — the disciplined “I found something” door. Spotted a bug or an improvement? It writes a proper task and stops. It will not quietly fix the thing, even when the fix is obvious. Finding work and doing work stay separate, on purpose.

/ratchet-sync — compile your plan files into live tasks right now, instead of waiting for the next push.

/ratchet-next — the heart of it. After you merge, it moves to the next task; after you request changes, it reworks the current one.

/ratchet-memory and /ratchet-update — housekeeping: tidy the project memory, and pull framework updates without ever touching your own files.

Two things I’d flag for anyone making the call, because they’re where the trust actually lives.

It runs locally, on your engineer’s own machine, using the GitHub login they already have. There’s no new service to stand up and no extra API key just to kick the tyres.

And by default it only notifies — your engineer decides when to advance. You can opt into having it move on automatically after each of your review decisions, but even then the gate doesn’t move. The merge is still yours. All you’re changing is how fast the queue refills behind it.

If you want a fair test, it’s a small one: spin up a throwaway repo, run /ratchet-init, write one plan file (or just /ratchet-report something you want fixed), and watch a single task become a branch, a set of green checks, and a pull request waiting for you. Like what lands at the gate, and you widen it. Don’t, and you’ve spent an afternoon — not a quarter.

One honest limitation

I built and tested Ratchet as a single-developer, single-machine system, and I’d rather tell you that plainly than let you find out the hard way. The half that lives in GitHub — the queue, the states, the labels, the pull requests, the branch protection — is genuinely multi-user, and the branch-as-claim trick even doubles as real concurrency control: two people’s agents can’t grab the same task, because the second branch creation just fails.

The local half is the part built for one operator. Today, every developer runs their own watcher, which reacts to all of the repo’s pull-request events rather than just their own — so two people running it will see their agents cross-fire and race for the next task. The automation token is a single personal one, where a team really wants a dedicated bot account. And once several people work in parallel, you reintroduce the ordinary

[truncated for AI cost control]