AI News HubLIVE
站内改写6 min read

Preparing Specs for AI Coding Agents

As AI coding agents now edit repositories, run commands, and produce branches, writing a specification before work becomes critical. This article outlines what makes a good spec: context, behavior change, constraints, examples, and validation evidence. Specs serve as an assignment layer that separates human intent from machine execution, improving team collaboration and code review. Unlike private prompts, specs provide durable memory and make changes inspectable.

SourceHacker News AIAuthor: pando85

Back to blog

AI coding agents now edit repositories, run commands, and produce branches. That makes the spec before the work more important: it carries the context, boundaries, and success criteria the agent needs.

What a good coding-agent spec includes

Specs are becoming more important because AI coding agents are no longer only answering questions. They are reading repositories, editing files, running commands, producing branches, and asking humans to review the result. That changes what a prompt needs to become.

When an assistant only answers a question, a private prompt can be enough. When an agent changes a shared codebase, the prompt becomes an assignment. And an assignment needs more than good wording. It needs the right context, boundaries, examples, and a way to judge whether the work matched the original intent.

That is the practical reason to prepare a spec before sending a coding agent into a repository. The spec does not need to be long. It does need to tell the agent what problem it is solving, what behavior should change, what must not change, and how the result will be reviewed.

At minimum, a good coding-agent spec should give the agent five things:

the context behind the task

the behavior that should change

the constraints the agent should preserve

examples or scenarios that define correctness

the validation evidence a reviewer should inspect

This is the useful idea behind spec-driven development, behavior scenarios, issue templates, lightweight design docs, OpenSpec, GitHub Spec Kit, and many internal engineering proposal formats. The specific framework matters less than the shape of the spec: the agent should receive enough context to act, and the team should receive enough structure to review the result.

The spec is not a nicer prompt. It is the prepared assignment between human intent and machine execution.

Prompts are good at starting work. Specs are better at carrying it.

A private prompt is optimized for immediacy. It lives in a chat session. It can include shorthand, missing context, and assumptions the author understands but nobody else sees.

That can work for a local explanation or a throwaway script. It is weaker for team engineering work.

The problem is not that prompts are informal. Informality is often useful. The problem is that private prompts usually disappear from the workflow after the agent starts. They do not naturally become review criteria. They are hard to compare against a pull request. They do not help the next person understand why the change exists.

Specs solve a different problem. They give the assignment a visible shape the team can keep inspecting.

That spec can live in different places. It can be a repo-local spec, an issue with acceptance criteria, a BDD scenario, a small design note, a change proposal, or a pull request description that names the behavior being changed. OpenSpec is one useful implementation of this pattern, but it is not the only one. GitHub Spec Kit, Gherkin-style scenarios, team RFC templates, and ordinary issue templates can all carry the same discipline when they make context and review criteria explicit.

That is the shift teams should care about. A good spec does not merely instruct the agent. It gives humans and agents something shared to inspect before, during, and after implementation.

FIG 01 Assignment shape

Private prompt

fix thismake it cleanerprobably auth?you know what I mean

Useful for starting thought, weak as a shared review object.

Team-visible spec

proposal

spec delta

design notes

tasks

review criteria

Visible before implementation, useful during review, durable after the session ends.

Private prompts are fast but unstable. Team-visible specs give the assignment a structure reviewers can inspect later.

The assignment layer separates intent from execution

The strongest specs behave like small behavior contracts. Requirements say what the system should do. Scenarios give concrete examples, often in a Given/When/Then style. Design notes and task lists can describe the technical approach and implementation checklist, but those are not the same thing as the requirement.

This separation is one of the most useful disciplines for AI-assisted engineering.

If the intent and implementation are mixed together too early, the agent can optimize for the wrong thing. It may faithfully follow a suggested implementation detail while missing the behavior the team actually needed. Or it may produce a plausible design that is hard to review because the success criteria were never made explicit.

An assignment layer keeps three questions apart:

What behavior should change?

What constraints or examples define correctness?

What implementation path seems appropriate right now?

Those questions are connected, but they should not collapse into one blob of instructions. The implementation can evolve as the agent reads the codebase. The requirement should remain stable enough for a reviewer to ask: did the work satisfy this?

FIG 02 Behavior contract

reviewability

100%

Steer the agent · collect all four · deliver a reviewable change

A spec is not a pipeline for the agent to follow blindly. It is a boundary for valid work.

That is also why delta-oriented formats are interesting for existing codebases. Most engineering work is not greenfield. Teams are changing behavior that already exists. A good spec says: here is the current contract, and here is the proposed change to that contract. Reviewers do not need to mentally diff a whole product document. They can look at the behavior delta.

A concrete example

Consider a private prompt like this:

Fix the flaky login test and update whatever needs changing.

That might be enough for a developer working alone. It is weak as a team assignment. It does not say what failure is observed, which behavior should remain stable, which checks matter, or what kind of fix is out of scope.

A better spec would make the work narrower:

Observed problem: the login test intermittently fails when the callback request arrives before the session record is visible to the test assertion.

Expected behavior: login should create exactly one session and redirect the user to the original destination.

Constraints: do not weaken auth checks, do not add sleeps to the test, and keep the fix local to the callback/session path unless the codebase shows a wider issue.

Validation: run the affected auth test file and the relevant backend or frontend checks.

Review object: compare the resulting PR against the stated behavior, constraints, and validation.

That does not remove judgment from the work. It gives the agent a boundary and gives the reviewer a relationship to inspect. When the spec is visible to the team, the reviewer can compare the pull request against the same context the agent received.

Specs are not waterfall if they stay small and revisable

Spec-driven work often triggers a reasonable objection: is this just waterfall with a new name?

It can be, if the team turns specs into ceremony. A giant document, months before implementation, is not suddenly better because an AI agent reads it.

The useful counter-pattern is lighter: fluid, iterative, easy to revise, and brownfield-first. Different frameworks express that differently. Some use proposals and delta specs. Some use issue checklists and acceptance criteria. Some use BDD scenarios. The important part is that these are actions around a change, not locked phases that delay learning.

That distinction is important. The assignment layer should reduce ambiguity, not freeze learning.

A good spec can change when implementation teaches the team something. If exploration reveals that the initial approach is wrong, the design should change. If a requirement was too broad, the scope should narrow. If a scenario exposed an edge case nobody considered, the spec should gain that scenario.

The discipline is not “write the perfect plan before code.” The discipline is “keep the visible intent and the implemented reality moving together.”

The review object changes

Without a visible assignment, reviewers mostly review the diff.

With an assignment layer, reviewers can review the relationship between four things:

the proposed behavior change

the implementation plan or task breakdown

the code and tests produced by the agent

the validation result from CI, local checks, or manual review

That relationship is where AI-assisted work becomes more manageable. The reviewer is not being asked to trust the agent’s confidence. They are comparing the spec, the implementation, and the evidence.

Different frameworks make this explicit in different ways. OpenSpec has verification concepts around completeness, correctness, and coherence. GitHub’s Spec Kit takes a stricter specification-first position. BDD workflows use examples as executable or semi-executable behavior expectations. Issue-driven teams may use acceptance criteria, labels, reviewers, and CI requirements instead.

Those are not identical philosophies. The common lesson is narrower and more useful: the more powerful coding agents become, the more important it is to preserve the assignment they were supposed to satisfy.

For teams, this changes the review question from:

“Does this diff look okay?”

to:

“Does this diff satisfy the behavior change we agreed to, under the constraints we named, with evidence we can inspect?”

That is a better question.

The right spec becomes team context

A good spec helps the agent start. A shared spec helps the team stay aligned.

A runner should not receive vague private intent, disappear into an isolated execution environment, and return a diff that reviewers have to decode from scratch. The work should start from context the team can see, then return through evidence the team already understands: branch, commits, pull request, CI result, runner summary, model audit, and human review.

No tool needs to prescribe one spec framework for every team. Some teams will use issues. Some will use repo-local specs. Some will use lightweight design docs or behavior scenarios. The important boundary is that coding-agent work should remain tied to a visible spec and a reviewable result.

That is what separates useful AI runner workflow from black-box autonomous output. Forkline follows this same principle, but the principle is broader than any one product: if agents act on shared code, the spec and the result should both be inspectable by the team.

The spec is also memory

AI coding sessions are temporary. Repositories are not.

One understated benefit of repo-local specs is that they can live with the code. They can be checked into the repository, organized by capability or change, and updated as work lands. That makes them useful to both people and agents later.

This matters because agent context is fragile. Chat history gets cleared. Context windows fill up. A different model or tool may handle the next task. The person who wrote the original prompt may not be available. If the only record of intent was a private chat, the team loses context as soon as the chat falls out of view.

Specs give that context a durable home. They do not replace code, tests, or documentation. They connect them. A new agent can read the current behavior. A new developer can understand what the system is expected to do. A reviewer can look back at an archived change and see not only what changed, but why the change was proposed.

This is not primarily an audit argument. It is a coordination argument. Teams need memory that survives individual sessions.

Bounded work is the right unit

Specs do not make every kind of work safe to delegate.

They are strongest when the work is bounded: a behavior change, a bug fix, a compatibility update, a small feature, a migration step, a CI repair, or a narrow refactor with clear constraints. In those cases, the team can describe the desired change and inspect whether the result mat

[truncated for AI cost control]