AI coding jargon, explained in plain English
This article introduces an online dictionary that demystifies AI coding jargon in plain English, helping developers understand key terms like model, inference, context window, and more to improve their use of AI tools.
Notifications You must be signed in to change notification settings
Fork 299
Star 2.5k
BranchesTags
Open more actions menu
Folders and files
NameName
Last commit message
Last commit date
Latest commit
History
49 Commits
49 Commits
.github/workflows
.github/workflows
.husky
.husky
.vscode
.vscode
dictionary
dictionary
internal
internal
.gitignore
.gitignore
.lintstagedrc.json
.lintstagedrc.json
.prettierignore
.prettierignore
.prettierrc.json
.prettierrc.json
CLAUDE.md
CLAUDE.md
README.md
README.md
package-lock.json
package-lock.json
package.json
package.json
Repository files navigation
AI coding can feel like it's just for experts. Unexplained jargon. Mysterious failures. Bills that don't seem to match the work.
It isn't, really. A lot of the confusion is manufactured: there's a whole VC-funded economy that benefits from keeping it hard to understand.
The basic terms of engagement are learnable in an afternoon. Once you have them, the whole thing stops feeling like guesswork.
Why does context degrade? Why is the bill so high? Why does the same prompt behave differently from one day to the next?
Each has a clean answer, once someone tells you the words to use.
That's what this dictionary is for. The vocabulary of AI coding, translated into plain English.
Want more than the vocabulary? Join 62,000+ developers at aihero.dev/newsletter for my latest skills, thinking on AI engineering, and the resources that'll keep you ahead of the curve.
Table of contents
Section 1 — The Model
AI
Model
Parameters
Training
Inference
Effort
Token
Next-token prediction
Non-determinism
Model provider
Harness
Model provider request
Input tokens
Output tokens
Prefix cache
Cache tokens
Section 2 — Sessions, Context Windows & Turns
Stateless
Context
Context window
Stateful
Agent
System prompt
Session
Turn
Section 3 — Tools & Environment
Environment
Filesystem
Tool
Tool call
Tool result
MCP
Permission request
Permission mode
Agent mode
Sandbox
Section 4 — Failure Modes
Sycophancy
Hallucination
Parametric knowledge
Knowledge cutoff
Contextual knowledge
Attention relationship
Attention budget
Attention degradation
Smart zone
Section 5 — Handoffs
Clearing
Handoff
Primary source
Secondary source
Handoff artifact
Spec
Ticket
Compaction
Autocompact
Section 6 — Memory and Steering
Memory system
AGENTS.md
Progressive disclosure
Context pointer
Skill
Subagent
Section 7 — Patterns of Work
Human-in-the-loop
AFK
Automated check
Automated review
Human review
Vibe coding
Design concept
Grilling
Prototyping
DX
AX
Section 1 — The Model
AI
A moving label, not a technology. "AI" doesn't name a fixed thing the way model or token does — it points at whatever computers can newly, impressively do. Right now it points at large language models. It has pointed at very different things before:
Era What "AI" meant
1950s Symbolic reasoning — theorem provers, checkers programs.
1960s–70s Rule-based symbolic programs — ELIZA, SHRDLU.
1980s Expert systems — thousands of hand-written if-then rules encoding human expertise.
1990s Game-tree search — Deep Blue beating Kasparov (1997). Researchers avoided the word "AI" entirely
2000s Statistical machine learning — spam filters, recommenders. Still sold as "machine learning", not "AI"
2010s Deep learning — image recognition (AlexNet, 2012), AlphaGo (2016).
2020s Large language models — ChatGPT (2022) made "AI" mean chatbots
The pointer moves by a known mechanism, sometimes called the AI effect: once a technique works reliably, it gets renamed — it's "just" search, "just" statistics — and "AI" slides forward to the next unsolved thing. The observation is old. Bertram Raphael put it this way in 1971: "AI is a collective name for problems which we do not yet know how to solve properly by computer." Larry Tesler's version, from around 1979: "Intelligence is whatever machines haven't done yet."
This is why conversations about AI so often talk past each other. A claim like "AI can't reason" or "AI is overhyped" carries a hidden timestamp — it may be about expert systems, about 2010s image classifiers, or about last month's LLM, and each reference supports a different conclusion. When a discussion about AI stalls, the fix is usually to swap the word for whichever precise term is actually meant: the model, the harness, the agent, the context it was given.
Avoid: "AI" in any technical claim — name the part you mean instead. "AI coding" as a label for the practice is fine; "the AI is hallucinating" is not.
Usage:
"The CTO wants to know whether AI could handle the triage queue."
"Translate that before scoping it — she means an LLM in a harness with access to the ticket system. 'AI' on its own isn't a spec."
Model
The parameters. Stateless — does next-token prediction and nothing else. "Claude Opus 4.x" and "GPT-5.x" are models. On its own a model can't do anything agentic; it has to be harnessed.
Models can't read files, run commands, browse the web, or remember yesterday — it takes tokens in and predicts tokens out, once per model provider request. Everything that feels like an agent working — choosing tools, reading results, looping until the task is done — is the harness orchestrating many of those predictions in a row.
Model providers ship models in tiers: a large one that's smartest but slow and expensive, and smaller ones that are faster and cheaper but less capable. Picking a tier is a real decision — heavyweight for planning and hard debugging, lightweight for mechanical changes — and harnesses let you switch mid-session.
Being strict about the word also sharpens diagnosis. "The model is bad at this" is a specific claim — the same model in a different harness, or with a different context, often behaves completely differently. Before blaming the model, check what it was given: most disappointing output traces back to context or harness, not parameters.
Usage:
"Should we switch the model from Sonnet to Opus for the planning step?"
"Try it — but the harness is doing most of the lifting on this task. The model swap won't help if the system prompt and tools are wrong."
Parameters
The numbers inside a model — often billions of them — tuned during training. Everything the model "knows" lives in them. Training sets them; inference uses them unchanged. Also called weights.
Mechanically, the parameters are what turn input into output. Next-token prediction is a giant calculation: the tokens in the context window go in, get multiplied through the parameters, and a prediction for the next token comes out. There is no database of facts inside the model, no code lookup table — just these numbers, arranged so that the calculation tends to produce useful output. Facts the model can recite from training, like a standard library API, are parametric knowledge: stored in the parameters, not retrieved from anywhere.
The detail worth internalising is that parameters are frozen after training. Nothing you do in a session changes them — no correction you make, no codebase you show it, no mistake it learns from. Every session runs on the same numbers. This is why the model is stateless, why its built-in knowledge stops at the knowledge cutoff, and why anything project-specific has to arrive via context instead. The only way parameters change is more training — which produces, in effect, a different model.
Usage:
"Can we fine-tune it on our codebase?"
"That'd update the parameters — different model afterwards. For one project it's almost always cheaper to load the codebase as context than to retrain."
Training
The process that sets a model's parameters, by exposing it to vast amounts of text and adjusting parameters to improve next-token prediction. A one-time, expensive process done by the model provider. Encompasses both pre-training (the bulk run) and post-training (later refinements like instruction-following and safety); the distinction doesn't matter at this glossary's level.
The mechanism is repetition at scale: show the model a stretch of text, have it predict the next token, nudge the parameters toward whatever the actual next token was, and repeat across trillions of tokens. Nothing is stored as facts or rules — everything the model "knows" is a side effect of getting better at prediction, compressed into the parameters as parametric knowledge.
Two consequences matter day to day. Training ends at a point in time, so the model has a knowledge cutoff — it hasn't seen the library version you upgraded to last month. And training is not something you can do: when the model doesn't know your codebase, your conventions, or your internal APIs, the fix is never "teach the model" — it's putting that material into context, the one input you control.
Usage:
"Can we get it to know our internal API?"
"Not via training — that's a months-long process by the model provider. Load the API docs into context instead, that's the lever you actually have."
Inference
Running a trained model to generate output — what happens on every model provider request. Parameters stay fixed; the model just does next-token prediction over the context it's given. Cheap relative to training, but billed per token and the dominant cost of using a model.
A model's life splits into two phases:
Phase When it happens What it does Parameters
Training Once, before release Produces the parameters from a training corpus Being written
Inference Every time anyone uses the model Runs the frozen parameters over your context to generate tokens Read-only
Nothing you do at inference time writes back to the parameters — that's the reason a correction you make today doesn't stick tomorrow. The model that makes the same mistake next session, after you carefully explained the fix, hasn't ignored you; it's incapable of learning from the exchange. The model is stateless — continuity has to come from outside it — from the context window or a memory system.
This mechanism also explains how you're billed. Every request runs the model over the full context, so cost scales with input tokens and output tokens, and an agent making dozens of tool calls pays for inference on each round trip. This is why context size is a cost question as well as a quality one.
Usage:
"Why does the bill scale with usage instead of being a flat license?"
"You're paying for inference — every model provider request runs the model on the provider's hardware. Training already happened, but inference costs accrue per request, and a single turn can expand into many requests when tools are called."
Effort
Effort is a dial for how much reasoning a model does before it answers. Set per model provider request, it controls the length of the thinking the model works through before it starts writing the response you see. That thinking is generated at inference time like everything else; the harness often hides it, but it's real work the model is doing.
Higher effort costs more and runs slower. The reasoning is emitted as tokens, billed as output tokens even when you never see them, and produced one token at a time — so turning effort up lengthens the wait before the answer arrives and adds to the bill. The trade is more deliberation against speed and cost.
Most harnesses expose effort as a small ladder:
Level What it's for
Low Mechanical edits, lookups, well-specified changes with one clear path.
Medium Everyday coding — the usual default.
High Tricky bugs, design decisions, multi-step plans.
Max The hardest problems, where a wrong answer is expensive to unwind.
The symptom of getting it wrong cuts both ways. Set effort too low on a hard problem and you get a confident, shallow answer that skipped the reasoning the problem needed — it reads fine and is wrong in a way that costs you later. Set it to max for a one-li
[truncated for AI cost control]