2026-07-05 15:10 UTCIn-site rewrite6 min readUpdated: 2026-07-05 15:44 UTC

AI coding jargon, explained in plain English

This article introduces an online dictionary that demystifies AI coding jargon in plain English, helping developers understand key terms like model, inference, context window, and more to improve their use of AI tools.

SourceHacker News AIAuthor: saikatsg

Notifications You must be signed in to change notification settings

Fork 299

Star 2.5k

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

49 Commits

.github/workflows

.husky

.vscode

dictionary

internal

.gitignore

.lintstagedrc.json

.prettierignore

.prettierrc.json

CLAUDE.md

README.md

package-lock.json

package.json

Repository files navigation

AI coding can feel like it's just for experts. Unexplained jargon. Mysterious failures. Bills that don't seem to match the work.

It isn't, really. A lot of the confusion is manufactured: there's a whole VC-funded economy that benefits from keeping it hard to understand.

The basic terms of engagement are learnable in an afternoon. Once you have them, the whole thing stops feeling like guesswork.

Why does context degrade? Why is the bill so high? Why does the same prompt behave differently from one day to the next?

Each has a clean answer, once someone tells you the words to use.

That's what this dictionary is for. The vocabulary of AI coding, translated into plain English.

Want more than the vocabulary? Join 62,000+ developers at aihero.dev/newsletter for my latest skills, thinking on AI engineering, and the resources that'll keep you ahead of the curve.

Table of contents

Section 1 — The Model

Model

Parameters

Training

Inference

Effort

Token

Next-token prediction

Non-determinism

Model provider

Harness

Model provider request

Input tokens

Output tokens

Prefix cache

Cache tokens

Section 2 — Sessions, Context Windows & Turns

Stateless

Context

Context window

Stateful

Agent

System prompt

Session

Turn

Section 3 — Tools & Environment

Environment

Filesystem

Tool

Tool call

Tool result

MCP

Permission request

Permission mode

Agent mode

Sandbox

Section 4 — Failure Modes

Sycophancy

Hallucination

Parametric knowledge

Knowledge cutoff

Contextual knowledge

Attention relationship

Attention budget

Attention degradation

Smart zone

Section 5 — Handoffs

Clearing

Handoff

Primary source

Secondary source

Handoff artifact

Spec

Ticket

Compaction

Autocompact

Section 6 — Memory and Steering

Memory system

AGENTS.md

Progressive disclosure

Context pointer

Skill

Subagent

Section 7 — Patterns of Work

Human-in-the-loop

AFK

Automated check

Automated review

Human review

Vibe coding

Design concept

Grilling

Prototyping

Section 1 — The Model

A moving label, not a technology. "AI" doesn't name a fixed thing the way model or token does — it points at whatever computers can newly, impressively do. Right now it points at large language models. It has pointed at very different things before:

Era What "AI" meant

1950s Symbolic reasoning — theorem provers, checkers programs.

1960s–70s Rule-based symbolic programs — ELIZA, SHRDLU.

1980s Expert systems — thousands of hand-written if-then rules encoding human expertise.

1990s Game-tree search — Deep Blue beating Kasparov (1997). Researchers avoided the word "AI" entirely

2000s Statistical machine learning — spam filters, recommenders. Still sold as "machine learning", not "AI"

2010s Deep learning — image recognition (AlexNet, 2012), AlphaGo (2016).

2020s Large language models — ChatGPT (2022) made "AI" mean chatbots

The pointer moves by a known mechanism, sometimes called the AI effect: once a technique works reliably, it gets renamed — it's "just" search, "just" statistics — and "AI" slides forward to the next unsolved thing. The observation is old. Bertram Raphael put it this way in 1971: "AI is a collective name for problems which we do not yet know how to solve properly by computer." Larry Tesler's version, from around 1979: "Intelligence is whatever machines haven't done yet."

This is why conversations about AI so often talk past each other. A claim like "AI can't reason" or "AI is overhyped" carries a hidden timestamp — it may be about expert systems, about 2010s image classifiers, or about last month's LLM, and each reference supports a different conclusion. When a discussion about AI stalls, the fix is usually to swap the word for whichever precise term is actually meant: the model, the harness, the agent, the context it was given.

Avoid: "AI" in any technical claim — name the part you mean instead. "AI coding" as a label for the practice is fine; "the AI is hallucinating" is not.

Usage:

"The CTO wants to know whether AI could handle the triage queue."

"Translate that before scoping it — she means an LLM in a harness with access to the ticket system. 'AI' on its own isn't a spec."

Model

The parameters. Stateless — does next-token prediction and nothing else. "Claude Opus 4.x" and "GPT-5.x" are models. On its own a model can't do anything agentic; it has to be harnessed.

Models can't read files, run commands, browse the web, or remember yesterday — it takes tokens in and predicts tokens out, once per model provider request. Everything that feels like an agent working — choosing tools, reading results, looping until the task is done — is the harness orchestrating many of those predictions in a row.

Model providers ship models in tiers: a large one that's smartest but slow and expensive, and smaller ones that are faster and cheaper but less capable. Picking a tier is a real decision — heavyweight for planning and hard debugging, lightweight for mechanical changes — and harnesses let you switch mid-session.

Being strict about the word also sharpens diagnosis. "The model is bad at this" is a specific claim — the same model in a different harness, or with a different context, often behaves completely differently. Before blaming the model, check what it was given: most disappointing output traces back to context or harness, not parameters.

Usage:

"Should we switch the model from Sonnet to Opus for the planning step?"

"Try it — but the harness is doing most of the lifting on this task. The model swap won't help if the system prompt and tools are wrong."

Parameters

The numbers inside a model — often billions of them — tuned during training. Everything the model "knows" lives in them. Training sets them; inference uses them unchanged. Also called weights.

Mechanically, the parameters are what turn input into output. Next-token prediction is a giant calculation: the tokens in the context window go in, get multiplied through the parameters, and a prediction for the next token comes out. There is no database of facts inside the model, no code lookup table — just these numbers, arranged so that the calculation tends to produce useful output. Facts the model can recite from training, like a standard library API, are parametric knowledge: stored in the parameters, not retrieved from anywhere.

The detail worth internalising is that parameters are frozen after training. Nothing you do in a session changes them — no correction you make, no codebase you show it, no mistake it learns from. Every session runs on the same numbers. This is why the model is stateless, why its built-in knowledge stops at the knowledge cutoff, and why anything project-specific has to arrive via context instead. The only way parameters change is more training — which produces, in effect, a different model.

Usage:

"Can we fine-tune it on our codebase?"

"That'd update the parameters — different model afterwards. For one project it's almost always cheaper to load the codebase as context than to retrain."

Training

The process that sets a model's parameters, by exposing it to vast amounts of text and adjusting parameters to improve next-token prediction. A one-time, expensive process done by the model provider. Encompasses both pre-training (the bulk run) and post-training (later refinements like instruction-following and safety); the distinction doesn't matter at this glossary's level.

The mechanism is repetition at scale: show the model a stretch of text, have it predict the next token, nudge the parameters toward whatever the actual next token was, and repeat across trillions of tokens. Nothing is stored as facts or rules — everything the model "knows" is a side effect of getting better at prediction, compressed into the parameters as parametric knowledge.

Two consequences matter day to day. Training ends at a point in time, so the model has a knowledge cutoff — it hasn't seen the library version you upgraded to last month. And training is not something you can do: when the model doesn't know your codebase, your conventions, or your internal APIs, the fix is never "teach the model" — it's putting that material into context, the one input you control.

Usage:

"Can we get it to know our internal API?"

"Not via training — that's a months-long process by the model provider. Load the API docs into context instead, that's the lever you actually have."

Inference

Running a trained model to generate output — what happens on every model provider request. Parameters stay fixed; the model just does next-token prediction over the context it's given. Cheap relative to training, but billed per token and the dominant cost of using a model.

A model's life splits into two phases:

Phase When it happens What it does Parameters

Training Once, before release Produces the parameters from a training corpus Being written

Inference Every time anyone uses the model Runs the frozen parameters over your context to generate tokens Read-only

Nothing you do at inference time writes back to the parameters — that's the reason a correction you make today doesn't stick tomorrow. The model that makes the same mistake next session, after you carefully explained the fix, hasn't ignored you; it's incapable of learning from the exchange. The model is stateless — continuity has to come from outside it — from the context window or a memory system.

This mechanism also explains how you're billed. Every request runs the model over the full context, so cost scales with input tokens and output tokens, and an agent making dozens of tool calls pays for inference on each round trip. This is why context size is a cost question as well as a quality one.

Usage:

"Why does the bill scale with usage instead of being a flat license?"

"You're paying for inference — every model provider request runs the model on the provider's hardware. Training already happened, but inference costs accrue per request, and a single turn can expand into many requests when tools are called."

Effort

Effort is a dial for how much reasoning a model does before it answers. Set per model provider request, it controls the length of the thinking the model works through before it starts writing the response you see. That thinking is generated at inference time like everything else; the harness often hides it, but it's real work the model is doing.

Higher effort costs more and runs slower. The reasoning is emitted as tokens, billed as output tokens even when you never see them, and produced one token at a time — so turning effort up lengthens the wait before the answer arrives and adds to the bill. The trade is more deliberation against speed and cost.

Most harnesses expose effort as a small ladder:

Level What it's for

Low Mechanical edits, lookups, well-specified changes with one clear path.

Medium Everyday coding — the usual default.

High Tricky bugs, design decisions, multi-step plans.

Max The hardest problems, where a wrong answer is expensive to unwind.

The symptom of getting it wrong cuts both ways. Set effort too low on a hard problem and you get a confident, shallow answer that skipped the reasoning the problem needed — it reads fine and is wrong in a way that costs you later. Set it to max for a one-li

[truncated for AI cost control]