Your AI Agent Already Forgot Half of What You Told It
This article is the seventh in a series on agentic engineering and AI-driven development, focusing on context management in AI sessions. The author shares a personal experience with Gemini forgetting earlier notes, introduces the concept of context compaction, and provides four practical techniques: split discovery from documentation, use handoff documents, give acceptance criteria rather than procedures, and use spec documents as bridges. These techniques apply to both developers and regular users, helping reduce frustration caused by AI forgetting.
Article intelligence
Key points
- AI assistants can 'forget' earlier information in long conversations due to context window limits, a phenomenon called context compaction.
- Four practical techniques: split discovery from documentation, use handoff documents, give acceptance criteria, and use spec documents as bridges.
- These techniques are equally effective for developers and general users, significantly reducing frustration from AI memory loss.
Why it matters
This matters because AI assistants can 'forget' earlier information in long conversations due to context window limits, a phenomenon called context compaction.
Technical impact
May affect model selection, inference cost, product capability, and evaluation benchmarks.
This is the seventh article in a series on agentic engineering and AI-driven development. Read part one here, part two here, part three here, part four here, part five here, and part six here.
This is the latest article in my Radar series on AI-driven development and agentic engineering, and I have to admit that this one took a bit of a turn I wasn’t expecting.
In my last article I talked about context and context management and I promised to give you some real practical tips for using it. It was originally meant to be about specific, practical context management techniques that were really helpful to me building Octobatch and the Quality Playbook, two open source projects where I work with AIs to plan and orchestrate all of the work and every line of code is written by AI tools like Claude Code and Cursor.
But as I was writing this, I found that I’d adapted those same techniques to my work writing articles like this one. Which is surprising! I’ve been doing all this work finding ways to help people developing AI skills improve context management, so their skills run more efficiently. It turns out that those same exact techniques apply to anyone using AI tools, even when you’re using chatbots like Claude.ai or ChatGPT.
Full disclosure: I use multiple AI tools to manage this article series. My primary tools are Claude Cowork for brainstorming and managing my article research, notes, and backlog and Gemini’s mobile app for reading drafts aloud and taking my notes while I’m away from my desk. And I want to tell you about something that happened while I was using those tools, because I think it really helps show why context management isn’t just a problem for developers.
While I was writing this article, I was using Gemini’s mobile app to read the draft aloud and take my notes. Partway through the session I asked it to go back and check whether there were earlier notes it hadn’t incorporated yet. It told me it didn’t have access to the previous notes, which seemed weird and insane, since we had just taken those notes a few prompts earlier in the session. I could scroll back up and see them earlier in the conversation, but somehow it didn’t “know” about them.
Here’s what happened. Gemini had compacted our conversation without telling me, and the notes from the first half of the session were just… gone.
If you’ve ever had a web chat AI just seem to forget things you talked about earlier, you’ve experienced context compaction, just like I did. Understanding even the basics of context and context windows can make a big difference in preventing that kind of frustration.
This all reminded me of something I wrote more than two decades ago in Applied Software Project Management (back in 2005!): “Important information is discovered during the discussion that the team will need to refer back to during the development process, and if that information is not written down, the team will have to have the discussion all over again.”
Jenny Greene and I wrote that about human teams and project meetings, but it applies to AI sessions just as well.
Which brings me back to context, which I wrote about in my last article, and which I’ll write more about in the next one, because it’s one of the most important concepts to keep top of mind when working with AI.
Context loss may be invisible, but that doesn’t make it any less frustrating
Context is everything the AI is holding in its working memory during a conversation: what you’ve told it, what it’s told you, any files or instructions it’s read, and whatever internal notes the system has made along the way. All of that lives in a fixed-size context window—think of that as your AI’s short-term memory, the stuff it’s thinking about right now—and when the window fills up, the AI has to start letting things go. Different tools handle this differently: Some truncate older messages, some compress the conversation into a summary (which means details get lost even though the summary looks complete), and some just start behaving inconsistently so you can’t tell whether the AI forgot something or never understood it in the first place. The result is the same: The AI loses track of things you told it, decisions you made together, or details it noticed earlier in the session. And it won’t tell you it forgot. It’ll just keep generating confident-sounding output based on whatever it still has.
Before we dive in a little deeper, I want to do a quick jargon check. If you’ve seen the terms “skills” and “agents” floating around but aren’t sure what they are, think of skills as libraries for AIs and agents as interactive executables. Those aren’t perfectly precise definitions, but if you’re a developer they’re close enough for this discussion.
When you’re coding skills and agents, you run into context problems quickly. The work you’re asking the AI to do is often complex enough that the context window fills up, and the AI has to start compacting: compressing or dropping older parts of the conversation to make room for new ones. Compaction always seems to happen at the most frustrating and inconvenient time, which makes sense when you think about it. You hit context limits precisely when you’ve put the most information into the conversation, which is exactly when losing that information costs you the most.
That’s why I think it can often help to think of AIs as having the same shortcomings that human teams do, except those shortcomings are exaggerated by their AI nature. A person who forgets something from a meeting last week might remember it when you remind them. An AI that lost something to context compaction won’t, because the information is gone. But there’s something you can do about it, and it turns out the techniques that help are the same whether you’re building autonomous AI skills or just trying to get a chatbot to remember what you told it 20 minutes ago.
I’ve landed on four techniques that I come back to over and over again. Each one exists because at some point the AI forgot something important and I responded by putting that thing in a file where it couldn’t be forgotten. None of them require special tooling. And to my surprise, all of these techniques have turned out to be useful for both building software and managing a writing project like this one, whether I’m chatting with Claude, ChatGPT, or Gemini, or using a desktop tool like Claude Cowork or Codex. These are the techniques I find most valuable:
Split discovery from documentation: Don’t ask the AI to figure something out and produce polished output in the same pass.
Use handoff documents, not continuation prompts: Before closing a stale session, have the AI write down everything the next session needs to know.
Give the AI an acceptance criterion, not a procedure: Tell it what “done” looks like instead of spelling out the steps.
Use spec documents as the bridge between AI tools: Make a shared document the single source of truth that all your tools read from.
Split discovery from documentation
When you ask an AI to do something complex, you’re often asking it to do two things at once without realizing it. You’re asking it to figure something out and produce polished output at the same time. The problem is that figuring things out takes attention, and producing output takes attention, and the model only has so much of it. When you combine both tasks in the same prompt, the model starts cutting corners on one of them, and you can’t tell which one it shortchanged.
I ran into this with the Quality Playbook, an open source AI coding skill I built that runs structured code reviews against any codebase. One of the things it does is derive requirements from source code: It reads through the code, identifies what the code promises to do (I call these behavioral contracts), and then produces a requirements document. Originally this all happened in a single pass. The problem was that single-pass requirement generation ran out of attention after about 70 requirements. The model forgot behavioral contracts it had noticed earlier in the code, and the forgetting was completely invisible. There was no stack trace or error message, just incomplete output and no way to know what was missing. I fixed it by splitting the work into two separate prompts:
Read each source file and write down every behavioral contract you observe as a simple list in CONTRACTS.md.
Read CONTRACTS.md and the documentation, then derive requirements from them and write REQUIREMENTS.md.
Then a third pass checks whether every contract has a corresponding requirement, and if there are gaps, goes back to step one for the files with gaps.
The key idea is that CONTRACTS.md is external memory. When the model “forgets” about a behavioral contract it noticed earlier, that forgetting is normally invisible. With a contracts file, every observation is written down before any requirements work begins, so an uncovered contract is a visible, greppable gap. You can see what was forgotten and fix it.
The principle: Don’t ask the AI to figure out what exists and write formatted output in the same pass. The model runs out of attention trying to do both at once. Whenever you’re asking an AI to do something complex, consider whether you’re actually asking it to do two things at once. “Analyze this codebase and write a report” is two tasks. “Read this document and suggest improvements” is two tasks. Split them, and let the first pass write its observations to a file before the second pass starts working with them.
Use handoff documents, not continuation prompts
Anyone who’s spent a long session with an AI coding tool has felt the moment when the context starts to go stale. The AI stops tracking details it was handling fine an hour ago, or it contradicts something it said earlier. The session gets slow, and you’re often restarting because the AI seems to have gotten bogged down and filled up on what you told it. You get the sense that if you keep going, you’re going to spend more time correcting it than making progress.
Most developers respond to their session getting too long in one of two ways: They push through the problem, or they start a fresh one and try to reexplain everything from scratch. Both of those approaches can cause the AI to lose context. The first loses it to compaction; the second loses it to incomplete reexplanation. And both are frustrating! Specifically because you just spent so much time building up all that context with the AI.
There’s a third option. Before you close the session, ask the AI to write a handoff document: a file that captures everything the next session needs to know, written while the current session still has full context. The key is that you’re asking the AI to write this while the relevant details are still fresh in the working context, and in a way that it or another AI can read.
I built this into the Quality Playbook as a core part of how phases communicate. When I split the playbook from a single prompt to independent phases, I needed each phase to run as a completely independent session with no context carryover. So each phase got its own kickoff prompt as a standalone file. Here’s the structure each one follows:
Write a handoff document that a fresh session could use to pick up this work cold. Include everything it would need to know.
Every kickoff opens with what prior phases accomplished, includes explicit boundaries about what’s frozen, and names which future phase owns each piece of remaining work, because without it the AI will helpfully start doing Phase 3 work while you’re still in Phase 2. Each phase also ends with a required forward-looking handoff where the completing agent writes down what the next session needs to know.
The principle: Each handoff is a complete state snapshot. The incoming AI agent never needs to read prior kickoff prompts or chat history. Everything it needs is in the current handoff file: current state, uncommitted ch
[truncated for AI cost control]