The Reality of Long-Running AI Development — The Context Wall and Running a Memory

The longer you talk, the more the AI forgets the beginning—the context wall, and how to get past it

Spend long hours developing with an AI and you'll always hit one wall: context—the thread of everything said so far—gradually slips away.

As a conversation grows long, the AI summarizes and compresses the earlier exchanges (a mechanism called compacting). It can't hold the whole history verbatim, so this makes sense in itself. But summarizing, convenient as it is, drops the details. The fine numbers in a spec, the why behind a decision, a pitfall you once hit and resolved to avoid—those thin out in the course of summarizing.

The context wall

The tricky part is that if you keep working off a summarized conversation, you drift from the original spec without noticing.

The AI carries on, plausibly, from the (already summarized) context left in hand. It works convincingly on its own, yet it disagrees with the spec you set at the start—the most common rework in a long session is born right here. The very feeling of "I checked that a moment ago, so it's fine" may be riding on a memory that summarizing has thinned.

There are two ways past it: keep the facts outside the conversation, and rebuild the context often. In order:

What you treat as truth: the spec and the plan

First, what counts as "true." Not the memory inside the conversation, but two documents—the spec and the plan.

The spec decides what to build: the design intent, and the "contract" to uphold—the shape of the exchange between screen and server, the definition of a database column, a function's inputs and outputs.
The plan is the order in which you implement it. You write each stage's definition of "done" up front, as numbers and commands ("this search returns zero," "this test passes")—in a form a machine can judge.

And the keystone is the Appendix at the end of the spec. Anything tied to a contract goes here, in one place, not scattered through the prose. And not from guesswork: it's written as the result of checking the real thing directly—stamped with what was verified, and as of when—as the single source of truth. The body and the plan don't restate those values; they just point to it: "see Appendix A.1."

Why go this far? Because writing the same fact in two places means one of them eventually goes stale and they disagree—they drift. Consolidate it in one place, and even when summarizing thins your memory, the appendix takes you back to the primary source. Before implementing, you re-read this spec / plan / appendix every time—and the moment you feel "I know it, it's in memory" or "I checked this earlier" is exactly when you stop and pull the real thing again. The fact written in a file is always righter than your own summarized memory. For that matter, even the "here's the current state" handed to you when a session starts can be a little stale—so before you branch off, refetch the real thing, then move.

Rebuild the context often

The other way is how you handle the context itself.

Compacting happens automatically—but you don't get to choose its summary; you can't control what stays and what drops. And the larger the context you carry, the more the model's attention scatters and the less of its real ability it can bring to bear—a swollen context is like trying to think at a cluttered desk.

So you develop while keeping an eye on how much context is left, and before it auto-summarizes, at a clean stopping point, you clear it yourself and start a new session. Clearing resets the context to blank, and the model—now travelling light—can perform at its best.

It looks wasteful at first, but clearing and starting over is far easier to handle than leaving it to automatic summarizing. You decide where to cut and what to carry forward. Stacking cleanly separated short sessions gives you more control—and keeps the quality up—than dragging one long session along under a hazy summary.

Hand off through files — across PCs and offices

Before you clear, you leave a handoff for the next session. Here you use two tools that write out session state.

One is remember, a Claude Code plugin. It bundles the current session's work log into units like the day's entry or "the last few days" and writes them out, so you can read them back when the next session starts—a short-term-memory role for quickly recovering "how far you got."

The other is a homegrown one, memory-write. It came out of a specific problem: when a session ends, the learnings and decisions you earned vanish—and if you try to continue on another PC, none of it carries over. So, to keep long-lived facts as a permanent memory, we built our own plugin. Here's what it does: it classifies a fact you want to keep by kind—the project's rules, the state of work in progress, and above all, past mistakes and their lessons—writes it to a memory file, updates an index, and commits it to git.

The crux is putting it on git. On a different PC, or in a different office, a git pull brings the same learnings straight to hand. Write down once that "doing it this way breaks here" or "that assumption was wrong," and the next session's AI won't step on the same trap, no matter where you resume. You can't carry the conversation's context with you, but the facts and the handoff you offloaded into files can be picked back up from anywhere. The place changes; the thread doesn't break.

What you can delegate, and what you can't

Put it all together and a line emerges.

The context wall doesn't go away. So don't use the AI on the premise that it "remembers everything forever." Keep the facts worth remembering outside the AI's memory—in the spec, the plan, and the memory store. What you delegate to the AI is the work built on those facts. Separate what you can hand off (moving the work along) from what you must not (holding the facts)—only once you've made that peace can you keep running, long hours and all, without it falling apart.

Not a partner that magically remembers everything, but one that forgets by default—so you put what you can't afford to lose on the outside. Unglamorous, but it was the most realistic discipline for long-running AI development.

The road to arriving at this way of running a memory is in How a Micro-SaaS Tech Stack Changed in a Single Year, and how we assure quality is in Two-Track Review with Claude and Codex. Other posts on how we build are gathered in the dev category.

The micro-SaaS we keep building while getting along with a forgetful partner is PentaTrail—a CTEM service that uses AI to continuously grasp your externally visible attack surface. If you're curious, take a look at your own company's "externally visible attack surface."

See PentaTrail / CTEM

The Reality of Long-Running AI Development — The Context Wall and Running a Memory

The context wall

What you treat as truth: the spec and the plan

Rebuild the context often

Hand off through files — across PCs and offices

What you can delegate, and what you can't

Visualize your attack surface with PentaTrail/CTEM

Related Articles

Two-Track Review with Claude and Codex — Surfacing Blind Spots with a Different AI Lineage

How We Keep AI-Written Code Secure — Defense in Depth for the Age of Generated Code

What MCP Changed — On Giving an AI "Hands"