Back to blog
ai workflow·6 min read·0 views

More Context Is Making Your AI Dumber

Prompt engineering taught us how to ask. Context engineering is the next layer: deciding what the model sees, when it sees it, and what it never needs to see at all.

More Context Is Making Your AI Dumber

For two years, the advice was the same: learn to write better prompts.

Phrase it carefully, add a few examples, give the model a role, and watch the quality jump. That advice was not wrong. But if you have moved past one-off chats and tried to build something real - an agent, a coding assistant, anything that runs for more than a single turn - you have probably hit a wall that no amount of clever wording fixes.

That wall is context.

And the discipline of getting it right has a name now: context engineering.

The simplest way to separate the two:

Prompt engineering is how you ask. Context engineering is what the model knows before it answers.

Prompts are still table stakes. But context is the multiplier. It is the difference between an AI that dazzles in a demo and one you can put in front of users without flinching.

This article is the version I wish someone had handed me earlier: no fluff, no buzzwords, just how models actually use context, why more of it can make things worse, and the concrete techniques that fix it.

First, A Myth Worth Killing

The instinct most people have is simple: if the model needs context, give it more.

Bigger window. More documents. Paste the whole file.

Context windows have ballooned to hundreds of thousands - even millions - of tokens, so why not use them?

Because the window is not storage. It is working memory.

And unlike the RAM in your computer, filling it up does not just slow things down. It actively degrades the model's ability to find and use what matters.

More context does not mean more intelligence. Past a point, it means less.

To engineer context well, you have to understand why.

How A Model Actually Uses Context

Two ideas explain almost everything.

1. The Attention Budget Is Finite

Under the hood, a transformer relates every token to every other token. For n tokens, that is on the order of n^2 relationships.

The model has a fixed pool of "attention" to spread across all of them. Double the length of your context and you have not doubled the model's focus. You have spread the same focus across far more material, much of it noise.

The signal you actually care about gets a thinner and thinner slice.

2. Position Matters More Than People Think

Models pay the most attention to the beginning and the end of the context, and the least to the middle.

This is the well-documented "lost in the middle" effect.

The critical instruction you buried halfway through a giant paste is, quite literally, the part the model is most likely to overlook.

Put those together and you get a phenomenon called context rot: as the token count climbs, the model's ability to accurately recall any specific fact inside that context goes down.

Your instruction does not get ignored because the model is incapable. It gets drowned.

This reframes the entire job.

The goal of context engineering is not to maximize what you put in. It is to find the smallest set of high-signal tokens that maximize the chance of the output you want.

Hold onto that sentence. Every technique below is just a way to serve it.

The Techniques That Move The Needle

These are the moves that separate people who fight their AI from people who ship with it.

1. Compaction

When a conversation approaches the context limit, do not just let it grow.

Summarize what happened and restart with a clean, compressed version. The art is in choosing what to keep - architectural decisions, unresolved bugs, key constraints - and what to throw away.

The lowest-hanging fruit here is clearing old tool outputs.

Once a tool ran twenty messages ago, the model almost never needs to re-read its raw result. Stripping those out can recover thousands of tokens without losing anything that matters.

It is the safest, lightest-touch optimization you can make, and most people never do it.

2. Structured Note-Taking

Instead of keeping every detail live in the window, have the agent write notes to an external file, like a running NOTES.md, and pull them back only when needed.

This gives the model a persistent external memory that survives across long tasks while keeping the active context lean.

The window holds what is relevant now. The file holds everything else.

3. Sub-Agent Architectures

For complex work, do not make one agent carry the entire history.

Spin up specialized sub-agents, each working in its own clean context window on a focused slice of the problem. The main agent receives only the condensed summaries back.

Every sub-agent stays sharp precisely because none of them is dragging the whole conversation around.

4. Just-In-Time Retrieval

Stop front-loading everything "just in case."

Pulling in five documents the model might need is five documents' worth of noise competing for that finite attention budget.

Instead, let the model retrieve context at the moment it actually needs it.

Less upfront clutter means sharper answers and a smaller bill.

The Token-Saving Angle Nobody Explains To Juniors

Most people treat trimming context as a cost optimization: fewer tokens, lower invoice.

That is true, but it badly undersells it.

Here is the insight:

Cost and quality move in the same direction.

Every token you do not add is both cheaper to process and one less distraction pulling on the model's attention.

You are not trading quality for savings. You are getting both at once.

That is why context engineering feels almost too good once it clicks: the lean version of your prompt is usually the more accurate one too.

Before And After

Picture a developer asking an AI to fix a bug.

The bloated way: paste the entire 800-line file, then somewhere in the middle write "fix the off-by-one error in the pagination."

The model now has to scan 800 lines, the real instruction is sitting in the dead zone of the context, and half the file is irrelevant to the bug.

The engineered way: include only the roughly 30-line pagination function, put the instruction right at the top - "There is an off-by-one error in this function's page boundary logic; find and fix it" - and add a one-line note pointing to where the function is called, instead of pasting all the callers.

Same model. Same bug. Wildly different odds of a correct fix, and a fraction of the tokens.

That gap is context engineering.

A Practical Checklist

Before your next serious prompt or agent run, ask:

  • Is my key instruction at the start or end of the context, never buried in the middle?
  • Did I paste a whole file when one function or section would do?
  • Am I carrying stale context - old tool results, finished sub-tasks, things I will never reuse?
  • Could the model retrieve this when needed instead of me front-loading it?
  • For long tasks, am I compacting and using external notes instead of letting the window grow unchecked?

Fix those five and you will feel the difference immediately - not in theory, in your very next session.

The Bottom Line

Prompt engineering taught us how to ask. It got everyone in the door.

But the leverage has moved.

As models got more capable, clever phrasing started delivering diminishing returns, and the real bottleneck became the information environment you build around the model.

Context engineering is that discipline: deciding what the model sees, when it sees it, and what it never needs to see at all.

It is less glamorous than a magic prompt and far more powerful.

Master it, and you stop hoping your AI gets it right. You start engineering it to.

Originally shared on X.


Enjoyed this?

Be the first to react

ShareXLinkedIn