Context Window Budgeting: Treating Tokens as a Finite Resource

The context window feels infinite until it isn't.

We're deep into a refactoring session. The AI understands our codebase, our patterns, our goals. We're making rapid progress. Then, mid-conversation, the responses start getting fuzzy. The AI forgets what we discussed ten minutes ago. It suggests changes we already made.

We've hit the context ceiling. And now we have to start over.

The Invisible Budget

Every AI conversation has a token limit. For most current models, it's somewhere between 100,000 and 200,000 tokens. That sounds like a lot, and it is, for simple tasks.

But research work isn't simple. We're reading files, analyzing code, discussing approaches, iterating on solutions. Each of those actions consumes context:

Reading a 500-line file: ~2,000 tokens
A detailed explanation from the AI: ~500-1,000 tokens
Back-and-forth discussion: ~200 tokens per exchange

A complex session might involve reading 20 files, several rounds of discussion, and multiple iterations. Suddenly, we've used 50,000 tokens just on context, before any actual work.

The Cost of Everything

The first step in budgeting is understanding what things cost.

High cost:

Reading entire files (especially large ones)
Asking for comprehensive explanations
Requesting multiple alternatives
Keeping full conversation history

Low cost:

Reading specific line ranges
Asking focused questions
Making targeted edits
Clear, concise requests

The pattern: breadth is expensive, depth is cheap. Reading ten files costs more than deeply analyzing one. Asking "explain everything about this module" costs more than "explain why line 47 uses a generator."

When to Spawn Agents

Claude Code offers a powerful tool for context management: spawning agents. An agent runs in a separate context, does its work, and returns a summary.

This is context arbitrage. Instead of loading twenty files into our main conversation, we ask an agent to explore and report back. The agent uses its own context window. We only receive the summary.

Spawn an agent when:

We need to search across many files
The exploration might hit dead ends
We want results without the journey
We're not sure what we're looking for

Keep in main context when:

We need to iterate on the results
The details matter for subsequent decisions
We're building on previous conversation
We want to maintain continuity

The heuristic: if we need the process, keep it local. If we only need the answer, spawn an agent.

The Summarization Trade-off

When context runs low, summarization becomes tempting. Compress the conversation so far, discard the details, continue with the summary.

This works, but at a cost. Summarization loses nuance. The AI might remember that we decided to use approach A, but forget the three reasons why approach B was rejected. Later, when we encounter a situation where approach B seems appealing, we don't have the context to remember why we ruled it out.

When summarization is worth it:

The session has been exploratory and we've found our direction
Early discussion is no longer relevant to current work
We're switching to a different part of the project

When to preserve full context:

We're iterating on a specific solution
The reasoning behind decisions matters
We might need to backtrack

Front-Loading vs. Just-in-Time

There are two philosophies for context loading:

Front-loading: Read everything relevant at the start. The AI has full context from the beginning.

Just-in-time: Read files only when needed. The AI requests context as questions arise.

Front-loading burns context early but enables faster responses. Just-in-time preserves context but requires more back-and-forth.

Our approach: front-load the minimum, just-in-time the rest.

Start with:

CLAUDE.md (always)
The specific file we're working on
Direct dependencies if they're small

Defer until needed:

Test files
Related but not immediately relevant modules
Documentation
Configuration

The AI can request additional context. We don't need to anticipate everything.

The Checkpoint Pattern

For long sessions, we use checkpoints: planned summarization points that preserve key context while freeing up space.

Every hour or so, we ask:

"Summarize our progress so far, the key decisions we've made, and what we're working on next. Keep it concise. This is a checkpoint, not documentation."

We save this summary externally. If context gets tight, we can start a fresh session with just the checkpoint summary and continue where we left off.

The checkpoint isn't as good as full context. But it's much better than starting from zero.

Recognizing Context Pressure

How do we know when context is running low? The signs:

Repetition: The AI suggests something we already discussed.

Forgetting: The AI asks for information we already provided.

Inconsistency: The AI's suggestions contradict earlier decisions.

Vagueness: Responses become less specific, more generic.

When we notice these signs, we have choices: checkpoint and continue, spawn agents for remaining work, or accept that this session has reached its natural end.

The Multi-Session Mindset

The most important shift: accepting that complex work spans multiple sessions.

We used to try to finish everything in one marathon session. That meant racing against context limits, making rushed decisions to avoid losing context, and ending exhausted with messy results.

Now we plan for multiple sessions. Each session has a clear goal. We capture context at the end. We start the next session with a warm-up rather than a cold start.

This is slower per-session but faster overall. We make better decisions when we're not racing the context clock.

Practical Budgeting

Before starting a complex task, we estimate:

Files to read: How many? How large?
Exploration needed: Are we searching or executing?
Iteration expected: Will we go back and forth?
Documentation required: Do we need to generate docs?

If the estimates suggest we'll exceed context, we plan for it. Maybe we spawn an agent for the exploration phase. Maybe we break the work into two sessions. Maybe we front-load less and just-in-time more.

The budget isn't precise. But having any budget beats discovering limits mid-session.

The Abundance Illusion

Context windows keep growing. It's tempting to think we'll soon have unlimited context and none of this will matter.

Maybe. But even large context windows have costs: slower responses, higher API costs, more opportunities for the AI to get confused by irrelevant information.

Good context hygiene goes beyond limits. It's about focus. The AI that has exactly the context it needs will outperform the AI that has everything and must figure out what matters.

Treat context as finite, even when it isn't.

This is part of our series on AI-assisted research workflows. Next: The Verification Tax, building checks into workflow.

Suggested Citation

Cholette, V. (2026, January 14). Context window budgeting: Treating tokens as a finite resource. Too Early To Say. https://tooearlytosay.com/research/methodology/context-window-budgeting/

Copy citation