Context Management (Implementation): Guards, Hygiene, Compaction, Snapshots
Context Management (Implementation): Guards, Hygiene, Compaction, Snapshots
This page is an implementation guide for keeping long sessions stable.
Code entry points (optional)
src/agents/pi-embedded-runner/run.tssrc/agents/pi-embedded-runner/run/attempt.tssrc/agents/pi-embedded-runner/history.tssrc/agents/context-window-guard.tssrc/agents/session-transcript-repair.tssrc/agents/pi-embedded-runner/run/compaction-timeout.tssrc/agents/pi-embedded-subscribe.ts
What you’re implementing (minimum)
- History growth stays bounded (no unbounded window blow-up).
tool_call/tool_resultstays paired (avoid provider 400s).- Context overflow triggers recovery (not a hard crash).
- Compaction timeouts still return a usable snapshot.
The real execution order (build it in this sequence)
1) Guard in run (block obviously-doomed attempts)
Block models/sessions that are below hard minimum before entering an expensive attempt.
See: Context / Compaction.
2) Hygiene in attempt (order matters)
The safe ordering is: sanitize/validate → limit turns → repair tool pairing.
Turn limiting can delete a tool_call while leaving a tool_result, creating orphan messages.
3) Make compaction retries awaitable
Don’t return “completed” while compaction retries are still in flight.
4) Overflow recovery is deterministic: compact → truncate → readable error
Keep the recovery path ordered and bounded:
- limited compaction retries
- truncate oversized tool outputs (lossy fallback)
- readable error (suggest reset or a larger window)
5) Compaction timeouts must pick a snapshot
Prefer a pre-compaction snapshot over returning a partially-compacted transcript.
Failure modes and troubleshooting
- Provider 400s: verify tool pairing repair; see Tools and Session pruning.
- Sessions slow down over time: inspect compaction retries/timeouts in Logging.
Acceptance checks
- Long sessions remain stable with bounded growth.
- Every
tool_callhas a matchingtool_result. - Oversized tool results trigger compaction/truncation recovery.
- Compaction start/end is observable (logs/events).
- Timeout during compaction still returns a consistent snapshot.
- Too-small windows are blocked before attempt.