Context Management (Implementation): Guards, Hygiene, Compaction, Snapshots

Context Management (Implementation): Guards, Hygiene, Compaction, Snapshots

This page is an implementation guide for keeping long sessions stable.

Code entry points (optional)

  • src/agents/pi-embedded-runner/run.ts
  • src/agents/pi-embedded-runner/run/attempt.ts
  • src/agents/pi-embedded-runner/history.ts
  • src/agents/context-window-guard.ts
  • src/agents/session-transcript-repair.ts
  • src/agents/pi-embedded-runner/run/compaction-timeout.ts
  • src/agents/pi-embedded-subscribe.ts

What you’re implementing (minimum)

  1. History growth stays bounded (no unbounded window blow-up).
  2. tool_call / tool_result stays paired (avoid provider 400s).
  3. Context overflow triggers recovery (not a hard crash).
  4. Compaction timeouts still return a usable snapshot.

The real execution order (build it in this sequence)

1) Guard in run (block obviously-doomed attempts)

Block models/sessions that are below hard minimum before entering an expensive attempt.

See: Context / Compaction.

2) Hygiene in attempt (order matters)

The safe ordering is: sanitize/validate → limit turns → repair tool pairing.

Turn limiting can delete a tool_call while leaving a tool_result, creating orphan messages.

3) Make compaction retries awaitable

Don’t return “completed” while compaction retries are still in flight.

4) Overflow recovery is deterministic: compact → truncate → readable error

Keep the recovery path ordered and bounded:

  1. limited compaction retries
  2. truncate oversized tool outputs (lossy fallback)
  3. readable error (suggest reset or a larger window)

5) Compaction timeouts must pick a snapshot

Prefer a pre-compaction snapshot over returning a partially-compacted transcript.

Failure modes and troubleshooting

  • Provider 400s: verify tool pairing repair; see Tools and Session pruning.
  • Sessions slow down over time: inspect compaction retries/timeouts in Logging.

Acceptance checks

  1. Long sessions remain stable with bounded growth.
  2. Every tool_call has a matching tool_result.
  3. Oversized tool results trigger compaction/truncation recovery.
  4. Compaction start/end is observable (logs/events).
  5. Timeout during compaction still returns a consistent snapshot.
  6. Too-small windows are blocked before attempt.