Prompt Caching

Prompt caching lets a provider reuse unchanged prompt prefixes across turns, reducing cost and latency. In usage summaries, this typically shows up as cacheWrite (first write) and cacheRead (later reuse).

Primary knobs

`cacheRetention`

Set on a model’s params:

agents:
  defaults:
    models:
      "anthropic/claude-opus-4-6":
        params:
          cacheRetention: "short" # none | short | long

Per-agent override:

agents:
  list:
    - id: "alerts"
      params:
        cacheRetention: "none"

Legacy `cacheControlTtl`

Legacy TTL values may still be accepted and mapped (example: 5m → short, 1h → long). Prefer cacheRetention for new config.

Session pruning with cache windows

If you keep caches warm, prune old context after TTL windows so idle periods don’t re-cache oversized history:

agents:
  defaults:
    contextPruning:
      mode: "cache-ttl"
      ttl: "1h"

Related: Session pruning

Heartbeat keep-warm

Heartbeat can keep cache windows warm for long-lived agents:

agents:
  defaults:
    heartbeat:
      every: "55m"

Notes

Provider support varies. If a provider ignores caching params, you won’t see a benefit.
For token accounting and tuning, also see Token use and Costs and tokens.

Onboarding Wizard Reference Release Checklist