Prompt Caching

Prompt caching lets a provider reuse unchanged prompt prefixes across turns, reducing cost and latency. In usage summaries, this typically shows up as cacheWrite (first write) and cacheRead (later reuse).

Primary knobs

cacheRetention

Set on a model’s params:

agents:
  defaults:
    models:
      "anthropic/claude-opus-4-6":
        params:
          cacheRetention: "short" # none | short | long

Per-agent override:

agents:
  list:
    - id: "alerts"
      params:
        cacheRetention: "none"

Legacy cacheControlTtl

Legacy TTL values may still be accepted and mapped (example: 5mshort, 1hlong). Prefer cacheRetention for new config.

Session pruning with cache windows

If you keep caches warm, prune old context after TTL windows so idle periods don’t re-cache oversized history:

agents:
  defaults:
    contextPruning:
      mode: "cache-ttl"
      ttl: "1h"

Related: Session pruning

Heartbeat keep-warm

Heartbeat can keep cache windows warm for long-lived agents:

agents:
  defaults:
    heartbeat:
      every: "55m"

Notes

  • Provider support varies. If a provider ignores caching params, you won’t see a benefit.
  • For token accounting and tuning, also see Token use and Costs and tokens.