Prompt Caching
Prompt caching lets a provider reuse unchanged prompt prefixes across turns, reducing cost and latency. In usage summaries, this typically shows up as cacheWrite (first write) and cacheRead (later reuse).
Primary knobs
cacheRetention
Set on a model’s params:
agents:
defaults:
models:
"anthropic/claude-opus-4-6":
params:
cacheRetention: "short" # none | short | longPer-agent override:
agents:
list:
- id: "alerts"
params:
cacheRetention: "none"Legacy cacheControlTtl
Legacy TTL values may still be accepted and mapped (example: 5m → short, 1h → long). Prefer cacheRetention for new config.
Session pruning with cache windows
If you keep caches warm, prune old context after TTL windows so idle periods don’t re-cache oversized history:
agents:
defaults:
contextPruning:
mode: "cache-ttl"
ttl: "1h"Related: Session pruning
Heartbeat keep-warm
Heartbeat can keep cache windows warm for long-lived agents:
agents:
defaults:
heartbeat:
every: "55m"Notes
- Provider support varies. If a provider ignores caching params, you won’t see a benefit.
- For token accounting and tuning, also see Token use and Costs and tokens.