System architecture
This page gives you a “global map” of OpenClaw so you can locate issues quickly when reading logs, events, and configuration.
Overview (ingress → control plane → execution)
OpenClaw is best understood as a “multi-ingress, single-kernel” runtime:
- Ingress: Channels (WhatsApp/Telegram/Discord…) and automation entry points (Webhooks/Cron…)
- Control plane: Gateway (WebSocket/HTTP APIs, auth, routing, state/event broadcast)
- Execution plane: Agent (run/attempt lifecycle, lane/queue concurrency, streaming)
- Capability layer: Tools (Browser/Exec/Web…) and Providers (models + failover)
- Data layer: Sessions, Media, Config, logging/auditing
If you’ve just finished onboarding, these three entry points matter most:
- Observe & interact:
Control UI - Diagnose:
doctor - Troubleshoot:
troubleshooting
Key components
Gateway (control plane)
Gateway is the long-running core process. Typical responsibilities:
- Manage channel/provider connections and their lifecycle
- Expose WebSocket/HTTP APIs and broadcast events
- Enforce auth boundaries (token / pairing / remote access)
- Coordinate sessions & concurrency (
sessionKey/ lane / queue)
See:
Channels (message ingress)
Channels are where messages enter the system. While platforms differ in connection and message formats, they should converge to the same pipeline: ingress → routing → session.
See:
Routing + sessionKey (how messages get “bucketed”)
The system needs a stable mapping from incoming messages to sessions. The sessionKey typically determines:
- What runs serially (same sessionKey)
- What can run in parallel (different sessionKeys)
- How you avoid “cross-talk” (groups, multi-account, multi-channel)
See:
Agent (execution plane: run/attempt)
A typical execution looks like:
- An incoming message is routed to a
sessionKey - It enters lane/queue (serial per session, parallel across sessions)
- A run starts, potentially with multiple attempts (retry/failover)
- Tool calls and streaming events happen during the run
- The final reply is delivered via the channel
See:
Tools (turn “say” into “do”)
Tools execute side-effectful actions (system commands, browser automation, HTTP calls). The key is controllability, not just capability.
See:
Providers (models + failover)
Providers turn prompts/context into model output, with controlled fallbacks for failure scenarios.
See:
A complete message journey (make it observable)
Breaking the flow into observable steps helps troubleshooting:
- Ingress: a channel receives a message (or Webhook/Cron triggers)
- Routing: build a
sessionKeyfrom account/group/thread context - Concurrency: lane/queue (serial per sessionKey)
- Execution: agent run with tools + streaming events
- Egress: the reply returns via the channel
A practical acceptance check: in Control UI (and logs), you can follow the same runId through those milestones (see logging).
Security boundaries (layer risk controls)
Avoid treating security as “prompt-only”. Separate concerns by layer:
- Ingress boundary: remote access, trusted proxy, token, pairing (who can connect)
- Execution boundary: sandbox / tool policy / approvals (what tool calls can do)
- Auditability: logs + diagnosis + minimal reproducible flows
See:
Next steps
- Want a verified first run: start at Quickstart
- Want internals: go to Deep dive (Track A/B)
- Want production: start from Gateway remote access + security baseline