Self-hosted coding agent runtime: build, buy, operate
Self-hosted coding agent runtime: build, buy, operate
A self-hosted coding agent runtime is a platform decision, not a CLI installation. The agent binary is only the worker. Production self-hosting means your team also owns isolated workspaces, task lifecycle, user identity, scoped secrets, network policy, audit logs, queues, snapshots, version-control identity, CI handoff, and cleanup.
The short answer: build only if you already operate Kubernetes, VMs, CI runners, or cloud development environments and can treat coding agents as remote-code-execution infrastructure. Buy or adopt a managed control plane if your main goal is developer productivity. Operate a self-hosted or bring-your-own-cloud runtime when data residency, internal network placement, local logs, and policy control outweigh the extra security and capacity burden.
The self-hosted coding agent runtime is the real product
Codex, Claude Code, OpenHands, Goose, and similar tools can run commands, edit files, execute tests, and produce branches. That does not make them a self-hosted runtime. A runtime has to turn those actions into repeatable, governable work: who started the task, which repository was mounted, which credentials were available, which commands ran, what network destinations were reached, what diff was produced, and when the environment was destroyed.
The brief points to a useful mental model: a self-hosted runtime is CI runner plus cloud dev environment plus agent harness plus policy engine. CI runners already teach the security lesson. GitHub warns that self-hosted runners do not have the clean ephemeral VM guarantees of GitHub-hosted runners and can be persistently compromised by workflow code. GitLab describes self-managed runners as remote code execution infrastructure and calls out non-ephemeral runners shared across projects as especially risky.
Coding agents raise the stakes because they are designed to explore. They install dependencies, read files, start servers, run tests, call tools, use credentials, and sometimes run for a long time. That behavior is the point of the product. It is also why the runtime needs enforcement below the chat interface, at the execution layer.
Split the architecture into two planes
The cleanest evaluation question is this: what runs in the control plane, and what runs in the worker plane?
The worker plane runs the repository and the agent's tool calls. It needs a workspace with shell access, package managers, Git, tests, preview servers, databases, browser sessions, MCP tools, and any project-specific dependencies. Coder Tasks, OpenHands, and many CLI-based workflows put the agent inside that workspace.
The control plane owns the things you cannot leave to an untrusted repository checkout: user authentication, task queue, model credentials, workspace creation, policy, approval state, logs, secrets, and version-control integration. Coder Agents is the clearest example in the brief. It separates the agent loop from the workspace, keeps LLM credentials in the Coder control plane, stores chat state in the Coder database, supports queued follow-up messages, and ties actions to the submitting user. Coder's wording matters here: "Every action the agent takes ... is tied to the user."
That distinction changes the build decision. Running a CLI in a container is a worker-plane prototype. Running a runtime means you can answer operational questions after the task finishes, after the branch is pushed, and after an incident review starts.
Isolation: pick by risk tier, not fashion
Docker and devcontainers are the ergonomic default. They are fast, familiar, and compatible with existing developer workflows. Anthropic's Claude Code devcontainer guidance covers persistent auth, organization policy, network egress restriction, and unattended mode. It also warns against mounting host secrets and says skipped permissions can still exfiltrate anything accessible inside the container.
For higher-risk environments, containers may not be enough. OpenHands distinguishes Docker, process, and remote sandbox providers. Docker is recommended locally. Process mode is unsafe but fast. Remote mode is for hosted or managed setups. Operator pressure around stronger isolation shows up in OpenHands issues about Docker socket exposure and QEMU microVM backends, including the concern that mounting the Docker socket is effectively root access to the host.
The isolation vocabulary buyers should recognize is gVisor, Kata Containers, Firecracker, and full VMs. gVisor and Kata sit between plain containers and heavier VM designs. Firecracker microVMs are common in serverless-style isolation discussions. Stanislas's Netclode implementation used k3s, Kata with Cloud Hypervisor microVMs, JuiceFS on S3, Redis Streams, Tailscale, and SDK runners for Claude Code, Codex, and OpenCode. His key implementation phrase is direct: "sessions are sandboxes running inside microVMs."
The trade-off is practical. Containers reduce startup time and operational complexity. MicroVMs and VMs give stronger tenant boundaries but require more work around images, networking, storage, warm pools, metrics, debugging, and cleanup.
Secrets and identity are first-class runtime features
The hardest part is not invoking a model. It is deciding which credentials the agent can touch.
OpenAI's Codex documentation supports non-interactive automation through codex exec, with sandbox and approval settings, JSONL event output, and machine-readable schemas. The same documentation warns not to expose API keys to repository-controlled code in CI. Codex Action is positioned to reduce API-key exposure by proxying API access. The important design principle is broader than Codex: repository code should not receive long-lived secrets just because an agent needs to complete a task.
Claude Code's security posture points in the same direction. It defaults to read-only behavior and asks before edits, tests, and mutating shell commands. Claude Code web uses Anthropic-managed isolated VMs, scoped credential proxying, branch restrictions, audit logging, and cleanup. If you self-host, those become your requirements.
In practice, safer designs use short-lived tokens, OIDC, GitHub Apps, deploy keys with narrow scope, secret proxying, and per-task injection. Version-control identity also needs a decision. A runtime should know whether commits come from the human user, a bot, a GitHub App, or a task-specific identity, and whether branch restrictions apply.
Audit logs are part of the product surface
For platform leaders, auditability should not be treated as an enterprise add-on. It is the evidence trail that makes self-hosting defensible.
At minimum, the runtime should record actor, repository, branch, task prompt, approvals, commands, file writes, generated diff, artifacts, model or agent used, workspace ID, start and end time, and cleanup status. The brief also flags higher-value fields: file reads, environment access, MCP or tool calls, network destinations, replay pointers, rollback pointers, and prompt or response retention according to policy.
This is where a self-hosted coding agent runtime differs from a developer running a local assistant. The platform team has to make agent work observable across many users and repositories. Queues, logs, metrics, and OpenTelemetry export, such as those exposed by E2B's sandbox platform, matter because agent tasks fail in ordinary infrastructure ways: stuck processes, missing dependencies, rate limits, broken networks, exhausted capacity, and orphaned workspaces.
State lifecycle: ephemeral is safer, but agents like memory
Security prefers throwaway environments. Agents benefit from persistent state. Your runtime design has to choose where to sit between those goals.
Ephemeral workspaces reduce cross-task contamination and make cleanup simpler. That is why hosted CI systems prefer fresh environments. But agent tasks often need warm dependency caches, installed services, long-running preview servers, resumable sessions, and follow-up messages. Coder Agents supports queued follow-up messages and automatic workspace provisioning. Coder Tasks runs each task in its own Coder workspace with Claude Code, Goose, or custom agents installed there.
Snapshots are the middle path. E2B documents sandboxes with templates, Git integration, persistence, metrics, logs, SSH and terminal access, OpenTelemetry export, and bring-your-own-cloud entry points. Its snapshot feature captures filesystem and memory state and can fork new sandboxes from a point-in-time capture. For a platform team, snapshots support resume, debugging, warm starts, and rollback, but they also create retention and secrets-handling questions.
Build, buy, or operate: the decision matrix
Use this decision frame before committing to a runtime strategy.
| Option | Best fit | You own | Main risk | |---|---|---|---| | SaaS runtime | Productivity first, low infrastructure appetite | Policy configuration, vendor review, integration | Data residency, network access, audit boundaries | | BYOC sandbox fleet | Need workloads near private systems, but can accept external control plane | Cloud account, sandbox capacity, network policy | Split responsibility across vendor and platform team | | OSS or internal runtime | Need full control over policy, logs, isolation, and credentials | Control plane, worker plane, upgrades, incidents | Rebuilding CI, dev environments, and security controls | | CLI in containers | Prototype or narrow internal workflow | Container image, runner host, secrets, cleanup | False confidence, weak lifecycle and audit model |
The right choice depends on the constraint that actually blocks adoption. If the blocker is developer workflow, a managed product may solve the problem faster. If the blocker is access to internal networks or regulated data, self-hosting or BYOC becomes more attractive. If the blocker is audit and policy, do not start with the agent CLI. Start with identity, task records, approval flow, and execution enforcement.
Evaluation checklist for platform teams
Before you approve a self-hosted coding agent runtime, ask these questions:
1. Where do model credentials live, and can repository code ever read them? 2. Does sandbox enforcement apply to all spawned commands, not only visible tool calls? 3. Are workspaces ephemeral, persistent, snapshot-based, or selectable by policy? 4. Which isolation modes are supported: devcontainer, Docker, gVisor, Kata, Firecracker, or full VM? 5. How are GitHub or GitLab permissions represented: user OAuth, GitHub App, bot account, deploy key, or short-lived token? 6. Can the runtime deny network egress by default and allow only approved destinations? 7. Are commands, file changes, approvals, artifacts, tool calls, and network activity logged with the human actor? 8. What happens when a task is cancelled, times out, or leaves background processes running? 9. Can CI handoff preserve branch identity, commit identity, artifacts, and audit links? 10. Who patches the base images, agent CLIs, sandbox layer, and control plane?
The practical conclusion is conditional. Self-hosting is worthwhile when control, residency, and internal placement are business requirements. It is expensive when the real need is simply to give developers a better coding assistant. Treat the runtime as shared execution infrastructure from day one, and the build-versus-buy discussion becomes clearer: you are not buying an agent, you are accepting or rejecting responsibility for the system that lets an agent act on your code.