MCP trust boundaries belong below the protocol
The Model Context Protocol’s threat model has a hole, and the fix isn’t another wrapper.
In April 2025, Trail of Bits documented line-jumping: a class of attack where an MCP server stuffs prompt-injection payloads into the tools/list response — tool descriptions, server instructions, parameter schemas. The payload lands in the model’s context window before the user has approved a single tool call. Connection isolation, human-in-the-loop confirmation, allow-listing — none of it triggers. The attack already happened.
ToB’s own answer, mcp-context-protector, is a wrapper. It sits between the model and the server, scrubs and pins tool descriptions, surfaces changes for review. It works. It’s exactly the right tool at the protocol layer.
But the protocol layer is the wrong place to draw the trust boundary in the first place.
What the protocol layer assumes
MCP, as a protocol, assumes that the client and the server live in the same trust domain. The client consumes what the server says — tool list, prompts, resources, errors — and treats it as input to the model. The whole protocol is structured around minimising friction between the two. That is the right design choice for a useful protocol. It is the wrong design choice for an adversarial one.
Wrappers can sanitize the traffic on this channel. They cannot change the fact that everything the server emits is, by design, in scope for the model. Strip the tool description of injection payloads and the server can still encode intent in resource URIs, in tool names themselves, in error messages, in the timing and ordering of responses. The protocol has too many side channels for sanitization to be exhaustive.
That doesn’t mean wrappers are wrong. It means they’re necessary-but-not-sufficient. The categorical answer is to stop pretending the client and the server share a trust domain, and to make that mistrust structural rather than protocolic.
What structural mistrust looks like
The model needs capabilities. It needs to read files, run commands, attach devices, network somewhere. The traditional MCP setup gives it those capabilities by handing the server the same authority the user has. Structural mistrust gives the model capabilities too, but routes every exercise of capability through a mediator that doesn’t trust the model and can’t be talked into trusting it.
In Qubes OS, that mediator is dom0. Dom0 is the management VM. It is the only thing that can start, stop, clone, or modify other qubes. Reaching it requires qrexec, a policy-mediated RPC system. The policy file is in dom0; it is not reachable from a guest. Whatever the guest says — however cleverly — the policy decides what gets through.
This is the substrate qubes-mcp builds on. The MCP server runs inside a normal qube. The agent connected to that server can ask it to do anything it likes. But anything it asks that touches another qube — list, spawn, clone, attach a device, run a command, copy a file — translates into a qrexec call into dom0. Dom0 checks the call against a policy that allows only a specific named source qube to invoke a specific named service against targets carrying a specific tag. Everything else gets a refusal that looks identical to “not found.”
+--------------------------+ +-------------------+
| mcp-control (untrusted) | ----> | dom0 |
| - Claude / model | qrexec | - policy gate |
| - FastMCP server | -----> | - invariant check |
| - tool implementations | | - calls qubesd |
+--------------------------+ +-------------------+
|
v
+------------------------+
| ai-managed qubes |
| (the work surface) |
+------------------------+
The arrows matter. The model can reach dom0 only through qrexec, and only with calls dom0 was configured to accept. The model cannot reach qubesd directly. The model cannot enumerate qubes that don’t carry the ai-managed tag. The model cannot infer their existence from error messages — qmcp.GetPropertyAIManaged returns the same uniform error whether the target qube doesn’t exist or simply isn’t tagged.
What’s in scope for the agent is what the operator decided to put in scope by tagging it. Everything else is structurally invisible.
A concrete qrexec policy
Here is a fragment of the policy that gates the spawn service. The dom0-side RPC script implementing the service is the part of the system that enforces invariants — name patterns, label restrictions, prohibited template flags, network attachment rules.
# /etc/qubes/policy.d/30-qmcp.policy
qmcp.SpawnAIManagedQube * mcp-control @adminvm allow target=dom0
qmcp.SpawnAIManagedQube * @anyvm @adminvm deny
qmcp.LifecycleAIManaged * mcp-control @tag:ai-managed allow
qmcp.LifecycleAIManaged * @anyvm @anyvm deny
qmcp.AttachDeviceAIManaged * mcp-control @tag:ai-managed allow
qmcp.AttachDeviceAIManaged * @anyvm @anyvm deny
Two things are doing the work here. First, the source qube must be mcp-control — the one qube where the MCP server is allowed to run. Any other qube attempting the call is denied; the model has no way to spoof its source. Second, the destination qube must carry the ai-managed tag (for stateful ops) or be @adminvm itself (for spawn, which by definition has no destination qube yet). Dom0’s RPC scripts apply the rest of the invariants — that a spawn request can’t name a system qube, can’t pick an arbitrary template, can’t attach to a network that lets it phone home.
The agent has plenty of room to do legitimate work inside this surface. It just can’t escape it.
What this actually forecloses
A line-jumping payload smuggled into a tool description still gets into the model’s context. That part doesn’t change; the model has to read tool descriptions to know what tools it has.
What changes is what the model can do once compromised.
A line-jumping payload that tells the agent to exfiltrate the user’s SSH keys requires the agent to read the user’s SSH keys. There are no SSH keys reachable in a tag-scoped Qubes setup. The agent’s filesystem is the mcp-control qube’s filesystem; the operator’s keys are in a different qube the agent cannot see or name. The payload’s instructions are physically unsatisfiable.
A payload that tells the agent to “clone the production-build VM and pivot from it” requires the production-build VM to be tagged ai-managed. It is not. qmcp.ListAIManagedQubes doesn’t return it. qmcp.GetPropertyAIManaged denies access without confirming the qube exists. The agent has no name to act on.
A payload that says “open a reverse shell to attacker.example.com” requires network egress that the policy allows. Stage C of the project routes all ai-managed qubes through a single egress qube (ai-net-router) with an explicit allow-list. The default-deny applies; the reverse-shell connection terminates at the router.
What does not get foreclosed: anything inside the agent’s tag-scoped surface. If the agent has been given two ai-managed qubes to play with and one of them holds something sensitive, the agent can absolutely be talked into damaging it. The model is no more trustworthy than before. What changed is the radius of what it can damage.
That’s the actual sell. Structural isolation does not make the agent safe to ignore. It bounds the worst case to a region you’ve explicitly handed it.
Implementation status
qubes-mcp (MIT) implements the dom0-mediated trust boundary in six stages, all tested on Qubes R4.3-era systems:
| Stage | Capability | Status |
|---|---|---|
| A | Tag-scoped lifecycle, spawn, wrapped property access | Tested |
| B | Command execution, inter-qube file transfer | Tested |
| C | Single-egress network sandbox via ai-net-router | Tested |
| D | Cloning, DispVM klass support, dom0 lifecycle wrapper | Tested |
| E1 | Device attach/detach between ai-managed qubes | Tested |
| E2 | Ephemeral DispVMs + qubes_run_disposable one-shot | Tested |
| F–H | Feature wrapping, hardening, mobile reach | Designed |
The dom0-side surface is nine RPC services — qmcp.LifecycleAIManaged, qmcp.SpawnAIManagedQube, qmcp.CloneAIManagedQube, qmcp.AttachDeviceAIManaged, qmcp.DetachDeviceAIManaged, qmcp.GetPropertyAIManaged, qmcp.SetPropertyAIManaged, qmcp.ListAIManagedQubes, qmcp.SpawnDisposableAIManaged — each with its own invariant checks. The MCP server inside mcp-control exposes these to the agent as FastMCP tools.
What an auditor of an MCP product should be checking
A working trust-boundary review of any MCP-using product is roughly this checklist. None of it is Qubes-specific.
- Where is the trust boundary drawn? If the answer is “between the user and the server,” that’s a protocol-layer answer and line-jumping crosses it. The right answer names a substrate (process namespace, container, VM, hypervisor) that the model cannot influence.
- What can the model name? Anything the model can name, it can be talked into acting on. List every resource — files, network endpoints, services, credentials — that the model’s tool surface can reference. The set should be the set the operator explicitly chose, not the set the host happens to have.
- What does “denied” look like vs. “not found”? If they differ, the model has an existence oracle. Probe every read-shaped tool: same response shape on absent vs. forbidden? If not, that’s a finding.
- What is the egress surface? Default-deny network with an explicit allow-list, or default-allow with an attempt to block known-bad? The first is auditable; the second is not.
- Where does authentication originate? If the model holds long-lived credentials, line-jumping holds them too. Credentials should be issued per-task by a component the model cannot influence and scoped to the resource the task needs.
- What happens to disposable state? Ephemeral compute (DispVM, fresh container per task) should be the default. State persistence requires an affirmative choice by the operator, not by the model.
- What is loggable, and by whom? Logging in a trust domain the model controls is not logging — it’s the model writing fiction. Logs must live in a substrate the agent cannot reach.
- What happens when the model is wrong? Failure mode: blast radius bounded to one disposable surface, or unbounded. If unbounded, the design is wrong regardless of how the protocol layer is configured.
If a product can answer all eight in writing, it has a real threat model and an audit is mostly about probing the edges. If it can answer fewer than four, the audit is about getting it to answer the rest before anything else.
Open design questions
Three things I’d value pushback on from anyone who’s done this work before:
- Existence-oracle resistance under timing. Uniform “not found” responses cover the static-shape case. They don’t fully cover differential latency — a denied call to a tagged-but-protected qube and a denied call to a nonexistent qube may take measurably different paths through
qubesd. How much should that worry an isolation argument? - Single-egress sandbox idiom. Routing all tagged qubes through one egress qube with an allow-list is the obvious design. Is it the Qubes design, or am I reinventing something the project already has a more idiomatic answer to (templates with
netvmproperties, perhaps)? - Wrapping the long tail. Nine RPC services cover the verbs an agent needs for everything I’ve thrown at it. The Qubes Admin API has more. Should wrapping be exhaustive (mirror every Admin API call with a tag-scoped version) or minimal (expose only what’s needed and document the gap)?
Closing
Wrappers like mcp-context-protector are not in competition with structural isolation. They are at different layers, and a real MCP-using product should have both — wrapper at the protocol layer to defuse the obvious injection content, isolation below it to bound what an injection can do when the wrapper misses something.