MCP trust boundaries belong below the protocol

Alex Schose · 25 May 2026

The Model Context Protocol’s threat model has a hole, and the fix isn’t another wrapper.

In April 2025, Trail of Bits documented line jumping: a vulnerability where a malicious MCP server embeds prompt-injection payloads in the tool descriptions returned by the tools/list method. The payload lands in the model’s context window the moment the client reads it — before the user has approved a single tool call. Connection isolation, human-in-the-loop confirmation, invocation controls — none of it triggers. The attack already happened.

Trail of Bits’ own answer, mcp-context-protector, is a wrapper the client launches around the MCP server. It pins server configurations on first use — tool descriptions, server instructions, and input schemas — and blocks downstream tool calls until the user approves any change. It works. It’s exactly the right tool at the protocol layer.

But the protocol layer is the wrong place to draw the trust boundary in the first place.

What the protocol layer assumes

MCP, as a protocol, assumes that the client and the server live in the same trust domain. The client consumes what the server says — tool list, prompts, resources, errors — and treats it as input to the model. The whole protocol is structured around minimising friction between the two. That is the right design choice for a useful protocol. It is the wrong design choice for an adversarial one.

Wrappers can sanitize the traffic on this channel. They cannot change the fact that everything the server emits is, by design, in scope for the model. Strip the tool description of injection payloads and the server can still encode intent in resource URIs, in tool names themselves, in error messages, in the timing and ordering of responses. The protocol has too many side channels for sanitization to be exhaustive.

That doesn’t mean wrappers are wrong. It means they’re necessary-but-not-sufficient. The categorical answer is to stop pretending the client and the server share a trust domain, and to make that mistrust structural rather than protocolic.

What structural mistrust looks like

The model needs capabilities. It needs to read files, run commands, attach devices, network somewhere. The traditional MCP setup gives it those capabilities by handing the server the same authority the user has. Structural mistrust gives the model capabilities too, but routes every exercise of capability through a mediator that doesn’t trust the model and can’t be talked into trusting it.

In Qubes OS, that mediator is dom0. Dom0 is the management VM. It is the only thing that can start, stop, clone, or modify other qubes. Reaching it requires qrexec, a policy-mediated RPC system. The policy file is in dom0; it is not reachable from a guest. Whatever the guest says — however cleverly — the policy decides what gets through.

This is the substrate qubes-mcp builds on. The MCP server runs inside a normal qube. The agent connected to that server can ask it to do anything it likes. But anything it asks that touches another qube — list, spawn, clone, attach a device, run a command, copy a file — translates into a qrexec call into dom0. Dom0 checks the call against a policy that allows only a specific named source qube to invoke a specific named service against targets carrying a specific tag. Everything else gets a refusal that looks identical to “not found.”

+--------------------------+         +-------------------+
|  mcp-control (untrusted) | ---->   |       dom0        |
|  - Claude / model        | qrexec  | - policy gate     |
|  - FastMCP server        | ----->  | - invariant check |
|  - tool implementations  |         | - calls qubesd    |
+--------------------------+         +-------------------+
                                              |
                                              v
                                  +------------------------+
                                  |  ai-managed qubes      |
                                  |  (the work surface)    |
                                  +------------------------+

The arrows matter. The model can reach dom0 only through qrexec, and only with calls dom0 was configured to accept. The model cannot reach qubesd directly. The model cannot enumerate qubes that don’t carry the ai-managed tag. The model cannot infer their existence from error messages — qmcp.GetPropertyAIManaged returns the same uniform error whether the target qube doesn’t exist or simply isn’t tagged.

What’s in scope for the agent is what the operator decided to put in scope by tagging it. Everything else is structurally invisible.

A concrete qrexec policy

Here is the load-bearing structure of the policy file installed in dom0. Each qmcp.* wrapper enforces invariants inside its dom0-side script before touching qubesd: forced tagging on creation, cross-reference validation on template, netvm, and default_dispvm, and opaque error responses for any read that would otherwise leak existence.

# /etc/qubes/policy.d/30-mcp-control.policy

# State-changing calls route to dom0 (@adminvm); the wrapper checks the
# ai-managed tag in dom0 with qubesadmin authority.
qmcp.SpawnAIManagedQube      *  mcp-control  @adminvm         allow
qmcp.LifecycleAIManaged      *  mcp-control  @adminvm         allow
qmcp.AttachDeviceAIManaged   *  mcp-control  @adminvm         allow
qmcp.GetPropertyAIManaged    *  mcp-control  @adminvm         allow

# Calls that execute inside the target ai-managed qube directly use the
# @tag: selector to gate the destination at the policy layer.
qmcp.RunInAIManaged          *  mcp-control      @tag:ai-managed  allow  user=root
qmcp.CopyToAIManaged         *  mcp-control      @tag:ai-managed  allow  user=root
qubes.Filecopy               *  @tag:ai-managed  @tag:ai-managed  allow

# Catch-all backstop — every unlisted call from mcp-control is denied.
*  *  mcp-control  @anyvm     deny

Two patterns operate here. State-changing calls — spawn, lifecycle, device attach, property read/write — route to @adminvm, which is dom0 itself. The dom0-side wrapper does the ai-managed tag check using qubesadmin authority and applies the invariants. This consolidation in dom0 is by necessity: qrexec’s @tag: selector does not match klass=DispVM targets on Qubes R4.3, so the wrapper provides uniform behaviour across all qube classes.

Calls that execute inside the target qube — command execution, file copy, inter-qube qubes.Filecopy — match @tag:ai-managed at the policy layer directly; the target qube itself runs the service. Untagged targets do not match. Anything not explicitly listed falls through to the catch-all * * mcp-control @anyvm deny at the bottom of the file, which is load-bearing — the entire model relies on default-deny being unreachable from any path.

The agent has plenty of room to do legitimate work inside this surface. It just can’t escape it.

What this actually forecloses

A line-jumping payload smuggled into a tool description still gets into the model’s context. That part doesn’t change; the model has to read tool descriptions to know what tools it has.

What changes is what the model can do once compromised.

A line-jumping payload that tells the agent to exfiltrate the user’s SSH keys requires the agent to read the user’s SSH keys. There are no SSH keys reachable in a tag-scoped Qubes setup. The agent’s filesystem is the mcp-control qube’s filesystem; the operator’s keys are in a different qube the agent cannot see or name. The payload’s instructions are physically unsatisfiable.

A payload that tells the agent to “clone the production-build VM and pivot from it” requires the production-build VM to be tagged ai-managed. It is not. qmcp.ListAIManagedQubes doesn’t return it. qmcp.GetPropertyAIManaged denies access without confirming the qube exists. The agent has no name to act on.

A payload that says “open a reverse shell to attacker.example.com” requires network egress. Stage C of the project designates a single ai-managed qube (ai-net-router) as the only egress all other ai-managed qubes route through. The operator picks ai-net-router’s upstream from dom0 — sys-firewall for clearnet, sys-whonix to force everything through Tor, a VPN qube, or "" for fully offline — and the AI cannot change that choice. If the operator picked offline or a Tor-only upstream, the reverse-shell connection has nowhere to land.

What does not get foreclosed: anything inside the agent’s tag-scoped surface. If the agent has been given two ai-managed qubes to play with and one of them holds something sensitive, the agent can absolutely be talked into damaging it. The model is no more trustworthy than before. What changed is the radius of what it can damage.

That’s the actual sell. Structural isolation does not make the agent safe to ignore. It bounds the worst case to a region you’ve explicitly handed it.

Implementation status

qubes-mcp (MIT) implements the dom0-mediated trust boundary in six stages, all tested on Qubes R4.3-era systems:

Stage	Capability	Status
A	Tag-scoped lifecycle, spawn, wrapped property access	Tested
B	Command execution, inter-qube file transfer	Tested
C	Single-egress network sandbox via `ai-net-router`	Tested
D	Cloning, DispVM klass support, dom0 lifecycle wrapper	Tested
E1	Device attach/detach between ai-managed qubes	Tested
E2	Ephemeral DispVMs + `qubes_run_disposable` one-shot	Tested
F1	Wrapped `feature.Set` with opaque cross-ref; `internal` denied	Tested
F2	Bounded-window admin event stream filtered by ai-managed tag	Tested
G, H	mcp-control hardening, mobile reach via Tor	Designed

The dom0-side surface is eleven RPC services — qmcp.LifecycleAIManaged, qmcp.SpawnAIManagedQube, qmcp.CloneAIManagedQube, qmcp.AttachDeviceAIManaged, qmcp.DetachDeviceAIManaged, qmcp.GetPropertyAIManaged, qmcp.SetPropertyAIManaged, qmcp.SetFeatureAIManaged, qmcp.ListAIManagedQubes, qmcp.SpawnDisposableAIManaged, qmcp.AIManagedEvents — each with its own invariant checks. The MCP server inside mcp-control exposes these to the agent as FastMCP tools.

What an auditor of an MCP product should be checking

A working trust-boundary review of any MCP-using product is roughly this checklist. None of it is Qubes-specific.

Where is the trust boundary drawn? If the answer is “between the user and the server,” that’s a protocol-layer answer and line-jumping crosses it. The right answer names a substrate (process namespace, container, VM, hypervisor) that the model cannot influence.
What can the model name? Anything the model can name, it can be talked into acting on. List every resource — files, network endpoints, services, credentials — that the model’s tool surface can reference. The set should be the set the operator explicitly chose, not the set the host happens to have.
What does “denied” look like vs. “not found”? If they differ, the model has an existence oracle. Probe every read-shaped tool: same response shape on absent vs. forbidden? If not, that’s a finding.
What is the egress surface? Default-deny network with an explicit allow-list, or default-allow with an attempt to block known-bad? The first is auditable; the second is not.
Where does authentication originate? If the model holds long-lived credentials, line-jumping holds them too. Credentials should be issued per-task by a component the model cannot influence and scoped to the resource the task needs.
What happens to disposable state? Ephemeral compute (DispVM, fresh container per task) should be the default. State persistence requires an affirmative choice by the operator, not by the model.
What is loggable, and by whom? Logging in a trust domain the model controls is not logging — it’s the model writing fiction. Logs must live in a substrate the agent cannot reach.
What happens when the model is wrong? Failure mode: blast radius bounded to one disposable surface, or unbounded. If unbounded, the design is wrong regardless of how the protocol layer is configured.
Is every resource scoped to the caller’s tenant? A single-operator threat model assumes one tenant; a multi-tenant product carries a second boundary, and it’s the one that leaks. Every lookup should be filtered by the authenticated tenant at the data layer — not by an interface that merely hides the other tenants’ rows.
What leaves through telemetry? The exfiltration path nobody audits is the observability pipeline. Default-on error-reporting and analytics integrations have shipped full prompts and complete model outputs to third-party endpoints. Telemetry should be off by default or scrubbed end-to-end, and you should be able to name exactly which fields leave and who receives them.
Is authority graduated, or all-or-nothing? A single trust boundary means one injection inside it holds every right the product grants. Authority should graduate per action and per resource — read below execute below write below lifecycle — and the model must not be able to widen its own tier. The strongest version pins identity and limits to server-side values re-checked on every call, so no prompt can argue, inject, or template its way past them: authority the conversation cannot move.
Is the data the agent parses safe against hostile input? The trust-boundary questions assume the model is the threat; the parser is a surface the model doesn’t even have to touch. An agent ingests model files and query results — both attacker-influenced. Every parser should validate file-declared sizes and counts before allocating, use parameterised queries for anything reaching a database, and treat model files and tool results as hostile input.

If a product can answer all twelve in writing, it has a real threat model and an audit is mostly about probing the edges. If it can answer fewer than six, the audit is about getting it to answer the rest before anything else.

The same twelve questions as a printable one-page reference: download the PDF.

Open design questions

Three things I’d value pushback on from anyone who’s done this work before:

Existence-oracle resistance under timing. Uniform “not found” responses cover the static-shape case. They don’t fully cover differential latency — a denied call to a tagged-but-protected qube and a denied call to a nonexistent qube may take measurably different paths through qubesd. How much should that worry an isolation argument?
Single-egress sandbox idiom. Routing all ai-managed qubes through one egress qube whose upstream is operator-locked in dom0 is the obvious design. Is it the Qubes-idiomatic answer, or is there an established cascade pattern (netvm chains, layered firewall qubes) I should be using instead?
Wrapping the long tail. Nine RPC services cover the verbs an agent needs for everything I’ve thrown at it. The Qubes Admin API has more. Should wrapping be exhaustive (mirror every Admin API call with a tag-scoped version) or minimal (expose only what’s needed and document the gap)?

Closing

Wrappers like mcp-context-protector are not in competition with structural isolation. They are at different layers, and a real MCP-using product should have both — wrapper at the protocol layer to defuse the obvious injection content, isolation below it to bound what an injection can do when the wrapper misses something.

This is one of two design studies for the practice. The companion piece — Authority an agent cannot rewrite — argues for the layer above the agent: the trust boundary the agent cannot rewrite, sitting where this post sits below. Both layers are needed. A real audit checks both.