MCP Security for AI Agents: Production Controls
MCP Security for AI Agents: Production Controls
MCP security for AI agents should start from a simple operating rule: treat the MCP layer as privileged integration code, not as a passive connector. A production agent can read context, discover tools, call tools, launch local processes, and move data between systems. That makes every MCP server, tool description, schema, tool output, OAuth token, session ID, and spawned command untrusted until your platform constrains it with identity, least privilege, sandboxing, egress policy, approval UX, and audit logs.
The Model Context Protocol specification is direct about the stakes. It warns that MCP enables "arbitrary data access and code execution paths." For a security architect, the implication is practical: installing an MCP server can be closer to installing privileged code than adding an API connector. The control plane needs to reflect that difference.
What MCP Security For AI Agents Changes
MCP standardizes how LLM applications connect to external data and tools. In the current architecture, hosts run clients, clients connect to servers, and servers expose resources, prompts, and tools. Clients may also expose capabilities such as sampling, roots, and elicitation. That split creates useful modularity, but it also creates more places where trust can be misplaced.
The security boundary is not only the remote endpoint. It includes the host application, MCP client, MCP server, downstream API, model, user, local process, tool metadata, tool output, and any system that receives data from the tool call. A secure deployment needs a separate trust decision for each of those parts.
The highest-risk production patterns are already visible in the brief: broad OAuth scopes, token passthrough to downstream APIs, unreviewed local server installs, remote servers with unclear provenance, malicious tool metadata, tool definition changes after approval, and sampling or elicitation flows that let servers influence model behavior.
The Production Threat Model
For platform teams, the useful threat model is not "can the MCP server authenticate?" The better question is: what can an agent do after this server is installed, and what data can leave the environment if the model follows malicious instructions?
Tool descriptions, parameter names, schemas, annotations, and return values all enter the model's working context. OWASP treats those fields as injection surfaces and recommends that teams avoid trusting tool descriptions blindly. Microsoft describes MCP tool poisoning as malicious instructions embedded in tool metadata, invisible to users but interpreted by the model. In practice, that means tool metadata is executable influence over agent behavior, even when it is not executable code.
Invariant Labs demonstrated three related problems: tool poisoning, rug pulls, and cross-server tool shadowing. The operational lesson is that a malicious server does not need direct access to a protected API if it can influence how the agent calls a trusted tool. A benign-looking tool can instruct the model to call another tool, alter parameters, or exfiltrate data through an allowed channel.
Simon Willison's framing is useful for approvals and policy design: the dangerous combination is private data, untrusted instructions, and exfiltration tools. MCP often brings those three elements into the same workflow. Treat human-in-the-loop approval as a required control for sensitive actions, not as optional polish.
Authorization: Reject Token Passthrough
Current MCP authorization treats protected MCP servers as OAuth 2.1 resource servers. Clients use bearer tokens, and protected MCP servers must validate that tokens were issued specifically for that MCP server as the intended audience. This is the boundary that prevents one service's token from becoming a universal credential inside the agent runtime.
The official MCP security guidance is blunt: "Token passthrough is an anti-pattern." Servers must not accept tokens that were not issued for that server. If an MCP server needs to call a downstream API, the safer pattern is token exchange or another design that keeps audience boundaries intact.
This is where many wrapper designs fail. A team exposes an existing SaaS API through MCP, accepts a user's OAuth token, and forwards it downstream without validating audience or narrowing scope. That may work in a prototype, but in production it collapses the distinction between the MCP server, the downstream API, and the user session.
Design the authorization model around four checks. First, the token audience must match the MCP server. Second, scopes should be narrow enough to map to specific tools or tool groups. Third, user identity and tenant context must be bound to the session. Fourth, session IDs and bearer tokens need lifecycle controls, including expiry, rotation, revocation, and auditability.
The unresolved implementation questions in MCP GitHub discussions show that multi-user remote MCP authorization is still an active design area. That is not a reason to accept passthrough. It means production teams need a gateway, identity broker, or MCP server design that preserves OAuth resource-server semantics before allowing broad rollout.
Tool Metadata Is Part of the Attack Surface
Tool definitions should be reviewed and pinned like code. A tool description can change after a user or administrator approves the integration. A schema can be modified to collect more sensitive parameters. A return value can carry instructions that influence the next model step. These are supply chain events, not cosmetic metadata changes.
Schema pinning gives security teams a deterministic reference point. Store approved tool names, descriptions, schemas, annotations, server identity, package source, version, and hash where possible. On change, require review before the tool is available to agents handling sensitive data. For high-risk tools, the approval should display the old and new behavior in concrete terms.
Cross-server shadowing raises the bar further. A malicious server may define a tool that looks harmless while instructing the model to misuse a trusted server's tool. Gateways and clients should evaluate the full tool set available to an agent, not each server in isolation.
Local Servers and Stdio Need Sandboxing
Local MCP servers are convenient because stdio makes tool development simple. The trade-off is direct: if configuration input can become a shell command, a connector becomes a command execution path.
OX Security reported a systemic stdio configuration-to-command-execution pattern and tied it to broad ecosystem exposure, including claims of 150M+ downloads, 7,000+ publicly accessible servers, up to 200,000 vulnerable instances, 30+ responsible disclosures, and 10+ high or critical CVEs. Those numbers should be verified before use in risk scoring, but the class of issue is clear from the brief.
LiteLLM's CVE-2026-30623 response shows the mitigation pattern. The fix allowed stdio commands by list, validated at request parsing and runtime, and restricted MCP test endpoints to admin roles. The stable fix was v1.83.7-stable, with mitigation beginning in v1.83.6-nightly.
In practice, local MCP servers should run with the same discipline as untrusted plugins. Use sandboxing, least-privilege filesystem access, restricted environment variables, no inherited broad credentials, and deny-by-default egress. Bind local HTTP servers to 127.0.0.1, validate Origin, and require authentication for streamable HTTP servers to reduce DNS rebinding and local service abuse.
Approval UX Must Show the Real Action
Approval prompts are only useful if the reviewer can understand the action. "Allow tool call" is not enough. The user or operator needs to see the server, tool, complete parameters, target system, data being sent, expected data returned, and any schema or tool definition change since the last approval.
This matters because prompt injection often works by hiding intent in context the user does not inspect. If tool metadata can influence the model, and the model can call a tool with sensitive parameters, the approval screen becomes a security control. It should expose actual data movement, not a friendly summary generated from the same untrusted context.
Approval should also be risk-based. A read-only lookup against low-sensitivity data may need lightweight confirmation or policy-only logging. A write operation, external send, filesystem read, credential access, or cross-tenant action should require stronger approval, step-up authentication, or administrator policy.
Sampling and Elicitation Require Separate Policy
MCP features such as sampling and elicitation can be useful, but they add a different risk: servers can influence prompts and user interaction. Unit 42 found that MCP sampling can enable resource theft, conversation hijacking, and covert tool invocation when malicious servers craft sampling prompts.
For production systems, the default should be conservative. Disable sampling unless the use case requires it, then allow it per server and per tool class. Log sampling requests, show when a server has asked the model to generate content, and block sampling from servers that do not need that capability.
The same principle applies to elicitation. A server that can ask the client to collect information from a user can become a data collection channel. Treat it as a privileged capability with its own policy, not as a harmless extension of chat.
Egress, SSRF, and Data Loss Controls
MCP can connect private data to external actions. That is useful when the action is intended, and dangerous when the model follows injected instructions. Egress controls are the practical backstop.
Apply network policy per server and per tool. A repository search tool does not need arbitrary internet egress. A ticketing tool does not need access to cloud metadata endpoints. A local file tool does not need to post results to an external webhook. These controls should exist outside the model so they still work when the model is misled.
SSRF controls belong in the same layer. Official MCP security guidance calls out SSRF, OAuth URL validation, and local server compromise. Validate callback URLs, authorization server metadata, redirect targets, and any user-supplied URL that a tool can fetch. Deny access to link-local, loopback, cloud metadata, and internal network ranges unless there is an explicit approved use case.
Logs Should Support Incident Response
MCP logs need to answer a specific incident question: who caused which agent to call which tool, with which parameters, under which identity, against which server version, and where did the data go?
OWASP recommends SIEM logging, no shared credentials, full-parameter approval, sandboxing, schema pinning, and per-server least privilege. For MCP, SIEM-grade logs should include user identity, tenant, host, client, server, tool name, schema version, server version, OAuth audience, scopes, approval result, redacted parameters, response classification, egress destination, and policy decision.
Redaction matters. Logs should capture enough to reconstruct behavior without storing secrets or sensitive payloads by default. For high-risk environments, store sensitive request and response payloads in a controlled evidence store with retention limits and access logging, rather than in general application logs.
Production Control Checklist
Use this checklist before allowing MCP tools into production agent workflows:
| Control area | Production requirement | | --- | --- | | Identity | Bind user, tenant, client, server, and session context to each tool call. | | OAuth | Validate token audience and reject token passthrough. | | Scope | Grant per-server and per-tool scopes with no shared credentials. | | Tool metadata | Pin approved tool descriptions, schemas, annotations, versions, and hashes where available. | | Local execution | Sandbox stdio servers and allow only approved commands. | | Network | Deny broad egress and block SSRF paths by default. | | Approval | Show server, tool, full parameters, target, data movement, and schema changes. | | Sampling | Disable by default or allow by explicit per-server policy. | | Logging | Send structured, redacted, SIEM-ready events for tool calls and policy decisions. | | Change control | Require review for new servers, changed tool definitions, and new outbound destinations. |
What to Decide Before Rollout
The NSA warning is a useful summary of the current maturity gap: "MCP's rapid proliferation has outpaced the development of its security model." That does not mean MCP is unusable. It means production teams should avoid treating the protocol alone as the security boundary.
Before rollout, decide where policy enforcement lives. It may sit in the host application, an MCP gateway, the enterprise identity layer, the client, or the server. The exact placement is less important than having deterministic enforcement that does not depend on the model recognizing an attack.
Also decide which capabilities are allowed by default. For many enterprise agents, the safer baseline is read-only tools first, no token passthrough, no unreviewed local servers, no sampling unless approved, no arbitrary egress, and no tool definition changes without review. Write actions, external sends, filesystem access, and privileged admin APIs should be separate rollout phases.
The Bottom Line
MCP turns integration into an agent supply chain surface. The production answer is not a single setting. It is a control stack: audience-bound OAuth, per-tool least privilege, pinned schemas, sandboxed local servers, constrained egress, full-parameter approval, and logs built for investigation.
If your platform can answer who approved a tool, what changed, which token audience was accepted, what parameters were sent, where data moved, and which policy allowed it, you have the base needed for MCP in production. Without those answers, the agent is operating across trust boundaries the security team cannot see.