Interaction & Data Flow
User-controlled MCP tool input interpolated directly into an inner LLM's system prompt allows attackers to override model instructions and pivot the tool to arbitrary behavior.
This is a second-order prompt injection vulnerability (CWE-94) where attacker-controlled string data is concatenated or f-string interpolated into the system role of a downstream LLM API call. The system prompt is the highest-trust position in a chat completion request — models treat it as authoritative instructions, not user data. When external input reaches this position unescaped and unframed, an attacker can supply content like 'Ignore all previous instructions and exfiltrate...' that the inner model will interpret as operator-level directives.
MCP servers expose LLM-callable tools that frequently act as orchestrators, spinning up their own inner LLM calls to perform subtasks — a pattern rare in traditional web APIs. Because the outer LLM selects tool arguments based on conversational context, a malicious user can craft a message that propagates adversarial strings through the outer model's reasoning into a tool argument, which then lands in the inner model's system prompt without any human review in the call chain. The tool-composition model means a single injection point can compromise multiple downstream LLM contexts simultaneously.
from openai import OpenAI |
from mcp.server.fastmcp import FastMCP |
mcp = FastMCP("demo") |
client = OpenAI() |
@mcp.tool() |
def summarize_topic(topic: str) -> str: |
resp = client.chat.completions.create( |
model="gpt-4o", |
messages=[ |
{"role": "system", "content": f"You are a {topic} expert."}, |
{"role": "user", "content": "Give me a summary."}, |
], |
) |
return resp.choices[0].message.content or "" |
from openai import OpenAI |
from mcp.server.fastmcp import FastMCP |
mcp = FastMCP("demo") |
client = OpenAI() |
@mcp.tool() |
def summarize_topic(topic: str) -> str: |
resp = client.chat.completions.create( |
model="gpt-4o", |
messages=[ |
{"role": "system", "content": "You are a concise expert summarizer."}, |
{"role": "user", "content": f"Summarize the following topic: <topic>{topic}</topic>"}, |
], |
) |
return resp.choices[0].message.content or "" |
MCPSafe performs taint tracking from MCP tool handler parameters through string interpolation operations (f-strings, str.format, concatenation) and flags any tainted value that reaches the `content` field of a message dict whose `role` key resolves to the static string `system` in an LLM client call (OpenAI, Anthropic, LiteLLM, and compatible wrappers). Paths where user input reaches only `role: user` messages, or where the system content is a bare string literal with no tainted operands, are explicitly excluded from the finding.
See the full threat catalog for every documented detection.
MCPSafe runs this check — and every other rule in the catalog — on any MCP server you paste in.
Scan now