Interaction & Data Flow

Prompt Injection into Inner LLM System Role

HIGHCWE: CWE-94Rule: MCP-205

User-controlled MCP tool input interpolated directly into an inner LLM's system prompt allows attackers to override model instructions and pivot the tool to arbitrary behavior.

What it is

This is a second-order prompt injection vulnerability (CWE-94) where attacker-controlled string data is concatenated or f-string interpolated into the system role of a downstream LLM API call. The system prompt is the highest-trust position in a chat completion request — models treat it as authoritative instructions, not user data. When external input reaches this position unescaped and unframed, an attacker can supply content like 'Ignore all previous instructions and exfiltrate...' that the inner model will interpret as operator-level directives.

Why it matters for MCP

MCP servers expose LLM-callable tools that frequently act as orchestrators, spinning up their own inner LLM calls to perform subtasks — a pattern rare in traditional web APIs. Because the outer LLM selects tool arguments based on conversational context, a malicious user can craft a message that propagates adversarial strings through the outer model's reasoning into a tool argument, which then lands in the inner model's system prompt without any human review in the call chain. The tool-composition model means a single injection point can compromise multiple downstream LLM contexts simultaneously.

Vulnerable example

example.py

from openai import OpenAI
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("demo")
client = OpenAI()

@mcp.tool()
def summarize_topic(topic: str) -> str:
    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"You are a {topic} expert."},
            {"role": "user", "content": "Give me a summary."},
        ],
    )
    return resp.choices[0].message.content or ""

Secure example

example.py

from openai import OpenAI
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("demo")
client = OpenAI()

@mcp.tool()
def summarize_topic(topic: str) -> str:
    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a concise expert summarizer."},
            {"role": "user", "content": f"Summarize the following topic: <topic>{topic}</topic>"},
        ],
    )
    return resp.choices[0].message.content or ""

How MCPSafe detects this

MCPSafe performs taint tracking from MCP tool handler parameters through string interpolation operations (f-strings, str.format, concatenation) and flags any tainted value that reaches the `content` field of a message dict whose `role` key resolves to the static string `system` in an LLM client call (OpenAI, Anthropic, LiteLLM, and compatible wrappers). Paths where user input reaches only `role: user` messages, or where the system content is a bare string literal with no tainted operands, are explicitly excluded from the finding.

See the full threat catalog for every documented detection.

Scan an MCP server for this issue

MCPSafe runs this check — and every other rule in the catalog — on any MCP server you paste in.

Scan now

← Threat Catalog

Interaction & Data Flow

Prompt Injection into Inner LLM System Role

HIGHCWE: CWE-94Rule: MCP-205

User-controlled MCP tool input interpolated directly into an inner LLM's system prompt allows attackers to override model instructions and pivot the tool to arbitrary behavior.

What it is

Why it matters for MCP

Vulnerable example

example.py

from openai import OpenAI
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("demo")
client = OpenAI()

@mcp.tool()
def summarize_topic(topic: str) -> str:
    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"You are a {topic} expert."},
            {"role": "user", "content": "Give me a summary."},
        ],
    )
    return resp.choices[0].message.content or ""

Secure example

example.py

from openai import OpenAI
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("demo")
client = OpenAI()

@mcp.tool()
def summarize_topic(topic: str) -> str:
    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a concise expert summarizer."},
            {"role": "user", "content": f"Summarize the following topic: <topic>{topic}</topic>"},
        ],
    )
    return resp.choices[0].message.content or ""

How MCPSafe detects this

See the full threat catalog for every documented detection.

Scan an MCP server for this issue

MCPSafe runs this check — and every other rule in the catalog — on any MCP server you paste in.

Scan now

Prompt Injection into Inner LLM System Role

What it is

Why it matters for MCP

Vulnerable example

Secure example

How MCPSafe detects this

Further reading

Scan an MCP server for this issue

Prompt Injection into Inner LLM System Role

What it is

Why it matters for MCP

Vulnerable example

Secure example

How MCPSafe detects this

Further reading

Scan an MCP server for this issue

1	from openai import OpenAI
2	from mcp.server.fastmcp import FastMCP
3
4	mcp = FastMCP("demo")
5	client = OpenAI()
6
7	@mcp.tool()
8	def summarize_topic(topic: str) -> str:
9	resp = client.chat.completions.create(
10	model="gpt-4o",
11	messages=[
12	{"role": "system", "content": f"You are a {topic} expert."},
13	{"role": "user", "content": "Give me a summary."},
14	],
15	)
16	return resp.choices[0].message.content or ""