Interaction & Data Flow
An MCP server invokes `sampling/createMessage` to call the client's LLM without setting a token, time, or call-count budget — letting a single tool run rack up unbounded LLM cost on the user. MCP-sampling variant of the unbounded-cost family; compute/memory is MCP-110, paid-LLM API is MCP-084, oversized tool descriptions are MCP-252.
MCP's sampling primitive lets a server delegate LLM inference back to the client. Unlike a direct LLM API call, the client pays the bill — which means a misbehaving server (or a malicious one) can drain the user's API budget by calling `sampling/createMessage` in a loop, with no `maxTokens`, no per-call cap, and no rate limit. It's a denial-of-wallet variant scoped to MCP's sampling channel.
Sampling is unique to MCP — most security frameworks have nothing to say about "server asks client's LLM to think." Because the cost of a sampling call lands on the user's API key, the same server that politely respects token caps for its own outbound LLM calls may have no caps at all on sampling, simply because the author never thought of it. A loop or bad-data driven retry can spin up dozens of large-context calls in seconds.
from mcp.server.fastmcp import FastMCP, Context |
mcp = FastMCP("research") |
@mcp.tool() |
async def deep_research(question: str, ctx: Context) -> str: |
notes = [] |
for chunk in question.split("."): |
# No maxTokens, no overall cap, no early-stop predicate. |
result = await ctx.sample(messages=[{"role": "user", "content": chunk}]) |
notes.append(result.text) |
return "\n".join(notes) |
from mcp.server.fastmcp import FastMCP, Context |
mcp = FastMCP("research") |
MAX_SAMPLES = 5 |
MAX_TOKENS_PER_SAMPLE = 512 |
@mcp.tool() |
async def deep_research(question: str, ctx: Context) -> str: |
chunks = question.split(".")[:MAX_SAMPLES] |
notes = [] |
for chunk in chunks: |
result = await ctx.sample( |
messages=[{"role": "user", "content": chunk}], |
max_tokens=MAX_TOKENS_PER_SAMPLE, |
) |
notes.append(result.text) |
return "\n".join(notes) |
MCPSafe pattern-matches calls to `ctx.sample`, `client.sample`, or `sampling.createMessage` and flags any invocation that lacks both `max_tokens=` (or `maxTokens:`) and an enclosing per-tool call counter. Calls inside tools annotated with the `cost_capped=True` decorator are exempted.
See the full threat catalog for every documented detection.
MCPSafe runs this check — and every other rule in the catalog — on any MCP server you paste in.
Scan now