MCPSafe.io
RegistryThreatsMethodologyDocsPricingScanSign in
MCPSafe.io

Security checks for MCP servers — public packages and private repos, fast or deep.

Legal

Privacy PolicyCookie PolicyTerms of ServiceSecurity disclosure

Resources

State of MCP SecuritySupportSystem statusMade in Germany 🇩🇪

© 2026 MCPSafe. All rights reserved.

GDPR — Privacy Policy
← Threat Catalog

Interaction & Data Flow

MCP sampling without budget

MEDIUMCWE: CWE-770Rule: MCP-211

An MCP server invokes `sampling/createMessage` to call the client's LLM without setting a token, time, or call-count budget — letting a single tool run rack up unbounded LLM cost on the user. MCP-sampling variant of the unbounded-cost family; compute/memory is MCP-110, paid-LLM API is MCP-084, oversized tool descriptions are MCP-252.

What it is

MCP's sampling primitive lets a server delegate LLM inference back to the client. Unlike a direct LLM API call, the client pays the bill — which means a misbehaving server (or a malicious one) can drain the user's API budget by calling `sampling/createMessage` in a loop, with no `maxTokens`, no per-call cap, and no rate limit. It's a denial-of-wallet variant scoped to MCP's sampling channel.

Why it matters for MCP

Sampling is unique to MCP — most security frameworks have nothing to say about "server asks client's LLM to think." Because the cost of a sampling call lands on the user's API key, the same server that politely respects token caps for its own outbound LLM calls may have no caps at all on sampling, simply because the author never thought of it. A loop or bad-data driven retry can spin up dozens of large-context calls in seconds.

Vulnerable example

example.py
1
from mcp.server.fastmcp import FastMCP, Context
2
3
mcp = FastMCP("research")
4
5
@mcp.tool()
6
async def deep_research(question: str, ctx: Context) -> str:
7
    notes = []
8
    for chunk in question.split("."):
9
        # No maxTokens, no overall cap, no early-stop predicate.
10
        result = await ctx.sample(messages=[{"role": "user", "content": chunk}])
11
        notes.append(result.text)
12
    return "\n".join(notes)

Secure example

example.py
1
from mcp.server.fastmcp import FastMCP, Context
2
3
mcp = FastMCP("research")
4
MAX_SAMPLES = 5
5
MAX_TOKENS_PER_SAMPLE = 512
6
7
@mcp.tool()
8
async def deep_research(question: str, ctx: Context) -> str:
9
    chunks = question.split(".")[:MAX_SAMPLES]
10
    notes = []
11
    for chunk in chunks:
12
        result = await ctx.sample(
13
            messages=[{"role": "user", "content": chunk}],
14
            max_tokens=MAX_TOKENS_PER_SAMPLE,
15
        )
16
        notes.append(result.text)
17
    return "\n".join(notes)

How MCPSafe detects this

MCPSafe pattern-matches calls to `ctx.sample`, `client.sample`, or `sampling.createMessage` and flags any invocation that lacks both `max_tokens=` (or `maxTokens:`) and an enclosing per-tool call counter. Calls inside tools annotated with the `cost_capped=True` decorator are exempted.

See the full threat catalog for every documented detection.

Further reading

  • MCP Spec — Sampling primitive
  • CWE-770: Allocation of Resources Without Limits

Scan an MCP server for this issue

MCPSafe runs this check — and every other rule in the catalog — on any MCP server you paste in.

Scan now