MCPSafe.io
RegistryThreatsMethodologyDocsPricingScanSign in
MCPSafe.io

Security checks for MCP servers — public packages and private repos, fast or deep.

Legal

Privacy PolicyCookie PolicyTerms of ServiceSecurity disclosure

Resources

State of MCP SecuritySupportSystem statusMade in Germany 🇩🇪

© 2026 MCPSafe. All rights reserved.

GDPR — Privacy Policy
← Threat Catalog

Interaction & Data Flow

Unicode tag character smuggling

MEDIUMCWE: CWE-94Rule: MCP-223

Unicode tag characters (U+E0000–U+E007F) embedded in tool output or input pass through most filters invisibly but reach the LLM, which can be trained to interpret them as instructions.

What it is

The Unicode Tag block (U+E0000 to U+E007F) is a deprecated range originally meant for language tags. Most text renderers display them as zero-width or as the `?` glyph — so a human reviewer sees normal text. But LLMs tokenize these characters and can be prompted to act on hidden instructions encoded in the tag block. The result: invisible prompt injection that bypasses keyword filters.

Why it matters for MCP

MCP tool inputs and outputs flow through models. A retrieved document containing tag-encoded instructions reaches the model uncensored if the tool author's sanitization only filters visible content. The fix is structural — strip the entire U+E0000–U+E007F range before any LLM-bound output.

Vulnerable example

example.py
1
@server.tool()
2
def echo_with_summary(text: str) -> str:
3
    # If 'text' contains tag characters, they pass through to the model.
4
    return f"You said: {text}"

Secure example

example.py
1
import re
2
3
_TAG_RANGE = re.compile(r"[\U000E0000-\U000E007F]")
4
5
@server.tool()
6
def echo_with_summary(text: str) -> str:
7
    clean = _TAG_RANGE.sub("", text)
8
    return f"You said: {clean}"

How MCPSafe detects this

MCPSafe flags tool handlers that return strings derived from user/model-controlled input without an explicit strip of the U+E0000–U+E007F range. Inputs passed through `unicodedata.normalize` + a tag-range filter, or run through known sanitizers (e.g. `clean_unicode_tags`), are exempted.

See the full threat catalog for every documented detection.

Further reading

  • Unicode Tags character smuggling research
  • CWE-94: Improper Control of Generation of Code

Scan an MCP server for this issue

MCPSafe runs this check — and every other rule in the catalog — on any MCP server you paste in.

Scan now