Interaction & Data Flow

Unicode tag character smuggling

MEDIUMCWE: CWE-94Rule: MCP-223

Unicode tag characters (U+E0000–U+E007F) embedded in tool output or input pass through most filters invisibly but reach the LLM, which can be trained to interpret them as instructions.

What it is

The Unicode Tag block (U+E0000 to U+E007F) is a deprecated range originally meant for language tags. Most text renderers display them as zero-width or as the `?` glyph — so a human reviewer sees normal text. But LLMs tokenize these characters and can be prompted to act on hidden instructions encoded in the tag block. The result: invisible prompt injection that bypasses keyword filters.

Why it matters for MCP

MCP tool inputs and outputs flow through models. A retrieved document containing tag-encoded instructions reaches the model uncensored if the tool author's sanitization only filters visible content. The fix is structural — strip the entire U+E0000–U+E007F range before any LLM-bound output.

Vulnerable example

example.py

1	@server.tool()
2	def echo_with_summary(text: str) -> str:
3	# If 'text' contains tag characters, they pass through to the model.
4	return f"You said: {text}"

Secure example

example.py

import re

_TAG_RANGE = re.compile(r"[\U000E0000-\U000E007F]")

@server.tool()
def echo_with_summary(text: str) -> str:
    clean = _TAG_RANGE.sub("", text)
    return f"You said: {clean}"

How MCPSafe detects this

MCPSafe flags tool handlers that return strings derived from user/model-controlled input without an explicit strip of the U+E0000–U+E007F range. Inputs passed through `unicodedata.normalize` + a tag-range filter, or run through known sanitizers (e.g. `clean_unicode_tags`), are exempted.

See the full threat catalog for every documented detection.

Scan an MCP server for this issue

MCPSafe runs this check — and every other rule in the catalog — on any MCP server you paste in.

Scan now

What it is

How MCPSafe detects this

See the full threat catalog for every documented detection.

Unicode tag character smuggling

What it is

Why it matters for MCP

Vulnerable example

Secure example

How MCPSafe detects this

Further reading

Scan an MCP server for this issue

Unicode tag character smuggling

What it is

Why it matters for MCP

Vulnerable example

Secure example

How MCPSafe detects this

Further reading

Scan an MCP server for this issue

1	import re
2
3	_TAG_RANGE = re.compile(r"[\U000E0000-\U000E007F]")
4
5	@server.tool()
6	def echo_with_summary(text: str) -> str:
7	clean = _TAG_RANGE.sub("", text)
8	return f"You said: {clean}"