Methodology
How we decide if an MCP server is safe.
No black box. Every grade on this site comes from a pipeline you can audit. Here’s the whole thing — the rules, the model votes, the scoring math, and what we explicitly don’t do.
The scan pipeline
When you submit a target, we fetch the source — an npm package, a PyPI package, or a GitHub repository — run it through our detection engine, and aggregate everything into a single letter grade with a signal-score breakdown. All infrastructure is hosted in the EU (Frankfurt).
A Fast scan runs static analysis, manifest checks, and CVE lookups. It targets a p95 of under 3 minutes (hard cap 20). A Deep scan adds taint-flow analysis and an independent model consensus panel. It targets a p95 of under 20 minutes (hard cap 30) and is available for signed-in users.
Supported targets
Paste any of the following into the scan box. The parser resolves bare names, URLs, version constraints, and official registry IDs.
npm
| Input | Resolves to |
|---|---|
| express | latest version on npm |
| @modelcontextprotocol/sdk | scoped package, latest |
| npm:fastify | explicit prefix, latest |
| npm:@modelcontextprotocol/server-filesystem | scoped with prefix, latest |
| npm:lodash@4.17.21 | pinned version |
| @modelcontextprotocol/sdk@1.0.0 | scoped, pinned version |
| https://www.npmjs.com/package/express | npm URL, latest |
| https://www.npmjs.com/package/@modelcontextprotocol/server-github/v/0.6.2 | npm URL, pinned version |
PyPI
Bare names (e.g. requests) default to npm. Use pypi: prefix or a version constraint to target PyPI.
| Input | Resolves to |
|---|---|
| pypi:requests | latest on PyPI |
| pypi:mcp | Anthropic MCP Python SDK, latest |
| requests==2.31.0 | pinned — == operator detected as PyPI |
| mcp>=1.0.0 | range constraint, scans latest matching |
| httpx[http2]>=0.24.0 | extras stripped, range resolved to latest |
| https://pypi.org/project/mcp/ | PyPI URL, latest |
| https://pypi.org/project/requests/2.31.0/ | PyPI URL, pinned version |
GitHub
| Input | Resolves to |
|---|---|
| modelcontextprotocol/servers | HEAD of default branch |
| github:modelcontextprotocol/servers | explicit prefix, HEAD |
| https://github.com/modelcontextprotocol/servers | GitHub URL, HEAD |
| https://github.com/modelcontextprotocol/servers.git | .git suffix stripped, HEAD |
| https://github.com/modelcontextprotocol/servers/tree/main | pinned branch |
| https://github.com/modelcontextprotocol/servers/tree/v1.2.0 | pinned tag |
Docker
| Input | Resolves to |
|---|---|
| nginx:latest | Docker Hub image with tag |
| nginx:1.27-alpine | pinned tag |
| docker:mcp/fetch | explicit prefix, resolves :latest |
| ghcr.io/owner/image:tag | GitHub Container Registry |
| gcr.io/project/image:tag | Google Container Registry |
| mcr.microsoft.com/mcp-server:latest | Microsoft Container Registry |
| nginx@sha256:abc123 | pinned digest |
Official MCP Registry
Reverse-domain IDs from registry.modelcontextprotocol.io. io.github.* IDs resolve to the actual GitHub repo via the registry, so the version captured is the registry’s current release.
| Input | Resolves to |
|---|---|
| io.github.modelcontextprotocol/servers | looks up repo + version from MCP registry |
| io.github.punkpeye/fastmcp | resolves to github:punkpeye/fastmcp@<registry-version> |
| ai.anthropic/claude-code | MCP registry server ID (non-GitHub) |
| https://registry.modelcontextprotocol.io/servers/io.github.punkpeye/fastmcp | full registry URL |
The rules
Every new rule goes through a precision review before any finding it produces affects a user-visible scan result.
What we check today: destructive-tool annotations without confirmation, runtime secret exfiltration, over-broad permissions, OAuth over-scoping, prompt injection into inner LLMs, overbroad input schemas, install-time remote-exec hooks, typosquat package names, known CVEs, container running as root, plaintext secrets in environment files, and more. Browse the full detection rules list, or see how it maps to the MCP Top 10.
The model consensus (Deep scans only)
Five independent models from four different vendors vote on each tool handler. No single model can unilaterally move a score. We record every vote and show you the full judge panel on the result page — including disagreements.
- Anthropic Claude Haiku 4.5 (via Bedrock, Frankfurt)
- Anthropic Claude 3.7 Sonnet (via Bedrock, Frankfurt)
- Google Gemini 2.5 Flash (via Vertex AI, Frankfurt)
- Mistral Small (via Mistral La Plateforme, Paris)
- OpenAI GPT-4o-mini (via OpenAI API)
Per-judge verdicts are aggregated as a cross-judge median, not a majority vote — one outlier vote can't move the score.
Model votes are not the scan. They are a second opinion on semantic intent. A rule finding alone will flag a server; model votes adjust the score, not the verdict.
The grade
We publish a 0–100 safety score and a letter grade. The score is a weighted average across signal categories including injection, secrets, permissions, supply chain, destructive actions, CVEs, typosquats, server configuration, and community signals.
The letter grade is derived from the package's AIVSS score (the maximum individual finding score). Grade thresholds: A AIVSS < 2, B 2–3.9, C 4–6.9, D 7–8.9, F ≥ 9. A single high-severity finding (AIVSS ≥ 7) is enough to push the grade to D; a critical finding (AIVSS ≥ 9) pushes it to F.
Public vs Private scans
Public scans answer “is this MCP server safe to install?” for code anyone can pull from npm, PyPI, GitHub, or a container registry. Results are attested with a shareable URL and appear in the public /registry. They’re free to run, anonymously or signed in.
Private scans answer “is the server we’re shipping safe?” — the same engine pointed at code your users haven’t seen yet. Results are isolated to your account, encrypted at rest, never written to the public registry, and unreachable from any public-scan code path (enforced at the IAM policy layer). Private scans require a paid plan; see /pricing.
Both visibilities support Fast and Deep scan modes — the four combinations are orthogonal. The strongest pre-launch posture is a Deep scan on Private before the first public release.
What we don’t do
- We don’t execute your code. Static analysis only — no sandboxed runtime, no sample inputs, no side effects.
- We don’t store your source. Fetched packages are used for the duration of the scan only.
- We don’t credential-scan for the purpose of exploitation. Secret detection flags leaks out of the server — we are not harvesting anything.
- We don’t re-sell scan data. Public grades appear in /registry; private scans are isolated to your account and never visible to other users.
- We don’t claim perfect recall. Every scanner has blind spots — ours are listed below.
Limitations
Today we focus on Python, TypeScript, and JavaScript source files, Dockerfiles, and common manifest formats. Other languages pass through un-scanned at the rule level, though CVE lookup and typosquat checks still apply.
A grade on MCPSafe reflects the code, not a running instance. Live endpoint probes — TLS enforcement, header audits, unauthenticated endpoint detection — are on the roadmap.
Cross-file data flow analysis is a future capability. Current taint-flow rules operate within a single file.
Found a problem?
False positives are the bug, not the noise. Email security@mcpsafe.io with the scan URL and we’ll add a fixture and retune the rule. Same for missing checks: if a published CVE or a class of attack isn’t caught, we want to know.