High risk. Don't ship without significant remediation.
Scanned 5/9/2026, 6:18:27 AM·Cached result·Deep Scan·91 rules·How we decide ↗
AIVSS Score
High
Severity Breakdown
0
critical
10
high
56
medium
14
low
MCP Server Information
Findings
This package receives a D grade with a safety score of 55/100 due to 10 high-severity issues alongside 56 medium-severity findings, primarily centered on verbose error handling (45 instances) that could leak sensitive information and readiness gaps (14 instances) that may indicate incomplete security hardening. The 6 prompt injection vulnerabilities and 3 tool poisoning risks pose direct threats to safe operation, while server configuration weaknesses (9 findings) and potential data exfiltration paths (2 findings) compound the overall security posture concerns.
AIPer-finding remediation generated by bedrock-claude-haiku-4-5 — 38 of 80 findings. Click any finding to read.
No known CVEs found for this package or its dependencies.
Scan Details
Done
Sign in to save scan history and re-scan automatically on new commits.
Building your own MCP server?
Same rules, same LLM judges, same grade. Private scans stay isolated to your account and never appear in the public registry. Required for code your team hasn’t shipped yet.
Showing 1–30 of 80 findings
80 findings
Tool handlers use module-level AWS clients (self.s3_client, self.glue_client) initialized from os.getenv() without consulting caller identity or per-request credentials, enabling confused deputy attacks on S3 and Glue resources.
Evidence
| 1 | """FastMCP server for SQL to PySpark conversion with code review and optimization.""" |
| 2 | |
| 3 | import json |
| 4 | import os |
| 5 | import re |
| 6 | from typing import Any, Dict, List, Optional, Union |
| 7 | |
| 8 | from fastmcp import FastMCP |
| 9 | |
| 10 | from .advanced_optimizer import AdvancedOptimizer |
| 11 | from .aws_glue_integration import ( |
| 12 | AWSGlueIntegration, |
| 13 | DataCatalogTable, |
| 14 | DataFormat, |
| 15 | GlueJobConfig, |
| 16 | GlueJobType, |
| 17 | ) |
| 18 | from .batch_processor import BatchProcessor |
| 19 | from .code_reviewer import PySparkCodeReviewer |
| 20 | from .data_source_anal |
RemediationAI
The problem is that module-level AWS clients (self.s3_client, self.glue_client) are initialized once from os.getenv() without validating caller identity, allowing any caller to use the server's ambient credentials to access S3 and Glue resources. Modify the FastMCP server to accept caller credentials (e.g., via request headers or context) and pass them to a new method that creates per-request boto3 clients using sts.assume_role() or explicit credential passing instead of relying on ambient credentials. This ensures each caller's requests are isolated to their own AWS identity and permissions. Verify by adding logging to confirm that each tool invocation creates a new client with the caller's credentials and that cross-caller access is denied.
LLM consensus
DataSourceAnalyzer initializes boto3 S3 and Glue clients at module import time without per-request credential validation, allowing handlers to access AWS resources using server's ambient credentials regardless of caller identity.
Evidence
| 57 | def __init__(self): |
| 58 | self.s3_client = None |
| 59 | self.glue_client = None |
| 60 | self._initialize_aws_clients() |
| 61 | |
| 62 | def _initialize_aws_clients(self): |
| 63 | """Initialize AWS clients if credentials are available.""" |
| 64 | if not BOTO3_AVAILABLE: |
| 65 | return |
| 66 | |
| 67 | try: |
| 68 | self.s3_client = boto3.client("s3") |
| 69 | self.glue_client = boto3.client("glue") |
| 70 | except (NoCredentialsError, Exception): |
| 71 | # AWS clients will be None if no credentials |
| 72 | |
RemediationAI
The problem is that DataSourceAnalyzer._initialize_aws_clients() creates boto3 clients at module import time using only os.getenv(), with no per-request credential validation, so all handlers share the same AWS identity. Refactor _initialize_aws_clients() to accept optional caller credentials as parameters and defer client initialization until tool invocation time, or implement a factory method that creates fresh clients per request using caller-provided credentials or STS assume-role. This prevents the confused deputy problem by ensuring each request uses only the caller's credentials. Verify by instrumenting the code to confirm that clients are created fresh per request and that attempting to access resources outside the caller's IAM policy fails.
MemoryManager uses a shared SQLite database at a fixed path (os.getenv PYSPARK_TOOLS_DB_PATH or ~/.cache/mcp/memory.sqlite) without per-caller isolation, allowing all callers to access and modify shared conversion history and metrics.
Evidence
| 67 | expires_at: Optional[str] = None |
| 68 | |
| 69 | |
| 70 | class MemoryManager: |
| 71 | """SQLite-based memory manager for storing conversion history and context.""" |
| 72 | |
| 73 | def __init__(self, db_path: Optional[str] = None): |
| 74 | if db_path is None: |
| 75 | db_path = os.getenv( |
| 76 | "PYSPARK_TOOLS_DB_PATH", |
| 77 | os.path.expanduser("~/.cache/mcp/memory.sqlite"), |
| 78 | ) |
| 79 | self.db_path = Path(db_path) |
| 80 | self.db_path.parent.mkdir(parents=True, exist_ok=True) |
| 81 | # Thread-local s |
RemediationAI
The problem is that MemoryManager uses a shared SQLite database at a fixed path without per-caller isolation, allowing all callers to read and modify each other's conversion history and metrics. Modify the MemoryManager.__init__() method to accept a caller_id parameter and include it in the database path (e.g., ~/.cache/mcp/memory_{caller_id}.sqlite) or add a caller_id column to all tables and filter queries by caller_id. This ensures each caller's data is isolated. Verify by creating two separate MemoryManager instances with different caller IDs and confirming that queries from one caller do not return data from the other.
SQL injection risk. SQL call receives a query built with string interpolation (%, +, f-string, or template literal) instead of placeholder parameters. Use parameterised queries.
Evidence
| 264 | for column, column_type in required_columns.items(): |
| 265 | if column not in columns: |
| 266 | conn.execute( |
| 267 | f"ALTER TABLE conversions ADD COLUMN {column} {column_type}" |
| 268 | ) |
| 269 | |
| 270 | conn.commit() |
RemediationAI
The problem is that the ALTER TABLE statement uses f-string interpolation for the column name and type, creating a SQL injection vulnerability if column or column_type contain malicious SQL. Replace the f-string with parameterized query syntax; however, note that column names cannot be parameterized in SQLite, so use a whitelist of allowed column names and validate column_type against a predefined set of safe types before interpolation. This prevents injection of arbitrary SQL. Verify by attempting to pass a column name like "x; DROP TABLE conversions; --" and confirming it is rejected or safely escaped.
BatchProcessor performs FILESYSTEM side effects (writes output files and directories via OutputManager) that may not be explicitly disclosed in tool descriptions focused only on 'processing' or 'converting'.
Evidence
| 1 | """ |
| 2 | Batch processor for handling multiple SQL files and directories with comprehensive |
| 3 | job management, status tracking, and error handling. |
RemediationAI
The problem is that BatchProcessor performs filesystem side effects (writing output files and directories) that are not disclosed in tool descriptions, violating the MCP principle of transparent side effects. Update the tool description for any BatchProcessor-based tool to explicitly document that it writes files to disk, including the output directory path and file types created. This allows callers to understand and consent to the side effects. Verify by reading the tool description and confirming it mentions filesystem writes and the output location.
DataSourceAnalyzer._initialize_aws_clients performs NETWORK side effects (initializes boto3 S3 and Glue clients) that may not be disclosed in tool descriptions focused only on 'analyzing' data sources.
Evidence
| 57 | def __init__(self): |
| 58 | self.s3_client = None |
| 59 | self.glue_client = None |
| 60 | self._initialize_aws_clients() |
| 61 | |
| 62 | def _initialize_aws_clients(self): |
RemediationAI
The problem is that DataSourceAnalyzer._initialize_aws_clients() performs network side effects (initializing boto3 clients) that are not disclosed in tool descriptions focused only on 'analyzing' data sources. Update the tool description to explicitly state that the tool may initialize AWS API clients and make network calls to S3 and Glue services. Alternatively, defer client initialization to lazy-load only when AWS operations are actually needed. Verify by checking the tool description and confirming it mentions potential AWS API calls.
MemoryManager.__init__ performs FILESYSTEM side effects (creates SQLite database and directories) not disclosed in typical tool descriptions that only mention 'storing' or 'caching' data.
Evidence
| 44 | @dataclass |
| 45 | class PerformanceMetric: |
| 46 | """Represents a performance metric.""" |
| 47 | |
| 48 | id: Optional[int] |
RemediationAI
The problem is that MemoryManager.__init__() performs filesystem side effects (creating SQLite database and directories) not disclosed in tool descriptions that only mention 'storing' or 'caching' data. Update the tool description to explicitly document that the tool creates a local SQLite database file and may create cache directories. Alternatively, defer database creation to lazy-load only when data is first stored. Verify by reading the tool description and confirming it mentions filesystem creation.
Tool 'convert_sql_to_pyspark' returns untrusted SQL query verbatim in pyspark_code output without provenance wrapper, enabling indirect prompt injection via user-supplied SQL strings.
Evidence
| 1 | """FastMCP server for SQL to PySpark conversion with code review and optimization.""" |
| 2 | |
| 3 | import json |
| 4 | import os |
| 5 | import re |
| 6 | from typing import Any, Dict, List, Optional, Union |
| 7 | |
| 8 | from fastmcp import FastMCP |
| 9 | |
| 10 | from .advanced_optimizer import AdvancedOptimizer |
| 11 | from .aws_glue_integration import ( |
| 12 | AWSGlueIntegration, |
| 13 | DataCatalogTable, |
| 14 | DataFormat, |
| 15 | GlueJobConfig, |
| 16 | GlueJobType, |
| 17 | ) |
| 18 | from .batch_processor import BatchProcessor |
| 19 | from .code_reviewer import PySparkCodeReviewer |
| 20 | from .data_source_anal |
RemediationAI
The problem is that analyze_codebase returns untrusted file content from third-party S3 locations and filesystems without delimiters identifying source provenance, enabling prompt injection if the output is consumed by an LLM. Wrap all file content returned from external sources with provenance markers such as `<!-- FILE_START: {path} -->...<!-- FILE_END: {path} -->` or return it in a structured format with {"source": "file", "path": path, "content": content}. This clearly identifies external content. Verify by checking the tool output and confirming that all file content is wrapped with source identifiers.
LLM consensus
Tool 'review_pyspark_code' returns untrusted user-supplied PySpark code snippets in review output without provenance markers, allowing injection of malicious instructions into LLM context.
Evidence
| 1 | """FastMCP server for SQL to PySpark conversion with code review and optimization.""" |
| 2 | |
| 3 | import json |
| 4 | import os |
| 5 | import re |
| 6 | from typing import Any, Dict, List, Optional, Union |
| 7 | |
| 8 | from fastmcp import FastMCP |
| 9 | |
| 10 | from .advanced_optimizer import AdvancedOptimizer |
| 11 | from .aws_glue_integration import ( |
| 12 | AWSGlueIntegration, |
| 13 | DataCatalogTable, |
| 14 | DataFormat, |
| 15 | GlueJobConfig, |
| 16 | GlueJobType, |
| 17 | ) |
| 18 | from .batch_processor import BatchProcessor |
| 19 | from .code_reviewer import PySparkCodeReviewer |
| 20 | from .data_source_anal |
RemediationAI
The problem is that analyze_codebase returns untrusted file content from third-party S3 locations and filesystems without delimiters identifying source provenance, enabling prompt injection if the output is consumed by an LLM. Wrap all file content returned from external sources with provenance markers such as `<!-- FILE_START: {path} -->...<!-- FILE_END: {path} -->` or return it in a structured format with {"source": "file", "path": path, "content": content}. This clearly identifies external content. Verify by checking the tool output and confirming that all file content is wrapped with source identifiers.
LLM consensus
Tool 'analyze_codebase' returns untrusted file content from third-party PySpark codebases (via data_source_analyzer.analyze_s3_location and file system reads) without delimiters identifying source provenance.
Evidence
| 1 | """FastMCP server for SQL to PySpark conversion with code review and optimization.""" |
| 2 | |
| 3 | import json |
| 4 | import os |
| 5 | import re |
| 6 | from typing import Any, Dict, List, Optional, Union |
| 7 | |
| 8 | from fastmcp import FastMCP |
| 9 | |
| 10 | from .advanced_optimizer import AdvancedOptimizer |
| 11 | from .aws_glue_integration import ( |
| 12 | AWSGlueIntegration, |
| 13 | DataCatalogTable, |
| 14 | DataFormat, |
| 15 | GlueJobConfig, |
| 16 | GlueJobType, |
| 17 | ) |
| 18 | from .batch_processor import BatchProcessor |
| 19 | from .code_reviewer import PySparkCodeReviewer |
| 20 | from .data_source_anal |
RemediationAI
The problem is that analyze_codebase returns untrusted file content from third-party S3 locations and filesystems without delimiters identifying source provenance, enabling prompt injection if the output is consumed by an LLM. Wrap all file content returned from external sources with provenance markers such as `<!-- FILE_START: {path} -->...<!-- FILE_END: {path} -->` or return it in a structured format with {"source": "file", "path": path, "content": content}. This clearly identifies external content. Verify by checking the tool output and confirming that all file content is wrapped with source identifiers.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 2813 | return result |
| 2814 | |
| 2815 | except Exception as e: |
| 2816 | return {"status": "error", "message": str(e)} |
| 2817 | |
| 2818 | |
| 2819 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 2783 | return result |
| 2784 | |
| 2785 | except Exception as e: |
| 2786 | return {"status": "error", "message": str(e)} |
| 2787 | |
| 2788 | |
| 2789 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 2061 | } |
| 2062 | |
| 2063 | except Exception as e: |
| 2064 | return {"status": "error", "message": str(e)} |
| 2065 | |
| 2066 | |
| 2067 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 1004 | } |
| 1005 | |
| 1006 | except Exception as e: |
| 1007 | return {"status": "error", "message": str(e)} |
| 1008 | |
| 1009 | def _get_input_format(self, format: Optional[DataFormat]) -> str: |
| 1010 | """Get input format for Glue table definition.""" |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 3070 | return result |
| 3071 | |
| 3072 | except Exception as e: |
| 3073 | return {"status": "error", "message": str(e)} |
| 3074 | |
| 3075 | |
| 3076 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 1641 | } |
| 1642 | |
| 1643 | except Exception as e: |
| 1644 | return {"status": "error", "message": str(e)} |
| 1645 | |
| 1646 | |
| 1647 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 3100 | return result |
| 3101 | |
| 3102 | except Exception as e: |
| 3103 | return {"status": "error", "message": str(e)} |
| 3104 | |
| 3105 | |
| 3106 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 657 | return template |
| 658 | |
| 659 | except Exception as e: |
| 660 | return {"status": "error", "message": str(e)} |
| 661 | |
| 662 | def generate_data_catalog_table_definition( |
| 663 | self, |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 871 | } |
| 872 | |
| 873 | except Exception as e: |
| 874 | return {"status": "error", "message": str(e)} |
| 875 | |
| 876 | def generate_schema_evolution_strategy( |
| 877 | self, |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 1713 | return result |
| 1714 | |
| 1715 | except Exception as e: |
| 1716 | return {"status": "error", "message": str(e)} |
| 1717 | |
| 1718 | |
| 1719 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 237 | } |
| 238 | |
| 239 | except Exception as e: |
| 240 | return {"status": "error", "message": str(e)} |
| 241 | |
| 242 | def _generate_imports(self, config: GlueJobConfig) -> str: |
| 243 | """Generate import statements based on configuration.""" |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 3019 | return result |
| 3020 | |
| 3021 | except Exception as e: |
| 3022 | return {"status": "error", "message": str(e)} |
| 3023 | |
| 3024 | |
| 3025 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 531 | } |
| 532 | |
| 533 | except Exception as e: |
| 534 | return {"status": "error", "message": str(e)} |
| 535 | |
| 536 | def _extract_dataframe_operations(self, pyspark_code: str) -> List[str]: |
| 537 | """Extract DataFrame operations from PySpark code.""" |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 2173 | } |
| 2174 | |
| 2175 | except Exception as e: |
| 2176 | return {"status": "error", "message": str(e)} |
| 2177 | |
| 2178 | |
| 2179 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 1491 | } |
| 1492 | |
| 1493 | except Exception as e: |
| 1494 | return {"status": "error", "message": str(e)} |
| 1495 | |
| 1496 | def generate_small_files_consolidation_job( |
| 1497 | self, |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 1932 | } |
| 1933 | |
| 1934 | except Exception as e: |
| 1935 | return {"status": "error", "message": str(e)} |
| 1936 | |
| 1937 | |
| 1938 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 2638 | return result |
| 2639 | |
| 2640 | except Exception as e: |
| 2641 | return {"status": "error", "message": str(e)} |
| 2642 | |
| 2643 | |
| 2644 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 1623 | } |
| 1624 | |
| 1625 | except Exception as e: |
| 1626 | return {"status": "error", "message": str(e)} |
| 1627 | |
| 1628 | def _analyze_partitioning_strategy( |
| 1629 | self, table_info: DataCatalogTable, query_patterns: Optional[List[str]] |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 2244 | } |
| 2245 | |
| 2246 | except Exception as e: |
| 2247 | return {"status": "error", "message": str(e)} |
| 2248 | |
| 2249 | |
| 2250 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 403 | return result |
| 404 | |
| 405 | except Exception as e: |
| 406 | return {"status": "error", "message": str(e)} |
| 407 | |
| 408 | |
| 409 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 1903 | } |
| 1904 | |
| 1905 | except Exception as e: |
| 1906 | return {"status": "error", "message": str(e)} |
| 1907 | |
| 1908 | |
| 1909 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 3151 | return result |
| 3152 | |
| 3153 | except Exception as e: |
| 3154 | return {"status": "error", "message": str(e)} |
| 3155 | |
| 3156 | |
| 3157 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 3228 | } |
| 3229 | |
| 3230 | except Exception as e: |
| 3231 | return {"status": "error", "message": str(e)} |
| 3232 | |
| 3233 | |
| 3234 | # ============================================================================= |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 1843 | } |
| 1844 | |
| 1845 | except Exception as e: |
| 1846 | return {"status": "error", "message": str(e), "pdf_path": pdf_path} |
| 1847 | |
| 1848 | |
| 1849 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 1614 | return {"status": "success", "message": f"Context stored with key: {key}"} |
| 1615 | |
| 1616 | except Exception as e: |
| 1617 | return {"status": "error", "message": str(e)} |
| 1618 | |
| 1619 | |
| 1620 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 1808 | } |
| 1809 | |
| 1810 | except Exception as e: |
| 1811 | return {"status": "error", "message": str(e)} |
| 1812 | |
| 1813 | |
| 1814 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 536 | } |
| 537 | |
| 538 | except Exception as e: |
| 539 | return {"status": "error", "message": str(e)} |
| 540 | |
| 541 | |
| 542 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 1422 | } |
| 1423 | |
| 1424 | except Exception as e: |
| 1425 | return {"status": "error", "message": str(e)} |
| 1426 | |
| 1427 | def generate_s3_optimization_strategy( |
| 1428 | self, |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 766 | } |
| 767 | |
| 768 | except Exception as e: |
| 769 | return {"status": "error", "message": str(e)} |
| 770 | |
| 771 | def detect_schema_from_sample_data( |
| 772 | self, |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 2914 | return result |
| 2915 | |
| 2916 | except Exception as e: |
| 2917 | return {"status": "error", "message": str(e)} |
| 2918 | |
| 2919 | |
| 2920 | @app.tool() |
RemediationAI
The problem is that exception handlers return str(e) which exposes full exception details including internal paths, library versions, and query structure useful for reconnaissance. Replace all `str(e)` with a generic error message such as "An error occurred processing your request" and log the full exception details server-side using logging.exception(). This hides sensitive information from callers. Verify by triggering an error and confirming the response contains only a generic message, not stack traces or internal paths.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 2701 | return result |
| 2702 | |
| 2703 | except Exception as e: |
| 2704 | return {"status": "error", "message": str(e)} |
| 2705 | |
| 2706 | |
| 2707 | @app.tool() |
Remediation
Log the full exception server-side with a correlation ID; return only {"error_id": id, "message": "internal error"} to the caller. Never enable Flask debug mode in production.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 2869 | return result |
| 2870 | |
| 2871 | except Exception as e: |
| 2872 | return {"status": "error", "message": str(e)} |
| 2873 | |
| 2874 | |
| 2875 | @app.tool() |
Remediation
Log the full exception server-side with a correlation ID; return only {"error_id": id, "message": "internal error"} to the caller. Never enable Flask debug mode in production.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 2208 | } |
| 2209 | |
| 2210 | except Exception as e: |
| 2211 | return {"status": "error", "message": str(e)} |
| 2212 | |
| 2213 | |
| 2214 | @app.tool() |
Remediation
Log the full exception server-side with a correlation ID; return only {"error_id": id, "message": "internal error"} to the caller. Never enable Flask debug mode in production.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 2971 | return result |
| 2972 | |
| 2973 | except Exception as e: |
| 2974 | return {"status": "error", "message": str(e)} |
| 2975 | |
| 2976 | |
| 2977 | @app.tool() |
Remediation
Log the full exception server-side with a correlation ID; return only {"error_id": id, "message": "internal error"} to the caller. Never enable Flask debug mode in production.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 2294 | } |
| 2295 | |
| 2296 | except Exception as e: |
| 2297 | return {"status": "error", "message": str(e)} |
| 2298 | |
| 2299 | |
| 2300 | @app.tool() |
Remediation
Log the full exception server-side with a correlation ID; return only {"error_id": id, "message": "internal error"} to the caller. Never enable Flask debug mode in production.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 1568 | } |
| 1569 | |
| 1570 | except Exception as e: |
| 1571 | return {"status": "error", "message": str(e)} |
| 1572 | |
| 1573 | |
| 1574 | @app.tool() |
Remediation
Log the full exception server-side with a correlation ID; return only {"error_id": id, "message": "internal error"} to the caller. Never enable Flask debug mode in production.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 2227 | } |
| 2228 | |
| 2229 | except Exception as e: |
| 2230 | return {"status": "error", "message": str(e)} |
| 2231 | |
| 2232 | def _generate_timestamp_incremental_job( |
| 2233 | self, |
Remediation
Log the full exception server-side with a correlation ID; return only {"error_id": id, "message": "internal error"} to the caller. Never enable Flask debug mode in production.
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 1594 | } |
| 1595 | |
| 1596 | except Exception as e: |
| 1597 | return {"status": "error", "message": str(e)} |
| 1598 | |
| 1599 | |
| 1600 | @app.tool() |
Remediation
Log the full exception server-side with a correlation ID; return only {"error_id": id, "message": "internal error"} to the caller. Never enable Flask debug mode in production.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 2115 | } |
| 2116 | |
| 2117 | except Exception as e: |
| 2118 | return {"status": "error", "message": str(e)} |
| 2119 | |
| 2120 | |
| 2121 | @app.tool() |
Remediation
Log the full exception server-side with a correlation ID; return only {"error_id": id, "message": "internal error"} to the caller. Never enable Flask debug mode in production.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 2006 | } |
| 2007 | |
| 2008 | except Exception as e: |
| 2009 | return {"status": "error", "message": str(e)} |
| 2010 | |
| 2011 | |
| 2012 | @app.tool() |
Remediation
Log the full exception server-side with a correlation ID; return only {"error_id": id, "message": "internal error"} to the caller. Never enable Flask debug mode in production.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 1758 | } |
| 1759 | |
| 1760 | except Exception as e: |
| 1761 | return {"status": "error", "message": str(e)} |
| 1762 | |
| 1763 | |
| 1764 | @app.tool() |
Remediation
Log the full exception server-side with a correlation ID; return only {"error_id": id, "message": "internal error"} to the caller. Never enable Flask debug mode in production.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 638 | } |
| 639 | |
| 640 | except Exception as e: |
| 641 | return {"status": "error", "message": str(e)} |
| 642 | |
| 643 | |
| 644 | # Helper functions for the new tools |
Remediation
Log the full exception server-side with a correlation ID; return only {"error_id": id, "message": "internal error"} to the caller. Never enable Flask debug mode in production.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 1970 | } |
| 1971 | |
| 1972 | except Exception as e: |
| 1973 | return {"status": "error", "message": str(e)} |
| 1974 | |
| 1975 | |
| 1976 | @app.tool() |
Remediation
Log the full exception server-side with a correlation ID; return only {"error_id": id, "message": "internal error"} to the caller. Never enable Flask debug mode in production.
LLM consensus
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 2070 | } |
| 2071 | |
| 2072 | except Exception as e: |
| 2073 | return {"status": "error", "message": str(e)} |
| 2074 | |
| 2075 | def generate_change_data_capture_job( |
| 2076 | self, |
Remediation
Log the full exception server-side with a correlation ID; return only {"error_id": id, "message": "internal error"} to the caller. Never enable Flask debug mode in production.
Full exception detail or stack trace returned to the caller. Leaking tracebacks exposes internal paths, library versions, and query structure — useful recon for attackers.
Evidence
| 2022 | } |
| 2023 | |
| 2024 | except Exception as e: |
| 2025 | return {"status": "error", "message": str(e)} |
| 2026 | |
| 2027 | def generate_job_bookmark_configuration( |
| 2028 | self, |
Remediation
Log the full exception server-side with a correlation ID; return only {"error_id": id, "message": "internal error"} to the caller. Never enable Flask debug mode in production.
MCP tool input schema exposes an unconstrained string/any field with a risky name (command/query/sql/code/script/url/path/expr/ eval). Any caller can pass arbitrary values, which typically widens the tool's blast radius well beyond its intent. Narrow the schema with `.enum()`, `.regex()`, `.max()`, `Literal[...]`, Pydantic `Field(max_length=..., pattern=...)`, or a JSON Schema `enum` / `pattern` / `maxLength`.
Evidence
| 16 | pattern_id: str |
| 17 | pattern_hash: str |
| 18 | description: str |
| 19 | code_template: str |
| 20 | parameters: List[str] |
| 21 | usage_count: int |
| 22 | examples: List[str] |
Remediation
Shape the schema to the tool's actual intent: - Zod: chain `.enum([...])`, `.regex(/.../)`, or `.max(n)`; prefer `z.enum([...])` or `z.literal(...)` when the value set is small. - Pydantic: use `Literal["a", "b"]` or `Field(max_length=..., pattern=r"...")`. - JSON Schema: add `"enum"`, `"pattern"`, or `"maxLength"` to the property. An overbroad schema is an "overpowered tool" — the model has nothing to prevent it from calling the tool with input far beyond what the tool's prose contract
MCP tool input schema exposes an unconstrained string/any field with a risky name (command/query/sql/code/script/url/path/expr/ eval). Any caller can pass arbitrary values, which typically widens the tool's blast radius well beyond its intent. Narrow the schema with `.enum()`, `.regex()`, `.max()`, `Literal[...]`, Pydantic `Field(max_length=..., pattern=...)`, or a JSON Schema `enum` / `pattern` / `maxLength`.
Evidence
| 50 | @dataclass |
| 51 | class ExtractedSQL: |
| 52 | """Represents an extracted SQL query with metadata.""" |
| 53 | |
| 54 | query: str |
| 55 | source_file: str |
| 56 | page_number: Optional[int] = None |
| 57 | line_number: Optional[int] = None |
Remediation
Shape the schema to the tool's actual intent: - Zod: chain `.enum([...])`, `.regex(/.../)`, or `.max(n)`; prefer `z.enum([...])` or `z.literal(...)` when the value set is small. - Pydantic: use `Literal["a", "b"]` or `Field(max_length=..., pattern=r"...")`. - JSON Schema: add `"enum"`, `"pattern"`, or `"maxLength"` to the property. An overbroad schema is an "overpowered tool" — the model has nothing to prevent it from calling the tool with input far beyond what the tool's prose contract
MCP tool input schema exposes an unconstrained string/any field with a risky name (command/query/sql/code/script/url/path/expr/ eval). Any caller can pass arbitrary values, which typically widens the tool's blast radius well beyond its intent. Narrow the schema with `.enum()`, `.regex()`, `.max()`, `Literal[...]`, Pydantic `Field(max_length=..., pattern=...)`, or a JSON Schema `enum` / `pattern` / `maxLength`.
Evidence
| 38 | id: Optional[int] |
| 39 | pattern_hash: str |
| 40 | pattern_description: str |
| 41 | code_template: str |
| 42 | usage_count: int |
| 43 | created_at: Optional[str] = None |
Remediation
Shape the schema to the tool's actual intent: - Zod: chain `.enum([...])`, `.regex(/.../)`, or `.max(n)`; prefer `z.enum([...])` or `z.literal(...)` when the value set is small. - Pydantic: use `Literal["a", "b"]` or `Field(max_length=..., pattern=r"...")`. - JSON Schema: add `"enum"`, `"pattern"`, or `"maxLength"` to the property. An overbroad schema is an "overpowered tool" — the model has nothing to prevent it from calling the tool with input far beyond what the tool's prose contract
MCP tool input schema exposes an unconstrained string/any field with a risky name (command/query/sql/code/script/url/path/expr/ eval). Any caller can pass arbitrary values, which typically widens the tool's blast radius well beyond its intent. Narrow the schema with `.enum()`, `.regex()`, `.max()`, `Literal[...]`, Pydantic `Field(max_length=..., pattern=...)`, or a JSON Schema `enum` / `pattern` / `maxLength`.
Evidence
| 42 | @dataclass |
| 43 | class ConversionResult: |
| 44 | """Result of converting a single SQL query to PySpark.""" |
| 45 | |
| 46 | sql_query: str |
| 47 | pyspark_code: str |
| 48 | optimizations: List[str] |
| 49 | success: bool |
Remediation
Shape the schema to the tool's actual intent: - Zod: chain `.enum([...])`, `.regex(/.../)`, or `.max(n)`; prefer `z.enum([...])` or `z.literal(...)` when the value set is small. - Pydantic: use `Literal["a", "b"]` or `Field(max_length=..., pattern=r"...")`. - JSON Schema: add `"enum"`, `"pattern"`, or `"maxLength"` to the property. An overbroad schema is an "overpowered tool" — the model has nothing to prevent it from calling the tool with input far beyond what the tool's prose contract
MCP tool input schema exposes an unconstrained string/any field with a risky name (command/query/sql/code/script/url/path/expr/ eval). Any caller can pass arbitrary values, which typically widens the tool's blast radius well beyond its intent. Narrow the schema with `.enum()`, `.regex()`, `.max()`, `Literal[...]`, Pydantic `Field(max_length=..., pattern=...)`, or a JSON Schema `enum` / `pattern` / `maxLength`.
Evidence
| 29 | """Represents a match of a pattern in code.""" |
| 30 | |
| 31 | pattern_id: str |
| 32 | code_snippet: str |
| 33 | start_line: int |
| 34 | end_line: int |
| 35 | confidence: float |
Remediation
Shape the schema to the tool's actual intent: - Zod: chain `.enum([...])`, `.regex(/.../)`, or `.max(n)`; prefer `z.enum([...])` or `z.literal(...)` when the value set is small. - Pydantic: use `Literal["a", "b"]` or `Field(max_length=..., pattern=r"...")`. - JSON Schema: add `"enum"`, `"pattern"`, or `"maxLength"` to the property. An overbroad schema is an "overpowered tool" — the model has nothing to prevent it from calling the tool with input far beyond what the tool's prose contract
MCP manifest declares tools but no authentication field is present (none of: auth, authorization, bearer, oauth, mtls, apiKey, api_key, basic, token, authToken). Absence is a weak signal — confirm whether the server relies on network-layer or host-level auth, or declare the real mechanism explicitly so reviewers can audit it.
Evidence
| 1 | Metadata-Version: 2.4 |
| 2 | Name: pyspark-tools |
| 3 | Version: 0.0.4 |
| 4 | Summary: MCP server for SQL migration, AWS Glue job generation, and PySpark optimization |
| 5 | Author-email: Annas Mazhar <annas.mazhar10@gmail.com> |
| 6 | Project-URL: Homepage, https://github.com/AnnasMazhar/pyspark_mcp |
| 7 | Project-URL: Repository, https://github.com/AnnasMazhar/pyspark_mcp |
| 8 | Project-URL: Issues, https://github.com/AnnasMazhar/pyspark_mcp/issues |
| 9 | Requires-Python: >=3.10 |
| 10 | Description-Content-Type: text/markdown |
| 11 | License-File: LICENSE |
| 12 | Requires- |
Remediation
Declare a real authentication mechanism in the manifest, matching what the running server actually enforces: - `"auth": "bearer"` with a token scheme documented for callers - `"auth": "oauth"` / `"oauth2": { ... }` for delegated flows - `"apiKey": { "header": "X-API-Key", "prefix": "..." }` - `"mtls": true` when client certificates are required If the server is intentionally unauthenticated (stdio-only, local developer tool, trusted-host network), document the assumption in the manifest via a `"
MCP manifest declares tools but no authentication field is present (none of: auth, authorization, bearer, oauth, mtls, apiKey, api_key, basic, token, authToken). Absence is a weak signal — confirm whether the server relies on network-layer or host-level auth, or declare the real mechanism explicitly so reviewers can audit it.
Evidence
| 1 | Metadata-Version: 2.4 |
| 2 | Name: pyspark-tools |
| 3 | Version: 0.0.4 |
| 4 | Summary: MCP server for SQL migration, AWS Glue job generation, and PySpark optimization |
| 5 | Author-email: Annas Mazhar <annas.mazhar10@gmail.com> |
| 6 | Project-URL: Homepage, https://github.com/AnnasMazhar/pyspark_mcp |
| 7 | Project-URL: Repository, https://github.com/AnnasMazhar/pyspark_mcp |
| 8 | Project-URL: Issues, https://github.com/AnnasMazhar/pyspark_mcp/issues |
| 9 | Requires-Python: >=3.10 |
| 10 | Description-Content-Type: text/markdown |
| 11 | License-File: LICENSE |
| 12 | Requires- |
Remediation
Declare a real authentication mechanism in the manifest, matching what the running server actually enforces: - `"auth": "bearer"` with a token scheme documented for callers - `"auth": "oauth"` / `"oauth2": { ... }` for delegated flows - `"apiKey": { "header": "X-API-Key", "prefix": "..." }` - `"mtls": true` when client certificates are required If the server is intentionally unauthenticated (stdio-only, local developer tool, trusted-host network), document the assumption in the manifest via a `"
MCP manifest declares tools but no authentication field is present (none of: auth, authorization, bearer, oauth, mtls, apiKey, api_key, basic, token, authToken). Absence is a weak signal — confirm whether the server relies on network-layer or host-level auth, or declare the real mechanism explicitly so reviewers can audit it.
Evidence
| 1 | # PySpark MCP Server |
| 2 | |
| 3 | |
| 4 | |
| 5 | SQL migration assistance, AWS Glue job generation, and Spark code optimization — as an MCP server. |
| 6 | |
| 7 | [](https://github.com/AnnasMazhar/pyspark_mcp/actions/workflows/ci.yml) |
| 8 | [](https://www.python.org/downloads/) |
| 9 | [](https://opensource.org/licenses/MIT |
Remediation
Declare a real authentication mechanism in the manifest, matching what the running server actually enforces: - `"auth": "bearer"` with a token scheme documented for callers - `"auth": "oauth"` / `"oauth2": { ... }` for delegated flows - `"apiKey": { "header": "X-API-Key", "prefix": "..." }` - `"mtls": true` when client certificates are required If the server is intentionally unauthenticated (stdio-only, local developer tool, trusted-host network), document the assumption in the manifest via a `"
PySpark Tools FastMCP server emits notifications/tools/list_changed but tools/list response entries lack content-bound integrity fields (version, etag, digest, sha256, hash), enabling undetected tool list rotation attacks.
Evidence
| 1 | """FastMCP server for SQL to PySpark conversion with code review and optimization.""" |
| 2 | |
| 3 | import json |
| 4 | import os |
Remediation
Either: 1. Drop `notifications/tools/list_changed` from the server's capabilities and keep the tool list immutable for the lifetime of the connection, OR 2. Add a content-bound `version` / `etag` / `digest` field to each tool entry in `tools/list` responses. Recompute it whenever the handler / description / schema changes. The client can then surface an approval prompt on change.
LLM consensus
MCP server holds a mutable shared container (cache / store / state / pool / registry / sessions / results / outputs) and mutates it via append / push / add / extend, but no caller-identity marker (user_id / session_id / caller_id / request_id / org_id / tenant_id / actor_id / subject / principal) appears anywhere in the file. A process-global list mutated from inside a tool handler with no caller partition leaks data across requests: a later caller can read what an earlier caller wrote. Closes
Evidence
| 1 | """FastMCP server for SQL to PySpark conversion with code review and optimization.""" |
| 2 | |
| 3 | import json |
| 4 | import os |
| 5 | import re |
| 6 | from typing import Any, Dict, List, Optional, Union |
| 7 | |
| 8 | from fastmcp import FastMCP |
| 9 | |
| 10 | from .advanced_optimizer import AdvancedOptimizer |
| 11 | from .aws_glue_integration import ( |
| 12 | AWSGlueIntegration, |
| 13 | DataCatalogTable, |
| 14 | DataFormat, |
| 15 | GlueJobConfig, |
| 16 | GlueJobType, |
| 17 | ) |
| 18 | from .batch_processor import BatchProcessor |
| 19 | from .code_reviewer import PySparkCodeReviewer |
| 20 | from .data_source_anal |
Remediation
Partition shared state by caller identity, or eliminate it. Wrong: _cache = {} @mcp.tool() def search(query: str) -> list: if query in _cache: return _cache[query] result = _expensive(query) _cache[query] = result return result Right (key by caller): _cache: dict[str, dict] = {} @mcp.tool() def search(query: str, ctx) -> list: user_cache = _cache.setdefault(ctx.user_id, {}) if query in user_cache:
LLM consensus
MCP server holds a mutable shared container (cache / store / state / context / pool / registry / sessions / results / outputs) and writes to it via subscript assignment, but no caller-identity marker (user_id / session_id / caller_id / request_id / org_id / tenant_id / actor_id / subject / principal) appears anywhere in the file. A process-global cache written from inside a tool handler with no caller partition is a cross-request data path: one user's tool result can be served to another user o
Evidence
| 1 | """FastMCP server for SQL to PySpark conversion with code review and optimization.""" |
| 2 | |
| 3 | import json |
| 4 | import os |
| 5 | import re |
| 6 | from typing import Any, Dict, List, Optional, Union |
| 7 | |
| 8 | from fastmcp import FastMCP |
| 9 | |
| 10 | from .advanced_optimizer import AdvancedOptimizer |
| 11 | from .aws_glue_integration import ( |
| 12 | AWSGlueIntegration, |
| 13 | DataCatalogTable, |
| 14 | DataFormat, |
| 15 | GlueJobConfig, |
| 16 | GlueJobType, |
| 17 | ) |
| 18 | from .batch_processor import BatchProcessor |
| 19 | from .code_reviewer import PySparkCodeReviewer |
| 20 | from .data_source_anal |
Remediation
Partition shared state by caller identity, or eliminate it. Wrong: _cache = {} @mcp.tool() def search(query: str) -> list: if query in _cache: return _cache[query] result = _expensive(query) _cache[query] = result return result Right (key by caller): _cache: dict[str, dict] = {} @mcp.tool() def search(query: str, ctx) -> list: user_cache = _cache.setdefault(ctx.user_id, {}) if query in user_cache:
LLM consensus
Silent error swallowing detected. An except clause that does pass or ... discards the exception with no log, no metric, and no trace. This blinds incident response and hides real failures.
Evidence
| 1588 | col_name = str(select_expr) |
| 1589 | col_name = col_name.strip('"').strip("'") |
| 1590 | columns.append(f"col('{col_name}')") |
| 1591 | continue |
| 1592 | except: |
| 1593 | pass |
| 1594 | |
| 1595 | if isinstance(expr, sqlglot.expressions.Column): |
| 1596 | table = ( |
Remediation
Log the exception at minimum (`logger.exception(e)`), emit a metric, or re-raise if the error is not recoverable. If you genuinely want to ignore an exception, say so with a comment.
Silent error swallowing detected. An except clause that does pass or ... discards the exception with no log, no metric, and no trace. This blinds incident response and hides real failures.
Evidence
| 6 | """CLI entry point for pyspark-mcp server.""" |
| 7 | from pyspark_tools.server import app |
| 8 | try: |
| 9 | app.run() |
| 10 | except KeyboardInterrupt: |
| 11 | pass |
Remediation
Log the exception at minimum (`logger.exception(e)`), emit a metric, or re-raise if the error is not recoverable. If you genuinely want to ignore an exception, say so with a comment.
Silent error swallowing detected. An except clause that does pass or ... discards the exception with no log, no metric, and no trace. This blinds incident response and hides real failures.
Evidence
| 1380 | sqlglot.expressions.Where |
| 1381 | ): |
| 1382 | where_expr = parsed_sql.find(sqlglot.expressions.Where) |
| 1383 | return self._convert_expression_to_filter(where_expr.this) |
| 1384 | except: |
| 1385 | pass |
| 1386 | return None |
| 1387 | |
| 1388 | def _convert_expression_to_filter(self, expr) -> str: |
Remediation
Log the exception at minimum (`logger.exception(e)`), emit a metric, or re-raise if the error is not recoverable. If you genuinely want to ignore an exception, say so with a comment.
Silent error swallowing detected. An except clause that does pass or ... discards the exception with no log, no metric, and no trace. This blinds incident response and hides real failures.
Evidence
| 319 | try: |
| 320 | for cte in parsed_sql.find_all(sqlglot.expressions.CTE): |
| 321 | if hasattr(cte, "alias") and cte.alias: |
| 322 | cte_names.add(str(cte.alias).strip('"').strip("'")) |
| 323 | except Exception: |
| 324 | pass |
| 325 | |
| 326 | all_tables = self._extract_all_tables(parsed_sql) |
| 327 | if all_tables: |
Remediation
Log the exception at minimum (`logger.exception(e)`), emit a metric, or re-raise if the error is not recoverable. If you genuinely want to ignore an exception, say so with a comment.
Silent error swallowing detected. An except clause that does pass or ... discards the exception with no log, no metric, and no trace. This blinds incident response and hides real failures.
Evidence
| 1356 | if hasattr(table, "alias") and table.alias |
| 1357 | else table_name |
| 1358 | ) |
| 1359 | tables.append((table_name, table_alias)) |
| 1360 | except: |
| 1361 | pass |
| 1362 | return tables |
| 1363 | |
| 1364 | def _extract_all_tables(self, parsed_sql) -> List[str]: |
Remediation
Log the exception at minimum (`logger.exception(e)`), emit a metric, or re-raise if the error is not recoverable. If you genuinely want to ignore an exception, say so with a comment.
Silent error swallowing detected. An except clause that does pass or ... discards the exception with no log, no metric, and no trace. This blinds incident response and hides real failures.
Evidence
| 491 | for expr in parsed.expressions: |
| 492 | col_str = self._convert_expression_to_pyspark(expr, dialect) |
| 493 | cols.append(col_str) |
| 494 | return ", ".join(cols) if cols else "" |
| 495 | except Exception: |
| 496 | pass |
| 497 | return "" |
| 498 | |
| 499 | def _extract_cte_from(self, parsed) -> str: |
Remediation
Log the exception at minimum (`logger.exception(e)`), emit a metric, or re-raise if the error is not recoverable. If you genuinely want to ignore an exception, say so with a comment.
Silent error swallowing detected. An except clause that does pass or ... discards the exception with no log, no metric, and no trace. This blinds incident response and hides real failures.
Evidence
| 524 | group = parsed.find(sqlglot.expressions.Group) |
| 525 | if group and hasattr(group, "expressions"): |
| 526 | cols = [f"'{str(e)}'" for e in group.expressions] |
| 527 | return ", ".join(cols) |
| 528 | except Exception: |
| 529 | pass |
| 530 | return "" |
| 531 | |
| 532 | def _handle_subqueries(self, parsed_sql, dialect: str) -> List[str]: |
Remediation
Log the exception at minimum (`logger.exception(e)`), emit a metric, or re-raise if the error is not recoverable. If you genuinely want to ignore an exception, say so with a comment.
Silent error swallowing detected. An except clause that does pass or ... discards the exception with no log, no metric, and no trace. This blinds incident response and hides real failures.
Evidence
| 558 | sub_from = str(sub_table.name) |
| 559 | sub_where_node = subquery.this.find(sqlglot.expressions.Where) |
| 560 | if sub_where_node: |
| 561 | sub_where = str(sub_where_node.this) |
| 562 | except Exception: |
| 563 | pass |
| 564 | |
| 565 | code_lines.append(f"# Subquery {i+1}: {sub_spark[:80]}") |
| 566 | if sub_from: |
Remediation
Log the exception at minimum (`logger.exception(e)`), emit a metric, or re-raise if the error is not recoverable. If you genuinely want to ignore an exception, say so with a comment.
Silent error swallowing detected. An except clause that does pass or ... discards the exception with no log, no metric, and no trace. This blinds incident response and hides real failures.
Evidence
| 513 | try: |
| 514 | where = parsed.find(sqlglot.expressions.Where) |
| 515 | if where: |
| 516 | return str(where.this) |
| 517 | except Exception: |
| 518 | pass |
| 519 | return "" |
| 520 | |
| 521 | def _extract_cte_groupby(self, parsed) -> str: |
Remediation
Log the exception at minimum (`logger.exception(e)`), emit a metric, or re-raise if the error is not recoverable. If you genuinely want to ignore an exception, say so with a comment.
Silent error swallowing detected. An except clause that does pass or ... discards the exception with no log, no metric, and no trace. This blinds incident response and hides real failures.
Evidence
| 503 | if from_clause: |
| 504 | table = from_clause.find(sqlglot.expressions.Table) |
| 505 | if table: |
| 506 | return str(table.name) |
| 507 | except Exception: |
| 508 | pass |
| 509 | return "" |
| 510 | |
| 511 | def _extract_cte_where(self, parsed) -> str: |
Remediation
Log the exception at minimum (`logger.exception(e)`), emit a metric, or re-raise if the error is not recoverable. If you genuinely want to ignore an exception, say so with a comment.
Silent error swallowing detected. An except clause that does pass or ... discards the exception with no log, no metric, and no trace. This blinds incident response and hides real failures.
Evidence
| 1652 | col_ref += ".asc()" |
| 1653 | |
| 1654 | order_cols.append(col_ref) |
| 1655 | return ", ".join(order_cols) |
| 1656 | except: |
| 1657 | pass |
| 1658 | return None |
| 1659 | |
| 1660 | def _convert_where_to_filter(self, where_str: str) -> str: |
Remediation
Log the exception at minimum (`logger.exception(e)`), emit a metric, or re-raise if the error is not recoverable. If you genuinely want to ignore an exception, say so with a comment.
Silent error swallowing detected. An except clause that does pass or ... discards the exception with no log, no metric, and no trace. This blinds incident response and hides real failures.
Evidence
| 995 | if RESOURCE_MANAGEMENT_AVAILABLE: |
| 996 | try: |
| 997 | resource_manager = get_resource_manager() |
| 998 | resource_manager.cleanup_all() |
| 999 | except Exception: |
| 1000 | pass |
| 1001 | |
| 1002 | # Note: Individual connections are closed in their respective methods |
| 1003 | # using 'with' statements or explicit close() calls |
Remediation
Log the exception at minimum (`logger.exception(e)`), emit a metric, or re-raise if the error is not recoverable. If you genuinely want to ignore an exception, say so with a comment.
Silent error swallowing detected. An except clause that does pass or ... discards the exception with no log, no metric, and no trace. This blinds incident response and hides real failures.
Evidence
| 1368 | if hasattr(parsed_sql, "find_all"): |
| 1369 | for table in parsed_sql.find_all(sqlglot.expressions.Table): |
| 1370 | if hasattr(table, "name"): |
| 1371 | tables.add(table.name) |
| 1372 | except: |
| 1373 | pass |
| 1374 | return sorted(list(tables)) |
| 1375 | |
| 1376 | def _extract_where_clause(self, parsed_sql) -> Optional[str]: |
Remediation
Log the exception at minimum (`logger.exception(e)`), emit a metric, or re-raise if the error is not recoverable. If you genuinely want to ignore an exception, say so with a comment.
Silent error swallowing detected. An except clause that does pass or ... discards the exception with no log, no metric, and no trace. This blinds incident response and hides real failures.
Evidence
| 1608 | else: |
| 1609 | col_str = str(expr).strip('"').strip("'") |
| 1610 | columns.append(f"col('{col_str}')") |
| 1611 | return ", ".join(columns) |
| 1612 | except: |
| 1613 | pass |
| 1614 | return None |
| 1615 | |
| 1616 | def _extract_order_by_clause(self, parsed_sql) -> Optional[str]: |
Remediation
Log the exception at minimum (`logger.exception(e)`), emit a metric, or re-raise if the error is not recoverable. If you genuinely want to ignore an exception, say so with a comment.