DEEP DIVE

Apache Superset MCP Service: A Technical Deep Dive

Amin Ghadersohi
8 min read
1,584 words

This is the technical reference for Apache Superset's MCP service. Bookmark it. Come back when you're integrating, deploying, or extending.

If you want the narrative first — what MCP is, why we built it, and what it unlocks for analytics — start with Apache Superset MCP Service: A Library-First Architecture with FastMCP.

Four design decisions shape everything that follows:

Library-first architecture — The MCP service imports Superset as a Python library rather than proxying its web API. Independent scaling, clean async support, and unit tests without a full web context.

Preview-first workflowgenerate_chart and update_chart default to non-destructive previews. LLM agents explore and iterate freely without polluting the database. Persistence is always explicit.

Zero privilege escalation — The MCP service cannot grant permissions that don't exist in Superset. Every tool call flows through the same DAOs and Security Manager as the web UI — RBAC, RLS, and column restrictions apply identically.

Token budget management — Responses are bounded at 25K tokens by default. The middleware warns at 80% capacity and blocks at the limit, keeping LLM context windows from being overwhelmed.

Everything else in this post is detail. If you have any questions, please reach out.


Framework: FastMCP 3.1+ · Transport: Streamable HTTP · Protocol: JSON-RPC 2.0 · Tools: 20 · Auth: JWT + RBAC · Python: 3.10+


How MCP Connects AI to Superset

The Model Context Protocol is an open standard released by Anthropic that defines how AI applications discover and invoke external tools. Think of it as a USB-C port for AI — a universal interface that lets any compliant LLM client talk to any compliant tool server.

The Superset MCP service exposes the full analytics capability — dashboards, charts, datasets, SQL execution — as 20 discrete tools that LLM agents can discover, understand, and invoke through structured JSON-RPC calls.


System Architecture

The Library-First Decision

The most important architectural decision: Superset runs as an imported library, not as a Flask web app. The MCP service creates its own minimal Flask context and imports Superset's DAOs, models, and commands directly.

Original Approach

Embed MCP inside Flask via ASGI middleware.

  • Tight coupling to request lifecycles
  • Framework conflicts between sync and async
  • Scaling required scaling the whole web app
  • Testing required full web context

Current Approach

Standalone FastMCP service that imports Superset as a library.

  • Independent process and scaling
  • Clean async support via FastMCP
  • Unit testing without web framework
  • Separate deployment and monitoring

Full Architecture Diagram

Request Lifecycle

Every tool invocation follows this precise path through the system:

Every request passes through the auth hook, which loads the authenticated user into Flask's `g.user`. All subsequent DAO operations automatically apply that user's RBAC permissions and row-level security rules — the same permissions enforced in the web UI.

Complete Tool Catalog

Tool Domain Purpose Key Parameters
list_charts Chart List charts with filtering, search, pagination page, page_size, filters, search, select_columns
get_chart_info Chart Get chart details by ID, UUID, or slug identifier, form_data_key
get_chart_data Chart Retrieve chart query results as text identifier, format, limit, use_cache
get_chart_preview Chart Generate visual PNG preview of chart identifier, format, width, height
generate_chart Chart Create chart with optional save dataset_id, config, save_chart, chart_name, generate_preview
update_chart Chart Modify existing chart configuration identifier, config, chart_name, generate_preview
update_chart_preview Chart Preview changes without saving form_data_key, dataset_id, config, generate_preview
list_dashboards Dashboard List dashboards with filtering and search page, page_size, filters, search
get_dashboard_info Dashboard Get dashboard details and chart positions identifier
generate_dashboard Dashboard Create new dashboard from chart IDs dashboard_title, chart_ids
add_chart_to_existing_dashboard Dashboard Add a chart to an existing dashboard dashboard_id, chart_id
list_datasets Dataset List datasets with metadata control page, page_size, filters, search
get_dataset_info Dataset Get dataset schema, columns, metrics identifier
execute_sql SQL Lab Execute SQL with RLS enforcement database_id, sql, schema_name, limit
open_sql_lab_with_context SQL Lab Generate SQL Lab URL with pre-filled query database_connection_id, sql, schema_name, title
save_sql_query SQL Lab Save a SQL query to Saved Queries database_id, label, sql, description
generate_explore_link Explore Generate interactive Explore URL dataset_id, config
get_instance_info System Instance metadata, user info, entity counts none
get_schema System Discover filterable and sortable columns model_type
health_check System Connectivity and version check none

Chart Tools — 7 tools

7 tools covering the complete chart lifecycle: discovery, retrieval, creation, modification, and preview.

The generate_chart tool implements a 4-layer validation pipeline:

SchemaDatasetRuntimeColumn Normalization

Key patterns:

  • get_chart_info supports two identifier types: integer ID and UUID string
  • get_chart_data returns CSV-like text optimized for LLM consumption, not raw JSON
  • get_chart_preview uses a WebDriver pool for PNG screenshot generation
  • generate_chart runs a compilation check with row_limit=2 before saving
  • update_chart_preview is non-destructive — original chart remains unchanged

Dashboard Tools — 4 tools

4 tools for dashboard discovery, inspection, creation, and modification.

  • get_dashboard_info returns chart positions in the grid layout alongside chart metadata
  • generate_dashboard auto-generates layout with proper grid positioning
  • add_chart_to_existing_dashboard auto-calculates the new chart position in the grid

Dataset Tools — 2 tools

2 tools for dataset discovery and schema inspection.

  • list_datasets includes cache control flags: use_cache, refresh_metadata
  • get_dataset_info returns full column metadata including types, descriptions, and available metrics

SQL Lab and Explore Tools — 4 tools

4 tools bridging MCP and Superset's query engine.

  • execute_sql enforces Row-Level Security automatically — the same RLS rules apply as in the web UI
  • execute_sql supports Jinja2 template parameters for dynamic queries
  • open_sql_lab_with_context generates URLs with pre-filled query state
  • save_sql_query persists a query to Saved Queries with an optional label and description
  • generate_explore_link creates interactive Explore URLs for visual chart building

Core Framework Patterns

The MCP service is built on four reusable core classes that standardize tool behavior and eliminate boilerplate.

ModelListCore — The Workhorse

The most-used core class, powering all list_* tools. A single instantiation handles:

  • Pagination: 0-based internally, 1-based in requests
  • Filtering: JSON filter objects or structured filter types
  • Full-text search: Across configured columns
  • Column selection: Return only requested columns
  • Sorting: Any sortable column, asc or desc
  • RBAC enforcement: Automatic via the DAO layer
# How list_dashboards uses ModelListCore
tool = ModelListCore(
    dao_class=DashboardDAO,
    output_schema=DashboardInfo,
    filter_type=DashboardFilter,
    list_field_name="result",
    search_columns=["dashboard_title", "slug", "uuid"],
    default_columns=["id", "dashboard_title", "slug", "uuid"],
    sortable_columns=["id", "dashboard_title", "slug", "changed_on"],
    item_serializer=serialize_dashboard_object,
    output_list_schema=DashboardList,
)
return tool.run_tool(page=request.page - 1, page_size=request.page_size)

App Factory Pattern

The MCP service uses a factory function to create FastMCP instances:

def create_mcp_app(
    name: str | None = None,
    instructions: str | None = None,
    auth: Any | None = None,
    middleware: list[Middleware] | None = None,
    config: dict | None = None,
    **kwargs,
) -> FastMCP:
    """Create FastMCP instance with custom auth, middleware, etc."""
    return FastMCP(**build_kwargs)

# Module-level singleton
mcp = create_mcp_app()

The factory enables:

  • Custom auth providers for different environments
  • Middleware composition for rate limiting, size guards, logging
  • Configuration inheritance from Superset's app.config

Flask Singleton — Smart Initialization

The MCP service needs a Flask app context for database access but runs as a separate process. The singleton uses intelligent detection:

This enables the MCP service to work correctly whether started:

  • Standalone: Creates its own minimal Flask app
  • Embedded: Reuses the existing Superset Flask app

Authentication and Security

Authentication Flow

Authentication Modes

Mode Config Use Case Security
Development MCP_DEV_USERNAME = "admin" Local development, testing Single user, no token
RS256 + JWKS MCP_JWKS_URI · MCP_JWT_ISSUER Production with IdP Key rotation, standard OAuth2
RS256 + Static Key MCP_JWT_PUBLIC_KEY Production with fixed key pair Asymmetric, no shared secret
HS256 MCP_JWT_SECRET Simple deployments Shared secret required
The MCP service cannot grant permissions that don't exist in Superset. If a user is a Viewer in the web UI, they remain a Viewer through MCP. Row-level security, column restrictions, and RBAC all apply identically — because the same DAOs and Security Manager are used.

Middleware Architecture

The middleware pipeline processes every request in order. Each layer can inspect, modify, or reject requests before they reach the tool function. There are 7 layers in total.

Middleware Purpose Key Behavior Default
RateLimitMiddleware Prevent abuse Sliding window algorithm, Redis or in-memory Disabled
ResponseSizeGuardMiddleware Protect LLM context Estimates token count, warns at 80%, blocks at 25K Enabled
FieldPermissionsMiddleware Column-level security Strips fields the user cannot access Enabled
LoggingMiddleware Audit trail Logs tool calls with user, params, duration Enabled
GlobalErrorHandlerMiddleware Consistent errors Sanitizes sensitive data from error messages Enabled
PrivateToolMiddleware Hide internal tools Blocks tools tagged as private Enabled
StructuredContentStripperMiddleware FastMCP 3.x compat Strips structuredContent from responses for Claude.ai bridge compatibility Enabled

Error Sanitization

The GlobalErrorHandlerMiddleware automatically redacts sensitive information before it reaches the LLM client:

  • Database connection strings — replaced with generic messages
  • API keys and tokens — redacted from error traces
  • File system paths — stripped to prevent information disclosure
  • IP addresses — removed from error context

All regex patterns are bounded to prevent ReDoS attacks.

Token Budget Management

The ResponseSizeGuardMiddleware prevents overwhelming LLM context windows:

# Configuration in superset_config.py
MCP_RESPONSE_SIZE_CONFIG = {
    "enabled": True,
    "token_limit": 25000,
    "warn_threshold_pct": 80,
    "excluded_tools": ["health_check", "get_chart_preview", "generate_explore_link", "open_sql_lab_with_context"],
}

When a response exceeds the limit, the middleware returns a structured error suggesting the client use pagination or column selection to reduce the response size.


Preview-First Workflow

A critical design pattern that respects the iterative, conversational nature of AI interactions.

Traditional API expects precise requirements upfront, persists every call to the database, and assumes minimal iteration. LLM conversations are exploratory — preview first, refine, then save, with no database clutter from exploration.

How it works in practice:

# Step 1: Explore (no database write)
response = generate_chart(dataset_id=42, config={...}, save_chart=False)
# Returns: explore URL + form_data_key for iteration

# Step 2: Iterate (still no database write)
response = update_chart_preview(chart_id, config_updates={...})
# See changes instantly without committing

# Step 3: Save (explicit persistence)
response = generate_chart(dataset_id=42, config={...}, save_chart=True)
# NOW the chart is persisted to the database

Configuration Reference

Parameter Default Description Required
MCP_SERVICE_HOST localhost Host to bind the MCP server No
MCP_SERVICE_PORT 5008 Port for the MCP server No
MCP_DEV_USERNAME none Username for development mode auth Yes in dev
MCP_DEBUG False Enable debug logging No
MCP_AUTH_ENABLED False Enable JWT authentication No
MCP_JWT_ALGORITHM RS256 JWT signing algorithm No
MCP_JWKS_URI none JWKS endpoint for key rotation If RS256
MCP_JWT_PUBLIC_KEY none Static RSA public key If RS256 static
MCP_JWT_SECRET none HMAC shared secret If HS256
MCP_JWT_ISSUER none Expected JWT issuer claim Production
MCP_JWT_AUDIENCE none Expected JWT audience claim Production
MCP_PARSE_REQUEST_ENABLED True Enable flexible JSON/object input parsing No
MCP_RESPONSE_SIZE_CONFIG 25K tokens Token limit for response size guard No
MCP_STORE_CONFIG disabled Redis config for multi-pod sessions Multi-pod
MCP_CACHE_CONFIG disabled Response caching with TTL No

Deployment Guide

Standalone Process

The simplest deployment — run alongside the Superset web server:

# Terminal 1: Superset web server
superset run -p 9001 --with-threads --reload

# Terminal 2: MCP service
superset mcp run --port 5008 --debug

Docker Compose

Add an MCP service to your Docker Compose setup:

# Add to your docker-compose.yml
mcp:
  image: apache/superset:latest
  command: ["superset", "mcp", "run", "--host", "0.0.0.0", "--port", "5008"]
  ports:
    - "5008:5008"
  environment:
    - PYTHONPATH=/app
    - SUPERSET_CONFIG_PATH=/app/superset_config.py

Multi-Pod with Redis

For high-availability deployments, Redis provides shared session state across pods:

# superset_config.py
MCP_STORE_CONFIG = {
    "enabled": True,
    "CACHE_REDIS_URL": "redis://redis:6379/0",
    "event_store_prefix": "mcp_events_",
    "event_store_max_events": 100,
    "event_store_ttl": 3600,
}

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: superset-mcp
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: mcp-service
        image: apache/superset:latest
        command: ["superset", "mcp", "run", "--host", "0.0.0.0", "--port", "5008"]
        ports:
        - containerPort: 5008
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 5008
          initialDelaySeconds: 30

Extension System

Extensions can register custom MCP tools using the plugin decorator system — new tools automatically inherit the authentication, middleware, and security model.

from superset_core.mcp.decorators import tool

@tool(name="my_extension.custom_analytics")
def custom_analytics(dataset_id: int, metric: str) -> dict:
    """Run custom analytics on a dataset.

    Computes specialized metrics not available
    in standard Superset charts.
    """
    from superset.daos.dataset import DatasetDAO

    dataset = DatasetDAO.find_by_id(dataset_id)
    # Custom logic here...
    return {"result": computed_value}

Checklist for adding a new tool:

  1. Create tool file in superset/mcp_service/{module}/tool/{name}.py
  2. Add Apache license header
  3. Decorate with @tool from superset_core.mcp.decorators
  4. Create Pydantic request/response schemas
  5. Use @parse_request for flexible input handling
  6. Add dataset validation for chart-related tools
  7. Import the tool in app.py — tools won't register otherwise
  8. Write unit tests in tests/unit_tests/mcp_service/

Codebase Map

superset/mcp_service/
├── app.py                    # FastMCP factory + tool registration
├── server.py                 # Server runner (uvicorn)
├── auth.py                   # JWT auth + user resolution
├── mcp_core.py               # Generic core classes
├── middleware.py              # 7 middleware layers
├── flask_singleton.py         # Smart Flask app initialization
├── mcp_config.py              # Default configuration
├── jwt_verifier.py            # JWT token validation
├── caching.py                 # Response caching
├── storage.py                 # Redis store management
│
├── chart/
│   ├── schemas.py            # Pydantic models
│   ├── chart_utils.py        # Validation helpers
│   ├── preview_utils.py      # Screenshot generation
│   ├── validation/           # 4-layer validation pipeline
│   └── tool/                 # 7 chart tools
│
├── dashboard/
│   ├── schemas.py
│   └── tool/                 # 4 dashboard tools
│
├── dataset/
│   ├── schemas.py
│   └── tool/                 # 2 dataset tools
│
├── sql_lab/
│   ├── schemas.py
│   └── tool/                 # 3 SQL Lab tools
│
├── explore/
│   └── tool/                 # 1 explore tool
│
├── system/
│   ├── schemas.py
│   └── tool/                 # 3 system tools
│
├── common/                   # Shared error schemas
└── utils/                    # Schema parsing, permissions, caching

Performance and Observability

Connection Pooling

Each MCP service process maintains a SQLAlchemy connection pool. For multi-worker deployments, total connections = workers × pool size.

# superset_config.py
SQLALCHEMY_POOL_SIZE = 5
SQLALCHEMY_MAX_OVERFLOW = 10
SQLALCHEMY_POOL_TIMEOUT = 30
SQLALCHEMY_POOL_RECYCLE = 3600

Response Caching

Optional response caching reduces database load for repeated queries:

  • In-memory: Default backend, single-process only
  • Redis: For multi-pod deployments
  • TTL: 300 seconds default, configurable per operation

Tool Search Optimization

For context-constrained LLM clients, tools can be loaded in deferred categories:

  • core — Always loaded: list_charts, list_dashboards, list_datasets, get_instance_info, health_check
  • discovery — Load on demand: get info, get schema
  • data — Load on demand: previews, chart data
  • mutate — Load on demand: generate, update, execute
  • explore — Load on demand: URL generators

This provides up to 85% token savings in the initial tool discovery phase, reducing from 15–20K tokens down to 4–5K.


Contact our team to set up a demo and see what AI-native analytics looks like in practice.


For the full narrative on architectural decisions, see the companion article: Apache Superset MCP Service: A Library-First Architecture with FastMCP

Running Preset? See Preset MCP: From Open Source to Enterprise — how Preset extends this architecture with multi-tenant isolation, OAuth 2.0, and built-in Chatbot.

Source code: github.com/apache/superset · MCP Protocol: modelcontextprotocol.io · FastMCP: gofastmcp.com

Subscribe to our blog updates

Receive a weekly digest of new blog posts

Close