Open Source BI Governance & Security Without Vendor Lock-In
DATA ENGINEERING

Open Source BI Governance & Security Without Vendor Lock-In

Preset Team
9 min read
1,639 words

Governance and security are the parts of a BI evaluation that look like checklist items right up until they're the line that kills the deal. Procurement won't approve a tool without SOC 2. Security won't approve it without SSO + SCIM. Data won't approve it without row-level security that survives scrutiny. Compliance won't approve it without an audit log that answers who saw what, when two years later. And legal won't approve it if the lock-in profile means you can't leave.

Open source BI has structural advantages on most of these — the source is auditable, the data stays in your warehouse, and the lock-in profile is genuinely lower than SaaS-only alternatives. But "open source" by itself doesn't satisfy enterprise procurement; the platform also has to ship the security model. This guide walks through what that model looks like, why each piece matters for compliance, and how the leading open source BI platforms — Apache Superset, Metabase, Lightdash, and Redash — compare.

What enterprise governance actually requires

The list that matters in real procurement reviews:

Role-based access control (RBAC)

Not just "admin / user" — proper RBAC means roles defined per workspace, per object type (dashboards, datasets, queries, databases), with permissions composable across roles. A dashboard editor in one workspace can be a viewer in another. A data engineer can manage datasets but not dashboards. The granularity matters because real organizations don't fit a flat permission model.

Row-level security (RLS)

Two users hitting the same dashboard should see different rows when their access policies differ. RLS enforced at query time, against the warehouse, with attribute-based rules that can express "show this user only their region's data" or "filter to the customer this tenant ID belongs to." Critically: RLS that's evaluated by the BI tool after the warehouse returns rows is not RLS — it's a UI filter that anyone with API access can bypass.

Audit logging at compliance-grade detail

What good audit logs look like, in regulated environments:

  • Who — authenticated user identity, not just session ID
  • What — every dashboard view, query execution, dataset access, permission change
  • When — timestamp with timezone
  • Where — IP / region / device fingerprint
  • Outcome — success, denied, error
  • Retention — at least 1 year, often longer for SOX / HIPAA; good platforms provide structured CSV/API export so compliance teams can meet this via their own archival pipeline rather than relying on in-product storage alone

Auditors don't accept "we have logs somewhere." They want structured, queryable, tamper-evident logs with predictable retention. The platforms that ship audit logging as a first-class feature (rather than a debug-log byproduct) are the ones that pass enterprise reviews on the first try.

Identity integration

Enterprise identity is centralized. The BI tool needs to:

  • SSO via SAML, OAuth, or OIDC against the corporate IdP (Okta, Azure AD / Entra, Auth0, Google Workspace)
  • SCIM for automated user provisioning and de-provisioning — when an employee leaves, their BI access should disappear automatically, not on a quarterly access review
  • Group / role mapping so IdP groups translate to BI roles without manual reconciliation

Tools that lean on their own user table for enterprise deployments are tools that fail audit. SCIM in particular is what separates "we support SSO" from "we support enterprise."

Encryption and key management

In-flight encryption (TLS) is table stakes. At-rest encryption is also expected. The differentiator at the enterprise tier is Bring Your Own Key (BYOK / CMEK) — the customer holds the encryption key for their data, and revoking the key revokes the BI tool's access. Required for some regulated industries; nice-to-have everywhere else.

Network isolation

For environments where data can't leave the customer's network: VPC-native deployments, private-cloud installation, or self-hosting with the BI tool talking only to internal endpoints. Some industries (financial services, healthcare, government) treat this as non-negotiable. The open source advantage here is real — you can deploy the platform inside your own network with no external dependency.

Data lineage

Where the data came from, what transformations were applied, which dashboards consume it. BI tools are not the right place to own lineage — that belongs to dedicated tools (OpenLineage, OpenMetadata, dbt's lineage graph, Atlan, Monte Carlo) that span the whole stack. What a BI tool should do is integrate with these systems: surface upstream model lineage from dbt, expose downstream consumption (which dashboards depend on this dataset), and emit events that lineage tools can consume.

A BI vendor claiming to provide "comprehensive data lineage" by itself is usually overstating it. The honest answer is "we integrate with the tools that own lineage."

Compliance frameworks

The frameworks that come up in enterprise procurement, in rough order of frequency:

  • SOC 2 Type II — the baseline for almost any enterprise. Open source projects don't carry SOC 2 themselves; managed offerings on top of them do.
  • HIPAA-eligible deployments with BAAs — required for healthcare.
  • ISO 27001 / 27017 — increasingly common globally.
  • GDPR — for any EU-touching data flows.
  • FedRAMP — required for US federal customers.
  • Industry-specific — SOX for financial reporting, PCI for payment data, etc.

Note the asymmetry: the open source project itself doesn't get SOC 2 / HIPAA certified. The deployment does — either via a managed offering with the certification, or by you carrying the compliance burden internally with your own controls.

Lock-in: the open source advantage that matters most

Vendor lock-in in BI shows up in three places, and the ones with the highest cost are the ones procurement teams underestimate the most:

  • Data lock-in. With SaaS-only proprietary tools, your data often gets replicated into the vendor's storage. With open source, the data stays in your warehouse — the BI tool is purely a query/serving layer.
  • Modeling lock-in. Tools with proprietary modeling languages (LookML being the most prominent example) lock your business logic into a format that doesn't port. Open source platforms either use SQL directly, integrate with dbt, or use formats that are exportable.
  • Authoring lock-in. Dashboards and queries built in a proprietary tool are stranded in that tool's storage format. Open source platforms with documented schemas (or, in Lightdash's case, where the modeling layer is dbt) let you take your work with you.

The "without vendor lock-in" part of the title isn't decorative — it's the structural property that lets enterprises choose a BI tool and revisit the choice later if needs change. Per-viewer pricing on a SaaS-only proprietary tool that owns your data, modeling, and authoring is a 5–10 year commitment whether the procurement team intends it to be or not.

How the open source shortlist compares

Capability Apache Superset Metabase Lightdash Redash
RBAC granularity Per-workspace, per-object Per-collection (paid tier deeper) Per-project (paid tier deeper) Per-group (basic)
Row-level security Yes (rule-based, query-time) Yes (paid tier deeper) Yes (via dbt + warehouse) Limited
Audit logging Yes (Enterprise; 30-day retention + CSV/API export) Yes (paid tier) Yes (paid tier) Limited
SSO via SAML / OIDC Yes Yes Yes (paid tier) Limited
SCIM provisioning Yes (via managed) Yes (paid tier) Yes (paid tier) Limited
Encryption at rest Yes (warehouse-side) Yes Yes (warehouse-side) Yes (warehouse-side)
Bring Your Own Key (BYOK) Yes (via Managed Private Cloud) Limited Limited Limited
VPC-native / private deployment Yes (self-host or Managed Private Cloud) Yes (self-host or Cloud Premium) Yes (self-host) Yes (self-host)
Lineage integration Yes (dbt manifest) Yes (dbt integration) Native (dbt) Limited
SOC 2 Type II via managed offering Preset Metabase Cloud Lightdash Cloud None primary
HIPAA-eligible via managed offering Yes (Preset Cloud + MPC) Limited Limited None
Lock-in profile Low (warehouse-resident, exportable) Low–medium (paid features create some pull) Low (dbt is the modeling layer) Low

A note on the proprietary alternatives: Looker, Power BI, and Tableau all carry SOC 2 / HIPAA equivalents and have mature governance features at their enterprise tiers. The trade-offs are the lock-in axes above — particularly data and modeling lock-in for the SaaS-first tools, and per-viewer pricing that complicates the "let everyone see this" question that governance teams otherwise want to say yes to.

How to choose

A short decision tree for the governance/security lens:

  • You're in a regulated industry (financial services, healthcare, government) and need SOC 2 + HIPAA + SSO + SCIM + RLS + audit logging without building the compliance burden internally. Apache Superset via Preset (or self-hosted with a dedicated platform team if you can absorb the compliance work).
  • You need network isolation — data can't leave your VPC or private cloud. Apache Superset self-hosted, or a managed offering with VPC-native / private-cloud deployment options like Preset Managed Private Cloud.
  • You need governance for an internal-only deployment with moderate scale. Metabase (Pro / Enterprise tier for the deeper governance features) or Apache Superset, depending on team preference and skills.
  • You want a dbt-first analytics surface with lineage that follows your dbt models. Lightdash, with the understanding that some governance features are in the paid Cloud tier.

For most enterprise rollouts where governance, compliance, and lock-in profile are decisive — and especially for customer-facing analytics where audit and identity integration matter most — Apache Superset paired with a managed offering is the most complete open source answer today. The combination of source-level transparency, warehouse-resident data, exportable artifacts, and enterprise-grade security via the managed layer hits the requirements list without the lock-in tail.

Where Preset fits

Preset is the managed Apache Superset platform built for the governance and compliance requirements this guide covers: SOC 2 Type II, HIPAA-eligible deployments with BAAs, SSO via SAML, SCIM provisioning, audit logging, Bring Your Own Key (BYOK) via Managed Private Cloud, network isolation options, and lineage integration through dbt. The lock-in profile stays low — your data lives in your warehouse, your dashboards live in a documented schema, and you can move off Preset to self-hosted Superset (or vice versa) without re-platforming.

If you're scoping a BI rollout against an enterprise governance bar — or working through a specific compliance framework with procurement — the team is happy to walk through the controls in detail. For related angles, our companion guides cover open source embedded analytics platforms, self-service BI for non-technical teams, warehouse-native BI for the modern data stack, BI total cost of ownership, and scaling BI for enterprise growth.

Subscribe to our blog updates

Receive a weekly digest of new blog posts

Close