CASE STUDIES

Running Apache Superset on the Open Internet: A Report from the Fireline

Maxime Beauchemin

After years as an on-call and DevOps engineer for Apache Superset, Apache Airflow, and various services across Facebook, Airbnb, Lyft, and now Preset, I've learned that running mission-critical services is a bit like being a firefighter. You're always ready to respond to emergencies, address immediate issues, and work to prevent future incidents. And just like firefighting, experience teaches you that prevention is often more valuable than even the most heroic response.

The DevOps movement has taught us that the best people to run a service are often those who've contributed most to it. This isn't just theory – I've seen it play out repeatedly across different organizations. Even the most skilled cloud infrastructure engineers face significant challenges when operating software they haven't worked with extensively.

At Preset, Superset isn't just a service we run – it's part of our DNA. As core contributors, we're involved in every design decision, we tackle the most complex production issues, and we're generally the ones developing patches for critical vulnerabilities. When something breaks – and complex software eventually does – we understand not just how to fix it, but why it broke in the first place.

Stepping into the Wild: When Internal Tools Go Public

Recently, I've noticed a common pattern: teams who've built amazing internal analytics with Superset want to share these insights with their customers. It's a natural evolution – you've created valuable dashboards that your internal teams love, so why not extend them to your customers? The conversation usually starts simply enough: "We'll just put it behind a reverse proxy with authentication, and we're good to go, right?"

I've sat in those planning meetings, and I understand the appeal. You already have authentication in your main application, the dashboards work great internally, and the reverse proxy seems like a solid security barrier. Spin up an isolated instance your current service as a model and poke a hole in the firewall. But here's what often gets overlooked: when you move from an internal tool to a public-facing service, you're not just changing where the traffic comes from – you're fundamentally altering the security landscape.

Think about it: How hard is it to get authenticated access to your main application? Do you have a free tier? A trial period? An open signup process? For most products trying to grow, getting authenticated access is intentionally straightforward. That reverse proxy you're counting on? For a determined attacker, getting valid credentials might be the easiest part of their day.

What's different about external exposure isn't just the volume of traffic – it's the dramatic expansion of your threat surface. Internal users might make mistakes, but they rarely have malicious intent and leave a clear trail behind everything they do. External users? They might be competitors, automated bots, or bad actors specifically looking for vulnerabilities. The stakes aren't just higher – they're in a different league entirely.

The Real Complexity Behind the Scenes

Let me unpack, at a high level, what running Superset at scale actually looks like. This isn't theoretical – it's drawn from countless early morning pages and late-night debugging sessions.

When you deploy Superset to production, you're orchestrating:

  • A Python web application with 200+ dependencies, fine-tuned for autoscaling to handle variable loads efficiently.
  • A React frontend with an extensive dependency tree that supports a modern, dynamic user interface.
  • A metadata database (Postgres or MySQL) for storing application state, complete with backups, failover mechanisms, and disaster recovery for high availability.
  • An asynchronous task processing system (Celery), configured with dynamic autoscaling rules based on workload.
  • Multiple caching layers, including Redis for chart caching and S3 for CSV exports, to improve performance.
  • A message queue system (Redis or RabbitMQ) for reliable communication between components and supporting Celery tasks.
  • Encrypted, tailored network communications ensuring secure data transfer between systems.
  • Authentication systems integrated with Identity Providers to support seamless and secure user access.
  • Analytics event handling to collect and send Superset usage data to a data ingestion system.
  • A reverse proxy for embedded use cases, securing exposure with an additional authentication layer.
  • Observability through statsd metrics, application logs, and stdout/stderr, sent to an observability platform or APM, backed by actionable alerts.
  • An on-call rotation supported by a comprehensive runbook for effective incident response.

Each of these components requires attention, maintenance, and occasional middle-of-the-night care. Let me share a real incident from just last month: One of our large customers had their self-hosted analytics database (looking at you, Redshift...) hit a rough patch. As their dashboards started timing out, users got understandably frustrated and started hammering the "force refresh" button. What began as a simple database slowdown cascaded into our Celery service hitting its auto-scaling boundaries, and eventually led to our Postgres database reaching its maximum connection limit.

Thanks to our noisy-neighbor protections, the situation was contained and we were able to quickly ease the limits and stabilize the situation. But finding and implementing the ultimate solution? That required the kind of deep Superset expertise you can count on one hand. It's not just about knowing what to fix – it's about understanding the intricacies of the codebase well enough to perform that kind of surgical operation and successfully get it merged into the main project.

The Economics of Scale: Beyond Uptime

Service maturity isn't just about keeping the lights on – it's about doing it efficiently. At Preset, deploying Superset isn't just another item on our infrastructure checklist; it's our core focus. This specialization has taught us valuable lessons about infrastructure optimization that go far beyond basic uptime metrics.

When Superset is just one of many services your team maintains, it often gets lost in the larger cloud bill. You might be running oversized instances "just to be safe," or keeping extra capacity that sits idle most of the time. It's a common pattern I've seen: teams err on the side of over-provisioning because they don't have the time or data to optimize thoroughly.

Our approach at Preset is different because it has to be. We've invested heavily in:

  • Right-sizing every component of the deployment
  • Implementing sophisticated auto-scaling that matches real workload patterns
  • Fine-tuning resource allocation across all services
  • Optimizing database and caching configurations for different usage patterns

But the real game-changer is the economy of scale that comes with running a multi-tenant service. With a broader user base, usage patterns become more predictable. Peak times from one customer offset quiet periods from another. This predictability, combined with our focused optimization efforts, creates efficiencies that we can pass on to our customers.

I've had fascinating conversations with customers who've switched to Preset and discovered something surprising: their Preset contract costs less than what they were spending just on cloud infrastructure for their self-hosted Superset installation. And that's before factoring in engineering time and attention.

The Operational Reality

Let's talk about what a typical week of running Superset at scale looks like. You might find yourself:

  • Investigating why certain dashboards are suddenly slower than usual
  • Debugging cache inconsistencies
  • Fine-tuning auto-scaling rules
  • Managing worker queues to prevent resource starvation
  • Tuning database connection pools
  • Addressing browser-specific rendering issues
  • Responding to intermittent timeouts under specific load patterns
  • Easing rate-limiting criteria for specific, legitimate heavy API-powered use cases

At Preset, we've built comprehensive systems to monitor and alert on these scenarios. We have playbooks for nearly every failure mode because we've encountered them in production. We've instrumented critical paths, implemented safety nets for edge cases, and developed recovery mechanisms for when things go wrong.

The Time and Attention Commitment

The infrastructure costs are often the most predictable part of running Superset. The real investment comes in:

  • Engineering time for maintenance and upgrades
  • On-call rotations and incident response
  • Security monitoring and patching
  • Performance optimization
  • System reliability engineering
  • Training and documentation

Each of these represents not just a cost, but an opportunity cost – time your team could be spending on your core business objectives.

Making an Informed Decision

If you're considering running Superset on the public internet, it's worth thinking carefully about your team's core focus and capabilities. Running and securing a BI platform at scale requires significant dedicated resources and expertise.

At Preset, our experience comes from both sides of this decision. We've built and maintained these systems from scratch, and we've helped organizations transition to managed solutions. What we've learned is that organizations often derive the most value from Superset when they can focus on creating impactful dashboards and deriving insights from their data, rather than managing infrastructure.

Looking Forward

Whether you choose to run Superset yourself or opt for a managed solution, understanding the operational landscape is crucial. Every organization's needs are different, and there's no one-size-fits-all answer. The key is making an informed decision based on your team's capabilities, resources, and core business objectives.

Remember, the goal isn't just to run Superset – it's to help your organization make better decisions through data. How you get there depends on where you need to focus your team's valuable time and expertise.

And whatever path you choose, keep a good coffee maker nearby. Some lessons you just have to learn from experience.

Subscribe to our blog updates

Receive a weekly digest of new blog posts

Close