What It Takes to Manage the Open Source BI Tool, Apache Superset
Apache Superset is a leading web application for data exploration and data visualization. Because it is an open source software allowing for flexibility and customization, Superset is popular among companies that are built on modern data stacks to create business intelligence (BI) experiences cost-effectively and at scale. Companies of all sizes from Fortune 500 firms to fast-growing startups rely on Superset to enable data-driven decisions and workflows.
But managing open source components in-house is not free. It involves a significant amount of engineering resources and indirect costs to maintain secure, compliant, and high-quality software. So what exactly is the true cost of ownership and expertise required to manage Superset?
In this blog, we discuss the required efforts and the cost of managing open source Superset and how Preset’s managed solution could enable your organization to deploy a robust and reliable analytics tool powered by Superset.
Working with open source software is a continuous process (Figure 1). Even after your engineering team invests their time in installing and configuring Superset during the initial deployment, an ongoing effort is required to maintain high-quality analytics software that can serve many teams and business units across the organization.
Figure 1. A lifecycle of managing an open source software in-house.
Below describes what it takes to deploy and maintain Superset at high level.
Spinning up a Superset instance for the first time requires installing the software, configuring database drivers, setting up role-based access control (RBAC) and permissions, connecting to databases, and creating the first set of visualizations. If users rely on dashboards to get insights on product, operations, and business metrics, they would benefit from setting up event-triggered notifications (alerts) and/or scheduled notifications (reports). Further, setting up caching is recommended to improve query performance and enable faster data access.
Configuring alerts, reports, and caching, however, requires more investment in setting up your infrastructure. Caching, for example, requires enabling an asynchronous backend for Superset. The entire installation, configuration, and tuning process could take a few weeks of engineering work. For this reason, most organizations deploy just one instance of Superset, although they may benefit from having multiple isolated environments that allow for more secure separation of data connections and access.
It is not easy for companies to quickly address security vulnerabilities in Superset’s codebase. When vulnerabilities are discovered, the Apache Software Foundation notifies the Apache Superset Project Management Committee (a.k.a., PMC) of such issues in private. Unless the companies have employees that are on the committee, they could risk running vulnerable software without knowing until the next official Apache release, which typically happens every two to six months. Preset, on the other hand, is able to address security bugs within 24 hours because we have many PMC members that are dedicated and capable of addressing such issues promptly.
As your organization increasingly depends on analytics, uptime, performance, and overall reliability become critical to your operations. SLAs and expectations grow and a proportional investment in observability, monitoring, and engineering support will be required. Should your DevOps engineer(s) be on-call for yet another mission-critical system? Just ask how they would feel about it. Alternatively, you can trust a team who writes, patches, and releases the Superset software to provide the necessary support for your organization.
Apache Superset is one of the fastest evolving open source projects with new features, enhancements, and bug fixes being released on an ongoing basis. To provide a sense of velocity, over 170 pull requests have been merged over the past month at the time of writing.
To take advantage of Superset’s latest features and security updates, your internal engineering team would need to manage software upgrades. It is clear that mission-critical software upgrades need to be thoroughly managed with proper safety and caution, minimum downtime, a downgrade path (if needed), and the right amount of validation in isolated environments. Insuring this would require knowledge of the software, a good understanding of the changelog and upgrade notes, and in some cases, user communication (e.g., release notes) to prevent surprises. Database migration with an appropriate data backup strategy may also be required to avoid possible data loss. These tasks can be achieved most effectively by dedicated engineers focused on managing the Superset software at hand.
Similar to most modern, fast-evolving, and complex software solutions, Superset uses feature flags extensively as a way to build, mature, and release individual features or feature sets. There are over 50 feature flags in Superset at the time of writing. There is some level of complexity in understanding what individual feature flags do and whether or not the gated features are stable enough for release. At Preset, we know exactly which features are fully baked and manage sensible defaults at all times.
To ensure appropriate role permissions (e.g., who can edit dashboards vs. have view-only access), your organization needs to build and integrate user management rules with Superset. Depending on the maturity of your organization, this may result in additional work only relevant for Superset. As your organization grows, managing role permissions becomes increasingly more complicated and may require a more scalable solution (e.g., building a user management API).
The challenge with estimating the total cost of managing open source software is often unpredictability of the cost. There will be software bugs, regressions, unexpected issues, and database migrations associated with upgrades. Engineering resources required to maintain Superset are unpredictable and often not properly accounted for, as they are lumped into wider budget allocations for infrastructure. Furthermore, more experienced (and highly paid) engineers capable of covering broad concepts ranging from database connection and storage to caching are required to support the maintenance work.
It may be difficult to gain economies of scale from cloud hosting due to unpredictable usage. Reaching economies of scale requires high infrastructure density. At Preset, we have built multi-tenancy into Preset’s version of Superset to enable efficiency and pass on cost savings to our customers.
There is, of course, a complex interplay between quality of service (QoS) and TCO. While installing a piece of software and getting it to work appears to be a one-off task, offering consistent QoS has intricate implications on TCO. Setting up different environments (e.g., sandbox, staging, and production environments), good observability and alerts, tuning auto-scaling, and defining a proper runbook can be challenging. Realizing your organization needs higher QoS down the line could turn costly, throwing away your initial investment.
Over the years, Superset has evolved to a set of re-usable constructs to make it easier for people to install and run Superset. This consist of a set of Docker images, Helm charts, a Docker-compose file, (not-so-comprehensive) documentation, and blog posts about reference implementation. While these are helpful for someone gearing up to start their journey of productionizing Superset, they are not drop-in solutions.
While many open source communities have tried to make it as easy as possible to install and operate their software at scale, every organization has a different set of constraints that makes it impossible to create assets that solve this challenge for everyone. These constraints are related to your infrastructure and practices for how you run services, namely your provisioning solution and standards (e.g., Terraform, CloudFormation, SaltStack), your observability stack (e.g., Datadog, Prometheus, Splunk, New Relic), your cloud provider, your preference for using these third-party services over going raw in Kubernetes, your network abstractions, your backup and disaster recovery strategy, your preferred secret management solutions, and many other important areas of concern. The reality of installing and running open-source software is that it requires taking these assets and documentation and weaving them into your internal practices to align with your particular set of constraint.
If you're interested in Superset, you already understand that analytics are critical, and understanding how your organization uses analytics is important. While Superset is instrumented to serve observability and usage analytics, it is fairly tricky to configure the data to land into your data warehouse and takes a moment to build a chart for understanding how people are using it.
If a BI tool is widely adopted across your organization, there will be a broad base of end users requiring onboarding and troubleshooting support. While the Superset community provides the basic level of documentation, the end users will require a dedicated individual or a team versed in Superset’s codebase and functionality to help diagnose and resolve their unique issues.
Preset was founded by Maxime Beauchemin, the original creator of Apache Superset. Preset’s fully-managed cloud-based solution lets your organization explore data in Superset while ensuring quality, security, and compliance of the software (Figure 2).
Figure 2. Preset's SaaS offerings.
Table 1 summarizes the benefits offered by Preset on top of Superset.
Table 1. Benefits of Preset SaaS solutions.
Over the past few years, our team has steadily been pushing a majority of the contributions to the Superset codebase and has been running QA tests to ensure stable and secure software for Preset’s customers. If there are security vulnerabilities, we are the first to know and aim to deploy fixes on Preset within 24 hours. We have a defined release management process powered by a dedicated QA team to ensure that every software update released bi-weekly is reliable and running on the latest version of Superset.
We are committed to providing all features from the open source version of Superset while building additional Preset-only functionality to make your BI experiences more secure, performant, and seamless.
- Preset CLI and Preset API allow for managing users, workspaces, databases, and assets as code.
- Preset’s workspace and team management features let you deploy multiple fully-isolated Superset environments instantly, making it easier to separate teams without the need to involve your DevOps team.
- Preset’s embedded SDK lets your teams implement charts and dashboards into your custom applications quickly with limited code.
- Preset maintains a robust product roadmap to enhance your BI and embedded analytics experiences (e.g., in areas of usage tracking, theming and customizability, and more visualization options).
Our vision is to deliver interactive analytics for you, your team, and your customers, regardless of which apps they’re using. Stay tuned for exciting updates soon!
For individuals and small teams, we offer a free 14-day trial to the Preset Professional plan to test our fully-managed cloud-managed solution. Sign up here to get started.
For larger teams and organizations, we offer additional features and support to help deploy Preset at scale, including more workspaces, API access, managed private cloud deployment, and enterprise support with SLA. Contact our team to learn more.