
From Airflow to Superset: How One Data Engineer's Mission Became an Open Source Revolution
What does it take to reshape an entire industry? For Maxime Beauchemin, Preset CEO and creator of Apache Airflow and Apache Superset, it started with a simple belief: open source could take over business intelligence. But the journey from building tools at Facebook and Airbnb to founding Preset and leading the charge for modern data visualization involved far more than just writing code.
In a recent episode of the Data Renegades podcast, Max sat down to discuss the evolution of the data stack, the surprising inspiration behind Superset's visual capabilities, and why the hardest problems in data engineering still remain unsolved. If you're building data pipelines, creating dashboards, or just curious about where the industry is headed, this conversation is packed with insights you won't want to miss.
The Philosophy: Software Rewrites Itself Every 5-10 Years
One of Max's most compelling observations is that software goes through phases, and every 5-10 years we essentially rewrite the same stack on a new premise. He's lived through multiple eras—from desktop applications and OLAP cubes in the early 2000s at Ubisoft, to the web era, to today's cloud-native stack.
"There was some really cool stuff in the desktop era," Max reflects. "Languages like MDX—multi-dimensional SQL—were pretty evolved at the time. And modeling with cubes was kind of a thing." Technologies like SQL Server Analysis Services and Hyperion represented sophisticated approaches to analytics that many modern data practitioners have never encountered.
This historical perspective isn't just nostalgia. Understanding these cycles helps us see where we're headed and what patterns repeat across generations of technology.
The Airbnb Innovation Lab: Where Airflow and Superset Were Born
Max's time at Facebook (2012-2014) was transformative. He describes it as "a Cambrian explosion of data tools" with thousands of engineers building at a breakneck pace. But it was at Airbnb where theory became reality.
When interviewing at Airbnb, Max had a specific condition: "If I join, could I work on building something like Data Swarm?"—the pipeline tool that inspired Airflow. Airbnb agreed, and the rest is history.
The creation of Superset followed a similar path. An internal project at Airbnb became Apache Superset, and Max made the strategic decision to donate it to the Apache Software Foundation. Why? Not just for governance guarantees, but with an eye toward the future: "If I want to start a company around it in the future, not a bad thing for the IP to be in neutral territory—the Switzerland of software—at Apache."
The Unexpected Inspiration: Burning Man and Digital Art
Here's where the story takes a fascinating turn. When asked about his inspiration for data visualization, Max didn't cite other BI tools. Instead, he talked about Burning Man and digital art installations.
"I was doing electronic-type art, making projects with flashy, glowing things," Max explains. He built interactive LED installations, psychedelic mirrors with trailing effects, and other visual experiments that played with color, animation, and real-time interaction.
"If you join data engineering and this hobby, maybe somewhere in the middle fits data viz. The interactive nature, the colors, animation—it's probably in that conceptual space."
This creative background fundamentally shaped how Superset approaches visualization. It's not just about charts and graphs; it's about making data interactive, beautiful, and engaging.
Why Max Started Preset: Open Source Needs Champions
After building Superset at Airbnb, Max faced a choice: stay and maintain it with a small team, or start a company to accelerate its adoption. His mission was clear: "Take over BI with open source."
"At Airbnb, if I had stayed, maybe we'd have a team of two, three, four people working on Superset," he says. "On the other side, if I raised money and started a company, believing that commercial interests can really coexist positively with open source... this foster child of an open source project needs a parent-type organization."
With VCs eager to fund him and the clock ticking on his 40th birthday, Max took the leap. But his advice to aspiring founders is surprisingly contrarian: he actively challenges people who want to start companies, telling them they shouldn't do it. "I just want them to only do it if they have strong convictions," he explains. "Make sure you know what you're getting into."
The Unsolved Problem: Code Reuse in Data Engineering
Perhaps the most thought-provoking part of the conversation centers on a problem that Max believes is fundamental to data engineering: we don't share data pipeline code.
"You go on the front end, I just built a front end app in less than a month that's incredible," Max notes. "It's because there's all these frameworks and toolkits and component libraries. You need a date picker, plug it in. In data engineering, there's kind of nothing like that."
Every company is reinventing the wheel—building their own growth accounting pipelines, calculating DAU/MAU, creating experimentation frameworks. "A lot of that stuff feels like tagged up before it's even committed to a repo," he says.
The challenge isn't just technical; it's conceptual. Without unified data models and a proper medium for sharing transformations (an "NPM for data engineering"), reuse remains elusive. Even with modern tools like dbt, the dialect differences and data model variations make true modularity difficult.
However, there's hope: with standardization in the EL/ELT layer through tools like Fivetran and Airbyte, we finally have a foundation. "Now that if you use Fivetran, I use Fivetran to sync Salesforce or HubSpot," Max says, we can start building on common ground.
The Modern Data Stack and What's Next
Throughout the conversation, Max touches on the evolution of data modeling—from Ralph Kimball's dimensional modeling to modern approaches—and how different philosophies shape our tools and workflows. He discusses the trade-offs between denormalized dimensional models and more normalized structures, always with an eye toward what serves analysts and end users best.
As AI continues to reshape every industry, Max sees opportunities for open source to play an even bigger role. "Because the models are trained on open source, I think there's a more natural bias towards working and using open source," he observes. "Software is eating the world. Open source is eating software. And AI is eating everything."
Listen to the Full Episode
This blog post only scratches the surface of a rich 30+ minute conversation covering:
- Max's journey from Ubisoft Montreal to Silicon Valley
- The Facebook data culture that shaped modern tools
- Deep dives into data modeling philosophy
- The intersection of art and analytics
- Founding advice and startup realities
- The future of the data stack
Listen to the full Data Renegades podcast episode to hear Max's complete story and insights.
Data Renegades is brought to you by Heavybit, the leading investor in enterprise infrastructure. Special thanks to Dory Wilson and the Recce team for hosting this insightful conversation.