ANNOUNCEMENTS

Apache Superset Now Supports Real-time Indexing Database Rockset

Srini Kadamati

Rockset is a real-time indexing database built for the cloud that uses RocksDB for fast storage. Organizations use Rockset to power data-intensive, customer-facing applications where the ability to quickly index, query, and run fresh data in seconds is essential.

Rockset automatically indexes semi-structured data via Converged Index, where every field is indexed in 3 different ways: row, columnar, and an inverted index. This is how Rockset has millisecond query latency on terabytes of nested semi-structured data.

Rockset Diagram

Apache Superset is a modern, open-source BI platform built on modern technologies (Python, React / TypeScript, cloud-native backend, etc). Because Superset is open source and was built with extensibility in mind, Superset can easily be extended to support any SQL speaking data source. Superset ships with both a powerful no-code chart builder that lets end-users quickly build charts and assemble dashboards (and having Superset generate the queries for them).

The speed of crafting charts in Superset combined with the speed of ingesting and serving data in Rockset makes for a great pairing when building real time dashboards. If you’re unsure what exactly real-time entails, I encourage you to read our blog post on real-time analytics that explores the nuances further.

Adding Support for Rockset

Out of the box, Superset didn’t know how to talk to Rockset. As I discussed in my earlier post, Building New Database Connectors for Superset post](https://preset.io/blog/building-database-connector/), Superset needs a SQLAlchemy dialect and a companion Python DB-API 2.0 library to query a database. In addition, each database needs it’s own database engine spec (referred to db_engine_spec in the Superset codebase) to enable some specific features like time grains.

The Rockset team has built a Python client library that meets these requirements, so a lot of the heavy lifting is already done!

Rockset Python Driver

To better support Superset features like time grains, I created a minimal database engine spec for Rockset:

Rockset Database Engine Spec

and made Superset aware of Rockset as a queryable database.

Rockset DB Engine Spec 2

Adding Rockset Support in your Superset Deployment

There are two requirements to using Rockset with Superset:

  • Inclusion of the Rockset database engine spec in your Superset deployment
  • Inclusion of the Rockset library in the same Python context as your Superset deployment

The Rockset database engine spec was merged into Superset’s master branch 2 weeks ago at this time of writing. The last major release of Superset was version 1.3, which was released more than 2 weeks ago. This means that currently, the only way to tap into the Rockset database engine spec is to run a close-to-master version of Superset or cherry the specific commits into your Superset deployment.

In the following sections, I’ll assume that you’re running a Superset version that has the database engine spec mentioned above.

Docker

To add the Rockset driver in your Docker-ized Superset deployment, you just need to add the following line to docker/requirements-local.txt:

rockset==0.7.68

and rebuild your Docker image.

Non-Docker / Native

If you’re not using Docker to run Superset, then you need to ensure that the rockset driver (version 0.7.68 is the last one I have personally tested) is installed in your Python environment / context.

Either run pip install rockset==0.7.68 manually or add rockset==0.7.68 to your pip requirements installation file.

Make sure to re-build or re-deploy if needed for your setup.

Preset Cloud

Rockset is available in Preset Cloud, our hosted Apache Superset product and you will see Rockset as an option in the new Database modal.

Rockset Add DB

Connecting Rockset to Superset

If everything was installed correctly, you should see Rockset listed as an option in the new database modal.

Next, you need to generate an API key in the Rockset interface.

Rockset API Key

Once you have your Rockset API key, head back to Superset, select + Database, select Rockset from the drop-down, and type in the following value for the SQLAlchemy URI:

rockset://apikey:{your-apikey}@api.rs2.usw2.rockset.com/ 

Finally, click Test Connection to see if your Superset instance can talk to Rockset.

The Future of Rockset and Superset

This integration is still in its infancy! Superset can't yet use Rockset query lambdas as a data source, for example. Certain Superset features like CSV upload as well don’t work.

We’re hoping to collaborate with the Rockset team to close these gaps and unlock the full power of Rockset and Superset!