DATA ENGINEERING

Introduction to YugabyteDB and Apache Superset

Akshay Atrey and Balachandar Seetharaman

This post was originally published over at the YugabyteDB blog.

Enterprises today collect and store vast amounts of data. However, realizing business value from this data requires modern data exploration and visualization tools.

Apache Superset is a widely popular, open source data exploration and visualization platform used and supported by enterprises such as Airbnb, Lyft, and Nielsen. More specifically, Superset enables users to quickly and easily explore and visualize large amounts of data stored in YugabyteDB. As a result, this distributed SQL database connects seamlessly with Superset using the standard PostgreSQL connector.

In this post, we demonstrate how business users can use Superset to get valuable insights from their data stored in YugabyteDB. We’ll also walk through a retail operations use case that keeps track of order processing efficiency in a specific business territory, as highlighted below.

Yugabyte Superset Dashboard

Setup and Installation

Throughout this post, we’ll explore tables in the Northwind database (available as a sample database with YugabyteDB installation). Our goal is to create a chart showing the average time taken to ship an order once placed.

Step 1: Install YugabyteDB

You have multiple options to install or deploy YugabyteDBt if you don’t have one already available.

Step 2: Load the Northwind database

If using yugabyted, you can load the database with the ./bin/yugabyted demo connect command from your shell. Otherwise, you’ll need to follow these instructions.

Step 3: Install Superset and configure drivers

Install Superset from scratch using Python install (PIP) or Docker Compose (recommended).

Step 4: Launch Superset

After successful installation, launch Superset in the browser at http://<hostname/IP>:8088. In case of local installation on your laptop, navigate to localhost:8088 or 127.0.0.1:8088. The Superset landing page will look something like this:

Yugabyte Superset

Exploration

With setup and installation complete, you’re now ready to connect Superset and explore your database.

Step 1: Connect Superset with YugabyteDB

Superset connects to YugabyteDB with the standard PostgreSQL driver that comes with Superset. Otherwise, you need to manually install the psycopg2 driver. For this example, we’ll connect a single Yugabyte DB node using the standard PostgreSQL connector with the below configurations.

  • Navigate to Data → Databases → + Databases as shown below.

Yugabyte Superset 3

  • Choose PostgreSQL from the “Connect a database” menu highlighted below.

connect database superset

  • Finally, enter the hostname or IP Address of your YB-TServer with standard credentials.

superset db yugabyte

In addition, please refer to the settings captured in the above screenshot if you are using a localhost-based cluster installed on your laptop with Superset and docker-compose.

Note: As of Docker version 18.03, the host.docker.internal hostname connects to your Docker host from inside a Docker container.

Step 2: Explore the Northwind Dataset

Once you're connected to YugabyteDB, you need to load specific tables to explore as datasets.

  • Navigate to Data → Datasets → +Datasets and configure as shown below. Since we’re interested in order processing timelines, let’s explore the orders table.

add dataset yugabyte superset

  • Specifically, ‘orders’ will now be listed on the available datasets tab. Click on it to explore this dataset. You can explore this dataset in multiple charts, tables, and filtered rows without writing complex queries.
  • The below screenshot shows order placement and shipment dates for a particular region in the US within a specific date range.

explore orders table

  • Finally, explore some other tables and more columns to get a good understanding of the database. Try filters and date ranges.

Step 3: Create a report

In conclusion, let’s create a report to see how order processing times vary over a period of interest.

  • Navigate to Charts → +Chart from the main menu, select ‘orders’ as the dataset, and an appropriate chart style. To keep it simple, we’ll select ‘Time-Series Line Chart’ under ‘Popular’ and click the ‘Create New Chart’ button.
  • Configure it as shown below to visualize the varying average order processing times for U.S. orders over a period of 60 days. We’re using a simple ‘avg.(shipped_date – order_date)’ metric to calculate this. Play with time range, filters, and group by to visualize different periods, regions, or countries. And don’t forget to click ‘Run Query’ to refresh the chart.

create report chart superset

  • Once you have the desired chart, click ‘Save’ to save the chart and add it to a dashboard.

save chart superset yugabyte

  • Try creating another chart as shown below to visualize order distribution by regions within the U.S. Save this chart and add it to the dashboard you created earlier.

create report chart 2

  • Finally, navigate to ‘Dashboards’ from the main menu and click on the dashboard you created. Click on the ‘draft’ status button to publish the dashboard.

dashboard

Conclusion

In this blog post, we presented how to explore and visualize data stored in YugabyteDB with Apache Superset. YugabyteDB is PostgreSQL compatible and you can seamlessly connect Superset with YugabyteDB using the standard PostgreSQL connector that is part of the Superset docker installation.

More specifically, YugabyteDB and Superset are both designed for modern enterprises leading digital transformation. When combined, these tools create business value from large amounts of data they collect and store. Both come with great documentation and have excellent developer and community support.

If you haven’t already, check out our docs and blog blog sites to learn more about YugabyteDB. Any questions? Ask them in the YugabyteDB community Slack channel.

Subscribe to our blog updates

Receive a weekly digest of new blog posts

Close