Superset

Superset Announces Elasticsearch Support!

Daniel Gaspar

At Preset, we love open source.

The Superset open-source community is active with daily contributions flowing in from passionate contributors around the world. New features, fixes, code improvements or just quick asks for help - this is open source at it's best. Together, we generate a positive feedback loop that pushes innovation, quality and an inclusive community.

Before we begin, thanks to everyone from the Superset community who requested support for Elasticsearch. Your feedback made it clear that this is an important integration, and we’re excited to share what we’ve built.

Elasticsearch support for SQL Queries

You may already have heard, but Elastic recently released support for SQL queries. SQL being as widespread as it is, this enables all sorts of new integrations.

This got us thinking - what could be the simplest and best solution to add Elasticsearch support on Superset?

Connecting Python and Elasticsearch

Superset is written in Python and uses SQLAlchemy to connect to SQL-speaking databases. Until now, there was no option to connect to Elasticsearch from SQLAlchemy.

We chose the following process for building this:

  1. Implement a python database driver for Elasticsearch that follows the DBAPI specification
  2. Create a SQLAlchemy dialect for Elasticsearch. For context, SQLAlchemy a popular SQL toolkit and Python Object Relationship Mapper (ORM)
  3. Open-source the code for all this
  4. Publish 1 and 2 as a Pypi (Python Package Index) package

With this driver and dialect in the open, Elasticsearch is now accessible to a wide range of existing python applications (including Superset!). Below, we have provided a quick guide to installation with code snippets.

Let's take a look on how to use it.

Installation

pip install elasticsearch-dbapi

If you have an available elasticsearch cluster you can use it, if not you can clone the project and start one (Kibana included):

git clone git@github.com:preset-io/elasticsearch-dbapi.git

cd elasticsearch-dbapi

docker-compose up -d

Now, wait a few seconds and then:

Access Kibana (http://localhost:5601) to import the provided sample Flights data.

Finally, write a simple python script to query our cluster:

from es.elastic.api import connect

conn = connect(host='localhost')

curs = conn.cursor().execute(
    "select * from kibana_sample_data_flights LIMIT 10"
)

print([row for row in curs])

How easy was that?

As a follow up, if you're curious on how to write a Python Database API Specification (dbapi) check out PEP-249.

Integrating Python, Elasticsearch and Apache Superset

Now that you have set up your Python and Elasticsearch, it's time to integrate with Superset. Superset implements an abstract database (DB) specification. This generic DB integration can be specifically implemented to support any kind of DB that has a python driver and is compliant with PEP-249. This specification creates an abstract layer between Superset and any DB engine.

Using this set up, let's see what flight carrier has canceled more flights.

First, let's look at all of the data from the Flights dataset charts and dashboard.

Screenshot of What You can Build with Elasticsearch and Superset Using Flights Dataset

Navigate to SQL Lab.

Protip: there is no concept of schema on Elasticsearch so we expose a "default" schema

It's also important to remember that Indexes are treated like Relational Database Management System (RDBMS) tables.

SQL querying which flight carriers have the most cancelled flights

"JetBeats" is the dreaded winner. Happy Flying and thanks for reading our how to guide and example.

Here are some additional resources for you:

Preset will be available soon. Be the first to know.