SUPERSET

Intro To Jinja Templating in Apache Superset™

Shreesham Mukherjee

Intro to Jinja Templating

In the world of Preset and Superset, a dashboard is a collection of charts. Dashboards have the unique capability to tell a story by combining different types of charts to form a narrative.

When used correctly, a dashboard can empower teams to present powerful data-based presentations as well as enable organizations to monitor information based on dynamic data.

But what if we wanted to make our dashboards more interactive? What if we had a dashboard that measured the population of countries and we wanted to have the ability to filter the dashboard based on country or region? What if we had a dashboard that tracked support tickets with columns for associated actions, labels, etc. and we wanted the ability to filter by the different possible ticket label values? How would we achieve these things without the need to modify the underlying queries powering the dashboards?

What is Jinja?

Jinja is a templating engine for Python. The templates are traditionally used in web development to define and re-use common Python code snippets. Other popular software products that utilize Python in their backends such as Ansible, dbt, or Superset/Preset also leverage Jinja as a templating engine!

SQL is powerful — but there are a lot of features that are not supported. There's missing support for loops and variables, as SQL commands are by nature narrower than other conventional programming languages like Python. SQL statements specify what data operations should be performed rather than how to perform them. This often surfaces challenges around how to make SQL more dynamic.

All Superset/Preset queries pass through the Python backend and utilizing Jinja allows users to craft more dynamic SQL queries to increase the interactivity of their dashboards.

  • Dynamic SQL helps generalize the SQL
    • Increased legibility
      • Code used for the generation of SQL queries in templates increase legibility compared to building string statements
      • This may require different functions to generate parts of the SQL statements
        • With a template these components are rendered inline based on the initial variable inputs

In general, creating complex queries with loops and other types of abstractions with Jinja can increase the legibility of queries by reducing the number of times a user may copy paste the same things in statements

At a high level, templating is the ability to add programmatic capabilities to SQL with Jinja acting as the bridge allowing us to inject context into queries. Preset’s SQL Lab supports query templating via the Jinja Framework.

With and Without Jinja (a brief history of Superset)

Superset Without Jinja

Superset didn’t always have Jinja templating. Before Jinja, only basic “dynamic-ness” could be achieved. Superset originally had filter boxes, which moved on to become dashboard native filters.

Virtual datasets, the primary way to enhance this dynamic-ness pre-Jinja, are created via any valid SQL query (as long as the operations, joins, etc. were supported by the underlying database). At the end of the day a virtual dataset in Superset is simply a query — if the query was changed the virtual dataset would change. The dashboard filters worked by injecting state that the user provides and only whatever is in the filter state can be passed into the query. These are typically only main/outer query operations.

This limited how much control the user had over where the filter values could be applied to the query. More complicated filtering in sub/inner queries couldn’t be done.

In summary, the filter context would be created, injected into the query, that query is sent to the underlying database, and the final query was what Superset would create.

Superset With Jinja

With Jinja context, the filter state becomes a much more powerful tool. Now there is more available to pull from such as:

  • User state
    • the user_id of the person logged in
    • their username
  • URL state
    • you can add parameters to a Superset URL that can be inserted into your query
    • For example, if you have a sales dashboard, and the end user only wanted to see sales in Barcelona:
      • A dashboard can be sent where they can go to the filter and click Barcelona
      • Send a dashboard with a pre-filtered URL with the city in the URL parameter itself so when the dashboard loads they can immediately see what they want

Most importantly, you still have the filter state but instead of just being injected the way Superset thinks (typically in the WHERE clause) this virtual dataset can now be exposed as a variable to build queries on — so now further options become available such as JOIN only when year greater than some year value, something that wouldn’t be possible without Jinja.

It is important to remember that at the end of the day whether you add this dynamic-ness with or without Jinja everything is still a query: the same concepts apply — context is obtained, it is applied to the query, the query is sent to your database, and Superset surfaces the results of that query. This should always be the mental model to keep in mind when working with Superset!

SQL Templating with Jinja in Superset

SQL Lab and Explore supports Jinja templating in queries. To enable templating, the ENABLE_TEMPLATE_PROCESSING feature flag needs to be enabled either in:

  • superset/config.py
  • superset_config.py in the root directory (which you must manually create to over-ride values in superset/config.py

This enables you to extend the SQL queries Superset generates with short Python snippets.

In Preset Cloud, our is a fully hosted, hassle-free cloud service for Superset, Jinja templating is enabled for you already!

By default, the following variables are made available in the Jinja context:

  • columns: columns which to group by in the query
  • filter: filters applied in the query
  • from_dttm: start datetime value from the selected time range (None if undefined)
  • to_dttm: end datetime value from the selected time range (None if undefined)
  • groupby: columns which to group by in the query (deprecated)
  • metrics: aggregate expressions in the query
  • row_limit: row limit of the query
  • row_offset: row offset of the query
  • table_columns: columns available in the dataset
  • time_column: temporal column of the query (None if undefined)
  • time_grain: selected time grain (None if undefined)

For example, to add a time range to a virtual dataset, you can write the following:

SELECT * 
FROM tbl 
WHERE 
	dttm_col > '{{ from_dttm }}' AND dttm_col < '{{ to_dttm }}'

Common Jinja Use Cases

Before diving deeper into specific use cases with some of the pre-defined macros, let’s touch on two of the most common, broader categories of what people use Jinja templating in Superset for:

Performance

  • Dynamic Subqueries
    • One example of Jinja templating in the Superset context would be using highly dynamic subqueries for date filtering to avoid the naive outer query approach, allowing the user to be more methodical in their filtering.
  • Database Specific Functions
    • Another case where Jinja templating is extremely useful in the Superset context is when utilizing exotic database specific functions that have not been implemented in the underlying database engine specs.
      • For example, TimescaleDB extends PostgreSQL by adding new functionality, and packages and optimizes PostgreSQL for time-series data analytics. TimescaleDB has it’s own unique time_bucket() function that optimizes the default PostgreSQL date_trunc() function. The time_bucket() function is much faster and performant than date_trunc() when attempting to slice millions to billions of rows of date values. Here, Jinja templating becomes an escape valve which allows you to use the DB specific time_bucket() while avoiding use of the much slower date_trunc()

The inclusion of Jinja increases flexibility in the Superset filter functionality and some more specific examples of what Jinja can be utilized for are:

  • Displaying only the currently logged in user’s data.
  • Using a filter component to filter a query when the name of a filter column doesn't match one in the current query.
  • Applying filter constraints via Superset dashboard URL.
  • For tables containing arrays, being able to search for entries with a particular value in the array.
  • Ability to apply OR statements in a filter for two different columns (e.g. ‘name’ = “Matthew” OR ‘age’ = 12)
  • Personalized dashboards.
  • Controlling access to data using Row Level access based on User_ID.
  • Investigate access to Superset metadata by User_ID.

Pre-defined Jinja Macros in Superset

Current User ID

The {{ current_user_id() }} macro returns the user_id of the currently logged in user.

  • Macro: {{ current_user_id() }}
  • Parameters: None
  • Returns: Integer

If you have caching enabled in your Superset configuration, then by default the the user_id value will be used by Superset when calculating the cache key.

A cache key is a unique identifier that determines if there's a cache hit in the future and Superset can retrieve cached data.

You can disable the inclusion of the user_id value in the calculation of the cache key by adding the following parameter to your Jinja code:

{{ current_user_id(add_to_cache_keys=False) }}

Current Username

The {{ current_username() }} macro returns the username of the currently logged in user.

  • Macro: {{ current_username() }}
  • Parameters: None
  • Returns: String

Use Case: Retrieve Username

Entered SQL:

SELECT *
FROM jinja_username_demo
WHERE "username" = '{{ current_username() }}'

Result:

SELECT * FROM 
jinja_username_demo 
WHERE "username" = 'test@gmail.com

Just like the {{ current_user_id() }} macro, if you have caching enabled in your Superset configuration, then by default the the username value will be used by Superset when calculating the cache key.

You can disable the inclusion of the username value in the calculation of the cache key by adding the following parameter to your Jinja code:

{{ current_username(add_to_cache_keys=False) }}

Custom URL Parameters

The {{ url_param('custom_variable') }} macro lets you define arbitrary URL parameters and reference them in your SQL code.

  • Macro: {{ url_param('custom_variable') }}
  • Parameters: Takes in string with the name of parameter passed into URL. Pass specified parameter in query into URL such as url/?reg=North+America.
  • Returns: String

Use Case: Custom URL Parameters

Entered SQL:

SELECT count(*)
FROM ORDERS
WHERE country_code = '{{ url_param('countrycode') }}'

Scenario:

  • You're hosting Superset at the domain www.example.com
  • You send your coworker in Spain the following SQL Lab URL www.example.com/superset/sqllab?countrycode=ES and your coworker in the USA the following SQL Lab URL www.example.com/superset/sqllab?countrycode=US

Result:

For your coworker in Spain, the SQL Lab query will be rendered as:

SELECT count(*)
FROM ORDERS
WHERE country_code = 'ES'

For your coworker in the USA, the SQL Lab query will be rendered as:

SELECT count(*)
FROM ORDERS
WHERE country_code = 'US'

Explicitly Including Values in Cache Key

The {{ cache_key_wrapper() }} function explicitly instructs Superset to add a value to the accumulated list of values used in the the calculation of the cache key.

This function is only needed when you want to wrap your own custom function return values in the cache key. Additional context can be found in this file within the Superset repo.

Note that this function powers the caching of the user_id and username values in current_user_id() and current_username() function calls if caching is enabled.

Filter Values

You can retrieve the value for a specific filter as a list using the {{ filter_values() }} macro

This is useful if:

  • You want to use a filter component to filter a query where the name of the filter component column doesn't match the one in the select statement
  • You want to have the ability for filter inside the main query for performance purposes

Use Case: Filtering Values
Method One:

SELECT action, count(*) as times
FROM logs
WHERE
	action in {{ filter_values('action_type')|where_in }}
GROUP BY action

The  where_in filter converts the list of values from filter_values('action_type') into a string suitable for an IN expression.

Method Two:

SELECT action, count(*) as times
FROM logs
WHERE
	action in ({{ "'" + "','".join(filter_values('action_type')) + "'" }})
GROUP BY action

For both methods above, if NULL values need to be included with a filter, an additional OR clause may need to be added to the WHERE clause to include this case.
You can use this feature to reference the start & end datetimes from a time filter using:

  • Start datetime value: {{ from_dttm }}
  • End datetime value: {{ to_dttm }}

Boolean Value Filtering

Boolean filter values can also be used for filtering. However, the extra single quotes added to the filter values Jinja needs to be removed.

Use Case: Boolean Filter Values

SELECT name, count(*)
FROM "Slack Users"
WHERE
	is_bot in ({{",".join(filter_values('bot_filter'))}})
GROUP BY name

For this Jinja templated filter to operate, the values in column bot_filter of the dataset populating the filter must be strings, not booleans.

Filters for a Specific Column

The {{ get_filters() }} macro returns the filters applied to a given column. In addition to returning the values (similar to how filter_values() does), the get_filters() macro returns the operator specified in the Explore UI.

This is useful if:

  • You want to handle more than the IN operator in your SQL clause
  • You want to handle generating custom SQL conditions for a filter
  • You want to have the ability to filter inside the main query for speed purposes

Use Case: Filters for Specific Columns

WITH RECURSIVE
	superiors(employee_id, manager_id, full_name, level, lineage) AS (
	SELECT
		employee_id,
		manager_id,
		full_name,
		1 as level,
		employee_id as lineage
	FROM employees
	WHERE 1=1
	{%- for filter in get_filters('full_name', remove_filter=True) -%}
		{%- if filter.get('op') == 'IN' -%}
			AND full_name IN {{ filter.get('val')|where_in }}
		{%- endif -%}
	{%- if filter.get('op') == 'LIKE' -%}
		AND full_name LIKE {{ "'" + filter.get('val') + "'" }}
	{%- endif -%}
{%- endfor -%}
	UNION ALL
		SELECT
			e.employee_id,
			e.manager_id,
			e.full_name,
			s.level + 1 as level,
			s.lineage
		FROM 
			employees e, superiors s
		WHERE s.manager_id = e.employee_id
)
SELECT
	employee_id,
	manager_id,
	full_name,
	level,
	lineage
	FROM superiors
ORDER BY lineage, level

Using Jinja To Filter Data (Walkthrough)

General Overview

Here's a high-level overview of the workflow for using Jinja templating in Superset.

  1. Create a Virtual Dataset.
  2. Modify the Virtual Dataset query to include the Jinja Templating structure.
  3. Create a Chart from the Virtual Dataset and add it to a Dashboard.
  4. Configure the Dashboard filters.

Step 1: Creating a Virtual Dataset

For this example, we'll use the Vehicle Sales table from the examples database.

Consider the query below:

WITH calculation as (
	SELECT count(*), country
	FROM "Vehicle Sales"
	WHERE product_line in ('Classic Cars', 'Motorcycles')
	GROUP BY country
)
SELECT * FROM calculation

If we create a Virtual Dataset using this query, the dataset will have only two columns — count and country. We won't be able to create a dashboard filter for product_line so we can utilize Jinja to do just that!

If we execute the query including the Jinja templating in the SQL Lab directly, it won't return any results as we don't have the filters in there to pass the values, which means that the executed query end up looking like:

WITH calculation as (
	SELECT count(*), country
	FROM "Vehicle Sales"
	WHERE product_line in ('')
	GROUP BY country
)
SELECT * FROM calculation

To get around this we need to create the Virtual Dataset without Jinja first:

  • In the Toolbar, hover over SQL Lab, and in the sub-menu select SQL Editor.
  • In the SQL Editor, select the following database, schema, and table:

db schema table schema

Use the following SQL with a value for product_line to create a new virtual dataset:

WITH calculation as (
	SELECT count(*), country
	FROM "Vehicle Sales"
	WHERE product_line in ('Classic Cars')
	GROUP BY country
)
SELECT * FROM calculation
  • Select Run and then Explore. The Save or Overwrite Dataset window appears.
  • Name new virtual dataset "Filtering with Jinja" and then click Save & Explore.

save and explore

Step 2: Modify the Virtual Dataset to Include Jinja

A new tab will open in your browser featuring your new virtual dataset. The first thing to do is to include the Jinja templating in the dataset.

In the Dataset panel, select the vertical ellipses and, in the sub-menu, select Edit dataset.

edit dataset

  • The Edit Dataset window appears. In the Click the lock to make changes field, select the lock icon to enable dataset editing.

click lock

In the SQL text-entry panel, replace product_line in ('Classic Cars') with: 

product_line in ({{ "'" + "', '".join(filter_values('product_line')) + "'" }}) 

and then select Save.

select save

Lastly, in the Confirm Save window, select OK.

confirm save

Structure of Jinja Templating

  • The beginning — {{ "'" + "', '".join( — and ending — ) + "'" }} — are responsible for concatenating everything — this structure allows for filtering with multiple values.
  • filter_values('Column-Name') is the actual function that's going to look for the filter value. In this case, Column-Name is product_line — we'll talk more about this in the coming steps.

Step 3: Add Chart to Dashboard

Now that we have properly modified the Virtual Dataset, we can execute our chart by selecting Update Chart.

update chart

If you see “No results were returned for this query”

result error

It’s expected! The chart will not return any results because it has yet to be configured.

Select Save to save the chart to a new or existing dashboard.

save the chart

In this example, we titled the chart "Jinja Filtering Chart" and saved it to a new dashboard. When completed, select Save & Go to Dashboard.

save and go to dashboard

Step 4: Configure the Dashboard Filters

The dashboard will launch in your browser. After it appears, let's start the final step of configuring the dashboard's filters.

Start by selecting the right arrow to expand the Filter area, and then select + Add/Edit Filters.

add edit filters

The Add and edit filters window appears.

  • In the Filter Name field, enter a name for your filter. In this example, we will create a filter that displays options from our Jinja-revised product line, so we’ll call the filter "Product Line".
  • In the Dataset field, let's switch from our virtual dataset to the actual source for the inner query, which is Vehicle Sales.
  • In the Column field, we’ll select product_line because this is what defines our column name in the Jinja structure.
  • Next, select the Scoping tab to ensure that the filter is mapped to your chart.
  • When done, select Save.

select save prod line

Now let's see the filter in action!

In the new Product Line filter, we selected MotorcyclesPlanesTrains, and Ships, and then selected Apply Filters.

filter in action

If you'd like to see your inner query displayed via Jinja Templating — select the vertical ellipsis icon in the chart and then choose View query.

view query

Now the query updates to include the filter values of Product Line:

jinjified query

Conclusion

It should be re-emphasized that Jinja via Superset serves as an escape valve to perform actions against the underlying datasets. Complexity can become increasingly difficult to manage if Jinja templating is used to create intense custom queries and virtual datasets for every piece of advanced transformation logic, rather than having these datasets created and transformed accordingly further upstream in the data architecture, and making them available for proper visualization within Superset (where it really shines). All of this can result in much harder to debug issues, should they arise, because the data lineage and transformation logic becomes that much more obfuscated and harder to trace.

The core use case for Superset is to enable people to explore, visualize, and collaborate on their data effectively. If you've set up the transformation layer to power the datasets you need for your analysis, Superset aims to be the fastest way to craft a chart. Superset ideally serves fast-moving data teams and works best with a dataset-centric approach which you can learn more about from the Case for Dataset-Centric Visualization blog post by Max Beauchemin.

If you prefer an up-to-date documentation reference for Jinja templating in Superset / Preset, we recommend our Preset Documentation!

Subscribe to our blog updates

Receive a weekly digest of new blog posts

Close