Skip to main content

dbt - pivoting on list of values

Say you have a massive table but for ease of use for reporting you need to split numbers out. For example say you are a service that provides numerous different services, each person gets a single form with a mark for each service provided and the transactional system stores this a a row per form per service. For reporting you may well want to pivot these out and have a separate columns for each service with a 1 / 0 and a count for each. 

In traditional databases and tools you have to create all your views in advanced and you cannot pivot on a list of values. The joy I am finding with dbt and jinja coding is that you can create the views dynamically. And whilst that may pose some risk if means that in this scenario a new service code is catered for automatically as you can dynamically generate the column by looping through a list of values. 

I have applied this logic my strength exercise data, pivoting it from rows to columns. Below is the jinja code and the resulting SQL, 

The other thing I used here was from the utils was slugify utility to replace spaces with underscores in my strings so I can use them in column names.
{%- set activities = dbt_utils.get_column_values(
table=ref('stg_exercise_star'),
column= 'description'
) -%}
SELECT
{{ dbt_utils.current_timestamp() }} last_refresh,
date_done,
{% for activity in activities -%}
sum(case when description = '{{ activity }}' then act_count else 0 end)
as {{ dbt_utils.slugify(activity) }}_count
{%- if not loop.last -%}
,
{% endif -%}
{%- endfor %}
from {{ ref('stg_exercise_star') }}
group by 2
SELECT
current_timestamp::
timestamp_ntz
last_refresh,
date_done,
sum(case when description = 'Pull Up' then act_count else 0 end)
as pull_up_count,
sum(case when description = 'Crunch' then act_count else 0 end)
as crunch_count,
sum(case when description = 'Push Up' then act_count else 0 end)
as push_up_count,
sum(case when description = 'Dumbbells' then act_count else 0 end)
as dumbbells_count,
sum(case when description = 'Box Jump' then act_count else 0 end)
as box_jump_count,
sum(case when description = 'Plank Time' then act_count else 0 end)
as plank_time_count
from KEBOOLA_7127.WORKSPACE_15661914.stg_exercise_star
group by 2
view raw pivoted_gen.sql hosted with ❤ by GitHub

Comments

Popular posts from this blog

Gen AI news 29-04-2024

Here are some recent updates and insights related to Generative AI (gen AI) : Enterprise Hits and Misses - Robotics and Gen AI Converge : This article discusses the convergence of robotics and generative AI. It explores breakthroughs needed in the field, the FTC’s policy change regarding non-competes, and the impact on AI model sizes for enterprises 1 . Read more All You Need To Know About The Upcoming AI-Powered OLED iPad Pro : This piece provides a summary of rumors surrounding the next-gen AI-fused OLED iPad Pro, powered by the new Apple M4 chip 2 . Read more Delivering on the Promise of Gen AI : New Electronics reflects on NVIDIA GTC and key announcements that contribute to delivering on the promises made for generative AI 3 . Read more The Future of Generative AI - An Early View in 15 Charts (McKinsey): Since the release of ChatGPT in November 2022, generative AI has been making headlines. McKinsey research estimates that gen AI features could add up to $4.4 trillion to the globa...

Keboola Flows

Really finding Keboola was the thing that kickstarted this project otherwise I would be trying to build custom code on a python cloud server and building everything from scratch.  In Keboola you build you data sources and destinations using connection details which is fairly simple and something I will likely cover in another post, same goes for transformations etc. Here though I am going to discuss Flows, this is where you bring everything together. On my free account there are some limitations.  My easiest flow is very basic:  Pull parkrun results e-mail from Gmail to Google Sheets (actually done by Zap not Keboola).  Keboola will, as often as I like, in this case once a week, pull the data from the sheet into its storage.  It will then transfer this to the target database. Currently I have this setup to be MySQL database but I can and might expand that to the Snowflake instance within Keboola.  I then, outside of Keboola, connect to the MySQL database f...

Snowflake Scripting - SQL Cursors

Snowflake scripting in SQL seems to be in preview and I have decided to have a play with it. Given how new it is there is limited documentation so I am using a combination of what I can find on the Snowflake site and the odd blog that has been written about it. There appear to be a few quirks, at least when compared to Oracle PL/SQL (though that has been round for years). How many of these are intentional and how many are things to be ironed out I don't know. You can see the procedure I have created it:  Accepts an id as a parameter  Creates a result set selecting from a table, using the parameter as a filter Loads the results set into a cursor.  Loops through the cursor loading the id in the cursor into variable Calls procedure passing in the variable as the parameter.  Then as a proof of concept I tried the Snowflake feature of allowing declaration of variables within the main start and end block.