Skip to main content

dbt - more stuff

The more I used dbt the more I like it. I am finding many of its features really useful and I haven't even done the training on macros and packages yet so I feel there is more to come yet. In the meantime I have now start to, just of the fun of it, create some downstream views with dependencies on other steps and a function in SQL. Happy to say it is all working really well and using jinja (and my Snowflake function) has saved me heap of time coding. 

Sources yml: 


View using the source function (results in SQL)


View that references the output from previous steps, allows them to be linked: 


Assuming you create your sources in the yml file and reference previous steps using the reference function rather than calling the resulting table (dbt handles that for you) (as shown above) it will automatically work out the dependencies, run things in the right order and produces a lovely lineage graph like so. 



I am hoping to stop playing with what I know of dbt and might make some visuals based on the extra processing I have done before playing with dbt some more by working through some more of the training. 



Comments

Popular posts from this blog

Gen AI News - 12/03/2024

Google’s Beta AI Content Rewriting Tool : Google is testing an AI tool that finds and rewrites quality content. However, some critics argue that it may incentivize the production of AI-generated low-quality content 1 . The New York Times and OpenAI Controversy : A court filing alleges that The New York Times paid someone to hack OpenAI’s products using deceptive prompts. The situation raises questions about the ethical use of AI 1 . Optimizing GPTs for Online Visibility : Learn how to increase online visibility and click-through rates for your GPT models in the GPT Store and Google Search with six practical tips 1 . AI Democratizing SEO or Amplifying Incompetence? : Understand what AI can realistically do for SEO and manage expectations regarding results 1 . Google’s “Help Me Write” AI Assistant : Google has launched an AI writing assistant called “Help Me Write” for the Chrome browser. It suggests text based on website context 1 . Google’s Gemini: Laptop-Friendly Open Language Model :...

My Latest project using Gen AI

So recently parkrun removed all their stats and as a keen running who is trying to work their way up the top 100 of their local parkrun I wanted to get some of these stats back and have a bit of "fun" at the same time. So here is a little "ETL" process that I developed with the help of Gen AI.  The steps of my ETL:  Copy and paste data into Google Sheets template where an AI produced formula extracts URLS from the text and puts them into a new field. This effectively allows me to extract the parkrun athlete id, the primary key, and use it in my analysis. I also have a column to autofill the data I am processing.  Use an Gen AI generated Google Apps script to process it into a processed sheet, this allows me to build up a backlog of events (I had over 500 to process).  This is then queried using a Gen AI Google sheets query to extract key information and columns / format times etc. I then ingest the fully processed sheet into Keboola directly from Google Sheets. ...

Zapier

As much as I have enjoyed using Keboola there are some connections that it doesn't have or that just haven't worked for one reason or another. I actually came across Zapier as a solution for bringing in e-mails from parkrun to load my results every week. Honestly I have not found it to be as robust as Keboola but that might just be me archiving my e-mails before it completes it 15 minute poll.  The second use case I am working on is the pulling in Strava data, for a fitness dashboard the fact it has a built in connector for Strava is great, though I am worried given the activities I do that I might reach the limit.  I won't go into details on how to set things up but you can setup 5 Zaps that can run for a combined 100 runs during a month for free.  In my data platform / solution I am using Zaps to load harder to get / automate data. It doesn't add much from a technical point of view as it is just signing into a few account to get the data into Google Sheets for downstr...