Skip to main content

DBT training

One of the tools I am hoping to get to grips with is DBT. It appears to be a very popular tool at the moment. I think with the trend of moving to ELT having a good tool to perform your transformations is important and from what I hear DBT is good. 

I have signed up for the free DBT cloud developer account and connected it to my Snowflake instance but after that I am not quite sure what I am meant to be doing. DBT has its own training so I am starting with the dbt fundamentals course. The training is supposed to take several hours with a few more hours implementing the hands on project and gives up a badge for LinkedIn or something. I am more interested in trying out the tool and seeing what it can do, for free, for this project. I have looked into quite a few training courses over the last few months, looking at all the tools I am using for this and things like AWS and when it comes to actually being useful the dbt training is at the top so far. I skipped some as it was basic for someone with 10 years experience but as someone just starting out this is a really good introduction to not only some of the processes but best practices with testing and version control. 

There has already been an interesting article linked to on the dbt website. 

Working my way through the training it has already been very easy to setup a link to Snowflake and perform a transformation. I am planning on using it to create a couple of mock rollup tables on the exercise data. Given the volumes I am dealing with it is completely pointless but it gives me something to create. Initially I am going to remove the timestamp entry and group by day. 

Going through the training I have been nodding along to a lot of the functionality that the tool provides: 

  • documentation built in 
  • using yml files to be able to config a table so it only has to be changed in a single place 
  • automatic lineage using the ref function
  • which is used to auto order the jobs to be run 
I have used bespoke ETL tools where some of the above is not done or is done so with great effort, so the fact that for my little free play ETL I don't have to worry about this is great. 

By the end of the training I had: 
  • Connected everything to a Git repository.
  • Configured a transformation in my Snowflake DB (the roll up mentioned above) 
  • Configured the data sources so they are referred to using config so changing things could be done in a single place. 
    • This also gave me auto lineage on the transformations, diagram below but will have to try something more complicated. 
  • Configured Source freshness, gives warning if data is stale. 
  • Produced documentation of the sources and processes etc. utilising the yml files and auto documentation processes.
  •  
  • Used their testing as per this video
    • Great that it comes in with some built in tests that run quickly and are easy to configure. 
  • Scheduled a daily job to refresh the roll up tables modelled above. 

I plan on moving onto the more advanced training as I am interested to see what can be done and I really like the tool. 



Comments

Popular posts from this blog

Gen AI news 29-04-2024

Here are some recent updates and insights related to Generative AI (gen AI) : Enterprise Hits and Misses - Robotics and Gen AI Converge : This article discusses the convergence of robotics and generative AI. It explores breakthroughs needed in the field, the FTC’s policy change regarding non-competes, and the impact on AI model sizes for enterprises 1 . Read more All You Need To Know About The Upcoming AI-Powered OLED iPad Pro : This piece provides a summary of rumors surrounding the next-gen AI-fused OLED iPad Pro, powered by the new Apple M4 chip 2 . Read more Delivering on the Promise of Gen AI : New Electronics reflects on NVIDIA GTC and key announcements that contribute to delivering on the promises made for generative AI 3 . Read more The Future of Generative AI - An Early View in 15 Charts (McKinsey): Since the release of ChatGPT in November 2022, generative AI has been making headlines. McKinsey research estimates that gen AI features could add up to $4.4 trillion to the globa...

AWS training cloud academy free course

One of the things I like about this course are the instructors are really clear but also that it provides free labs that allow you to actually sign into AWS and perform some actions to actually create and do things without worrying that you are going to incur a cost.  Today I complete one of the hands on labs.  This was to create a lambda function, in this case it was a very basic python script that was searching a website for a keyword. I then placed this into a schedule and used cloudwatch to create a dashboard that monitored the running of this function. Overall it was a very simple use case but it was also a very simple process to setup.  I don't have much to add to this other than it is well worth signing up to cloud academy for the free training if nothing else, I am tempted, once i have done some more training, to give the paid for option a go to get the full sandboxes. 

Gen AI News - 01/04/2024

According to Gemini here is the latest news:  Here's a rundown on some recent Generative AI news you might find interesting: Nvidia's New Architecture: Nvidia announced their next-gen "Blackwell" architecture for GPUs [1]. This promises a significant leap in performance for running large language models and other generative AI tasks. AI for Mental Health: An article highlighted how chatbots powered by generative AI are being used to help Gen Z with mental health struggles [2]. Amazon Invests in Anthropic: Amazon is making a further investment in Anthropic, an AI startup focused on developing safe and beneficial AI [2]. This suggests big players are keen on the potential of generative AI. Google and Reddit Partner Up: There have been talks about Google and Reddit forming a new partnership [4]. This could involve using generative AI for content creation or recommendation on Reddit. Overall, the field of Generative AI is seeing continued investment and de...