Skip to main content

DBT training

One of the tools I am hoping to get to grips with is DBT. It appears to be a very popular tool at the moment. I think with the trend of moving to ELT having a good tool to perform your transformations is important and from what I hear DBT is good. 

I have signed up for the free DBT cloud developer account and connected it to my Snowflake instance but after that I am not quite sure what I am meant to be doing. DBT has its own training so I am starting with the dbt fundamentals course. The training is supposed to take several hours with a few more hours implementing the hands on project and gives up a badge for LinkedIn or something. I am more interested in trying out the tool and seeing what it can do, for free, for this project. I have looked into quite a few training courses over the last few months, looking at all the tools I am using for this and things like AWS and when it comes to actually being useful the dbt training is at the top so far. I skipped some as it was basic for someone with 10 years experience but as someone just starting out this is a really good introduction to not only some of the processes but best practices with testing and version control. 

There has already been an interesting article linked to on the dbt website. 

Working my way through the training it has already been very easy to setup a link to Snowflake and perform a transformation. I am planning on using it to create a couple of mock rollup tables on the exercise data. Given the volumes I am dealing with it is completely pointless but it gives me something to create. Initially I am going to remove the timestamp entry and group by day. 

Going through the training I have been nodding along to a lot of the functionality that the tool provides: 

  • documentation built in 
  • using yml files to be able to config a table so it only has to be changed in a single place 
  • automatic lineage using the ref function
  • which is used to auto order the jobs to be run 
I have used bespoke ETL tools where some of the above is not done or is done so with great effort, so the fact that for my little free play ETL I don't have to worry about this is great. 

By the end of the training I had: 
  • Connected everything to a Git repository.
  • Configured a transformation in my Snowflake DB (the roll up mentioned above) 
  • Configured the data sources so they are referred to using config so changing things could be done in a single place. 
    • This also gave me auto lineage on the transformations, diagram below but will have to try something more complicated. 
  • Configured Source freshness, gives warning if data is stale. 
  • Produced documentation of the sources and processes etc. utilising the yml files and auto documentation processes.
  •  
  • Used their testing as per this video
    • Great that it comes in with some built in tests that run quickly and are easy to configure. 
  • Scheduled a daily job to refresh the roll up tables modelled above. 

I plan on moving onto the more advanced training as I am interested to see what can be done and I really like the tool. 



Comments

Popular posts from this blog

AI News

Here’s a concise roundup of the latest AI news from the past couple of days: AI Technology: Friend or Foe? Researchers and experts continue to debate the impact of artificial intelligence. Is it a boon or a threat? The discussion ranges from AI ethics to its potential in various fields. Read more here . 5 Ways Artificial Intelligence Will Change the World by 2050 Futurists predict that AI will revolutionize our lives in the coming decades. From healthcare to transportation, AI is set to transform industries. Explore the possibilities here . How AI Will Transform Businesses in 2023 Business leaders are embracing AI to enhance efficiency, decision-making, and customer experiences. Stay updated on the latest AI trends in the corporate world here . China’s High-Level AI Modules China is pushing the boundaries of AI with modular next-generation systems. These high-level AI technologies promise breakthroughs in fields like robotics, healthcare, and smart cities. Learn more here . The Future

MySQL - Free

 So I was looking at trying to get a cloud based database that was always on. I wanted to build some visuals over whatever data I ended up building and have the DB accessible from a cloud server seemed like the easy way. I wanted to keep it free because I hate spending when I don't need to, so that others could use it for free and because I was sure there must be options out there. In the end my life was made much easier by spending £10 but you can go with the same free option on this site. https://www.freemysqlhosting.net/ Although not super fast or super sized it gives you a free and easily accessible database. So far I have easily connected using phpAdmin, BeeKeeper Studio, Python, Google Data Studio and Keboola. I have had no issues at all unlike several other solutions I have tried including Heroku.  To setup the DB you just set your location and hit start, you will then be e-mailed the connection details and then use your favourite MySQL IDE and you are in.  Above is a snapsh

Free AWS Training

Whilst a lot of the information I am post on here is about how to actually build a free cloud data platform for yourself there is also some good training out there. Whilst you will need to pay to get access a lot of places want to entice you in with certain bits for free. In the data engineering space a popular option for training is places with hands on labs. With the rise of these cloud platforms training providers are able to spin up mini instances with lots of restrictions and allow you to do hands on training without fear of using something outside of the free tier.  As I said most of this training is not free, I did a trial month at Whizlabs and got myself a certification on Snowflake and AWS with hands on experience using their sandbox areas. Honestly despite finishing that course I don't feel like I learnt a huge amount, the labs were too regimented and the trainers were not overly engaging. If money were no object I would give A Cloud Guru a go but I am looking for free ma