One of the tools I am hoping to get to grips with is DBT. It appears to be a very popular tool at the moment. I think with the trend of moving to ELT having a good tool to perform your transformations is important and from what I hear DBT is good.
I have signed up for the free DBT cloud developer account and connected it to my Snowflake instance but after that I am not quite sure what I am meant to be doing. DBT has its own training so I am starting with the dbt fundamentals course. The training is supposed to take several hours with a few more hours implementing the hands on project and gives up a badge for LinkedIn or something. I am more interested in trying out the tool and seeing what it can do, for free, for this project. I have looked into quite a few training courses over the last few months, looking at all the tools I am using for this and things like AWS and when it comes to actually being useful the dbt training is at the top so far. I skipped some as it was basic for someone with 10 years experience but as someone just starting out this is a really good introduction to not only some of the processes but best practices with testing and version control.
There has already been an interesting article linked to on the dbt website.
Working my way through the training it has already been very easy to setup a link to Snowflake and perform a transformation. I am planning on using it to create a couple of mock rollup tables on the exercise data. Given the volumes I am dealing with it is completely pointless but it gives me something to create. Initially I am going to remove the timestamp entry and group by day.
Going through the training I have been nodding along to a lot of the functionality that the tool provides:
- documentation built in
- using yml files to be able to config a table so it only has to be changed in a single place
- automatic lineage using the ref function
- which is used to auto order the jobs to be run
- Connected everything to a Git repository.
- Configured a transformation in my Snowflake DB (the roll up mentioned above)
- Configured the data sources so they are referred to using config so changing things could be done in a single place.
- This also gave me auto lineage on the transformations, diagram below but will have to try something more complicated.
- Configured Source freshness, gives warning if data is stale.
- Produced documentation of the sources and processes etc. utilising the yml files and auto documentation processes.
- Used their testing as per this video
- Great that it comes in with some built in tests that run quickly and are easy to configure.
- Scheduled a daily job to refresh the roll up tables modelled above.
I plan on moving onto the more advanced training as I am interested to see what can be done and I really like the tool.
Comments
Post a Comment