Skip to main content

Posts

Showing posts with the label python

streamlit

Over the last few days I have been playing with Streamlit. This is a really useful library that effectively lets you build an interactive dashboard using python. In theory, and somewhat in practice, I can now / have handled extracting data from APIs and DBs, transformed the data using pandas, loaded it into my final database and built a dashboard all using python. From what I have read it can be setup as a form, if I could make it private it might allow me to have direct data entry into my source database as well, else I can still use Google Sheets but might implement it for fun.  The dashboard below is embedded so it should be updated as I progress with my code. The code below however is from a one of gist, you don't seem to be able to embed from the github repo, but you can find the latest version of the code here . Apparently the embedding of the code failed so just check out the dashboard for now and I will try and get that working.

prefect ETL tool in python

Having spent a lot of my time playing with Keboola and dbt to load and transform my data I wanted to have a look at just doing stuff in pure python. I have previously built the fill ETL pipeline for a company in python but haven't really had a need to touch it in over 4 years. Most of the work I did before was just using pandas with a few connectors to various databases and producing reports in Excel using xlwings. It wasn't pretty but it was effective and everyone was happy with the job that it did.  Instead I ended up using the prefect library. Well I built it all and then integrated it into prefect once I found it. I found it ok and it has some useful features bit it is not brilliant but that could be through back of use. It does allow you to produce DAGs and and lots of other useful functionality. Script below. 

Loading my Strava Data using Python

I have wanted to load my strava data into my data platform since I started loading the strength data. I found some really useful instructions that I used as by base here . I basically use the procedures shown to load my last 200 strava activities. I load this into MySQL, find the new entries which then get loaded into the main MySQL table and then a bulk load into Snowflake. My next step will be to process this into a more meaningful table using either dbt or seeing if I can do something smart with python and a view in Snowflake.

Python Pipeline API to MySQL to Snowflake

I decided that I liked doing some of my coding in python, even if I have to manually kick it off at the moment I might go with the safe option of a $5 a month python anywhere package to run on a schedule in the cloud or I could put in as an AWS lambda function or in Azure but I don't want to accidentally rack up a bill so might wait until I am further into training on them.  So in this code I have: Used dotenv to store all parameters and passwords as environment variables so I can post my scrips without modification and store them in git. (with the env file set to gitignore).  Retrieved the values above and called the weather API. Flattened the json to get all the columns. Put the new rows into the table in MySQL.  Retrieved the table from MySQL and done a drop and replace into Snowflake.  My Code:  The table: 

Creating a custom python data pipeline

Having pushed the data in MySQL using a Google App script I wanted to see whether I could then push the data to Snowflake without using one of the automated tools. I decided, initially at least, to use python. Python is a very easy language to use and to achieve the basics, for this sort of project at least, you can plug and play with packages that you need. This is not going to be the most performant and probably won't fly in a proper Enterprise environment though I have previously used complex scripts to generate BI reports for the majority of a companies (start up) reporting.  Below is my script minus the creation of the connection engines, because for the purposes of this I did not want to go to the trouble of masking them and it is very well documented how to create them. Currently I am running this manually but for £5 a month I can get this scheduled on pythoanywhere but hoping I will pluck up the courage to run this on free tier Azure.