So recently parkrun removed all their stats and as a keen running who is trying to work their way up the top 100 of their local parkrun I wanted to get some of these stats back and have a bit of "fun" at the same time. So here is a little "ETL" process that I developed with the help of Gen AI.
The steps of my ETL:
- Copy and paste data into Google Sheets template where an AI produced formula extracts URLS from the text and puts them into a new field. This effectively allows me to extract the parkrun athlete id, the primary key, and use it in my analysis. I also have a column to autofill the data I am processing.
- Use an Gen AI generated Google Apps script to process it into a processed sheet, this allows me to build up a backlog of events (I had over 500 to process).
- This is then queried using a Gen AI Google sheets query to extract key information and columns / format times etc.
- I then ingest the fully processed sheet into Keboola directly from Google Sheets.
- Within the Snowflake Keboola environment I then create views to summarise and process the data.
- Present the data in Streamlit.
The above is all saved into a GitHub repo and presented through Streamlit so anyone from Eastbourne parkrun can take a look. It still needs some work and I would love to do some more analysis for individual users.
Comments
Post a Comment