Skip to main content

Data Cleansing View in MySQL

I discussed before how I picked up parkrun data from my e-mails, they don't have an API as their system was never designed to cope with the millions of people that now take part. I only want my own data so this works just fine for me. I use a Zap to pick up the e-mail and plonk it in a Google Sheet and the Keboola to process the data into MySQL and maybe soon Snowflake. Actually given the setup I have it would only take 5 minutes in Keboola to add a step to the Flow to pass the output from the view below and put it into Snowflake as a table. I am leaning more towards using Snowflake as long as Retool stays free enough for me to use as the free MySQL database has a very limited session pool and therefore limits the visualisations I can do. 

Anyway the raw data from the e-mail is useless for visuals so I processed the data in MySQL. There might be more elegant solutions but for me it was some experience in how to code this in MySQL and what functions it has. Being primarily used to Oracle it has been interesting doing this in MySQL and Snowflake. The view uses that fact that the e-mail arrives in a standard format to pull out the details I am interested in, such as location, time, parkrun number etc. and allows me to then report on these. 

The code for my view can be seen here: 

Comments

Popular posts from this blog

My Latest project using Gen AI

So recently parkrun removed all their stats and as a keen running who is trying to work their way up the top 100 of their local parkrun I wanted to get some of these stats back and have a bit of "fun" at the same time. So here is a little "ETL" process that I developed with the help of Gen AI.  The steps of my ETL:  Copy and paste data into Google Sheets template where an AI produced formula extracts URLS from the text and puts them into a new field. This effectively allows me to extract the parkrun athlete id, the primary key, and use it in my analysis. I also have a column to autofill the data I am processing.  Use an Gen AI generated Google Apps script to process it into a processed sheet, this allows me to build up a backlog of events (I had over 500 to process).  This is then queried using a Gen AI Google sheets query to extract key information and columns / format times etc. I then ingest the fully processed sheet into Keboola directly from Google Sheets. ...

Gen AI News - 12/03/2024

Google’s Beta AI Content Rewriting Tool : Google is testing an AI tool that finds and rewrites quality content. However, some critics argue that it may incentivize the production of AI-generated low-quality content 1 . The New York Times and OpenAI Controversy : A court filing alleges that The New York Times paid someone to hack OpenAI’s products using deceptive prompts. The situation raises questions about the ethical use of AI 1 . Optimizing GPTs for Online Visibility : Learn how to increase online visibility and click-through rates for your GPT models in the GPT Store and Google Search with six practical tips 1 . AI Democratizing SEO or Amplifying Incompetence? : Understand what AI can realistically do for SEO and manage expectations regarding results 1 . Google’s “Help Me Write” AI Assistant : Google has launched an AI writing assistant called “Help Me Write” for the Chrome browser. It suggests text based on website context 1 . Google’s Gemini: Laptop-Friendly Open Language Model :...

Zapier

As much as I have enjoyed using Keboola there are some connections that it doesn't have or that just haven't worked for one reason or another. I actually came across Zapier as a solution for bringing in e-mails from parkrun to load my results every week. Honestly I have not found it to be as robust as Keboola but that might just be me archiving my e-mails before it completes it 15 minute poll.  The second use case I am working on is the pulling in Strava data, for a fitness dashboard the fact it has a built in connector for Strava is great, though I am worried given the activities I do that I might reach the limit.  I won't go into details on how to set things up but you can setup 5 Zaps that can run for a combined 100 runs during a month for free.  In my data platform / solution I am using Zaps to load harder to get / automate data. It doesn't add much from a technical point of view as it is just signing into a few account to get the data into Google Sheets for downstr...