Skip to main content

Posts

Showing posts with the label mySQL

prefect ETL tool in python

Having spent a lot of my time playing with Keboola and dbt to load and transform my data I wanted to have a look at just doing stuff in pure python. I have previously built the fill ETL pipeline for a company in python but haven't really had a need to touch it in over 4 years. Most of the work I did before was just using pandas with a few connectors to various databases and producing reports in Excel using xlwings. It wasn't pretty but it was effective and everyone was happy with the job that it did.  Instead I ended up using the prefect library. Well I built it all and then integrated it into prefect once I found it. I found it ok and it has some useful features bit it is not brilliant but that could be through back of use. It does allow you to produce DAGs and and lots of other useful functionality. Script below. 

Loading my Strava Data using Python

I have wanted to load my strava data into my data platform since I started loading the strength data. I found some really useful instructions that I used as by base here . I basically use the procedures shown to load my last 200 strava activities. I load this into MySQL, find the new entries which then get loaded into the main MySQL table and then a bulk load into Snowflake. My next step will be to process this into a more meaningful table using either dbt or seeing if I can do something smart with python and a view in Snowflake.

Python Pipeline API to MySQL to Snowflake

I decided that I liked doing some of my coding in python, even if I have to manually kick it off at the moment I might go with the safe option of a $5 a month python anywhere package to run on a schedule in the cloud or I could put in as an AWS lambda function or in Azure but I don't want to accidentally rack up a bill so might wait until I am further into training on them.  So in this code I have: Used dotenv to store all parameters and passwords as environment variables so I can post my scrips without modification and store them in git. (with the env file set to gitignore).  Retrieved the values above and called the weather API. Flattened the json to get all the columns. Put the new rows into the table in MySQL.  Retrieved the table from MySQL and done a drop and replace into Snowflake.  My Code:  The table: 

Creating a custom python data pipeline

Having pushed the data in MySQL using a Google App script I wanted to see whether I could then push the data to Snowflake without using one of the automated tools. I decided, initially at least, to use python. Python is a very easy language to use and to achieve the basics, for this sort of project at least, you can plug and play with packages that you need. This is not going to be the most performant and probably won't fly in a proper Enterprise environment though I have previously used complex scripts to generate BI reports for the majority of a companies (start up) reporting.  Below is my script minus the creation of the connection engines, because for the purposes of this I did not want to go to the trouble of masking them and it is very well documented how to create them. Currently I am running this manually but for £5 a month I can get this scheduled on pythoanywhere but hoping I will pluck up the courage to run this on free tier Azure. 

Getting Weather Data

I decided that I wanted to ingest some weather data through a different means, this time combining Google API scripts to retrieve the data from the API and another script to connect to the MySQL Database and deposit the data. Both steps are setup on a schedule.  Setting up the API Call:  I followed the instructions from the following medium post to create a function in Google Apps Scripts that would call an API for me. I use it to call the weather API website and retrieve the weather data for where I live.  These 2 things combined import the data into my Google Sheet as per:  I have then set this up to run on a 4 hour schedule within Google Sheets using one of their triggers.  Getting the data into MySQL : So now that I have the data in Google Sheets I want to regularly import this data into a database as it only stores a single row in google sheet, though I could probably get it to persist here as well. Looking at my options with the JDBC drivers available MyS...

Zoho Analytics

Have I finally found my BI Tool, one that lets me import data from Snowflake and share it for free? I know, no sooner have I posted about how hard it was to find a tool that could do anything from Snowflake than I come across Zoho. You can check out my dashboard on the following page . Below is a diagram that outlines the processes I have used to obtain this data. In summary my parkrun e-mail is pushed to Google Sheets every week by Zapier and Forms I submit every day are used to track the strength training I do. Keboola is then used to ingest this data into MySQL and or Snowflake where I then use views or the built in transformation processes in Keboola to shift the data into a format for reporting. Google Data Studio then connects to MySQL and Zoho to Snowflake to visualise the data. 

Data Cleansing View in MySQL

I discussed before how I picked up parkrun data from my e-mails, they don't have an API as their system was never designed to cope with the millions of people that now take part. I only want my own data so this works just fine for me. I use a Zap to pick up the e-mail and plonk it in a Google Sheet and the Keboola to process the data into MySQL and maybe soon Snowflake. Actually given the setup I have it would only take 5 minutes in Keboola to add a step to the Flow to pass the output from the view below and put it into Snowflake as a table. I am leaning more towards using Snowflake as long as Retool stays free enough for me to use as the free MySQL database has a very limited session pool and therefore limits the visualisations I can do.  Anyway the raw data from the e-mail is useless for visuals so I processed the data in MySQL. There might be more elegant solutions but for me it was some experience in how to code this in MySQL and what functions it has. Being primarily used to ...

Google Data Studio Part 2

I have been working with Google Data Studio for some of my visualisations. Now I say only some because I still have the issue with not being able to get it to connect to PostgreSQL and the Snowflake connector is a paid product. To address this I am looking at other products and am hopeful about Retool but will see what functionality is left once the free trail runs out. I am hoping that I am only using the free stuff.  Building my dashboard in Google Data studio has been pretty easy, there are some things I don't like or can't work out how to do but I am more of a technical data person than a visualisation person, at least for the last 5-6 years that has been the case.  Honestly the best thing to do it stick a data source in Google Data Studio and have a play. You basically pick a theme and drag and drop your dimensions and measures in and watch it build the graphs on the fly. Below are some graphics of the meu options, inserting a graph and manipulating it. 

Create table scripts in Oracle, MySQL and Snowflake

Not much text to put in this one but a comparison between create table scripts in Oracle (my normal), MySQL (did not like) and Snowflake (Was very easy). I think the newer version of MySQL would have been easier to work with but this is what I have to work with on my free instance. The next step of this project is to see how easy it is to build a little package that runs these controls. who knows if I can get it working in MySQL or Snowflake perhaps it can be worked properly into this project with its own dashboards. From the Googling I have done so far I am not optimistic about MySQL but I always have the backup of creating something in Python and seeing if it can work for all 3, you can even get free python web servers (I think) so that might be an option.  Note that for the Oracle code I have already created a basic version of the package.  Oracle Code:   MySQL Code:  Snowflake Code: 

Universal Database IDE?

For this project I seem to have ended up with:  Oracle Database on my local Machine Snowflake DB through Keboola, cloud hosted on Snowflake MySQL DB through free MySQL database website hosted in the clod.  PostgrelSQL through Elephant SQL.  Plus ended up with a test DB on SQLLite.  Whilst this is great fun logging into lots of different sites and browsers and tools to access all the different databases was a pain, especially as I am sure to working purely in SQL developer. After a quick Google I found DBeaver .    As you can see from the above screenshot I have successfully connected to all the above databases from one tool. The tool even handled the downloading and installation of the drivers, I just had to work out the basic stuff and I was in.  Now I just need to work out what I want to build in each database and how it fits into the overall project I am working on. 

MySQL - Free

 So I was looking at trying to get a cloud based database that was always on. I wanted to build some visuals over whatever data I ended up building and have the DB accessible from a cloud server seemed like the easy way. I wanted to keep it free because I hate spending when I don't need to, so that others could use it for free and because I was sure there must be options out there. In the end my life was made much easier by spending £10 but you can go with the same free option on this site. https://www.freemysqlhosting.net/ Although not super fast or super sized it gives you a free and easily accessible database. So far I have easily connected using phpAdmin, BeeKeeper Studio, Python, Google Data Studio and Keboola. I have had no issues at all unlike several other solutions I have tried including Heroku.  To setup the DB you just set your location and hit start, you will then be e-mailed the connection details and then use your favourite MySQL IDE and you are in.  Above i...