Skip to main content

Posts

Showing posts from April, 2022

Data Cleansing View in MySQL

I discussed before how I picked up parkrun data from my e-mails, they don't have an API as their system was never designed to cope with the millions of people that now take part. I only want my own data so this works just fine for me. I use a Zap to pick up the e-mail and plonk it in a Google Sheet and the Keboola to process the data into MySQL and maybe soon Snowflake. Actually given the setup I have it would only take 5 minutes in Keboola to add a step to the Flow to pass the output from the view below and put it into Snowflake as a table. I am leaning more towards using Snowflake as long as Retool stays free enough for me to use as the free MySQL database has a very limited session pool and therefore limits the visualisations I can do.  Anyway the raw data from the e-mail is useless for visuals so I processed the data in MySQL. There might be more elegant solutions but for me it was some experience in how to code this in MySQL and what functions it has. Being primarily used to

Keboola Flows

Really finding Keboola was the thing that kickstarted this project otherwise I would be trying to build custom code on a python cloud server and building everything from scratch.  In Keboola you build you data sources and destinations using connection details which is fairly simple and something I will likely cover in another post, same goes for transformations etc. Here though I am going to discuss Flows, this is where you bring everything together. On my free account there are some limitations.  My easiest flow is very basic:  Pull parkrun results e-mail from Gmail to Google Sheets (actually done by Zap not Keboola).  Keboola will, as often as I like, in this case once a week, pull the data from the sheet into its storage.  It will then transfer this to the target database. Currently I have this setup to be MySQL database but I can and might expand that to the Snowflake instance within Keboola.  I then, outside of Keboola, connect to the MySQL database from Google Data Studio and ma

Free AWS Training

Whilst a lot of the information I am post on here is about how to actually build a free cloud data platform for yourself there is also some good training out there. Whilst you will need to pay to get access a lot of places want to entice you in with certain bits for free. In the data engineering space a popular option for training is places with hands on labs. With the rise of these cloud platforms training providers are able to spin up mini instances with lots of restrictions and allow you to do hands on training without fear of using something outside of the free tier.  As I said most of this training is not free, I did a trial month at Whizlabs and got myself a certification on Snowflake and AWS with hands on experience using their sandbox areas. Honestly despite finishing that course I don't feel like I learnt a huge amount, the labs were too regimented and the trainers were not overly engaging. If money were no object I would give A Cloud Guru a go but I am looking for free ma

Snowflake Scripting - SQL Cursors

Snowflake scripting in SQL seems to be in preview and I have decided to have a play with it. Given how new it is there is limited documentation so I am using a combination of what I can find on the Snowflake site and the odd blog that has been written about it. There appear to be a few quirks, at least when compared to Oracle PL/SQL (though that has been round for years). How many of these are intentional and how many are things to be ironed out I don't know. You can see the procedure I have created it:  Accepts an id as a parameter  Creates a result set selecting from a table, using the parameter as a filter Loads the results set into a cursor.  Loops through the cursor loading the id in the cursor into variable Calls procedure passing in the variable as the parameter.  Then as a proof of concept I tried the Snowflake feature of allowing declaration of variables within the main start and end block. 

ETL Procedures in Snowflake

To continue on from my last post I have kept on playing in Snowflake and am working towards building something that resembles ETL Procedures on Snowflake. Now as this is just something I am messing around with at home it is much more about learning that doing things 100% correctly.  For me the below script was the first time in Snowflake SQL procedure I have:  Used Exception handling  That works and inserts into a log table, now to see if I can log the error message Performed a dynamic insert using data retrieved from one of my own tables.  Whilst these are not earth shattering they are little thing I was not confident on how to do this morning and now I have a working example. I was not confident this morning as I am honestly finding it hard to find real world examples written in Snowflake using their SQL language, all examples seem to be in the Javascript one.  As with my previous post below is the code and a little gif showing the code being called. 

Google Data Studio Part 2

I have been working with Google Data Studio for some of my visualisations. Now I say only some because I still have the issue with not being able to get it to connect to PostgreSQL and the Snowflake connector is a paid product. To address this I am looking at other products and am hopeful about Retool but will see what functionality is left once the free trail runs out. I am hoping that I am only using the free stuff.  Building my dashboard in Google Data studio has been pretty easy, there are some things I don't like or can't work out how to do but I am more of a technical data person than a visualisation person, at least for the last 5-6 years that has been the case.  Honestly the best thing to do it stick a data source in Google Data Studio and have a play. You basically pick a theme and drag and drop your dimensions and measures in and watch it build the graphs on the fly. Below are some graphics of the meu options, inserting a graph and manipulating it. 

My First Snowflake Procedure

Having not created stored procedures in Snowflake before I wanted to give it a go. My preference is for the more newly released SQL style procedures over the Javascript ones, though I fully intend to give them a stab as well. Below is the code I used to create said my first Snowflake Procedure. This procedure just takes in a table name and returns the count of rows in said table. Not overly useful but for me it has some useful proof of concepts, not only how to create a basic procedure in Snowflake but also how to run dynamic SQL Have a look through the code and feel free to ask any questions Here is proof it works with me running the procedure and it producing the correct result. 

Data Cleansing View in Snowflake

For part of one of my free ETLs I am using Zapps to transfer e-mails from Google Sheets and then Keboola to transfer the sheets into my Snowflake database. I am familiar with string searches and cleansing in Oracle and using python but have not had the chance to do this in Snowflake. I wanted to give it a go as a proof of concept if nothing else. There were some difference in functions between Oracle and Snowflake, no INSTR and using POSITION instead and some difference in working with dates / timestamps but overall it was very similar.  The code below is what I ended up using:  I think want to use this to create some overview graphics to allow me to track the success or failure of my ETLs. Assuming the aspects of Retool remain free you can see how much ETL is going on this link .  In case things aren't working, here is a table of the output I am producing. 

Create table scripts in Oracle, MySQL and Snowflake

Not much text to put in this one but a comparison between create table scripts in Oracle (my normal), MySQL (did not like) and Snowflake (Was very easy). I think the newer version of MySQL would have been easier to work with but this is what I have to work with on my free instance. The next step of this project is to see how easy it is to build a little package that runs these controls. who knows if I can get it working in MySQL or Snowflake perhaps it can be worked properly into this project with its own dashboards. From the Googling I have done so far I am not optimistic about MySQL but I always have the backup of creating something in Python and seeing if it can work for all 3, you can even get free python web servers (I think) so that might be an option.  Note that for the Oracle code I have already created a basic version of the package.  Oracle Code:   MySQL Code:  Snowflake Code: 

Google Data Studio - Part 1

As part of the project to work on a free data solution I have been looking into data visualisation tools. I have already done a post on Snowflake which has limited capabilities, I have also used Power BI however this has limited sharing options on the free plan. In my day jobs I have used various Oracle tools including OAC utilising the RPD and really badly I have used Excel.  Personally the best free tool, honestly probably the best tool other than it missing the data modelling layer, is Google Data Studio. I have found it fairly intuitive and for the most part easy to get the results I wanted.  Probably the bit I was most impressed with was the data visualisations stuff, see part 2 (on its way), however setting up supported data sources is also very easy. Note the at the moment Snowflake is not supported by Google itself.  To get going: Create a data source, this can use a Google Connector or one of many of their other customer produced connectors.   Once you have picked a data conne

Universal Database IDE?

For this project I seem to have ended up with:  Oracle Database on my local Machine Snowflake DB through Keboola, cloud hosted on Snowflake MySQL DB through free MySQL database website hosted in the clod.  PostgrelSQL through Elephant SQL.  Plus ended up with a test DB on SQLLite.  Whilst this is great fun logging into lots of different sites and browsers and tools to access all the different databases was a pain, especially as I am sure to working purely in SQL developer. After a quick Google I found DBeaver .    As you can see from the above screenshot I have successfully connected to all the above databases from one tool. The tool even handled the downloading and installation of the drivers, I just had to work out the basic stuff and I was in.  Now I just need to work out what I want to build in each database and how it fits into the overall project I am working on. 

Keboola

Keboola is my new favourite toy and I have lost many hours playing with it and so far all for free, they even have free training which I have mostly been able to follow with the free account, though the tool is clearly evolving quickly and the training is now over a year old.  However finding Keboola and the free MySQL databases are what inspired me to start this project.  What I like about Keboola (Free):  So far (over a week) completely free and no credit card required. I am not anywhere near looking like I am going to use my allocation for this month.  Easy to connect to Snowflake, Google Drive, Google Sheets, MySQL and much more.  Free use of Snowflake, with Snowflake being so hot at the moment this is great.  Lots of logging.  Decent training that is mostly useful,  The dashboard: What I don't like:  I couldn't get the gmail connector to work due to Google Security policies.  Product is evolving so quickly that training is already date or can't be done on free tier? In

MySQL - Free

 So I was looking at trying to get a cloud based database that was always on. I wanted to build some visuals over whatever data I ended up building and have the DB accessible from a cloud server seemed like the easy way. I wanted to keep it free because I hate spending when I don't need to, so that others could use it for free and because I was sure there must be options out there. In the end my life was made much easier by spending £10 but you can go with the same free option on this site. https://www.freemysqlhosting.net/ Although not super fast or super sized it gives you a free and easily accessible database. So far I have easily connected using phpAdmin, BeeKeeper Studio, Python, Google Data Studio and Keboola. I have had no issues at all unlike several other solutions I have tried including Heroku.  To setup the DB you just set your location and hit start, you will then be e-mailed the connection details and then use your favourite MySQL IDE and you are in.  Above is a snapsh

Snowflake Data Visualisation

 Link to the visualisation here :  I have been looking for a tool to use to do free data visualisation on any data source but more recently on Snowflake. Whilst I can use Power BI but I cannot share. So I had a quick look at using the built in snowflake tool. It was very easy to create a couple of graphs off my basic model using a SQL query as the source but make no mistake it is, currently at least, nothing like the major reporting tools out there and cannot replace things like Google Data Studio, OAC or Power BI with very limited options available.