Skip to main content

Data Cleansing View in Snowflake

For part of one of my free ETLs I am using Zapps to transfer e-mails from Google Sheets and then Keboola to transfer the sheets into my Snowflake database. I am familiar with string searches and cleansing in Oracle and using python but have not had the chance to do this in Snowflake. I wanted to give it a go as a proof of concept if nothing else. There were some difference in functions between Oracle and Snowflake, no INSTR and using POSITION instead and some difference in working with dates / timestamps but overall it was very similar. 

The code below is what I ended up using: 

CREATE OR REPLACE VIEW V_ETL_LOGS AS
SELECT
TO_TIMESTAMP_NTZ(
substr(
"Date",
position(',', "Date") + 1,
position('+', "Date") + 1 - position(',', "Date") -2
),
'DD MON YYYY HH24:MI:SS'
) datetime_log,
trim(
REPLACE(
REPLACE(
REPLACE("Subject", '[KBC] Garymanley orchestrator'),
' succeeded'
),
' error'
)
) ETL_GROUP,
case
when "Subject" like '%succeeded%' then 'SUCCESSFUL'
when "Subject" like '%error%' then 'EROOR'
else 'Unknown'
end as ETL_STATUS,
TO_TIMESTAMP_NTZ(
substr(
"EmailText",
position('Start time', "EmailText") + 12,
16
),
'YYYY-MM-DD HH24:MI'
) start_time,
TO_TIMESTAMP_NTZ(
substr(
"EmailText",
position('End time', "EmailText") + 10,
16
),
'YYYY-MM-DD HH24:MI'
) end_time,
substr(
"EmailText",
position('succeeded in', "EmailText") + 12,
13
) run_time
FROM
KEBOOLA_7127.WORKSPACE_15661914.ETL_LOGS

I think want to use this to create some overview graphics to allow me to track the success or failure of my ETLs. Assuming the aspects of Retool remain free you can see how much ETL is going on this link

In case things aren't working, here is a table of the output I am producing. 



Comments

Popular posts from this blog

AWS training cloud academy free course

One of the things I like about this course are the instructors are really clear but also that it provides free labs that allow you to actually sign into AWS and perform some actions to actually create and do things without worrying that you are going to incur a cost.  Today I complete one of the hands on labs.  This was to create a lambda function, in this case it was a very basic python script that was searching a website for a keyword. I then placed this into a schedule and used cloudwatch to create a dashboard that monitored the running of this function. Overall it was a very simple use case but it was also a very simple process to setup.  I don't have much to add to this other than it is well worth signing up to cloud academy for the free training if nothing else, I am tempted, once i have done some more training, to give the paid for option a go to get the full sandboxes. 

Gen AI news 29-04-2024

Here are some recent updates and insights related to Generative AI (gen AI) : Enterprise Hits and Misses - Robotics and Gen AI Converge : This article discusses the convergence of robotics and generative AI. It explores breakthroughs needed in the field, the FTC’s policy change regarding non-competes, and the impact on AI model sizes for enterprises 1 . Read more All You Need To Know About The Upcoming AI-Powered OLED iPad Pro : This piece provides a summary of rumors surrounding the next-gen AI-fused OLED iPad Pro, powered by the new Apple M4 chip 2 . Read more Delivering on the Promise of Gen AI : New Electronics reflects on NVIDIA GTC and key announcements that contribute to delivering on the promises made for generative AI 3 . Read more The Future of Generative AI - An Early View in 15 Charts (McKinsey): Since the release of ChatGPT in November 2022, generative AI has been making headlines. McKinsey research estimates that gen AI features could add up to $4.4 trillion to the globa...

Using Gen AI to write a fairly simple SQL query

So I wanted to see if I could test the different Gen AI models that are out there and get them to write a relatively simple SQL query. Basically select against my table, as detailed in the prompts to Gen AI, and produce a list of the fastest 1000 times at an event (that takes place weekly) and provide the times and names of the athletes that ran said times. Note that although I say view a lot I mean query because what are views if not stored queries anyway and I am using this in my DB as a view.  Winner : Copilot The original view can be seen below:  So it is a fairly simple view with some logic in it to through some spanners in the works. The question is with the table definition and some explanation can the Gen AI platforms recreate a working version of the above view?  The initial Prompt:  I can't find a good way to format and embed my whole chats with the AI tools so I will work with what I have. Here is my original prompt that I used to get a starting point....