So I wanted to see if I could test the different Gen AI models that are out there and get them to write a relatively simple SQL query. Basically select against my table, as detailed in the prompts to Gen AI, and produce a list of the fastest 1000 times at an event (that takes place weekly) and provide the times and names of the athletes that ran said times. Note that although I say view a lot I mean query because what are views if not stored queries anyway and I am using this in my DB as a view.
Winner: Copilot
The original view can be seen below:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
create or replace view stg_top_100_ag as | |
with w_fastest as ( | |
select "parkrunID", "Name", "Gender", "AgeBracket", min(TOTALSEC) Fastest_time | |
from stg_parkrun_all_v | |
where "AgeBracket" is not null | |
group by "parkrunID", "Name", "AgeBracket", "Gender" | |
) | |
select RANK() OVER (partition by "AgeBracket" ORDER by Fastest_time) "Age Bracket Rank" , | |
"Name", | |
"Gender", | |
"AgeBracket" as "AgeBracket", | |
floor(Fastest_time/60) TimeMin, | |
mod(Fastest_time,60)TimeSec, | |
floor(Fastest_time/60)||':'||LPAD(mod(Fastest_time,60),2,'0') "Time" | |
from w_fastest | |
QUALIFY "Age Bracket Rank" <51 | |
order by "AgeBracket", "Age Bracket Rank" | |
; |
The initial Prompt:
I can't find a good way to format and embed my whole chats with the AI tools so I will work with what I have. Here is my original prompt that I used to get a starting point.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
You have a table in a Snowflake database called | |
stg_parkrun_all_v | |
The tale has the following columns where event date and position are a unique key: | |
EVENTDATE - Date the event took place | |
EVENTNUM - The event number where 1 is the first event and so on | |
TOTALATHLETES - The total of number of people that ran the event` | |
Position - The position of the athlete in this row within the event on the date shown | |
parkrunID - The unique ID of the athlete | |
Name - The name of the athlete - may not be unique | |
Gender - The gender of the athlete | |
AgeBracket - There 5 year age bracket of the athlete | |
Club - The running club of the athlete | |
TimeMin - the time they took in the event on the eventdate in minutes | |
TimeSec - The second portion of their time as per timeMin | |
TOTALSEC - The total seconds the athlete took in the event | |
DISPLAYTIME - A string representation of the athletes time on that date | |
MINS_5_BUCKET - An ID to place each time in a 5 minute bucket | |
Please write a query to get the best (minimum) time of each athlete over all events and then return just the top 1000 athletes with their ranking. | |
Please note that parkrunID and Name columns need double quotes round it to work. I would also like the athletes names to be returned in the query. | |
Please add a comment at the start of the query explaining what the query is doing that the source was a Gen AI model with your name included. |
Gemini:
Overall this chat isn't very long and the first attempt was pretty close, it just missed the floor function around the minutes column and some quotes so I asked it to add those.
And here is the Gemini view, which I tested and it produced the right results.
The Gemini View:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- This query, generated by Bard, a large language model from Google AI, | |
-- finds the fastest time for each athlete in the 'stg_parkrun_all_v' table | |
-- and ranks them based on their fastest time. It then returns the top 1000 | |
-- ranked athletes with their ranking, parkrunID, name, fastest time in | |
-- whole minutes, and fastest time in seconds. | |
with ranked as ( | |
SELECT | |
"parkrunID", | |
"Name", -- replaced "name" with "Name" as requested | |
MIN(TOTALSEC) as fastest_time, | |
ROW_NUMBER() OVER (ORDER BY MIN(TOTALSEC) ASC) AS ranking | |
FROM stg_parkrun_all_v | |
GROUP BY "parkrunID", "Name" | |
) | |
SELECT | |
ranking, | |
"parkrunID", | |
"Name", | |
FLOOR(fastest_time / 60) AS mins, | |
fastest_time % 60 AS secs | |
FROM ranked | |
WHERE ranking <= 1000 | |
ORDER BY ranking; |
Here
Copilot:
Next up I tried CoPilot. Copilot gave me a working piece of SQL out of the date, I did have to ask to expand the comments at the top and to add minute / seconds to the output but I like the way it did it. Overall was very impressed with it. No problems at all.
Copilot View:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- Query to get the best time for each athlete over all events with ranking | |
-- Generated by the **Copilot** AI model | |
WITH RankedTimes AS ( | |
SELECT | |
"parkrunID" AS "Athlete ID", | |
"Name" AS "Athlete Name", | |
MIN("TOTALSEC") AS "Best Time (Seconds)", | |
RANK() OVER (ORDER BY MIN("TOTALSEC")) AS "Rank" | |
FROM | |
stg_parkrun_all_v | |
GROUP BY | |
"parkrunID", "Name" | |
) | |
SELECT | |
"Athlete ID", | |
"Athlete Name", | |
"Best Time (Seconds)", | |
CONCAT( | |
FLOOR("Best Time (Seconds)" / 60), ' min ', | |
MOD("Best Time (Seconds)", 60), ' sec' | |
) AS "Best Time (Minutes:Seconds)", | |
"Rank" | |
FROM | |
RankedTimes | |
WHERE | |
"Rank" <= 1000 | |
ORDER BY | |
"Rank"; |
ChatGPT:
Literally identical story to CoPilot with a slightly different output. I got a working view first time but didn't have the additional columns that were useful and it didn't get the comments right at the top and when I asked it to add them I think it went overboard! Sorry for the formatting it won't let me fix it!
ChatGPT View:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- Query generated by ChatGPT | |
-- By OpenAI's ChatGPT | |
/* | |
View stg_parkrun_all_v: | |
- EVENTDATE: Date the event took place | |
- EVENTNUM: The event number where 1 is the first event and so on | |
- TOTALATHLETES: The total number of people that ran the event | |
- Position: The position of the athlete in this row within the event on the date shown | |
- "parkrunID": The unique ID of the athlete | |
- "Name": The name of the athlete - may not be unique | |
- Gender: The gender of the athlete | |
- AgeBracket: The 5-year age bracket of the athlete | |
- Club: The running club of the athlete | |
- TimeMin: the time they took in the event on the eventdate in minutes | |
- TimeSec: The second portion of their time as per TimeMin | |
- TOTALSEC: The total seconds the athlete took in the event | |
- DISPLAYTIME: A string representation of the athlete's time on that date | |
- MINS_5_BUCKET: An ID to place each time in a 5-minute bucket | |
*/ | |
SELECT | |
ROW_NUMBER() OVER (ORDER BY MIN(TOTALSEC)) AS ranking, | |
"parkrunID", | |
"Name", | |
MIN(TOTALSEC) AS best_time, | |
FLOOR(MIN(TOTALSEC) / 60) AS best_time_minutes, | |
MOD(MIN(TOTALSEC), 60) AS best_time_seconds | |
FROM | |
stg_parkrun_all_v | |
GROUP BY | |
"parkrunID", | |
"Name" | |
ORDER BY | |
best_time | |
LIMIT 1000; |
Summary:
Gemini - Original Prompt and 1 correction and gave a working view. Provided me with some nice to haves that the others didn't.
Copilot - Original Prompt worked and minor additions asked for and gave a working view.
ChatGPT - as with CoPilot
Think that overall the winner is Copilot, the final solution it gave me worked and it put in nice comments. I personally would then go with Gemini, even though it require a correction to add the floor that the others didn't it gave the view most closely matching my original and got the comments right. This leaves ChatGPT in last. Honestly however the experience with all 3 was very similar and they all produced more or less what I wanted within a handful of prompts. I wonder how they will cope with more complex data models.
Comments
Post a Comment