Skip to main content

Loading my Strava Data using Python

I have wanted to load my strava data into my data platform since I started loading the strength data. I found some really useful instructions that I used as by base here.

I basically use the procedures shown to load my last 200 strava activities. I load this into MySQL, find the new entries which then get loaded into the main MySQL table and then a bulk load into Snowflake. My next step will be to process this into a more meaningful table using either dbt or seeing if I can do something smart with python and a view in Snowflake.


# -*- coding: utf-8 -*-
"""
Version Number ||| Mod By ||| Version Date
---------------------------------------------------------
1.00 ||| GM ||| 19 May 2022
"""
from dotenv import load_dotenv
load_dotenv()
import os
from sqlalchemy import create_engine
import requests
import urllib3
import pandas as pd
from snowflake.connector.pandas_tools import pd_writer
import pandasql as ps
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
MySQLHost = os.environ.get("MYSQLHOST")
MySQLUser = os.environ.get("MYSQLSUER")
MySQLPwd = os.environ.get("MYSQLPWD")
MySQLDB = os.environ.get("MYSQLDB")
SnowAcc = os.environ.get("SNOWACC")
SnowUser = os.environ.get("SNOWUSER")
SnowPwd = os.environ.get("SNOWPWD")
SnowDB = os.environ.get("SNOWDB")
SnowWH = os.environ.get("SNOWWH")
SnowSchema = os.environ.get("SNOWSCHEMA")
StravaClient = os.environ.get("STRAVACLIENT")
StravaRefresh = os.environ.get("STRAVAREFRESH")
StravaSecret = os.environ.get("STRAVASECRET")
mySQL_conn = create_engine("mysql+mysqldb://"+MySQLUser+":"+MySQLPwd+'@'+MySQLHost+"/"+MySQLDB);
Snowengine = create_engine(
'snowflake://{user}:{password}@{account}/{db}/{schema}?warehouse={warehouse}'.format(
user= SnowUser,
password= SnowPwd,
account= SnowAcc,
warehouse= SnowWH,
db= SnowDB,
schema=SnowSchema ))
def stravaPipe():
auth_url = "https://www.strava.com/oauth/token"
activites_url = "https://www.strava.com/api/v3/athlete/activities"
payload = {
'client_id': StravaClient,
'client_secret': StravaSecret,
'refresh_token': StravaRefresh,
'grant_type': "refresh_token",
'f': 'json'
}
res = requests.post(auth_url, data=payload, verify=False)
access_token = res.json()['access_token']
header = {'Authorization': 'Bearer ' + access_token}
param = {'per_page': 200, 'page': 1}
my_strava = requests.get(activites_url, headers=header, params=param).json()
normStravaData = pd.json_normalize(my_strava)
normStravaData = normStravaData.drop(['map.summary_polyline','flagged' ,'gear_id' ,'start_latlng' ,'end_latlng'], axis=1)
stravaLimited = ps.sqldf("select name, distance, moving_time, elapsed_time, total_elevation_gain, type, id, start_date_local,average_speed, max_speed, average_cadence, average_heartrate, max_heartrate,elev_high, elev_low, upload_id, upload_id_str FROM normStravaData")
stravaLimited.to_sql('pre_stg_strava', con=mySQL_conn, if_exists='replace',index=False)
## need to add in SQL in here to extract the data from pre stg
stravaDelta = pd.read_sql_query('select * from stg_strava_v',mySQL_conn)
delta = stravaDelta.shape[0]
print(str(delta) + " rows inserted")
stravaDelta.to_sql('stg_strava', con=mySQL_conn, if_exists='append',index=False)
stravaSnow = pd.read_sql_query('select * from stg_strava',mySQL_conn)
stravaSnow.columns = stravaSnow.columns.str.upper()
stravaSnow.to_sql('stg_strava', con=Snowengine, if_exists='replace',index=False, method=pd_writer);
print('Snowflake Load Completed')
view raw stava.py hosted with ❤ by GitHub

Comments

Popular posts from this blog

Gen AI news 29-04-2024

Here are some recent updates and insights related to Generative AI (gen AI) : Enterprise Hits and Misses - Robotics and Gen AI Converge : This article discusses the convergence of robotics and generative AI. It explores breakthroughs needed in the field, the FTC’s policy change regarding non-competes, and the impact on AI model sizes for enterprises 1 . Read more All You Need To Know About The Upcoming AI-Powered OLED iPad Pro : This piece provides a summary of rumors surrounding the next-gen AI-fused OLED iPad Pro, powered by the new Apple M4 chip 2 . Read more Delivering on the Promise of Gen AI : New Electronics reflects on NVIDIA GTC and key announcements that contribute to delivering on the promises made for generative AI 3 . Read more The Future of Generative AI - An Early View in 15 Charts (McKinsey): Since the release of ChatGPT in November 2022, generative AI has been making headlines. McKinsey research estimates that gen AI features could add up to $4.4 trillion to the globa...

Keboola Flows

Really finding Keboola was the thing that kickstarted this project otherwise I would be trying to build custom code on a python cloud server and building everything from scratch.  In Keboola you build you data sources and destinations using connection details which is fairly simple and something I will likely cover in another post, same goes for transformations etc. Here though I am going to discuss Flows, this is where you bring everything together. On my free account there are some limitations.  My easiest flow is very basic:  Pull parkrun results e-mail from Gmail to Google Sheets (actually done by Zap not Keboola).  Keboola will, as often as I like, in this case once a week, pull the data from the sheet into its storage.  It will then transfer this to the target database. Currently I have this setup to be MySQL database but I can and might expand that to the Snowflake instance within Keboola.  I then, outside of Keboola, connect to the MySQL database f...

Snowflake Scripting - SQL Cursors

Snowflake scripting in SQL seems to be in preview and I have decided to have a play with it. Given how new it is there is limited documentation so I am using a combination of what I can find on the Snowflake site and the odd blog that has been written about it. There appear to be a few quirks, at least when compared to Oracle PL/SQL (though that has been round for years). How many of these are intentional and how many are things to be ironed out I don't know. You can see the procedure I have created it:  Accepts an id as a parameter  Creates a result set selecting from a table, using the parameter as a filter Loads the results set into a cursor.  Loops through the cursor loading the id in the cursor into variable Calls procedure passing in the variable as the parameter.  Then as a proof of concept I tried the Snowflake feature of allowing declaration of variables within the main start and end block.