In a recent project, I dove into Melbourne’s public transport disruptions using data engineering and analysis. I wanted to get a better understanding of disruption patterns across the Metro Trains, Metro Trams, and Metro Bus networks. I built a data pipeline to gather, organise, and visualise data from the Public Transport Victoria API, resulting in an informative dashboard.
Gathering Data with Python and AWS S3
I started by writing a Python script to fetch real-time disruption data from the Public Transport Victoria API. This data was stored in an AWS S3 bucket, providing a reliable storage solution. I automated this process using Apache Airflow on an AWS EC2 instance.
Transforming Data for Analysis
Next, I transformed the raw JSON data into structured tables for analysis. This involved parsing and organising the data into disruptions, routes, and stops tables. I then migrated this data to Google BigQuery for further refinement.
Analysing Data with dbt and Looker Studio
Using dbt, I transformed the data in BigQuery into analysis-ready tables. These tables were then connected to Looker Studio, creating an interactive dashboard for stakeholders.
Empowering Decision-Making
By orchestrating Python scripts, AWS services, Google BigQuery, dbt, and Looker Studio, I gained insights into Melbourne’s public transport disruptions.
Conclusion
This project exposed me to several new challenges and taught me more about data engineering, and data analysis. I learned how to set up and configure Airflow on an EC2 instance, how to deal with transportation data, and even worked with geolocation data.
You may find the dashboard here. The code for the python extraction and transformations codes, as well as the Airflow files, can be found in my GitHub account in the ptv_disruptions repository. The dbt transformations can be found in my dbt repository in the ptv model.
If you’re interested in data, public transport, or have any suggestions I would love to talk further. Feel free to reach out to me on LinkedIn. You can find my profile in the link below.
Leave a Reply