For this project, I dove into Melbourne’s public transport disruptions using data engineering and analysis. I wanted to get a better understanding of disruption patterns across the Metro Trains, Metro Trams, and Metro Bus networks. I built a data pipeline to gather, organise, and visualise data from the Public Transport Victoria API, resulting in a Looker Studio dashboard.
Overall Architecture

Gathering Data with Python and AWS S3
I started by writing a Python script to fetch the latest disruption data from the Public Transport Victoria API. This data was stored in an AWS S3 bucket, providing a reliable storage solution. I automated this process using Apache Airflow on an AWS EC2 instance, and the script ran every 30 minutes to store the raw data.
Transforming Data for Analysis
Next, I transformed the raw JSON data into structured tables for analysis. This involved parsing and organising the data into disruptions, routes, and stops tables. The transformed data was then pushed to Google BigQuery tables for further refinement. The transformation was set to run through Airflow every 6 hours.
Analysing Data with dbt and Looker Studio
Using dbt, I transformed the data in BigQuery into analysis-ready tables. These tables were then connected to Looker Studio, creating an interactive dashboard for users.
Conclusion
This project exposed me to several new challenges and taught me more about data engineering, and data analysis. I learned how to set up and configure Airflow on an EC2 instance, how to deal with transportation data, and even worked with geolocation data.
You may find the dashboard here. The code for the python extraction and transformations codes, as well as the Airflow files, can be found in my GitHub account in the ptv_disruptions repository. The dbt transformations can be found in my dbt repository in the ptv model.
I had this data pipeline running for a few months between October 2023 and April 2024, and it has been turned off since then. However, the dashboard is still available, and the code for the pipeline is still available.
Since last year the Public Transport Victoria API has been updated, and so this setup would not work anymore.
If you’re interested in data, public transport, or have any suggestions I would love to talk further. Feel free to reach out to me on LinkedIn. You can find my profile in the link below.
Leave a Reply