image.png

Introduction

Traffic flow analysis has emerged as a critical area of research, encompassing topics such as the relationship between train flow and fare structures, highway flow forecasting to mitigate congestion, and the development of public transportation policies based on traffic data. This study focuses on the Taipei Metro System, aiming to elucidate the temporal and spatial characteristics of its stations. While the relationship between socioeconomic development and commuter behavior has been explored at larger scales (e.g., statewide or nationwide), limited research has addressed this dynamic at the urban level, particularly within public transportation systems. Most existing studies concentrate on road networks, despite the significant role of public transit in urban passenger volume.

This research leverages hourly origin-destination (OD) data to analyze temporal traffic patterns across stations. Additionally, it incorporates demographic statistics, household income data, and commercial activity indices to examine spatial characteristics. By exploring the interplay between passenger flow and socioeconomic factors, this study aims to identify key determinants of metro flow patterns, offering insights for traffic prediction and urban transportation planning. The findings are expected to inform subway operators and policymakers in optimizing public transportation systems.

Data Description

The study utilizes four datasets:

  1. Hourly OD-Flow Data (February–June 2020): Includes date, time (hourly), origin, destination, and passenger flow for 131 stations across 6 routes in the Taipei Metro System.
  2. Demographic Statistics (2018): Covers population data (total, gender, aboriginal) for each village in the Greater Taipei Area.
  3. Household Income Statistics (2018): Provides taxpayer units and income metrics (total, mean, median, Q1, Q3) for each village.
  4. Commercial Activity Index: Derived from OpenStreetMap (OSM) Point of Interest (POI) data, counting commercial POIs (e.g., shops, restaurants, cafes, business buildings) within a 240-meter buffer of each station.

Literature Review

While the dataset is novel and has not been previously analyzed, similar methodologies have been applied in other contexts. For instance, Truong et al. (2018) examined the spatiotemporal patterns of the Washington, DC Metro using passenger flow data, employing Principal Component Analysis (PCA) and K-means clustering to characterize stations. Their approach provides a framework for this study.

Methodology

  1. Data Processing: Hourly OD-flow data is aggregated to calculate in-flow and out-flow for each station, segmented by weekday (w/d) and weekend (w/e). Flow values are normalized for temporal analysis.
  2. Dimensionality Reduction: PCA is applied to reduce data dimensions, retaining six components (C=6) to explain over 95% of variance.
  3. Clustering: K-means clustering is used to group stations based on temporal flow patterns, with the optimal number of clusters (K) determined by the Elbow method.
  4. Socioeconomic Analysis: Boxplots are generated to compare demographic, income, and commercial activity indices across clusters.

Results and Discussion

The analysis identifies five clusters with distinct temporal flow patterns:

image.png