Sage Journals: Discover world-class research

Abstract

Short-term passenger flow prediction is critical to managing real-time bus networks, responding to emergencies quickly, making crowdedness-aware route recommendations, and adjusting service schedules over time. Some recent studies have attempted to predict passenger flow using deep learning models. The complexity of transportation networks, coupled with emerging real-time data collection and information dissemination systems, has increased the popularity of these approaches. There has also been a growing interest in using a new deep learning approach, the graph neural network that captures graph dependence by passing messages between its nodes. Researchers in various transportation domains have used such tools for modeling and predicting transportation networks, as many of these networks consist of nodes and links and can be naturally categorized as graphs. This paper develops a bus network graph convolutional long short-term memory (BNG-ConvLSTM) neural network model to forecast short-term passenger flows in bus networks. Validating the proposed model is done using real-world data collected from the Laval bus network in Canada. Based on a set of comparisons between the proposed model and some other popular deep learning approaches, it clearly indicates that the BNG-ConvLSTM model is more scalable and robust than other baselines in making network-wide predictions for short-term passenger flows.

Keywords

passenger flow prediction deep learning bus network graph neural network BNG-ConvLSTM

Public transport, especially bus transport, can reduce private car usage and fuel consumption and alleviate traffic congestion. However, when traveling by bus, travelers care not only about waiting and travel time, but also about crowdedness in the bus itself. Excessively overcrowded buses may drive away travelers and make them reluctant to take buses ( 1 ). Therefore, predicting passenger flows on public transportation networks is of great importance and has always been one of the most challenging problems in intelligent transportation system applications ( 2 ). Expected passenger counts in transit systems help managers and urban planners manage travel behavior, diminish passenger congestion, and improve the quality of service in public transportation systems.

Accordingly, and owing to its importance, passenger flow prediction for public transport systems has been widely studied in the literature. With regard to time horizons, passenger flow forecasting studies can be categorized into short-term and long-term studies. Although long-term predictions are essential, especially for planning and development purposes, short-term predictions are also important because they play an important role in the real-time monitoring and management of transit systems. Moreover, short-term variations in travel demand or system performance may cause unacceptable waiting times and congestion, which reduce the attractiveness of public transportation systems for users in the long term. Accordingly, and especially in recent years, short-term prediction of passenger flows has become increasingly popular among researchers. In this study, we are going to predict passenger flow or the number of passengers on buses when they are leaving stops (which, in other words, is the number of passengers when the bus arrives at the next stop). Based on Luo et al. ( 3 ), this kind of passenger flow prediction is called service-level passenger flow which means the total number of on-board passengers for each service of each bus line passing through each stop. As the focus of this study is the short-term prediction of passenger flow at bus stops, we briefly review studies on short-term prediction of passenger flows in public transportation systems, including bus and Metro systems. In the paper we build on the work of Cui et al. ( 4 ) who developed a graph neural network (GNN) network for link traffic states prediction. We use this framework in the context of transit ridership.

The rest of the paper is structured as follows: In the next section, we survey the literature on passenger flow prediction. In the methodology section, we explain the architecture of the proposed model, as well as some background required for understanding the proposed model. We then describe the data set and present the results of the proposed model in conjunction with the results of a simple long short-term memory (LSTM) model. We then conclude the paper presenting some ideas on areas of future research.

Literature Review

Different sources of data and various methodologies have been employed to predict passenger flows in public transportation systems. New technologies such as automatic vehicle location (AVL) and automatic passenger counters (APC) make large amounts of data available to transit planners and operators for this purpose ( 5 ). Automated fare collection (AFC) data, APC data, and AVL data have been among the most popular data sources used by researchers. The methodological attempts, however, have been even more diverse, ranging from statistical and mathematical models to simulation tools, to data-driven, and deep learning approaches ( 6 ). Existing methods can be divided into two main categories: linear and nonlinear methods. In the following, we briefly discuss the existing literature in these two categories.

Linear Methods

Initial attempts at passenger flow prediction have been mainly focused on linear models. Among classic linear methods, linear regression models and Kalman filter-based methods have been among the methods most frequently used in the passenger flow prediction literature. To name a few, Yang et al. ( 7 ) employed a general linear regression for passenger flow volume forecasting, using public transit smart card data. Zhang et al. ( 8 ) developed a Kalman filter-based method for short-term prediction of passenger flow at individual bus stops. Jiao et al. ( 9 ) also used three revised Kalman filtering models for short-term rail transit passenger flow prediction.

However, since most of the data used in passenger flow prediction, especially AFC and APC data, have a spatial-temporal nature, among different linear models, autoregressive moving average (ARMA), and autoregressive integrated moving average (ARIMA) have attracted the main attention of researchers in this field. Ma et al. ( 10 ) and Xue et al. ( 11 ) developed interactive, multiple models for passenger flow prediction. Milenković et al. ( 12 ) tested several SARIMA (seasonal ARIMA) models to choose the most appropriate one for predicting rail passenger flows on Serbian Railways. Zhou et al. ( 13 ) also predicted passenger demand on bus services using three predictive models: a time-varying Poisson model, a weighted time-varying Poisson model, and an ARIMA model. Gong et al. ( 14 ) proposed a framework consisting of three sequential stages, combining a seasonal ARIMA-based method, an event-based model, and a Kalman filter-based method to predict the waiting passenger counts at stops.

Nonlinear Methods

In recent years, nonlinear prediction methods, especially machine learning techniques, have been widely used by researchers in the field of short-term passenger flow prediction. Chen et al. ( 15 ) built a transit flow prediction model based on a least squares support vector machine (LS-SVM) to predict passenger flows on a bus route. Bhattacharya et al. ( 16 ) also proposed a Gaussian process-based approach for modeling and predicting bus ridership. Samaras et al. ( 17 ) developed predictive models using random forests and bagging of regression trees for predicting passenger demand using AVL and APC data from a bus fleet. Sun et al. ( 18 ) developed a hybrid model combining Wavelet and SVM models. Li et al. ( 19 ) employed random forest regression for the short-term prediction of passenger flow for a railway transit system. However, among different machine learning techniques, artificial neural networks (ANNs) have received particular attention from researchers, and various versions of these models, such as radial basis function (RBF) neural networks ( 20 , 21 ), multiple temporal unit neural networks (MTUNN), the parallel ensemble neural networks ( 22 ), and the combination of ANNs with other linear and nonlinear methods have been employed in the literature ( 23 , 24 ).

Deep Learning Methods

As members of the nonlinear modeling family, deep learning methods have become popular for short-term forecasting of passenger-flow prediction in recent years. The advent of real-time data collection and information diffusion systems, on the one hand, and the ever-increasing complexity of transportation networks, on the other, have accelerated the popularity of these modeling tools. Liu et al. proposed a passenger flow prediction model using deep learning methods based on the passenger flows collected from AFC systems. Their model contained temporal features as well as real-time and historical passenger flows and was able to predict hourly passenger flows using a three-stage deep learning architecture for any specified hour at Xiamen BRT stations. Although the experimental results for some scenarios show room for improvement, their study showed that it is possible to develop a robust and universal prediction model by combining rich data with deep learning methods ( 25 ). Since then, various structures of deep learning have been employed by researchers for short-term prediction of passenger flows. These methods include fully connected deep neural networks ( 26 , 27 ), convolutional neural networks ( 28 ), and LSTM-based models ( 29 – 33 ). Also, to take different affecting factors and complex interdependencies into account, special attention has been drawn to combinations of different machine learning and deep learning models, such as the combination of convolutional neural networks and long-short-term-memory models ( 3 , 34 , 35 ).

In recent years, the graph family of neural networks has become very popular in different fields ( 4 , 36 , 37 ) and ( 39 ). Graph neural networks (GNN) are neural models that capture the dependence of graphs via message passing between the nodes of graphs ( 36 ). Since many transportation networks consist of nodes and links, they can naturally be considered as directed or undirected graphs. Accordingly, researchers in different areas of transportation have used such tools for mimicking transportation networks for modeling and prediction purposes ( 4 , 39 –41). That said, most such studies have been conducted by researchers in the area of automobile traffic flow prediction and travel demand modeling ( 41 ), and studies on passenger flow prediction have been mostly concerned with Metro and urban railway, that is, relatively sparse transit systems. To name a few, Li et al. developed a graph redconvolutional neural network (G-CNN) for urban traffic passenger flow prediction. They evaluated their model with Beijing subway data and compared its performance with other traditional models on short-term passenger flow prediction tasks ( 42 ). Han et al. ( 43 ) proposed a deep-learning-based approach, named STGCNNmetro (spatio-temporal graph convolutional neural networks for Metro), to collectively predict two types of passenger flow volumes—inflow and outflow—in each Metro station of a city. Wang et al. developed a dynamic hypergraph convolution network for Metro passenger flow prediction. To consider the higher-order relationships between stations and travel patterns of passengers, they employed a dynamic hypergraph neural network instead of a graph convolutional neural network. In the prediction framework, the primary hypergraph is constructed from Metro system topology and then extended with advanced hyperedges discovered from pedestrian travel patterns of multiple time spans. Furthermore, hypergraph convolution and spatio-temporal blocks are proposed to extract spatial and temporal features to achieve node-level prediction ( 44 ). Other studies using graph neural networks for predicting passenger flow at rail transportation networks can be found in ( 41 , 42 , 45 , 46 ).

To summarize, most short-term passenger flow prediction attempts, especially when it comes to using graph-based neural networks, have been focused on railway and Metro transportation systems. Although it might be inferred that there are no notable differences between Metro/railway and bus public transportation systems, there are significant operational and design distinctions between these two classes of transit systems. For instance, Metro/rail transit systems are generally fixed to pre-planned timetables as they usually operate underground or in dedicated infrastructure. In contrast, most bus transport systems share many of their routes with private vehicle traffic. This causes such systems to deviate from the expected timetables, especially during peak hours. These disruptions cause significant delays and have a direct impact on passenger flows at bus stops, which makes prediction in bus transport systems more complicated. In addition, bus transportation networks are usually more interconnected than Metro/railway systems as they are sometimes designed as feeders or complementary systems for higher-level transit systems like Metro systems. Besides, they also have more stops and more transfer points, and this is specially important in Laval’s bus transportation network as it is a city that is highly dependent on bus transportation given its low number of Metro stations (three for the whole city).

Considering the above, the appropriateness and effectiveness of GNNs in the context of buses needs to be tested and established. Accordingly, in this study, a graph convolutional LSTM recurrent neural network based on work in traffic states prediction from Cui et al. ( 4 ) is proposed for the short-term prediction of passenger flows at 929 bus stops in Laval, Canada. Instead of representing bus stops by grids and employing conventional redCNNs to capture spatio-temporal dependencies, we transform the bus network into a graph to take the spatio-temporal correlations between different stops and lines into account. To the best of our knowledge, this is the first time that a graph-based neural network has been used for network-based prediction of passenger flows in a bus transit system.

Methodology

Considering the complex interactions and uncertainties in bus transportation networks, choosing an appropriate modeling structure is of great importance. For the purpose of this study, and considering different structures used by previous researchers, we found the one developed by Cui et al. ( 4 ), which was originally used in the context of traffic state estimation, well-suited for application in bus network passenger flow prediction. This modeling structure enjoys a network-based graph convolution operator that allows more realistical modeling of the transportation network compared with transitionally used convolutional neural networks (CNNs). Conventional CNNs are appropriate when dealing with spatial relationships in Euclidean space, while in a real transportation network, these spatial relationships are not necessarily distributed in Euclidean space ( 4 ). For instance, two bus stops might be physically close but not connected to each other, and therefore have low spatial dependencies. Accordingly, conventional CNN-based methods cannot deal with the topological structure and the physical attributes of the transportation networks, especially in bus transportation networks that are not fully connected networks. The graph-based convolution operator solves this issue by extracting features from the graph-structured network. Moreover, the employed modeling structure includes an LSTM component to dynamically capture the temporal dependencies in passenger flow observations. The combination of the graph-based convolution operator and the LSTM model is a unique structure that was not found in the relevant literature and therefore inspired us to use this modeling framework for the purposes of this study. This integration enables the resulting model to capture both spatial and temporal dependencies among network-wide passenger flows at a station level.

Model Input Definitions

Before the model itself is presented, we define our bus network graph and its adjacency and nodes bus network proximity matrices which are the model’s inputs.

The Bus Network Graph and its Adjacency Matrix

In general, a graph $G = (N, E)$ consists of a set $N$ whose elements are called nodes $n_{i} \in N$ , and a set $E$ with elements called edges $(n_{i}, n_{j}) \in E$ . In this study, bus stops are considered the nodes, and bus routes between stops are defined as the edges of the graph. There are two kinds of graph: directed and undirected ( 47 ). We consider the bus network to be a directed graph, since travel on a given route in a given direction is different from the other direction. In addition, there are even sometimes stops in one direction of a route but not in the other direction.

Connectivity between nodes in the graph is described with its adjacency matrix, which is a $n \times n$ matrix with the nodes of $G$ on its rows and columns. If we consider $A \in R^{(N \times N)}$ as an adjacency matrix, $A_{(i, j)}$ is 1 if node i has been connected to node j and $A_{(i, j)}$ is 0 otherwise ( 47 ). In this study, $A_{(i, i)} = 1$ since we consider each node to be connected to itself.

Bus Network Proximity Matrix

In a real bus network, the influences of all stops on each other are not equal. For example, the stops that are closer to each other have more impact on themselves than the stops that are far from each other. To consider this characteristic of the bus network, the bus network proximity (BNP) matrix is defined, whose cells represent the possibility of reaching a given destination node from a given origin node in a given period of time. Thus, first, based on the distance between stops in the bus network, we define a distance matrix $Dist \in R^{N \times N}$ . In this matrix, each cell $Dis t_{i, j}$ represents the actual distance between stops $i$ and $j$ . Then, using the defined distance matrix and average bus speed in each edge, we will define the BNP matrix as follows:

BN P_{i, j} = {\begin{matrix} 1, & if S_{i, j} Δ t - Dis t_{i, j} \geq 0 \\ 0, & otherwise \end{matrix}

(1)

where $S_{i, j}$ is the average speed of the bus between the stops $i$ and $j$ , and $Δ t$ is the time interval. Each element $BN P_{i, j}$ equals 1 if the passenger can, by using the bus, reach from stop $i$ to stop $j$ under a specific time interval $Δ t$ , and $BN P_{i, j} = 0$ otherwise. Also, each stop is considered self-reachable, so all diagonal values in $BNP$ are equal to 1.

Short-Term Passenger Flow Prediction Problem

The model which is defined in this paper is used to predict short-term passenger flows. For this prediction, the temporal feature, the passenger flow situation of the network before the target time for which we want to predict the passenger flow, is considered. Also, the spatial features, for example, the environmental characteristics of the network, have been used for this prediction. $p_{t} \in R^{N}$ is a vector that includes the passenger flows at all stops of the bus network graph in time $t$ . Learning a function to map $T$ time steps of historical graph passenger flows, that is, $P_{T} = [p_{1}, p_{2}, \dots, p_{t}, \dots, p_{T}]$ , to the graph passenger flows in the subsequent one or multiple time steps, is the aim of the short-term passenger flow prediction. For this study, passenger flow is predicted for one step, that is, $p_{T + 1}$ , and $F (.)$ is defined as follows:

F ([p_{1}, p_{2}, \dots, p_{t}, \dots p_{T - 1}, p_{T}]; G (N, E, A, BNP)) = p_{T + 1})

(2)

In this formula, $N$ is the set of nodes, $E$ is the set of edges, $A$ is the network adjacency matrix, and $BNP$ is the BNP matrix. All of these four values together define the bus network graph $G$ .

Bus Network Graph Convolution

The convolution layer in a neural network is defined as a tool for extracting spatial features from the input data, which can have a two- or three-dimensional matrix structure. Based on this definition, a graph convolution layer is defined, and is used to extract the spatial features from the input data in a graph structure ( 4 , 48 , 49 ). The graph convolution operation to extract features from the neighborhood in our model is defined as the product of input data, the adjacency matrix, the BNP matrix, and a trainable weight matrix as a graph convolution operation to extract features from the neighborhood. The formula for the bus network graph convolution (BNGC) operation is as follows:

BNG C_{t} = (T W_{bngc} ⊙ A ⊙ BNP) p_{t}

(3)

where

$BNG C_{t} \in R^{N}$ is the extracted BNGC feature at time $t$ ,

$T W_{bngc} \in R^{N \times N}$ is a trainable weight matrix,

$A$ is the adjacency matrix,

BNP is the BNP matrix, and

$p_{t} \in R^{N}$ is the vector of passenger flows for all stops in the network at time $t$ .

In this equation, the element-wise matrix multiplication (Hadamard product operator) is represented $by ⊙$ . Based on the definition of the adjacency matrix $A$ and the BNP matrix $BNP$ , they are sparse matrices; thus the result of $T W_{bngc} ⊙ A ⊙ BNP$ is also sparse and only contains 0 and 1 elements. Therefore, the trained weight $T W_{bngc}$ can be used to measure the interactive influence between bus network graph stops, which enhances interpretability of the model ( 4 ).

Bus Network Graph Convolution LSTM Neural Network Model

We use a bus network graph convolutional LSTM (BNG-ConvLSTM) neural network model for short-term passenger flow forecasting. This model learns both the spatial and temporal dependencies presented in the bus network. A basic LSTM is used and the BNGC features are its inputs. However, the gate structure and the hidden state in the basic LSTM model are not changed ( 50 ). The input gate $i g_{t}$ , the output gate $o g_{t}$ , forget gate $f g_{t}$ , and the input cell state ${\tilde{C}}_{t}$ in the context of time step t are defined as follows:

i g_{t} = σ_{g} (T W_{ig} \cdot BNG C_{t} + U_{ig} \cdot h_{t - 1} + b_{ig})

(4)

o g_{t} = σ_{g} (T W_{og} \cdot BNG C_{t} + U_{og} \cdot h_{t - 1} + b_{og})

(5)

f g_{t} = σ_{g} (T W_{fg} \cdot BNG C_{t} + U_{fg} \cdot h_{t - 1} + b_{fg})

(6)

{\tilde{C}}_{t} = \tanh (T W_{c} \cdot BNG C_{t} + U_{c} \cdot h_{t - 1} + b_{c})

(7)

In these equations, $\cdot represents$ the matrix multiplication operator. $T W_{ig}$ , $T W_{og}$ , $T W_{fg}$ , and $T W_{C} \in R^{N \times N}$ are the weight matrices; they map the input to the three gates and the input cell state, while $U_{ig}$ , $U_{og}$ , $U_{fg}$ , and $U_{C} \in R^{N \times N}$ are the weight matrices that are used in the preceding hidden state. Also, $b_{ig}$ , $b_{og}$ , $b_{fg}$ , and $b_{C} \in R^{N}$ are used as bias vectors. The $σ_{g}$ is the gate activation function that is parametrized as a sigmoid function, and $\tanh$ is the hyperbolic tangent function ( 4 ).

As mentioned before, in a bus network, each node is affected by previous states of itself and neighboring stops; therefore, the states of neighboring cells of each node in the graph should affect its LSTM cell state. Accordingly, a cell state gate is defined as follows and is added to the LSTM cell:

C_{t - 1}^{*} = T W_{N} ⊙ (A ⊙ BNP) \cdot C_{t - 1}

(8)

In this equation, the impacts of neighboring cell states are measured by $T W_{N}$ , which multiplies the product of the adjacency and BNP matrices, $A ⊙ BNP$ , to reflect the bus network structure. The impact of neighboring cell states is considered when the cell state is recurrently input to the subsequent time step. Therefore, the final cell state $C_{T}$ and the hidden state $h_{t}$ are defined as follows:

C_{t} = f g_{t} ⊙ C_{t - 1}^{*} + i g_{t} ⊙ {\tilde{C}}_{t}

(9)

h_{t} = o g_{t} ⊙ \tanh (C_{t})

(10)

Eventually, at the last time step T, the hidden state $h_{T}$ will be the output of the BNG-ConvLSTM model, that is, the predicted value $\hat{y} = h_{T}$ . If we consider $y_{T} \in R^{N}$ as the label of the input passenger flow data $P_{T} \in R^{N \times N}$ , for the sequence prediction problem in this study, the label of time step $T$ is the input of the next time step $(T + 1)$ , that is $y_{T} = p_{T + 1}$ . Therefore, the loss during the training process is defined as:

Loss = L (y_{t}, {\hat{y}}_{t}) = L (p_{T + 1}, h_{T})

(11)

In Equation 11, $L (.)$ is a function for calculating the residual between the predicted value ${\hat{y}}_{T}$ and the actual value $y_{T}$ . In this study, we use mean squared error (MSE) as a loss function since it is commonly used for predicting continuous values.

Experiments

This section describes the data used and the results produced by our proposed model as well as three other previously and commonly used models in passenger flow prediction.

Data Used

The data set used in this study includes information of the passenger flow from the Société de transport de Laval (STL) in Laval, Canada, for one month (April 2020). The STL is the transit operator for Laval, Canada. Laval is the second largest city in the province of Quebec, but it is in the region of Greater Montreal. Montreal is Canada’s second largest city with a population of around 4 million according to Statistics Canada. While Laval has three stops of the Montreal Metro, its services are primarily bus-based. In this study, to test and verify many assumptions and scenarios, we decided to focus on a subnetwork of the Laval bus transportation network; nonetheless, we have chosen a relatively complex subset of the network including route 17-Northbound, as one of the busiest express routes in Laval, and all of its feeder lines. Thus, while this makes it easier for us to train and run our model for different scenarios, it also leaves us with a subnetwork that covers a significant portion of the city’s public transport demand and enables us to evaluate our proposed framework on a real network. It is worth mentioning that previous work had focused on one stop or one route, so the advantages of a graph approach could not be truly evaluated.

The data used in this study are from the automatic passenger counters (APC) of the Laval bus transportation system. Table 1 provides information about the route numbers, number of stops on each route, and number of observations per route for all routes in the modeled network of Laval. Each observation includes the number of passengers on board the bus when it leaves the stop (passenger flow). The “No. of Observations” column is the total number of times the buses appeared at the stops for each route, and the “No. of Passengers” column is the total number of passengers in the buses when they left the stops for each route. As can be seen, a total number of 929 stops from 15 routes have been studied. The studied network can be seen in Figure 1. The distance matrix $(Dist)$ , adjacency matrices $(A)$ , and BNP matrix $(BNP)$ for the data set are calculated based on the real distance between the stops in the network and the average speed for the bus (which is considered 18 km/h for all edges according to the Société de transport de Montréal [ 51 ]).

Table 1.

A Description of Data Used for the Modeled Network (Laval, April 2020)

Route number and direction	No. of stops	No. of observations	No. of passengers
17—Northbound	$51$	$62, 943$	$122, 478$
70—Westbound	$91$	$76, 766$	$153, 683$
39—Northbound	$75$	$58, 050$	$58, 051$
73—Northbound	$79$	$52, 298$	$52, 299$
74—Eastbound	$93$	$48, 639$	$48, 640$
22—Eastbound	$73$	$39, 859$	$84, 985$
48—Eastbound	$58$	$39, 208$	$39, 209$
31—Northbound	$66$	$38, 874$	$70, 950$
27—Northbound	$58$	$36, 540$	$43, 415$
60—Northbound	$56$	$33, 600$	$58, 629$
58—Eastbound	$57$	$32, 490$	$53, 417$
45—Northbound	$52$	$30, 215$	$30, 216$
43—Northbound	$37$	$23, 423$	$23, 424$
222—Eastbound	$63$	$18, 574$	$18, 575$
2—Westbound	$20$	$1, 560$	$1, 649$
Total	$929$	$593, 039$	$1, 139, 433$

Figure 1.

The modeled network of bus stations and routes, including route 17-Northbound and all routes intersecting it: Laval, Canada.

Experimental Setting

To evaluate the performance of the proposed model, in addition to comparing the predicted values with real observations, the model’s performance is compared with three other deep learning models. These are the multi-layer perceptron (MLP) model, the convolutional neural network (CNN) model, and the long short-term memory (LSTM) model. These models have been frequently used for passenger flow prediction in the literature in the past. For the BNG-ConvLSTM model, the dimensions of the hidden states are set as the number of nodes in the bus transportation network graph. The model is trained by minimizing the mean square error with a batch size of 32 and the initial learning rate of $10^{(- 5)}$ . Since the RMSProp algorithm overcomes gradient exploding and vanishing problems ( 4 ), it is used as the gradient descent optimizer in the model.

Evaluation

To predict the passenger flows at each time interval, the historical data for 10 previous intervals are used as the input time series. All intervals are set equally to 10 min. The performance of the BNG-ConvLSTM model and three other deep learning models in this study are evaluated by two commonly used metrics: (1) mean absolute error (MAE); and (2) root mean squared error (RMSE). In addition, as our problem is predicting passenger flows we are also going to calculate the percent accuracy of predictions as Equation 14 shows.

MAE = \frac{1}{n} \sum_{T = 1}^{n} | y_{T} - {\hat{y}}_{T} |

(12)

RMSE = \sqrt{\frac{1}{n} \sum_{T = 1}^{n} {(y_{T} - {\hat{y}}_{T})}^{2}}

(13)

Percent Accuracy = \frac{Number of correct predictions}{n} \times 100

(14)

where

${\hat{y}}_{T}$ is the predicted value,

$y_{T}$ is the actual value, and

n is the number of observations for all stops in the test data set.

The estimation of MAE, RMSE, and percent accuracy is based on a stop-level prediction and real-world values. To clarify, for each time span of the test data set, the model will predict the value for each stop; therefore, if $n_{obs}$ is defined as the number of observations in the data set, and $n_{stops}$ represents the number of stops in the network, then n will be equal to $n_{obs} \times n_{stops}$ .

Experimental Results

Moreover, the performance results, shown in Table 2, indicate that the BNGCal LSTM model outperforms other models with all four evaluation metrics. The values in Table 2 are calculated for all test observations based on the real and predicted values. As can be seen, the MAE and RMSE values for the BNG-ConvLSTM model are respectively $0.251$ and $0.851$ which are less than other models’ values for these two metrics. The reason is likely related to the proposed model given the spatial features, network topology, as well as temporal features, and, in contrast, other models ignore at least one kind of such important features. The percent accuracy values of the different models are also shown in Table 2. We have also defined a scenario in which we classify the passenger flows into bins with size 5, and we calculate the percent accuracy for this scenario too. As can be seen, the BNG-ConvLSTM model has shown a better satisfactory performance than other models even when the bin size is relatively small, which means the model can predict the variations in passenger flow with acceptable accuracy most of the time. Additionally, Table 3 compares the performance of the models used in this study for the peak hours on weekdays, as those are the busiest hours during the week. As can be seen, the proposed model offers superior performance to other models for all four metrics compared with other models.

Table 2.

Performance Comparison between Different Models

Model	MAE	RMSE	Total percent accuracy (%)	5-size bins percent accuracy (%)
MLP	$0.687$	$1.849$	$73.3$	$93.4$
CNN	$0.699$	$1.793$	$70.0$	$93.1$
LSTM	$0.337$	$1.030$	$80.2$	$96.1$
BNG-ConvLSTM	0.251	0.851	85.3	97.3

Note: MAE = mean absolute error; RMSE = root mean squre error; CNN = convolutional neural network; MLP = multi-layer perceptron; LSTM = long short-term memory; BNG-ConvLSTM = bus network graph convolutional long short-term memory.

Table 3.

Performance Comparison Between Different Models in Peak Hours on Weekdays

Model	MAE	RMSE	Total accuracy (%)	5-size bins accuracy (%)
MLP	$0.814$	$2.521$	$64.6$	$88.5$
CNN	$0.845$	$2.325$	$62.1$	$87.9$
LSTM	$0.573$	$1.413$	$72.2$	$93.2$
BNG-ConvLSTM	0.461	1.18	76.6	95.6

The training efficiency of the BNG-ConvLSTM model and other deep learning models in this study is examined. Figure 2 shows the validation loss curves versus the training epoch for all models. In the training process, the early stopping mechanism is used, so that each model can have a different number of training epochs. As Figure 2 indicates, the loss function of the BNG-ConvLSTM model decreases faster than the other models. This model needs fewer epochs than LSTM to coverage. However, the MLP and CNN models converged in fewer epochs than the BNG-ConvLSTM model.

Figure 2.

Validation loss versus training epoch.

Also, the comparison of computation time for all models is shown in Figure 3. As can be seen, the running time of each epoch for the BNG-ConvLSTM model is more than the other models. The difference between the running times of each epoch for the BNG-ConvLSTM model are far greater than that of MLP and CNN models, and since these two models also converged in a smaller number of epochs than the BNG-ConvLSTM model, the total running time of these two models is much less than the BNG-ConvLSTM model. However, because the BNG-ConvLSTM model converged in fewer epochs than the LSTM model, the total running time of this model is almost equal to LSTM model. All of the models were implemented on the OMEN GT13-0090 30L Gaming PC with characteristics of NVIDIA^® GeForce RTX^TM 3090, Intel Core i9-10850K, and HyperX^® 32 GB DDR4-3200 XMP MHz RAM (2 × 16 GB).

Figure 3.

Comparing computation time between LSTM and BNG-ConvLSTM models.

Conclusion

This paper develops a novel graph-based deep learning approach based on the GNN traffic forecasting framework of Cui et al. ( 4 ) for short-term prediction of passenger flows in a bus network. To do so, we define a BNGC operation that incorporates the adjacency and bus network proximity matrices to extract the spatial features of the network. We propose a BNGCal LSTM (BNG-ConvLSTM) neural network to forecast network-wide passenger flow. Therefore, in the formulation of the model, both the spatial and temporal aspects of the passenger flow are considered.

Moreover, based on actual data of the bus network of Laval, Canada, a case study is carried out to compare the proposed BNG-ConvLSTM with other prediction methods, that is, multi-layer perceptron (MLP) model, the convolutional neural network (CNN) model, and the general long short-term memory (LSTM) model. The proposed model performs better than the other candidate models on the measurement of mean absolute error (MAE) and root mean square error (RMSE). In addition, we compare the percent accuracy of the models’ results. We also calculated the percent accuracy for a scenario where passenger flow is classified into bins of size 5. The BNG-ConvLSTM model shows more accurate results than other models, even when the bin size is relatively small, meaning it can accurately predict many passenger flow variations.

Although the proposed framework has demonstrated its ability to make robust predictions for the short-term passenger flows at the network level, it is not yet complete. Further research is required. Firstly, to improve the proposed model further, other factors, such as weather, construction, festivals, weekday, or weekend could be considered. Secondly, the origin destination matrix in different periods could also be considered in future research to assess the spatial correlation of each stop in each time period. Furthermore, GTFS data, bus frequency, and bus capacity could be considered in calculating network proximity so that it is more realistic.

Footnotes

Acknowledgements

Our appreciation goes out to BusPas Inc and Mitacs Accelerate Program for supporting this research. We would also like to show our gratitude to Société de transport de Laval (STL) for providing the data.

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: Asiye Baghbani, Nizar Bouguila, and Zachary Patterson; data collection: Nizar Bouguila, and Zachary Patterson; analysis and interpretation of results: Asiye Baghbani, Nizar Bouguila, and Zachary Patterson; draft manuscript preparation: Asiye Baghbani, Nizar Bouguila, and Zachary Patterson. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Asiye Baghbani

Nizar Bouguila

Zachary Patterson

References

Zhou

Dai

Zhang

Passenger Demand Prediction on Bus Services. Proc., International Conference on Green Computing and Internet of Things (ICGCIoT), Noida, IEEE, New York, 2015, pp. 1430–1435.

Ceder

Bus Frequency Determination Using Passenger Count Data. Transportation Research Part A: General, Vol. 18, No. 5–6, 1984, pp. 439–453.

Luo

Zhao

You

Liu

Zhang

Zuo

Fine-Grained Service-Level Passenger flow Prediction for Bus Transit Systems Based on Multitask Deep Learning. Proc., IEEE Transactions on Intelligent Transportation Systems, IEEE, New York, 2020, pp. 7184–7199.

Cui

Henrickson

Wang

Traffic Graph Convolutional Recurrent Neural Network: A Deep Learning Framework for Network-Scale Traffic Learning and Forecasting. IEEE Transactions on Intelligent Transportation Systems, Vol. 21, No. 11, 2019, pp. 4883–4894.

Hammerle

Haynes

McNeil

Use of Automatic Vehicle Location and Passenger Count Data to Evaluate Bus Operations: Experience of the Chicago Transit Authority, Illinois. Transportation Research Record: Journal of the Transportation Research Board, 2005. 1903: 27–34.

Zhai

Cui

Nie

Zhang

A Comprehensive Comparative Analysis of the Basic Theory of the Short Term Bus Passenger flow Prediction. Symmetry, Vol. 10, No. 9, 2018, p. 369.

Yang

Zhao

Jin

Mao

Passenger Flow Volume Forecasting Method Based on Public Transit Intelligent Card (IC) Survey Data. Transport Research, Vol. 9, 2009, pp. 115–119.

Zhang

C.-H.

Song

Sun

Kalman Filter-Based Short-Term Passenger Flow Forecasting on Bus Stop. Journal of Transportation Systems Engineering and Information Technology, Vol. 11, No. 4, 2011, pp. 154–159.

Jiao

Sun

Hou

Ibrahim

, Three Revised Kalman Filtering Models for Short-Term Rail Transit Passenger Flow Prediction. Mathematical Problems in Engineering, Vol. 2016, 2016, p. 10.

10.

Xing

Mesbah

Ferreira

, Predicting Short-Term Bus Passenger Demand Using a Pattern Hybrid Approach. Transportation Research Part C: Emerging Technologies, Vol. 39, 2014, pp. 148–163.

11.

Xue

Sun

D. J.

Chen

Short-Term Bus Passenger Demand Prediction Based on Time Series Model and Interactive Multiple Model Approach. Discrete Dynamics in Nature and Society, Vol. 2015, 2015, p. 11.

12.

Milenković

Švadlenka

Melichar

Bojović

Avramović

SARIMA Modelling Approach for Railway Passenger flow Forecasting. Transport, Vol. 33, No. 5, 2018, pp. 1113–1120.

13.

Zhou

Dai

The Passenger Demand Prediction Model on Bus Networks. Proc., IEEE 13th International Conference on Data Mining Workshops, Dallas, TX, IEEE, New York, 2013, pp. 1069–1076.

14.

Gong

Fei

Wang

Z. H.

Qiu

Y. J.

Sequential Framework for Short-Term Passenger Flow Prediction at Bus Stop. Transportation Research Record: Journal of the Transportation Research Board, 2014. 2417: 58–66.

15.

Chen

Zhao

The Use of LS-SVM for Short-Term Passenger Flow Prediction. Transport, Vol. 26, No. 1, 2011, pp. 5–10.

16.

Bhattacharya

Phithakkitnukoon

Nurmi

Klami

Veloso

Bento

Gaussian Process-Based Predictive Modeling for Bus Ridership. Proc., ACM Conference on Pervasive and Ubiquitous Computing Adjunct Publication, Zurich, Switzerland, Association for Computing Machinery, New York, NY, 2013, pp. 1189–1198.

17.

Samaras

Fachantidis

Tsoumakas

Vlahavas

A Prediction Model of Passenger Demand Using AVL and APC Data From a Bus Fleet. Proc., 19th Panhellenic Conference on Informatics, Athens, Greece, Association for Computing Machinery, New York, NY, 2015, pp. 129–134.

18.

Sun

Leng

Guan

A Novel Wavelet-SVM Short-Time Passenger Flow Prediction in Beijing Subway System. Neurocomputing, Vol. 166, 2015, pp. 109–121.

19.

L. H.

Zhu

J. S.

Shan

X. H.

Zhang

Prediction Modeling of Railway Short-Term Passenger Flow Based on Random Forest Regression. Proc., International Conference on Green Intelligent Transportation System and Safety, Springer, Singapore, 2017, pp. 867–875.

20.

J. T.

Yang

J. F.

Prediction of Dalian Station Passenger Volume Based on RBF Neural Network. Journal of Dalian Jiaotong University, Vol. 28, 2007, pp. 32–34.

21.

Wang

Qin

Zhang

Short-Term Passenger Flow Prediction Under Passenger Flow Control Using a Dynamic Radial Basis Function Network. Applied Soft Computing, Vol. 83, 2019, p. 105620.

22.

Tsai

T.-H.

Lee

C.-K.

Wei

C.-H.

Neural Network Based Temporal Feature Models for Short-Term Railway Passenger Demand Forecasting. Expert Systems With Applications, Vol. 36, No. 2, 2009, pp. 3728–3736.

23.

Zhao

S.-Z.

T.-H.

Wang

Gao

X.-T.

A New Approach to the Prediction of Passenger Flow in a Transit System. Computers & Mathematics With Applications, Vol. 61, No. 8, 2011, pp. 1968–1974.

24.

Teng

Chen

Modified Bus Passenger Flow Forecasting Model Based on Integrating ARIMA With Neural Network. Proc., 15th COTA International Conference of Transportation Professionals (CICTP), Beijing, China, American Society of Civil Engineers, Reston, VA, July 24–27, 2015, pp. 1300–1310.

25.

Liu

Chen

R.-C.

A Novel Passenger Flow Prediction Model Using Deep Learning Methods. Transportation Research Part C: Emerging Technologies, Vol. 84, 2017, pp. 74–91.

26.

Zhu

Yang

Wang

Prediction of Daily Entrance and Exit Passenger Flow of Rail Transit Stations by Deep Learning Method. Journal of Advanced Transportation, Vol. 2018, 2018, p. 11.

27.

Gallo

De Luca

D’Acierno

Botte

Artificial Neural Networks for Forecasting Passenger Flows on Metro Lines. Sensors, Vol. 19, No. 15, 2019, p. 3424.

28.

Short-Term Prediction of Passenger Demand in Multi-Zone Level: Temporal Convolutional Neural Network With Multi-Task Learning. IEEE Transactions on Intelligent Transportation Systems, Vol. 21, No. 4, 2019, pp. 1480–1490.

29.

Hao

Lee

D.-H.

Zhao

Sequence to Sequence Learning With Attention Mechanism for Short-Term Passenger Flow Prediction in Large-Scale Metro System. Transportation Research Part C: Emerging Technologies, Vol. 107, 2019, pp. 287–300.

30.

Liu

Jia

DeepPF: A Deep Learning Based Architecture for Metro Passenger Flow Prediction. Transportation Research Part C: Emerging Technologies, Vol. 101, 2019, pp. 18–34.

31.

Lin

Wang

Gong

Passenger Flow Prediction Based on Land Use Around Metro Stations: A Case Study. Sustainability, Vol. 12, No. 17, 2020, p. 6844.

32.

Yang

Xue

Ding

Gao

Short-Term Prediction of Passenger Volume for Urban Rail Systems: A Deep Learning Approach Based on Smart-Card Data. International Journal of Production Economics, Vol. 231, 2021, p. 107920.

33.

Toqué

Khouadjia

Come

Trepanier

Oukhellou

Short & Long Term Forecasting of Multimodal Transport Passenger Flows With Machine Learning Methods. Proc., IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, IEEE, New York, 2017, pp. 560–566.

34.

Zhang

Bao

Hong

Shi

A Hybrid Spatiotemporal Deep Learning Model for Short-Term Metro Passenger Flow Prediction. Journal of Advanced Transportation, Vol. 2020, 2020, p. 12.

35.

Jing

Guo

Wang

Chen

Short-Term Prediction of Urban Rail Transit Passenger Flow in External Passenger Transport Hub Based on LSTM-LGB-DRS. IEEE Transactions on Intelligent Transportation Systems, Vol. 22, No. 7, 2020, pp. 4611–4621.

36.

Zhou

Cui

Zhang

Yang

Liu

Wang

Sun

Graph Neural Networks: A Review of Methods and Applications. AI Open, Vol. 1, 2020, pp. 57–81.

37.

Pan

Chen

Long

Zhang

Philip

S. Y.

A Comprehensive Survey on Graph Neural Networks. IEEE Transactions on Neural Networks and Learning Systems, Vol. 32, No. 1, 2020, pp. 4–24.

38.

Leskovec

Jegelka

How Powerful are Graph Neural Networks?

arXiv Preprint arXiv:1810.00826, 2018.

39.

Zhang

Dong

Shang

Zhang

Wang

A Multi-Modal Graph Neural Network Approach to Traffic Risk Forecasting in Smart Urban Sensing. Proc., 17th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), Como, Italy, IEEE, New York, 2020, pp. 1–9.

40.

Zhou

Yang

Zhang

Trajcevski

Zhong

Khokhar

Reinforced Spatiotemporal Attentive Graph Neural Networks for Traffic Forecasting. IEEE Internet of Things Journal, Vol. 7, No. 7, 2020, pp. 6414–6428.

41.

Jiang

Luo

Graph Neural Network for Traffic Forecasting: A Survey. arXiv Preprint arXiv:2101.11174, 2021.

42.

Peng

Liu

Xiong

Wang

Bhuiyan

M. Z. A.

Graph CNNs for Urban Traffic Passenger Flows Prediction. Proc., IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), IEEE, New York, 2018, pp. 29–36.

43.

Han

Wang

Ren

Wang

Gao

Chen

Predicting Station-Level Short-Term Passenger Flow in a Citywide Metro Network Using Spatiotemporal Graph Convolutional Neural Networks. ISPRS International Journal of Geo-Information, Vol. 8, No. 6, 2019, p. 243.

44.

Wang

Zhang

Wei

Piao

Yin

Metro Passenger Flow Prediction via Dynamic Hypergraph Convolution Networks. IEEE Transactions on Intelligent Transportation Systems, Vol. 22, No. 12, 2021, pp. 7891–7903.

45.

S. T.

Zhang

Zhou

Yang

Attention-Based Graph Neural Network Enabled Method to Predict Short-Term Metro Passenger Flow. Proc., 5th International Conference on Universal Village (UV), Boston, MA, IEEE, New York, 2020, pp. 1–6.

46.

Zhao

Wang

Tsui

K. L.

GC-LSTM: A Deep Spatiotemporal Model for Passenger Flow Forecasting of High-Speed Rail Network. Proc., IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, IEEE, New York, 2020, pp. 1–6.

47.

Levi

A Note on the Derivation of Maximal Common Subgraphs of Two Directed or Undirected Graphs. Calcolo, Vol. 9, No. 4, 1973, p. 341.

48.

Kipf

T. N.

Welling

Semi-Supervised Classification With Graph Convolutional Networks. arXiv Preprint arXiv:1609.02907, 2016.

49.

Abu-El-Haija

Perozzi

Kapoor

Alipourfard

Lerman

Harutyunyan

Ver Steeg

Galstyan

Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. Proceedings of the International Conference on Machine Learning. PMLR, Vol. 97, 2019, pp. 21–29.

50.

Hochreiter

Schmidhuber

Long Short-Term Memory. Neural Computation, Vol. 9, No. 8, 1997, pp. 1735–1780.

51.

STM. Société de transport de Montréal, The bus network and the schedules enlightened, 2021. https://www.stm.info/en/info/networks/bus-network-and-schedules-enlightened.

Short-Term Passenger Flow Prediction Using a Bus Network Graph Convolutional Long Short-Term Memory Neural Network Model

Abstract

Keywords

Literature Review

Linear Methods

Nonlinear Methods

Deep Learning Methods

Methodology

Model Input Definitions

The Bus Network Graph and its Adjacency Matrix

Bus Network Proximity Matrix

Short-Term Passenger Flow Prediction Problem

Bus Network Graph Convolution

Bus Network Graph Convolution LSTM Neural Network Model

Experiments

Data Used

Experimental Setting

Evaluation

Experimental Results

Conclusion

Footnotes

Acknowledgements

Author Contributions

Declaration of Conflicting Interests

Funding

ORCID iDs

References