Sage Journals: Discover world-class research

Abstract

Predicting flow in taxi systems can improve taxi operations and reduce passengers’ waiting time. With the increasing availability of taxi mobility data sets, more studies have tried to analyze and model taxi demand. However, most existing works merely attempt to predict the number of pick-ups and drop-offs by using local observations. Nevertheless, predicting the passengers’ pair-wise flow specifically through simultaneous consideration of local variations and network communities in the taxi systems has been neglected. In this paper, we introduce a globally consistent periodic flow prediction for taxi systems that integrates communities in the matrix factorization model. We apply non-negative tensor factorization to capture the periodic variations in the passenger flow in different areas and predict the flow (and pick-ups and drop-offs) in the next periods by integrating the most recent observation with detected patterns from historical data. The results show an improvement in the prediction accuracy from some baselines and a state-of-the-art method. We also propose a method for visualizing flow that potentially improves taxi operations by assisting drivers with selecting passengers.

Keywords

data and data science artificial intelligence and advanced computing applications machine learning (artificial intelligence)geographic information science sustainability and resilience transportation and society accessible transportation and mobility taxi

Developments in positioning systems and communication technologies have improved our understanding of mobility in urban areas. Specifically, the availability of large taxi-trip data sets has provided an opportunity to deploy more efficient transportation services. In this regard, analyzing the taxi passenger flow and taxi demand in a spatio-temporal context can improve taxi operations and reduce passenger waiting time.

In recent years, scholars have studied graph structures known as “communities” to reveal major mobility patterns and divide transportation systems into multiple areas with different operational and functional characteristics ( 1 , 2 ). It seems that cluster/community detection is an ever-present part of most demand prediction methods. However, in the existing works, the application of community detection is limited to an early stage of modeling where smaller regions represented by network nodes are grouped into larger areas or meta-nodes that indicate the major demand areas (or major origins and destinations in general) in a transportation system.

While existing methods for demand prediction in taxi systems rely on revealing temporal patterns in local observations, considering communities as an integrated part of the prediction model has been ignored. Similar to modeling local variations, which is a key component in most state-of-the-art taxi demand prediction methods, considering communities can potentially lead to more-accurate predictions. In fact, we believe that communities should be leveraged beyond a (sometimes unnecessary) aggregation step and can be incorporated as a constraint in modeling the flow in taxi systems. It has been shown that communities are stable in taxi systems ( 1 ). Therefore, considering this property, known as “global consistency,” can potentially improve the predictions, especially when local spatio-temporal patterns are violated because of the abrupt changes in the taxi demand.

In this paper, we propose a matrix factorization model for globally consistent periodic flow prediction (GCPFP). GCPFP predicts real-time taxi demand and passenger flow in urban areas by using historical and most recent observations from the taxi system. The study area is divided into a grid, and multiple origin-destination passenger flow matrices (simply called flow matrices in this paper) are formed that describe the trips between all cells in past time intervals (e.g., every hour). The flow matrix is decomposed into three temporal factor matrices that encode the evolution and periodic variations in the flow. We use the concept of community belongingness to preserve the community structures in the predictions. By incorporating the community structures and capturing the periodic and most recent variations, the model is used to carry out real-time predictions. In comparison with existing methods that predict the demand or flow at the community level, the proposed method allows for denser spatial granularity and enables computing flow between arbitrary divisions (e.g., traffic analysis zones) by performing simple post-process aggregations. Passenger trips are the basis of analysis in the proposed method, which means that pick-ups and drop-offs and the intrinsic relationships between them are modeled simultaneously.

Some important related works are presented in the next section. This is followed by the problem definition and methodology section, where the definitions, notations, and proposed method are presented in detail. Finally, an extended discussion of results and conclusions is presented.

Related Work

Using trip generation models to predict taxi demand is a common approach in studies of taxi systems. Accordingly, the number of pick-ups/drop-offs in each region is predicted based on explanatory variables such as population, demographics, socioeconomic variables (e.g., income and education), and land use. Different regression models have been used for this purpose ( 3 , 4 ). Such prediction models can help forecast taxi demand in longer periods (i.e., future years and decades). However, they fail to model the short-term variations, as the used explanatory variables are relatively steady in shorter periods.

Time series forecasting techniques have been widely employed to forecast time-varying traffic-related variables such as vehicle counts and the number of pick-ups/drop-offs in different regions. Separate time series (e.g., number of drop-offs) are formed in different regions and the variable of interest is forecast using times series analysis techniques ( 5 , 6 ). Although using independent times series for each region enjoys simplicity in the modeling, such models fail to capture the relations between the regions.

Incorporating spatio-temporal relations in taxi demand prediction has been the focus of some recent studies. Faghih et al. compared the performance of vector autoregressive and spatial-temporal autoregressive models and showed the effectiveness of considering spatial closeness in taxi demand prediction ( 7 ). Using convolutional and graph neural networks is another emerging approach for incorporating spatial relations, where each region is modeled by a cell (or a node) and the spatial relations between the neighbor cells are captured by applying a neural-network-based model on the data set ( 8 , 9 ). Although these works model the relations on a local scale, they do not consider the relations on a global scale in their predictions. Most works in this category model the pick-ups and drop-offs separately. In other words, the impact of “pick-ups at origins” on the “drop-offs at destinations” (and the opposite) is not taken into account. The latter shortcoming can also be seen in the previously mentioned approaches.

Communities in a taxi system have been studied in multiple works where an itinerary network generated from the mobility data (e.g., taxi trajectories) is divided into multiple regions with stronger intra-relationships. Detected communities are used as the basis of mobility analysis. Davis et al. used a multi-level clustering algorithm along with time series modeling to forecast taxi travel demand ( 10 ). The clustering algorithm groups the demand using the correlation between the neighbor cells, and a linear time series analysis is performed to forecast the demand at each cell in the future days. Tang et al. applied the Louvain algorithm on taxi trip data sets to divide the urban areas into multiple communities with the same functional properties ( 11 ). Detected communities are then used for forecasting the demand, based on a graph convolutional network.

Matrix factorization has been applied to time-varying data sets including mobile phone networks and shared mobility data such as taxi and shared bicycle trips ( 12 – 15 ). There is also a close relationship between the concepts involved in network community detection and matrix factorization ( 16 ). Matrix factorization has been widely used for analyzing large-scale and sparse data sets, as it effectively reduces the dimensionality of high-dimensional data sets and alleviates data noise and volatility.

Problem Definition

In this research we use the following definitions:

Passenger flow (flow): Given the origin $O_{i}$ and the destination $D_{j}$ , “passenger flow” ${flow}_{i, j}^{t}$ represents the number of taxi passengers that travel from $O_{i}$ to $D_{j}$ and leave their origin in the time interval $t$ . In the rest of this paper, we simply use the term “flow” to refer to this variable. It should be mentioned that, as there might be more than a single passenger in each taxi, taxi passenger flow is not necessarily equal to the number of taxis that have traveled from the same origin to the same destination.

Passenger pick-ups (pick-ups): The total number of passengers that are picked up by a taxi in the time interval $t$ at the origin $O_{i}$ .

Passenger drop-offs (drop-offs): The total number of passengers that are dropped off by a taxi in the time interval $t$ at the destination $D_{j}$ .

Taxi users are picked up at their origin and dropped off at their destination. The flow between different origins and destinations varies during the day because of various factors that affect the demand. For example, time of day has been shown to be closely related to taxi ridership. We also know that there are relationships between taxi demand at different locations on local and global scales that can affect taxi passenger flow. The core problem is to forecast the flow, pick-ups, and drop-offs for the next time periods(s) (e.g., next hours) given past and most recent data on the taxi passenger trips by considering the communities and periodic variations in the taxi system.

Methodology

Modeling Assumptions and Annotations

The study area is divided into a grid of rectangular $d \times d$ cells (Figure 3) that represent the origins and destinations of the trips. The flow in the taxi system is denoted by ${X (t)}_{t = 1}^{c}$ where $X (t) \in R^{n \times n}$ is the “flow matrix” that indicates the flow between all $n$ cells in $t^{th}$ time interval (with $t = 1$ representing the first and $t = c$ the current period). The length of the time interval $Δ t$ is called the “temporal granularity” of the model and determines the length of steps for which the predictions are accomplished.

Modeling Periodic Variations

Flow modeling in a taxi system can be leveraged by matrix factorization through representing the taxi flows as a time-varying adjacency matrix and learning a low-rank approximation that characterizes the underlying features of flow generation at different locations. Taxi activity has been shown to exhibit periodic variations. For example, we expect to observe relatively similar taxi usage in similar periods of the day (e.g., in the mornings). To capture these periodic (seasonal) patterns, we apply a modified version of the technique introduced by Hooi et al. for detecting seasonal patterns ( 17 ). Accordingly, a dynamic flow matrix $X$ at time $t$ can be approximated as

X (t) \approx UW (t) V (t)^{T}

(1)

where

$U \in R^{n \times k}$ = a constant matrix,

$V (t) \in R^{n \times k}$ = a smoothly varying factor matrix that models non-periodic (a.k.a non-seasonal) variations in the flow matrix $X$ , and

$W (t) \in R^{k \times k}$ = a diagonal matrix containing the seasonal weights for each component in the factor matrices.

Here $U$ and $V (t)$ have $k$ components denoted by $u_{i}, i \in {1, 2, \dots, k}$ and $v_{i} (t), i \in {1, 2, \dots, k}$ where each component represents the derived encoding of a single feature for all origin or destination locations at time $t$ . Similarly, $w_{i} (t), i \in {1, 2, \dots, k}$ indicates the seasonal multiplier that applies to component $i$ at time $t$ (Figure 1). Since each row in $U \in R^{n \times k}$ and $V \in R^{n \times k}$ is associated with an area (i.e., cell) and encodes the flows from/to that area as a vector of hidden (also called “latent” or “implicit”) features, we call it a “feature vector” of a cell. It should be mentioned that in this model only $V (t) \in R^{n \times k}$ determines non-periodic changes in $X (t)$ . However, this could easily be extended to the case that both $U$ and $V$ are time-varying. Yet, we consider $U$ as a constant factor to reduce the complexity of parameter space in the proposed model.

Figure 1.

An illustration of the proposed matrix factorization model.

The model includes two major units: initialization and online updating. Initialization is an offline process through which the factor matrices including the seasonal weights are calculated. It starts with stacking up a sequence of adjacency matrices in a few seasons of data $X (1), \dots, X (s), X (s + 1), \dots, X (q \times p)$ into a $m \times n \times (q \times p)$ tensor, where $q$ is the number of used seasons and $p$ is the period (e.g., 24 for hourly data with daily periodicity). As an example of the tensor indexing that we use in this paper, $M (:, :, 3 : 5)$ is the sub-tensor of $M$ with all its rows and columns on the 3rd to 5th pages. The initial tensor $M$ is then folded into a $m \times n \times p$ tensor ( 17 ):

M_{fold} = \frac{1}{q} \sum_{i = 1}^{q} M (:, :, (i - 1) . p + 1 : i . p)

(2)

Non-negative CP decomposition is then applied on $M_{fold}$ , and the first two resulting factors are used as $U$ and $V_{init}$ which is an initial value for $V (t)$ for any given time ( 18 ). $W (t)$ is built by creating a diagonal matrix using the values in the third factor matrix at the corresponding time $t \in {1, 2, \dots, p}$ . This is then followed by normalizing the components $u_{i}$ and $v_{i} (0)$ by dividing them by their norms (e.g., $| | u_{i} | |$ ) and then multiplying the norms into each of the $w_{i} (t)$ for $t \in {1, 2, \dots, p}$ to compensate for the normalization.

The updating algorithm is explained in the Optimization and Prediction section.

Network Communities with Matrix Factorization

Communities in a taxi system represent the areas with similar characteristics. For example, trips that originate from different cells in the same community may tend to have similar destinations. Therefore, if we represent each cell by a vector of latent features (i.e., feature vectors), the degree to which two vectors are similar indicates the tendency of their corresponding cells to have similar characteristics, or, in other words, it indicates their tendency to belong to the same community. On the contrary, cells with perpendicular feature vectors are less likely to be in the same community. Accordingly, detected communities reflect the relationship between the cells on a global scale.

Matrix factorization has been used for community detection ( 16 , 19 , 20 ). Specifically, communities can be determined by computing the similarities of connected objects (e.g., network nodes) using dot products of feature vectors ( 16 ). By computing the similarities for all pairs of objects, a similarity matrix is achieved that represents the tendency of all objects to be in the same community. This similarity measure is also known as “community belongingness.” By denoting the factor matrix of a rectangular adjacency matrix $X$ by $V \in R^{n \times k}$ , the matrix $V V^{T} \in R^{n \times n}$ indicates the community belongingness. The communities in the network can be then detected by applying a sorting algorithm on the similarity matrices (which can be found in Sarkar and Dong) ( 16 ).

Proposed Flow Prediction Model

Flow prediction starts with calculating the time-varying factor matrix $V (t_{c})$ for the current period. By knowing the flow matrix at the end of the current period $X (t = t_{c})$ we have

X (t_{c}) \approx UW (t_{c}) V (t_{c})^{T}

(3)

As $W (t)$ is periodic, its values for any time interval have been determined through the initialization. Accordingly, we have

X (t_{c}) \approx UW (t_{c} - p) V (t_{c})^{T}

(4)

$X (t)$ , $U$ , and $W (t - p)$ are known for any given time from the initialization step, and we need to update $V (t_{c})$ using the most recent observations at the end of the current time interval $t_{c}$ . In the rest of this section, we propose a method to estimate $V (t_{c})$ . To this end, the calculated flow by using $V (t_{c})$ (i.e., $UW (t_{c} - p) V {(t_{c})}^{T}$ ) should be close to the observed flows in the current period $(X (t_{c}))$ . This brings us to the following optimization problem:

\begin{matrix} min_{V (t_{c})} J = ‖ X (t_{c}) - UW (t_{c} - p) V {(t_{c})}^{T} ‖_{F}^{2} \\ + α ‖ V_{init} - V (t_{c}) ‖_{F}^{2} \end{matrix}

(5)

where

$‖ ‖_{F}^{2}$ = the Frobenius norm of a matrix.

We also include a term as $‖ V_{init} - V (t_{c}) ‖_{F}^{2}$ to prevent drastic changes in $V (t)$ over short periods. This is specifically important, as variations in $V (t)$ are sometimes the result of the effort of the optimization process to minimize the first term rather than real abrupt changes in the observed variables. $α$ is a parameter that balances the significance of each term in the optimization problem.

As a reminder, the dot products of feature vectors associated with two objects (i.e., cells in our case) reveal the tendency of those objects to belong to the same community. As in our model $W (t_{c}) V {(t_{c})}^{T}$ represents the temporal components of the estimated flow matrix at each cell, the dot product of the feature vectors (components) of all cell pairs at a time $t_{c}$ can be computed as

\begin{matrix} {(W (t_{c}) V {(t_{c})}^{T})}^{T} (W (t_{c}) V {(t_{c})}^{T}) \\ = V (t_{c}) W {(t_{c})}^{T} W (t_{c}) V {(t_{c})}^{T} \end{matrix}

(6)

For our model to be globally consistent, we expect the communities in the next steps $(t_{i}, i \geq c)$ to be subjected to the minimum changes with reference to the communities in similar times in previous periods $(t_{i} - p)$ . We explore the validity of this assumption in the Experiments section. Since communities can be detected by ordering the similarity matrices, scaling the similarity matrices does not affect the community belongingness ( 16 ). This assumption is made because scaling does not change the orders and accordingly detected communities at the next steps are unchanged. Therefore, to preserve communities between $t_{c} - p$ and $t_{c}$ , any change in the community belongingness that cannot be resolved by scaling should be minimized. Mathematically, we have

\begin{matrix} min_{V (t_{c}), s} ∥ V_{init} W {(t_{c} - p)}^{T} W (t_{c} - p) {V_{init}}^{T} \\ - s . V (t_{c}) W {(t_{c})}^{T} W (t_{c}) V {(t_{c})}^{T} ∥_{F}^{2} \end{matrix}

(7)

where

$s$ = a scaling scalar that is estimated through optimization.

Equation 7 can be presented as

min_{V (t_{c}), s} ‖ V_{init} G {V_{init}}^{T} - s . V (t_{c}) GV {(t_{c})}^{T} ‖_{F}^{2}

(8)

where

G = W {(t_{c} - p)}^{T} W (t_{c} - p) = W {(t_{c})}^{T} W (t_{c})

(9)

Thus, the objective function of the factorization model can be written as

\begin{matrix} min_{V (t_{c}), s} J = ‖ X (t_{c}) - UW (t_{c} - p) V {(t_{c})}^{T} ‖_{F}^{2} \\ + α ‖ V_{init} - V (t_{c}) ‖_{F}^{2} \\ + β ‖ V_{init} G {V_{init}}^{T} - s . V (t_{c}) GV {(t_{c})}^{T} ‖_{F}^{2} \end{matrix}

(10)

where

$α$ and $β$ = hyper-parameters that balance the significance of each term and are found by employing a grid search and using a sample of data.

Figure 9 indicates a set of parameters that have been used in the proposed model. A detailed description of the grid search algorithm can be found in Kelleher et al. ( 21 ).

Optimization and Prediction

We use a gradient descent algorithm to solve the optimization problem in Equation 10. By utilizing $‖ H ‖_{F}^{2} = tr (H^{T} H)$ , the gradients of the first term $(J_{1} = ‖ X (t_{c}) - UW (t_{c} - p) V {(t_{c})}^{T} ‖_{F}^{2})$ can be written as

\begin{matrix} \frac{\partial J_{1}}{\partial V (t_{c})} = - 2 X {(t_{c})}^{T} UW (t_{c} - p) \\ + 2 V (t_{c}) W {(t_{c} - p)}^{T} U^{T} UW (t_{c} - p) \end{matrix}

(11)

For the second term $(J_{2} = α ‖ V_{init} - V (t_{c}) ‖_{F}^{2})$ we have

\frac{\partial J_{2}}{\partial V (t_{c})} = 2 α (- V_{init} + V (t_{c}))

(12)

In the third term $J_{3} = β ‖ V_{init} G {V_{init}}^{T} - s . V (t_{c}) GV {(t_{c})}^{T} ‖_{F}^{2}$ , by utilizing the following product rule for matrix functions $f (H)$ and $g (H)$ as ( 22 )

\begin{matrix} \partial \frac{tr (f {(H)}^{T} g (H))}{\partial H} \\ = {\partial \frac{tr (f {(H)}^{T} g (Z)) + tr (g (H) f {(Z)}^{T})}{\partial H} |}_{Z \leftarrow H} \end{matrix}

After simplification we have

\begin{matrix} \frac{\partial J_{3}}{\partial V (t_{c})} = - 4 β . s V_{init} G {V_{init}}^{T} V (t_{c}) G \\ + 4 β . s^{2} V (t_{c}) GV {(t_{c})}^{T} V (t_{c}) G \end{matrix}

(13)

The derivative of $J$ with reference to $s$ is given by

\begin{matrix} \frac{\partial J}{\partial s} = \frac{\partial J_{3}}{\partial s} = β (- tr (V_{init} G {V_{init}}^{T} V (t_{c}) GV {(t_{c})}^{T}) \\ - tr (V (t_{c}) GV {(t_{c})}^{T} V_{init} G {V_{init}}^{T}) \\ + 2 s . tr (V (t_{c}) GV {(t_{c})}^{T} V (t_{c}) GV {(t_{c})}^{T}))) \end{matrix}

(14)

The derivative of $J$ (Equation 10) with reference to factor matrices $V (t_{c})$ is trivially calculated by

\frac{\partial J}{\partial V (t_{c})} = \frac{\partial J_{1}}{\partial V (t_{c})} + \frac{\partial J_{2}}{\partial V (t_{c})} + \frac{\partial J_{3}}{\partial V (t_{c})}

(15)

By updating $V (t)$ using the last received observations, computing the flow in $t_{c + 1}$ is trivial. The pseudocode of the optimization and flow prediction is given in Algorithm 1.

Algorithm 1 Optimization and flow prediction
Input: $X (t_{c})$ , factor matrices from initialization ( $U$ , $W (t)$ , and $V_{init}$ ), and $k$ Output $V (t_{c})$ and flow in the next period $X (t_{c + 1})$ 1: Initialize $V (t_{c}) = V_{init}$ , $s = 1$ 2: While not stopping criteria do 3: Compute the partial derivative $\frac{\partial J}{\partial V (t_{c})}$ and $\frac{\partial J}{\partial s}$ using equations in Optimization and Prediction section 4: Determine the step size $λ$ by a line search 5: Update $V (t_{c}) = V (t_{c}) - λ \frac{\partial J}{\partial V (t_{c})}$ 6: Update $s = s - λ \frac{\partial J}{\partial s}$ 7: end 8: Compute the flow at $t_{c + 1}$ : $X (t_{c + 1}) = UW (t_{c + 1} - p) V (t_{c})^{T}$

Algorithm 1 Optimization and flow prediction

Input:

X (t_{c})

, factor matrices from initialization (

U

W (t)

, and

V_{init}

), and

k

Output

V (t_{c})

and flow in the next period

X (t_{c + 1})

1: Initialize

V (t_{c}) = V_{init}

s = 1

2: While not stopping criteria do
3: Compute the partial derivative

\frac{\partial J}{\partial V (t_{c})}

and

\frac{\partial J}{\partial s}

using equations in Optimization and Prediction section
4: Determine the step size

λ

by a line search
5: Update

V (t_{c}) = V (t_{c}) - λ \frac{\partial J}{\partial V (t_{c})}

6: Update

s = s - λ \frac{\partial J}{\partial s}

7: end
8: Compute the flow at

t_{c + 1}

X (t_{c + 1}) = UW (t_{c + 1} - p) V (t_{c})^{T}

Experiments

Data and Initial Analysis

We have used a data set from yellow taxis in New York City, U.S., to demonstrate the efficiency of the proposed method. The data set contains around 89 million records of taxi trips from January to June 2013 and includes information on the start and end time of the trips, geographical position of the passenger pick-ups and drop-offs, and the number of passengers. The majority of trips were made inside or between the four boroughs of New York City: Bronx, Brooklyn, Queens, and Manhattan. We explored the periodicity in the data to define the seasons in the model. Figure 2 shows the hourly total number of trips on different days of the same week and on the same day (Thursday) of four consequent weeks. Both Figure 2a and b show a daily periodicity in the number of trips, as this number seems to be repeated every day at the same time. However, it seems that a weekly season (Figure 2b) represents the periodicity in the data more accurately. To investigate this, we calculated the standard deviation (SD) of the hourly total number of trips, and the average is significantly smaller in the weekly scenario (SD = 2,026 for weekly and SD = 2,482 for daily). Accordingly, we adopt a weekly seasonality for our model with a period p = 7 × 24 = 168. This specifically is helpful, since the significant difference between the taxi usage during weekdays and weekends can be handled naturally in this way. There are also longer seasons (e.g., yearly periods) about which we are not interested in this research, as we look for short-term patterns in the data.

Figure 2.

Daily and weekly variations in the total number of taxi trips: (a) on different days of the same week and (b) on the same day (Thursday) of four consequent weeks.

We also investigated the communities that can be detected at different times of the week. Figures 3 and 4 show the communities detected using the flow matrix generated from the trips taken from 7:00 to 8:00 a.m. (morning traffic peak) on two consecutive Tuesdays. We have used the Louvain algorithm to detect the communities, but any community detection algorithm can be used as we merely intend to demonstrate the periodicity of the communities at this stage ( 23 ). Detected communities reflect the relationship between the cells on a global scale. As a reminder, community detection is not the output of our proposed method, and instead we try to capture and preserve the tendency of the cells to belong to the same community. Therefore, interpretation of the number and extent of the communities is not in the scope of this paper. However, Figures 3 and 4 present examples of the existence of communities in the taxi trip data set and the periodic variations in the detected communities. There are four major communities in Figure 3 each representing an area with stronger relationships between their cells. It seems that the communities in the mornings of March 12 and March 19 (Tuesday) are fairly similar both in number and geographical extent. This especially becomes more evident when the detected communities on Tuesday morning are compared with the ones on Wednesday evening where there are three major communities (community 4 is merged with community 3) and the geographical extent of the communities are different from the ones on Tuesday morning. This example and similar experiments carried out by the authors, and also the evidence found in the literature, are the basis of considering communities in the flow prediction model in the proposed method.

Figure 3.

Communities on Tuesday morning 7:00–8:00 a.m: (a) March 12 and (b) March 19.

Figure 4.

Communities on Wednesday evening 7:00–8:00 a.m: (a) March 6 and (b) March 13.

Visualization and Results

In this section, we report the accuracy and some examples of the output of the proposed method. We present an example of flow between an origin and a destination (Figure 5) and examples of pick-ups and drop-offs prediction in some random cells (Figures 7 and 8). We also illustrate an example of predicted flows in a map and show how predicted flows can potentially help drivers in making operational decisions. This is followed by an accuracy assessment of the proposed model and a comparison of the results with some baseline and state-of-the-art methods for flow prediction. Figure 5 shows predicted, observed, and seasonal historical average (SH-A) flows between two pairs of cells during a whole week. The predicted flows are the flows in the next time interval (i.e., next hour) estimated by the proposed method. S-HA estimates the flow in the next time interval by averaging the flows in similar time intervals in the last n seasons (in this experiment n = 3). Observed flows are simply the ground truth information we use to evaluate the predictions made by the proposed method. As can be seen from both Figure 5a and b , predictions using the proposed method follow the observations more accurately compared with S-HA. Especially in Figure 5a, where the flow values are larger, the predicted and observed values stay close during the week. However, Figure 4b presents the flow between a less busy corridor (Mill Pond Park to Harlem) which shows noisy fluctuations. This can be explained by two interpretations. First, the flow values between these areas are small (usually less than 5) and random which causes accurate predictions to be challenging and even sometimes impossible. Second, the model is mostly trained based on the larger values, and smaller values are partially ignored so that the objective function can be optimized. We can specifically see the latter idea by looking at the flow values during the weekends (Figure 5b day 31) where the predictions deviate from the observations and the historical average presents better predictions. Nevertheless, our accuracy assessment for the proposed method shows a significant improvement from the S-HA (Tables 1 and 2).

Figure 5.

Predicted, observed, and seasonal historical average hourly flow between pairs of cells: (a) Broadway to 10^th Avenue and (b) Mill Pond Park to Harlem.

Table 1.

Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) of Predictions for 7:00–8:00 a.m.

	Flow		Drop-off		Pick-up
Method	MAE	RMSE	MAE	RMSE	MAE	RMSE
S-HA	0.155	2.425	16.02	57.08	22.04	83.79
MA3	0.332	6.470	70.43	304.89	43.18	225.33
SARIMA	0.131	1.495	8.08	26.05	8.02	24.43
SMF	0.140	1.584	8.99	27.55	8.45	25.9
GCPFP	0.110	1.267	7.72	21.01	7.43	19.53

Note: GCPFP = globally consistent periodic flow prediction; MA3 = moving average 3; S-HA = seasonal historical average; SARIMA = seasonal autoregressive integrated moving average; SMF = Seasonal Matrix Factorization.

Table 2.

Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) of Predictions for 7:00–8:00 p.m.

	Flow		Drop-off		Pick-up
Method	MAE	RMSE	MAE	RMSE	MAE	RMSE
S-HA	0.149	2.248	24.32	93.74	22.42	95.35
MA3	0.204	3.072	30.69	111.99	30.14	133.83
SARIMA	0.134	1.611	9.89	25.42	10.15	33.00
SMF	0.136	1.599	10.11	25.18	10.70	31.87
GCPFP	0.125	1.494	9.05	23.43	9.36	29.97

Figure 6 shows the most significant flows from Broadway to other cells. The labels and width of the arrows indicate the magnitude of the flows. As can be seen from the observed (Figure 6b) and predicted (Figure 6c) arrows, the model has successfully predicted the cells to which the 10 most significant flows are ended. The color indicates the hotness of the destination from the taxi driver’s point of view, where red arrows are pointing to the areas with higher pick-ups than drop-offs (i.e., higher demand with reference to supply or hot areas), and blue arrows are pointed to the areas with higher drop-offs compared with taxi pick-ups (i.e., higher supply with reference to demand). The red arrows indicate the hot areas from a taxi driver’s point of view, as drivers are interested in less competitive areas where there is a relatively higher demand for taxi cabs. Therefore, by driving a passenger who travels to a hot area, the taxi driver has a higher chance of finding new passengers after dropping off their current passengers. We believe that this combination of visualizing flow and hotness can provide taxi drivers with the information that is required to pick the right passengers. Especially these days, when e-hailing and internet taxis have given the choice to drivers to select their passengers based on their destination, informing taxi drivers of the expected destinations and their hotness enables them to make better decisions.

Figure 6.

The width of the arrows represents the magnitude of the flow, and the color indicates the hotness of the destination from the taxi driver's point of view. (c) shows that the proposed model has predicted the most significant (top 10 in this example) flows and their hotness correctly. Estimation based on historical average (a) deviates from the observations (b).

Figures 7 and 8 show the hourly predicted, observed, and SH-A drop-offs and pick-ups during the week in two cells: Broadway and South Brooklyn. As a reminder, S-HA estimates the pick-ups and drop-offs in the next time interval by averaging the values in similar time intervals in the last n seasons (in this experiment n = 3). Similar to predicted flows, the plots indicate that pick-up/drop-offs predictions are accurate both in regions with heavy and light traffic on weekdays and weekends.

Figure 7.

Hourly drop-offs (top) and pick-ups (bottom) during the week in South Brooklyn.

Figure 8.

Hourly drop-offs (top) and pick-ups (bottom) during the week in Broadway.

Finally, we compared the results from the proposed method (GCPFP) with some baseline and state-of-the-art techniques. We have used the following methods and settings:

GCPFP (proposed method): A 1,500 m cell dimension $(d)$ has been used that divides the study area into 230 cells in total. Hourly prediction $(Δ t = 1 hour)$ with weekly seasonality has been set $(p = 168)$ . In the proposed model, k is the number of components and can take any value less than the period (168). Although a larger k usually leads to better predictions, our experiments showed that, at some point, the change in the result is not significant anymore. At the same time, a larger k requires more expensive computations. Since optimizing the computations is beyond the scope of our paper, we simply chose to use a big enough value, $k = 150$ , in all experiments. In each experiment, we have used 3 weeks of data for initialization and 1 week for testing the predictions. Multiple experiments with data samples starting from different dates were conducted and the average results are reported (Tables 1 and 2).

S-HA: The flow in the next period (e.g., 3:00–4:00 p.m.) is predicted by averaging the corresponding historical values. To this end, 3 weeks of historical data (three seasons) are considered.

Moving average 3 (MA3): A moving average model with a time window of a length of 3 h is used.

Seasonal autoregressive integrated moving average (SARIMA): A common tool for seasonal time series analysis. We empirically use a SARIMA(1,0,0)(1,1,1)168 based on Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots.

Seasonal Matrix Factorization (SMF): Similar to the proposed model (GCPFP), SMF uses non-negative tensor factorization to model the seasonal data ( 17 ). Whenever possible, we applied the same parameters as the ones we have used in our model (e.g., $p = 168$ ) to have a fair comparison.

The improved root mean squared error (RMSE) and mean absolute error (MAE) resulting from the GCPFP can be observed in Tables 1 and 2 when compared with the baseline methods. S-HA and MA3 show the lowest level of accuracy in their predictions among the baseline methods, which can be associated with their over-simplified modeling assumptions. The SARIMA model, on the other hand, can produce more accurate predictions (compared with S-HA and MA3) by considering auto-regressive terms and modeling periodicity in the data. The improvement of the proposed model over SMF can be associated with the incorporation of global consistency in the model. This can also be seen in Figure 9 which presents the MAE of the predicted drop-offs using different $α$ and $β$ values. Figure 9 shows the best hyper-parameters ( $α = 0.05$ , $β = 0.0016$ ) for GCPFP found using a grid search. It should be mentioned that these parameters are only valid for this specific data set and modeling configuration, and, should a different data set or settings be applied (e.g., a different cell dimension), new parameters need to be found.

Figure 9.

Mean absolute error (MAE) of predicted drop-offs 7:00–8:00 p.m. using different $α$ and $β$ values.

Conclusions

In this paper, we proposed and tested a matrix factorization model for periodic flow prediction in taxi systems. We analyzed the taxi trips data set and presented examples of the periodic variations in the total number of trips and communities which was the foundation of basic assumptions in the proposed model. By retaining the community belongingness in the predicted flow matrix, we preserved the global consistency in the model. Comparing the results from the proposed method with some baselines shows promising improvements that can be generally associated with the successful design of the model. We also employed the proposed model to produce flow maps that can assist taxi drivers in making operational decisions.

As the proposed model provides prediction using historical data, future work can study the feasibility of using the model for traffic anomaly detection. Employing or extending the proposed matrix factorization model for other applications, including other mobility systems and recommender systems, and also exploring the robustness of the model by having different spatial and temporal granularity, are also potential future works.

Footnotes

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: R. Forouzandeh Jonaghani, M. Wachowicz, T. Hanson; data collection: R. Forouzandeh Jonaghani; analysis and interpretation of results: R. Forouzandeh Jonaghani, M. Wachowicz, T. Hanson; draft manuscript preparation: R. Forouzandeh Jonaghani. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the NSERC/Cisco Industrial Research Chair, Grant IRCPJ 488403-1.

ORCID iDs

Rouzbeh Forouzandeh Jonaghani

Monica Wachowicz

Trevor Hanson

References

Nie

W.-P.

Zhao

Z.-D.

Cai

S.-M.

Zhou

Understanding the Urban Mobility Community by Taxi Travel Trajectory. Communications in Nonlinear Science and Numerical Simulation, Vol. 101, 2021, p. 105863.

Liu

Gong

Liu

Revealing Travel Patterns and City Structure with Taxi Trip Data. Journal of Transport Geography, Vol. 43, 2015, pp. 78–90.

Yang

Gonzales

E. J.

Modeling Taxi Trip Demand by Time of Day in New York City. Transportation Research Record: Journal of the Transportation Research Board, 2014. 2429: 110–120.

Varagouli

Simos

T. E.

Xeidakis

Fitting a Multiple Regression Line to Travel Demand Forecasting: The Case of the Prefecture of Xanthi, Northern Greece. Mathematical and Computer Modelling, Vol. 42, No. 7–8, 2005, pp. 817–836.

Shahabi

Demiryurek

Liu

Deep Learning: A Generic Approach for Extreme Condition Traffic Forecasting. Proc., SIAM International Conference on Data Mining, Houston, Texas, SIAM, Philadelphia, PA, 2017, pp. 777–785.

Schimek

Gasoline and Travel Demand Models Using Time Series and Cross-Section Data from United States. Transportation Research Record: Journal of the Transportation Research Board, 1996. 1558: 83–89.

Faghih

Safikhani

Moghimi

Kamga

Predicting Short-Term Uber Demand in New York City Using Spatiotemporal Modeling. Journal of Computing in Civil Engineering, Vol. 33, No. 3, 2019, p. 05019002.

Yao

Tang

Jia

Gong

Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32, 2018, pp. 2588–2595.

Liao

Zhou

Yuan

Xiong

Large-Scale Short-Term Urban Taxi Demand Forecasting Using Deep Learning. Proc., 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), Jeju, Korea (South), IEEE, New York, NY, 2018, pp. 428–433.

10.

Davis

Raina

Jagannathan

A Multi-Level Clustering Approach for Forecasting Taxi Travel Demand. Proc., IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, IEEE, New York, NY, 2016, pp. 223–228.

11.

Tang

Liang

Liu

Hao

Wang

Multi-Community Passenger Demand Prediction at Region Level Based on Spatio-Temporal Graph Convolutional Network. Transportation Research Part C: Emerging Technologies, Vol. 124, 2021, p. 102951.

12.

Graells-Garrido

Caro

Parra

Inferring Modes of Transportation Using Mobile Phone Data. EPJ Data Science, Vol. 7, No. 1, 2018, pp. 1–23.

13.

Forouzandeh Jonaghani

Wachowicz

Hanson

A Matrix Factorization Model with Local and Global Consistency for Flow Prediction in Bike-Sharing Systems. International Journal of Geographical Information Science, Vol. 37, No. 2, 2023, pp. 360–379.

14.

Kang

Qin

Understanding Operation Behaviors of Taxicabs in Cities by Matrix Factorization. Computers, Environment and Urban Systems, Vol. 60, 2016, pp. 79–88.

15.

Cazabet

Jensen

Borgnat

Tracking the Evolution of Temporal Patterns of Usage in Bicycle-Sharing Systems Using Nonnegative Matrix Factorization on Multiple Sliding Windows. International Journal of Urban Sciences, Vol. 22, No. 2, 2018, pp. 147–161.

16.

Sarkar

Dong

Community Detection in Graphs Using Singular Value Decomposition. Physical Review E, Vol. 83, No. 4, 2011, p. 046114.

17.

Hooi

Shin

Liu

Faloutsos

SMF: Drift-Aware Matrix Factorization with Seasonal Patterns. Proc., SIAM International Conference on Data Mining, SIAM, Philadelphia, PA, 2019, pp. 621–629.

18.

Shashua

Hazan

Non-Negative Tensor Factorization with Applications to Statistics and Computer Vision. Proc., 22nd International Conference on Machine Learning, Bonn, Germany, Association for Computing Machinery, New York, NY, 2005, pp. 792–799.

19.

Cao

GraRep: Learning Graph Representations with Global Structural Information. Proc., 24th ACM International on Conference on Information and Knowledge Management, Melbourne, Australia, Association for Computing Machinery, New York, NY, 2015, pp. 891–900.

20.

Wang

Cui

Wang

Pei

Zhu

Yang

Community Preserving Network Embedding. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31, 2017, pp. 203–209.

21.

Kelleher

J. D.

Tierney

Becker

Applied Machine Learning. Wiley, Hoboken, NJ, 2019.

22.

Dattorro

Convex Optimization and Euclidean Distance Geometry. Lulu. com, Meboo publishing USA, Palo Alto, CA, 2010.

23.

Blondel

V. D.

Guillaume

J.-L.

Lambiotte

Lefebvre

Fast Unfolding of Communities in Large Networks. Journal of Statistical Mechanics: Theory and Experiment, Vol. 2008, No. 10, 2008, p. P10008.

Matrix Factorization for Globally Consistent Periodic Flow Prediction in Taxi Systems

Abstract

Keywords

Related Work

Problem Definition

Methodology

Modeling Assumptions and Annotations

Modeling Periodic Variations

Network Communities with Matrix Factorization

Proposed Flow Prediction Model

Optimization and Prediction

Experiments

Data and Initial Analysis

Visualization and Results

Conclusions

Footnotes

Author Contributions

Declaration of Conflicting Interests

Funding

ORCID iDs

References