An Online-Traffic-Prediction Based Route Finding Mechanism for Smart City

Abstract

Finding fastest driving routes is significant for the intelligent transportation system. While predicting the online traffic conditions of road segments entails a variety of challenges, it contributes much to travel time prediction accuracy. In this paper, we propose O-Sense, an innovative online-traffic-prediction based route finding mechanism, which organically utilizes large scale taxi GPS traces and environmental information. O-Sense firstly exploits a deep learning approach to process spatial and temporal taxi GPS traces shown in dynamic patterns. Meanwhile, we model the traffic flow state for a given road segment using a linear-chain conditional random field (CRF), a technique that well forecasts the temporal transformation if provided with further supplementary environmental resources. O-Sense then fuses previously obtained outputs with a dynamic weighted classifier and generates a better traffic condition vector for each road segment at different prediction time. Finally, we perform online route computing to find the fastest path connecting consecutive road segments in the route based on the vectors. Experimental results show that O-Sense can estimate the travel time for driving routes more accurately.

1. Introduction

With the development of intelligent transportation technology in smart cities, finding fast driving routes can be widely used for traffic flow coordinating [1] to optimize the plan of traffic management and urban computing [2–5]. It is beneficial to find fast driving routes, which is conducive to energy saving and traffic congestion coordination.

The existing fastest routes finding approaches usually look into the real-time traffic conditions or infer potential travel costs of road through mining historical trajectory data. However, these works mostly assume the static travel cost on each road, while ignoring the potential temporal dynamics. In fact, the driving time, calculated from the current travel time of each road segment, deviates much from the truth. The soundness of fastest route finding relates greatly to the prediction accuracy of travel time for vehicles on the driving road. Hence, it is necessary to predict the traffic conditions of road segments at an appropriate future moment.

The urban traffic prediction usually utilizes historical and current traffic flow information to predict road conditions for future moments [6]. Most existing methods present prediction trends via statistics of the current road time-dependent evolution or they only work on spatial relationships between various segments without consideration of the temporal sequence and temporal-spatial information rules. Although available spatial-temporal information is combined to model the traffic network pattern in some previous works, this information does not play out its full potential. Traffic networks involve complicated relations in time and space. Specifically, traffic flow often contains high-dimensional, nonlinear, and nonstationary random data. In a previous work [7], a trained temporal-spatial deep learning approach DeepSense was proposed, which extracted features from the high-dimensional, nonlinear, and random traffic flow in large scale taxi GPS traces. However, we found that the prediction does not work so well at certain times, especially when there are insufficient taxi GPS traces. Based on this finding, it is proper to consider more factors such as environmental information containing temporal changes that influence the traffic flow of current road.

In this paper, we propose an online-traffic-prediction based route finding mechanism for smart city, namely, O-Sense. It comprehensively utilizes both temporal-spatial dynamic pattern in transportation network and temporally related environmental information to predict the travel cost of each route. Firstly, we analyze temporal and spatial information of taxi GPS traces via Restricted Boltzmann Machine (RBM) since it can process this information into low-dimensional features. Then the extracted low-dimensional data is put into a support vector machine (SVM) to gain a robust classifier that makes promise for the following prediction. Secondly, a linear-chain CRF predicts the future traffic state by giving the previous state in a temporal sequence and supplementary temporally related observations, since CRF is well known for being applied to mutually independent road segments. With auxiliary data utilized by CRF, prediction accuracy can be further enhanced through a dynamic weighted classifier fusion. Finally, using the results calculated from the predicted traffic conditions of each road, our system can perform the fast route selection for users.

The major contributions of this paper can be summarized into the following aspects. (1)

We propose an online-traffic-prediction based route finding mechanism O-Sense, which innovatively adopts a trained temporal-spatial deep learning approach. Significant traffic features can be extracted from the information synthesized from high-dimensional, nonlinear, and random large scale taxi GPS traces.

(2)

Combining temporally related environmental resources which are utilized in CRF as supplementary and dynamic weighted classifier fusion, the prediction accuracy can be further improved. As a result, taking full advantage of temporal and spatial information for traffic forecasting, it can effectively reflect the dynamic pattern shown in transportation network.

(3)

Utilizing large scale taxi GPS traces and environmental information in Wuhan, we evaluate the mechanism from a systematic perspective. Experimental results manifest that O-Sense mechanism can achieve sound and robust performance in prediction accuracy.

The remainder of this paper is organized as follows. Section 2 discusses related works. Section 3 presents the proposed O-Sense mechanism. Section 4 describes the processes of the traffic flow condition prediction. The performance of O-Sense is evaluated in Section 5. Finally, we make a conclusion in Section 6.

2. Related Works

2.1. Route Finding

Many works have been proposed for the fastest route finding problem, most of which can be divided into two categories based on the characteristics of the travel cost of road segments. On one hand, some works infer potential travel costs of road segments by exploiting historical trajectory data [8]. In [9], Tsuyoshi and Sugiyama treat the potential cost of each road as a spatial proximity-regularized trajectory regression problem. Gonzalez et al. find the fastest route based on the important driving and speed patterns which are learned from the historical trajectory data [10]. However, these works with the assumption of static travel costs cannot reflect the dynamic variances according to the actual traffic conditions. On the other hand, some works take the temporal dynamics of road travel costs into consideration to compute the fastest route [11]. Zheng and Ni learn the temporal dynamics of road travel costs, while it depends on integrated trajectories which cannot provide enough information when there is insufficient traffic information [12]. However, the pattern of the temporal dynamics of latent costs for road segments is hard to be predicted, especially in the cases with insufficient trajectory data.

2.2. Traffic Flow Condition Prediction

Most of the existing traffic flow condition prediction approaches merely utilize time-dependent traffic flow evolution rule for prediction. In [13], the autoregressive integrated moving average model (ARIMA) simply relies on historical traffic flow data of forecasted point taking temporal variation into consideration for prediction. The traffic flow prediction model based on $m_{th}$ -order Markov chain counts the transition probability of traffic flow state with historical data of forecasted road [14]. Coupled with current traffic situation and state transition probability, traffic condition in a period of time can be predicted. However, by using historical and real-time traffic patterns, the establishment of model in time sequence and the forecast duration is prolonged to some extent. Thus, there exists a lot of uncertainty and it is a statistical method. The variance-entropy-based clustering approaches [15] are used to estimate the travel time distribution in different time slots with historical data of object sites, which simply applies the idea of clustering analysis and generality.

There are also other methods that use spatial information in the transportation network to analyze the trend of traffic flow. Castro et al. [16] employ adjacent road information to forecast future traffic conditions, which does not take into account the influence of distant segments and temporal-spatial correlation of predicted road. Markov logic networks [17] have also been adopted to predict the traffic conditions at simultaneous locations in different future time.

The methods mentioned above merely model traffic flow trends with respect to one aspect. Temporal-spatial information has not been made use of to depict the characteristics of nonlinearity and randomness in the flow from an angle of the whole network. A Bayesian network approach [18] intended for traffic flow forecast integrates information of adjacent links and its spatial-temporal information in a transportation network. Sun and Zhang develop a selective random subspace predictor (SRSP) model [19] utilizing traffic flows of some most closely correlated links ranked by the measurement of Pearson correlation coefficient in the subspace to forecast the given road link. Although considering spatial-temporal information, these approaches did not extract the characteristics of high-dimension, nonlinearity, and randomness effectively.

3. O-Sense Overview

3.1. Preliminary

Definition 1 (route).

A route R is a group of consecutive road segments, R: $r_{1} \to r_{2} \to \dots \to r_{n}$ , where $r_{i}$ is the ith road segment ( $1 \leq i \leq n$ ). In route R, the end point of one road segment is the start point of its directly succeeding road segment; that is, $r_{i} . end = r_{i + 1} . start$ .

Definition 2 (taxi trajectory).

A trajectory is a time series of GPS points for a trip, where there is a geospatial coordinate set and a timestamp at each consecutive point.

Definition 3 (road segment/link).

A road link or segment is defined as a directed edge consisting of a direction symbol, two terminal points, and a length between crossroads.

Definition 4 (traffic flow condition).

We select traffic flow to denote traffic state of road segments. Given different speed situations and flow limits on distinct road segments, classified traffic state based on absolute speed is obviously inaccurate. According to the degree of traffic congestion, we can categorize the traffic conditions into five states, represented by $Ω = {congesting (Cg)$ , $slow (Sl), normal (Nm), moderate (Md), unimpeded (Un)}$ .

In detail, for the road segment R from 0 o'clock to 24 o'clock, a traffic flow $o_{i}$ is collected every time interval $Δ t$ (e.g., 15 minutes), thus obtaining a series of traffic flow observations O (e.g., $o_{1}, o_{2}, o_{3}, \dots, o_{96}$ ). In order to categorize these series to prementioned five states, we adopted K-means algorithm to cluster them into 5 subsets. By utilizing the average values of these 5 subsets, we can acquire their corresponding traffic states.

3.2. Framework

In order to learn online road travel costs for selecting the fastest routes for users, we propose an online-traffic-prediction based route finding mechanism, namely, O-Sense. As shown in Figure 1, to implement our mechanism for fastest route finding entails three procedures, namely, preprocessing, comprehensive feature learning, and route computing.

Figure 1

Architecture of O-Sense.

3.2.1. Preprocessing

Spatial trajectories collected by GPS-equipped vehicles are mapped onto a road network using a map-matching algorithm [20] and then stored into a taxi traces database. Environmental information is extracted to be used as temporally related features, as described in detail in Section 4.

3.2.2. Comprehensive Feature Learning

This process contains three parts: temporal-spatial feature deep learning, temporally related supplementary feature learning, and dynamic weighted classifier fusion. Firstly, to learn temporal-spatial features, we apply deep learning approach with preextracted temporal-spatial traffic flow information from taxi traces database. The trained temporal-spatial features are used to train RBM model, which is capable of predicting irregular and stochastic factors in traffic systems. By means of this trained model, the dimensionality and redundancy of the input features are reduced to a proper extent. The extracted features can be more effectively classified by a classification engine SVM. This engine is designed to utilize the extracted features to better train the expected prediction. Secondly, in order to learn supplementary temporally related features, CRF approach is trained to estimate the temporal state sequence of traffic flow with extracted environmental information. Therefore, the temporal state sequences obtained from CRF are matched for future traffic state prediction. The last technique adopted is dynamic weighted classifier fusion for comprehensively learning the whole temporal-spatial information and temporally related features. After real-time traffic flow data is preprocessed and temporally related features of forecasted time are preextracted, we can get an eventual prediction result. Details are presented in Section 4.

3.2.3. Online Route Computing

Given the prediction of each road segment at disparate time, a detailed route computing approach is performed in O-Sense to seek an optimal route. This approach utilizes the traffic condition of road segments in real-time for route computing. The process of online route computing is illustrated in Algorithm 1. In this paper, we consider the transportation networks exhibit the “FIFO” (First-In-First-Out) property; that is, if A and B visit node $n 1$ at two different times $t 1$ and $t 2$ , where $t 1 \leq t 2$ , then A always arrives at node $n 2$ before B for any arc $(n 1, n 2)$ . Our solution is composed of the following steps.

Algorithm 1: Online route computing algorithm.

Input: A directed graph $G = {V, E}, V = {1, 2, \dots, n}$ , Departure time t

Output: The shortest distance from 1 to each node in G

Procedure:

(1) $X = {1}; Y = V - {1}; λ [1] = 0$ ;

(2) for $x = 1$ to $n - 1$

(3) for $y = 1$ to n

(4) if x is adjacent to y then $r_{x y} [60] = {r_{x y} [t], r_{x y} [t + 2], \dots$ ,

$r_{x y} [t + 120]}$ ;

(5) end if;

(6) end for;

(7) end for;

(8) for $y = 2$ to n

(9) if y is adjacent to 1 then $λ [y] = r_{1 y} [t]$ ;

(10) else $λ [y] = \infty$ ;

(11) end if;

(12) end for;

(13) u = 1;

(14) for $j = 1$ to $n - 1$

(15) $y \in Y, λ [y]$ is the least among all values;

(16) $X = X \cup {y}$ ;

(17) $Y = Y - {y}$ ;

(18) for every edge $(y, w)$

(19) if $w \in Y$ and $λ [y] + r_{y w} [t + r_{u y} [t]] < λ [w]$ then

(20) $λ [w] = λ [y] + r_{y w} [t + r_{u y} [t]]$ ;

(21) end if;

(22) end for;

(23) $t = t + r_{u y} [t]$ ;

(24) $u = y$ ;

(25) end for;

Step 1.

To find the fastest route with giving a starting point, a destination, and departure time, O-Sense firstly continually predicts a group of traffic states for each road segment that connects two points every two minutes for the following specific time period (e.g., for the following two hours). As the traffic condition division is based on specific clustering method, each traffic condition state takes a speed value as the clustering center. Hence, we can utilize five speed values to represent five states and they form a speed vector. Utilizing the speed vector to represent predicted traffic states, the weights of each road segment at different moments are obtained.

Step 2.

O-Sense chooses the optimal predicted value from the predicted speed vector of each road segment dynamically as the weight of corresponding road segment, which is closest to the actual value according to the time when the user arrives at the road segment. We can find the time-dependent fastest route using a modified Dijkstra algorithm [21, 22].

The fastest path connecting consecutive road segments is eventually found as we arrive at the destination. Owning to the different time costs of road segments, if we start at different time at the starting of the route even though given the same start point and destination, we may find disparate fast routes.

Example 1.

Seen from Figure 2, if starting at current time $t_{ct}$ from road intersection A, the road segment $r_{a b}$ of which the time cost $r_{a b} [t_{ct}]$ is the least in all road segments connecting A is chosen. Then the label of other nodes in the graph is updated, which denotes the shortest distance from the departure point to each node. When arriving at B, the time stamp at that time is $t_{ct} + r_{a b} [t_{ct}]$ . We compare the label of nodes C and E, that is, the travel time of going through $r_{b e}$ and $r_{b c}$ ( $r_{b e} [t_{ct} + r_{a b} [t_{ct}]] < r_{b c} [t_{ct} + r_{a b} [t_{ct}]]$ ) in this example. Then turn to E with updating the labels. Keeping searching like this, the fastest route from A to G starting at current time $t_{ct}$ is A → B → E → F → G. However, if starting at $t_{ct} + t_{lr}$ , where $t_{lr}$ denotes some time later than $t_{ct}$ , the route A →B → C → F → G now becomes the fastest rough route since when we arrive at B, the time stamp at that time is becoming $t_{ct} + t_{lr} + r_{a b} [t_{ct} + t_{lr}]$ and the travel time of going through $r_{b e}$ is larger than that of $r_{b c}$ ; that is, $r_{b e} [t_{ct} + t_{lr} + r_{a b} [t_{ct} + t_{lr}]] < r_{b c} [t_{ct} + t_{lr} + r_{a b} [t_{ct} + t_{lr}]]$ . Hence, our approach utilizes the traffic condition of road segments in real-time (when actually driven) for route computing.

Figure 2

Example of online route computing.

4. Comprehensive Feature Learning Based Prediction

This section details the methodology of the comprehensive feature learning based online traffic condition prediction mechanism, which consists of temporal-spatial feature deep learning, temporally related supplementary feature learning, and dynamic weighted classifier fusion-based comprehensive prediction.

4.1. Temporal-Spatial Traffic Feature Deep Learning

To simplify and reduce the dimension of the input data in order to efficiently learn the significant traffic features, O-Sense employs the temporal-spatial feature deep learning approach which includes three main steps: feature preextraction, feature learning, and training a SVM model.

4.1.1. Feature Preextraction

The correlation coefficient approach is first used to select the most correlated road segments between adjacent road segments and the predicted point, and then extracted features of these road segments are input into PCA to reduce data dimension and remove redundant information for better prediction.

Correlation Coefficient Approach. Considering N samples ${x_{i} (k), y_{j} (t_{ct} + t_{l})}$ ( $i, j = 1,2, \dots, N$ ) in a spatial accessible space, $x_{i} (k)$ ( $k = 1,2, \dots, s t$ ) represents the traffic flow of road segment i at time stamp k and $\bar{x}$ represents the mean value. And $y_{j} (t_{ct} + t_{l})$ stands for the traffic flow of road segment j at predicted time $(t_{ct} + t_{l})$ and $\bar{y}$ stands for the mean value, where $t_{ct}$ is the current time and $t_{l}$ means the duration of the prediction. The Pearson correlation coefficient $r_{i j} (k)$ between $x_{i} (k)$ and $y_{j} (t_{ct} + t_{l})$ is defined as follows:

\begin{array}{l} r_{i j} (k) & = \frac{\sum_{i = 1}^{N} (x_{i} (k) - \bar{x}) (y_{j} (t_{ct} + t_{l}) - \bar{y})}{\sqrt{\sum_{i = 1}^{N} {(x_{i} (k) - \bar{x})}^{2} \sum_{i = 1}^{N} {(y_{j} (t_{ct} + t_{l}) - \bar{y})}^{2}}} \\ = (N \sum_{i = 1}^{N} x_{i} (k) y_{j} (t_{ct} + t_{l}) - \sum_{i = 1}^{N} x_{i} (k) \sum_{i = 1}^{N} y_{j} (t_{ct} + t_{l})) \\ \cdot (\sqrt{N \sum_{i = 1}^{N} x_{i} {(k)}^{2} - {(\sum_{i = 1}^{N} x_{i} (k))}^{2}} \\ \cdot {\sqrt{N \sum_{i = 1}^{N} y_{j}^{2} (t_{ct} + t_{l}) - {(\sum_{i = 1}^{N} y_{j} (t_{ct} + t_{l}))}^{2}})}^{- 1} . \end{array}

(1)

It means higher correlation if the absolute value of spatial and temporal correlation coefficient is closer to 1. Then information of the most m correlated road segments and corresponding time stamp will be chosen as extracted features, that is, six parameters ( $F_{m}$ , $c_{m}$ , $Δ_{m x}$ , $d_{m x}$ , $w_{m}$ , $l_{m}$ ). $F_{m}$ , $c_{m}$ denote the average speed and the certain state of the $m_{th}$ road segment at the corresponding time stamp. $Δ_{m x}$ and $d_{m x}$ denote the interval of the time stamp and geodistance between the predicted road segment and its $m_{th}$ most correlated road segment, respectively. $w_{m}$ and $l_{m}$ denote the width and length of the $m_{th}$ most correlated road segment.

Principle Component Analysis. After extracting features by correlation coefficient approach, Principle Component Analysis (PCA) is adopted to reduce data dimension and compress the data volume efficiently. The features of m most correlated road segments construct the original data matrix. Consider

\begin{matrix} C = {[c (1) c (2) \dots c (6)]}^{T} \in R^{6 \times m}, \end{matrix}

(2)

where

c (i) = [c_{1} (i) c_{2} (i) \dots c_{m} (i)] \in R^{m}

. In order to overcome such problems as the disunity of the original data, the raw data is normalized as follows:

\begin{matrix} x_{j} (i) = \frac{c_{j} (i) - e (c_{j})}{δ (c_{j})} \forall i, j, \\ X = {[x (1) x (2) \dots x (6)]}^{T} \in R^{6 \times m}, \end{matrix}

(3)

where

δ^{2} (c_{j}) = (1 / 5) \sum_{i}^{6} [c_{j} (i) - e (c_{j})]^{2}

is the sample variance and

e (c_{j}) = (1 / 6) \sum_{i = 1}^{6} c_{j} (i)

is the sample mean. Then the covariance matrix of traffic flow in road segments can be constructed as follows:

\begin{matrix} S = \frac{1}{m - 1} \sum_{j} (x_{j} - 〈x〉) {(x_{j} - 〈x〉)}^{T}, \end{matrix}

(4)

where

〈x〉 = (1 / m) \sum_{j}^{} x_{j}

. All eigenvalues of the covariance matrix R are found through the matrix computationand the sequence is

λ_{1} \geq λ_{2} \geq \dots \geq λ_{6} \geq 0

As principle components contain most information, we only need to extract part of the main components. The contribution rate of each eigenvalue is

\begin{matrix} \frac{λ_{i}}{\sum_{k = 1}^{6} λ_{k}} = \frac{λ_{i}}{6} . \end{matrix}

(5)

The cumulative contribution rate of the preceding r principle components is computed as $\sum α_{r}$ . When this contribution rate is larger than a constant value ψ, we only need to extract r principle components which can reflect main information of all variables. Consider

\begin{matrix} \sum α_{r} = \frac{\sum_{k = 1}^{r} λ_{k}}{\sum_{k = 1}^{6} λ_{k}} . \end{matrix}

(6)

Finally the sample can be projected on the selected r feature vectors and a

r \times m

feature matrix is obtained for feature learning.

4.1.2. Feature Learning with RBM

The main idea of deep learning approach RBM in O-Sense is that a plenty of preextracted temporal-spatial traffic data are processed into informative low-dimensional features, and then these learned features are used to train an efficient classifier SVM.

RBM is a generative stochastic neural network that can learn a probability distribution (e.g., $p (v, h)$ ) over the input set, which is good at reasoning about and predicting irregular and stochastic behavior in the traffic flow. As shown in Figure 3, RBM is in the shape of a bipartite graph with no intralayer connections. The hidden unit activations h (low-dimensional data) are mutually independent given the visible unit activations v; that is, when constraints are given on v, all hidden units are conditionally independent; that is, $p (h | v) = p (h_{1} | v) \dots p (h_{q} | v)$ , and vice versa. Given visible layer v, hidden layer h can be obtained through $p (h | v)$ ; meanwhile the value of units is gained by getting the hidden units $p (v | h)$ . Through adjusting the parameters, the visible layer $v_{1}$ obtained from hidden layer can approximately equal the original input layer v. From this perspective, the outputs of hidden units are another representation of the visible units; namely, the original high-dimensional traffic information has been transformed into low-dimensional representative data.

Figure 3

Temporal-spatial feature deep learning model.

The standard type of RBM has binary-valued hidden and visible units and consists of a matrix of weights in which $w_{i j}$ denotes the connection weight between hidden unit $h_{j}$ and visible unit $v_{i}$ , $a_{i}$ denotes the bias of $v_{i}$ , and $b_{j}$ denotes the bias of $h_{j}$ . Given parameters $θ = (w_{i j}, a_{i}, b_{j})$ , the energy of a configuration $(v, h)$ is defined as

\begin{matrix} E (v, h; θ) = - \sum_{i = 1}^{r m} a_{i} v_{i} - \sum_{j = 1}^{q} b_{j} h_{j} - \sum_{i = 1}^{r m} \sum_{j = 1}^{q} h_{j} w_{i j} v_{i} . \end{matrix}

(7)

This energy function is analogous to that of a Hopfield network [23]. As in general Boltzmann machines, probability distributions over hidden and visible vectors are defined in terms of the energy function. Consider

\begin{matrix} P (v, h) = \frac{1}{Z} e^{- E (v, h)}, \end{matrix}

(8)

where Z is a partition function defined as the sum of

e^{- E (v, h)}

over all possible configurations. Similarly, the marginal probability of a visible vector of Booleans is the sum over all possible hidden layer configurations. Consider

\begin{matrix} P (v) = \frac{1}{Z} \sum_{h} e^{- E (v, h)} . \end{matrix}

(9)

Then the probability under the condition that $h_{j}$ equals 1 and the probability under the condition that $v_{i}$ equals 1 are given as follows:

\begin{matrix} P (h_{j} = 1 | v; θ) = σ (b_{j} + \sum_{i = 1}^{r m} w_{i j} v_{i}), \\ P (v_{i} = 1 | h; θ) = σ (a_{i} + \sum_{j = 1}^{q} w_{i j} h_{j}), \end{matrix}

(10)

where

σ (x) = 1 / (1 + e^{- x})

denotes the logistic sigmoid function and the parameters

θ = (w_{i j}, a_{i}, b_{j})

can be learned by gradient-based contrastive divergence algorithm [24].

4.1.3. Training a SVM Model

After input data is compressed by RBM, more representative features are extracted as the inputs of SVM for better classification effect. Through training a SVM classifier, we can get 5 classifications of traffic flow state and their probabilities construct a vector. Consider

\begin{array}{l} P (t) & = [p (state = Cg), p (state = Sl), p (state = Nm), \\ p (state = Mo), p (state = Un)] . \end{array}

(11)

4.2. Temporally Related Supplementary Feature Learning

Observing from the data distribution, there are enough taxi GPS traces at day, which is, however, insufficient at midnight. When traffic information is insufficient at some certain times, the deep learning approach which only utilizes big data for traffic condition prediction cannot make enough effect [25]. Through supplementary feature learning, environmental information relevant to traffic condition for each road segment can be well taken into account and then make the prediction become more profound and accurate. This approach basically involves three main processes, environmental feature extraction, CRF real-time estimation, and sequence segments matching.

4.2.1. Environmental Feature Extraction

The traffic condition can be reflected by environmental information. Apparently, most of the road noise is emitted by vehicles. The observed patterns of noise can reflect under which traffic conditions it was produced [26, 27]. A high rainfall intensity also influences the traffic condition [28]. Owning to the reduced visibility and pavement friction, the effective travel times are greater than the mean travel time as expected. In addition, the rainfall intensity will also influence the travelers’ path choice. If the effective travel time on one path is less sensitive to the rainfall as compared to other paths, most travelers choose or switch to the path with the probability of a higher rainfall intensity. A high wind speed results in a more hazardous trip and travel times are longer and less reliable, which leads to some travelers deferring trips or cancelling them [29]. The impact of temperature and PM2.5 is not very clear, but a good traffic condition is more likely when the temperature is high and PM2.5 is low.

When there is little traffic information at midnight, environmental information especially road noise can well reflect traffic condition. Road noise is mainly affected by traffic flow at night, which is not the same as during the day that contains interference factors. Hence, the sparse period of traffic information especially at midnight is quite suitable for being supplied with environmental information.

We select some environmental features that related to the traffic, road noise, temperature, wind speed, PM2.5, and rainfall denoted by a 5-dimensional vector $E = [e_{1}, e_{2}, e_{3}, e_{4}, e_{5}]$ , where $e_{i}$ ( $i = 1,2, \dots, 4,5$ ) is the value of each feature.

4.2.2. CRF Real-Time Estimation

The characteristic mode and temporal evolution rule of the traffic flow state are investigated by using linear-chain CRF method [30], considering the supplementary features such as E of the predicted road segment.

A linear-chain CRF is a discriminative probability undirected graphical learning model. The advantage of CRFs over hidden Markov models (HMM) is the relaxation of the independent assumptions between features. HMM is necessarily local in nature because they are constrained to binary transition and emission feature functions, which forces each state to depend only on the current label and each label to depend only on the previous label; however, CRF can use more global features. Additionally, CRF can obtain the global optimal value with the global normalization of all features. Building the special case of a linear-chain CRF, features are restricted to depend on only the current and previous labels instead of arbitrary labels throughout.

As shown in Figure 4, an undirected graph G consists of two kinds of nodes. The white nodes $Y = {Y_{1}, Y_{2}, \dots, Y_{n}}$ represent traffic flow state variables at continuous time stamps, connected with supplementary features represented by blue nodes $X = {X_{1}, X_{2}, \dots, X_{n}}$ , $X_{t} = {E, t}$ . Under observation sequence x, each random variable $Y_{i}$ obeys the Markov property; that is, $p (Y_{v} | X, Y_{u}, u \neq v) = p (Y_{v} | X, u ~ v)$ (u and v are adjacent nodes in G). Given observation sequence x at time t, the probability of each possible state for $Y_{t}$ is defined as follows:

\begin{array}{l} P (y | x, λ) \\ = \frac{\exp (\sum_{j} λ_{j} t_{j} (y_{t - 1}, y_{t}, x, t) + \sum_{k} μ_{k} s_{k} (y_{t}, x, t))}{\sum_{l} \exp (\sum_{j} λ_{j} t_{j} (y_{t - 1}, y_{t}, x, t) + \sum_{k} μ_{k} s_{k} (y_{t}, x, t))}, \end{array}

(12)

where

t_{j} (y_{t - 1}, y_{t}, x, t)

is the transfer feature function depending on the current time t and previous time

t - 1

s_{k} (y_{t}, x, t)

represents the state feature function depending on the time t, and

y_{t}

denotes the traffic flow state at time t. When meeting the known deterministic feature conditions, the value of

t_{j}

s_{k}

is 1, otherwise 0.

λ_{j}

and

μ_{k}

can be estimated from training data.

Figure 4

Temporally related feature learning CRF model.

Assuming that there is a bunch of samples which are independent of each other in training dataset $Data = {x^{(z)}, y^{(z)}}$ , maximum likelihood estimation learning $p (y | x, λ)$ is used to learn the parameters λ by gradient descent algorithm [31]. Consider

\begin{array}{l} L (λ) = \sum_{z} \begin{array}{l} [\log \frac{1}{\sum_{y} \exp [\sum_{j} λ_{j} f_{j} (y_{t - 1}, y_{t}, x, t)]} \end{array} \\ + \sum_{j} λ_{j} f_{j} (y_{t - 1}, y_{t}, x, t)] . \end{array}

(13)

By setting $s_{k} (y_{t}, x, t) = s_{k} (y_{t - 1}, y_{t}, x, t)$ , two feature functions are unified into $f_{j} (y_{t - 1}, y_{t}, x, t)$ , which includes an observation sequence x, the time t, the current state $y_{t}$ , and the previous state $y_{t - 1}$ . Assuming that there are $K_{1}$ transfer feature functions and $K_{2}$ state feature functions under the given deterministic feature conditions, $K = K_{1} + K_{2}$ , $f_{j} (y_{t - 1}, y_{t}, x, t)$ is represented as follows:

\begin{array}{l} f_{j} (y_{t - 1}, y_{t}, x, t) \\ = \{\begin{cases} t_{j} (y_{t - 1}, y_{t}, x, t), & j = 1,2, \dots, K_{1} \\ s_{k} (y_{t}, x, t), & j = K_{1} + k; k = 1,2, \dots, K_{2} . \end{cases} \end{array}

(14)

Assigning each feature function $f_{j}$ a weight $λ_{j}$ and given an observation sequence x, (12) can be transformed as follows:

\begin{array}{l} P (y | x, λ) & = \frac{\exp [p (y | x, λ)]}{\sum_{y} \exp [p (y | x, λ)]} \\ = \frac{\exp [\sum_{j} λ_{j} f_{j} (y_{t - 1}, y_{t}, x, t)]}{\sum_{y} \exp [\sum_{j} λ_{j} f_{j} (y_{t - 1}, y_{t}, x, t)]} . \end{array}

(15)

Through the above equation, the probabilities of five possible states for $Y_{t}$ can be informally denoted as a five-dimensional vector. Consider

\begin{array}{l} P (t) = & [p (y_{t} = Cg), p (y_{t} = Sl), p (y_{t} = Nm), \\ p (y_{t} = Mo), p (y_{t} = Un)] . \end{array}

(16)

4.2.3. Sequence Segments Matching

After applying CRF classifier to output a state vector of traffic flow at each time stamp, a state sequence segment can be obtained. Through finding similar state segments in history, we can obtain the state of traffic flow at the predicted time stamp according to the sequence matching with historical sequence segments. Since there may be no identical state sequence segments, measuring the similarity between sequence segments is necessary for sequence matching. Commonly used Euclidean distance is not suitable for the distance measurement of time-series sequences, which leads to the utilization of a nonlinear time alignment approach, namely, dynamic time warping (DTW).

In the process of state sequence segment matching, assuming that the current time is $t_{ct}$ , a traffic state sequence segment obtained from CRF classifier previouslyis represented as $S = (s_{t_{ct} - n \cdot Δ t + 1}, \dots, s_{t_{ct} - Δ t}, s_{t_{ct}})$ and a predicted traffic state after time $t_{l}$ can be denoted as $s_{t_{ct} + t_{l}}$ . DTW-based segment matching algorithm, shown in Algorithm 2, is employed to procure adequate matching segments through measuring DTW distances between the given state sequence segment and historical segments $S^{*} = (s_{t_{ct} - n \cdot Δ t + 1}^{*}, \dots, s_{t_{ct} - Δ t}^{*}, s_{t_{ct}}^{*})$ . In this algorithm, the predicted sequence segment is warped nonlinearly in time dimension to determine the similarity with historical segments. A constant thresholdε is defined to filter adequate traffic flow state segments in history according to the DTW distances which are less than the chosen threshold; that is, $DT W^{(i)} [t_{ct}] [t_{ct}] \leq ε$ . Assuming there are hs adequate historical segments chosen through DTW-based segment matching algorithm, so we can get hs state probability vectors of 5-dimensional at the predicted time $t_{ct} + t_{l}$ , where $i_{th}$ probability vector is represented as

\begin{array}{l} P^{(i)} (t = t_{ct} + t_{l}) \\ = [p^{(i)} (s_{t}^{*} = Cg), p^{(i)} (s_{t}^{*} = Sl), p^{(i)} (s_{t}^{*} = Nm), \\ p^{(i)} (s_{t}^{*} = Mo), p^{(i)} (s_{t}^{*} = Un)] . \end{array}

(17)

Algorithm 2: DTW-based segment matching algorithm.

Input: Historical segments:

$S^{* (i)} = (s_{t_{ct} - n \cdot Δ t + 1}^{* (i)}, \dots, s_{t_{ct} - Δ t}^{* (i)}, s_{t_{ct}}^{* (i)}) (i = 1,2, \dots, hs)$

Predicted segment: $S = (s_{t_{ct} - n \cdot Δ t + 1}, \dots, s_{t_{ct} - Δ t}, s_{t_{ct}})$

Output: Adequate historical segments

Procedure:

(1) for $S^{* (i)} (1 \leq i \leq hs)$

(2) for $t_{ct} - n Δ t + 1 < j, k \leq t_{ct}$ do

(3) ${Distance}^{(i)} [j] [k] = = 0$ ;

(4) ${DTW}^{(i)} [j] [t_{ct} - n Δ t + 1] = = \inf$ ;

(5) ${DTW}^{(i)} [t_{ct} - n Δ t + 1] [k] = = \inf$ ;

(6) end;

(7) ${DTW}^{(i)} [t_{ct} - n Δ t] [t_{ct} - n Δ t] = = 0$ ;

(8) for $t_{ct} - n Δ t + 1 < j, k \leq t_{ct}$ do

(9) ${Distance}^{(i)} [j] [k] = (s_{k - 1} - {s_{j - 1}}^{* (i)}) * (s_{k - 1} - {s_{j - 1}}^{* (i)})$

(10) end;

(11) for $t_{c t} - n Δ t + 1 < j, k \leq t_{c t}$ do

(12) ${DTW}^{(i)} [j] [k] = {Distance}^{(i)} [j] [k] + Min ({DTW}^{(i)} [j - 1] [k - 1]$ ,

${DTW}^{(i)} [j] [k - 1]$ ,

${DTW}^{(i)} [j - 1] [k])$ ;

(13) end;

(14) if ${DTW}^{(i)} [t_{ct}] [t_{ct}] \leq ε$

(15) output $S^{* (i)}$ ;

(16) end;

In the above formula, $S^{* (i)}$ is the $i_{th}$ adequate historical sequence segment and $P^{(i)} (s_{t}^{*} = s)$ ( $s \in Ω$ ) represents the probabilities of five possible states at the predicted time $t_{ct} + t_{l}$ according to the $i_{th}$ historical segment $S^{* (i)}$ . Since $w^{(i)}$ is the weight of $i_{th}$ historical segment and can be calculated as

\begin{matrix} w^{(i)} = \frac{DT W^{(i)} [t_{ct}] [t_{ct}]}{\sum_{i = 1}^{hs} DT W^{(i)} [t_{ct}] [t_{ct}]}, \end{matrix}

(18)

now, we can obtain the weighted vector at the predicted time t (

t = t_{ct} + t_{l}

) as the final state result. Consider

\begin{matrix} P (t = t_{ct} + t_{l}) = \sum_{i = 1}^{hs} \begin{matrix} w^{(i)} \cdot p^{(i)} (t = t_{ct} + t_{l}) \end{matrix} . \end{matrix}

(19)

4.3. Dynamic Weighted Classifier Fusion

During the daytime, the taxi traffic operation characteristics, which provide ample data of arterial roads, can objectively reflect real road traffic conditions of the city from a certain extent. However, applying supplementary information especially road noise to predict the traffic condition may be affected by some interference factors such as man-made noise during the daytime. In the early morning hours, pure road noise is more likely to better reflect the road traffic conditions, which makes it possible to achieve higher prediction accuracy than what is achieved by insufficient taxi GPS traces. Obviously, two different approaches have various prediction accuracies during different time in a day.

Comprehensively considering temporal-spatial features and supplementary features, a dynamic weighted classifier fusion is utilized to fuse two approaches to obtain a better result. We give scores to the prediction effects of two approaches every hour in one day, represented as $scor e_{CRF}^{a HR, b D}$ and $scor e_{DL}^{a HR, b D}$ , where $a HR$ represents the $a_{th}$ hour in one day and $b D$ denotes that $b_{th}$ day in history. Sampling num times per hour, we compare predicted state $s_{t}^{j}$ with actual state $s_{t}^{j, g t}$ at each sampling in one hour. $s_{t}^{j}$ represents the prediction result of $j_{th}$ sample in one hour that the time stamp t belongs to, while $s_{t}^{j, g t}$ is the corresponding ground truth, which is accurately defined in evaluation part. If $s_{t}^{j}$ is congruent with the ground truth, a positive value considering the probability of this state is assigned to the score; otherwise the score is assigned negative. Consider

\begin{matrix} {score}_{CRF}^{j, a HR, b D} = \{\begin{cases} P_{CRF}^{(b D)} (s_{t}^{*} = s_{t}^{j}), & if s_{t}^{j} is s_{t}^{j, g t} \\ - P_{CRF}^{(b D)} (s_{t}^{*} = s_{t}^{j}), & if s_{t}^{j} is not s_{t}^{j, g t}, \end{cases} \end{matrix}

(20)

where

scor e_{CRF}^{j, a HR, b D}

represents the score of

j_{th}

sample in the hour

a HR

on day

b D

. The sum of all scores in the hour

a HR

of day

b D

is calculated as follows:

\begin{matrix} {score}_{CRF}^{a H R, b D} = \sum_{j = 1}^{num} {score}_{CRF}^{j, a HR, b D} . \end{matrix}

(21)

Similarly, we can also get the

{score}_{DL}^{j, a HR, b D}

and

scor e_{DL}^{a HR, b D}

as follows:

\begin{matrix} {score}_{DL}^{j, a HR, b D} = \{\begin{cases} P_{DL}^{(b D)} (s_{t}^{*} = s_{t}^{j}), & if s_{t}^{j} is s_{t}^{j, g t} \\ - P_{DL}^{(b D)} (s_{t}^{*} = s_{t}^{j}), & if s_{t}^{j} is not s_{t}^{j, g t}, \end{cases} \\ {score}_{DL}^{a HR, b D} = \sum_{j = 1}^{num} {score}_{DL}^{j, a HR, b D} . \end{matrix}

(22)

When the expectation of scores in a given hour $a HR$ of all historical days is larger, the prediction will perform better. However, if the variance of scores in a given hour is larger, we will get worse prediction effect. Assuming that there are $e v$ days in the training dataset, prediction effects $α_{a HR}$ and $β_{a HR}$ in two respective approaches considering expectations and variances at the corresponding hour of $e v$ days in the history can be defined as follows:

\begin{matrix} α_{a HR} = \frac{\bar{{score}_{CRF}^{a HR}}}{1 + \sqrt{\sum_{b D = 1}^{e v} {({score}_{CRF}^{a HR, b D} - \bar{{score}_{CRF}^{a HR}})}^{2}}}, \\ β_{a HR} = \frac{\bar{{score}_{DL}^{a HR}}}{1 + \sqrt{\sum_{b D = 1}^{e v} {({score}_{DL}^{a HR, b D} - \bar{{score}_{DL}^{a H R}})}^{2}}}, \end{matrix}

(23)

where

\bar{{score}_{CRF}^{a HR}}

and

\bar{{score}_{DL}^{a HR}}

denote the average values of scores in hour

a HR

all

e v

days in two different approaches.

In prediction process, we apply CRF approach and deep learning approach to learn features separately, generating two classifications with two probability vectors ( $P_{CRF}$ and $P_{DL}$ ) of 5-dimensional. The prediction result of O-Sense (donated as $P_{OS}$ ) can be calculated by fusing $P_{CRF}$ and $P_{DL}$ with respective weights at time t belonging to hour $a HR$ , which can be represented as follows:

\begin{array}{l} P_{OS} = \frac{α_{a HR}^{2}}{α_{a HR}^{2} + β_{a HR}^{2}} * P_{CRF} + \frac{β_{a HR}^{2}}{α_{a HR}^{2} + β_{a HR}^{2}} * P_{DL}, \\ a HR = (1,2, \dots, 24) . \end{array}

(24)

5. Evaluations

5.1. Datasets

The datasets of Wuhan city are used for the evaluation of our traffic flow prediction and route computing. A representative region is selected to verify the validity of our mechanism. The following three available datasets are used. (1)

Taxi Trajectories: the trajectory datasets are generated by 30,000 taxis over a period of three months from January to March in 2013. Total distance of the dataset is about 450 million kilometers, and whole number of GPS points reaches 890 million. The interval sampling each time is 4.5 minutes with the distance of 500 meters between two consecutive points.

(2)

Road Network: the road network of Wuhan is adopted to perform the experiments. In Figure 5, a snapshot of the Wuchang district road network in Wuhan is displayed in the rush hour (5 pm).

(3)

Environmental data: from a public website, we collect environmental information which contains road noise, temperature, wind speed, PM2.5, and rainfall.

(4)

Ground truth: the actual traffic flow which can be known from camera sensor on the road is used as the ground truth to measure the prediction accuracy.

Figure 5

The road network of Wuchang district in Wuhan.

5.2. Performance

5.2.1. Evaluation of Temporal-Spatial Deep Learning Approach

Several parameters need to be determined for utilizing a temporal-spatial deep learning approach. Since different network structures influence the forecast results obviously, the parameter effect needs to be tested. One of the most important problems in designing a neural network is to determine the size of the network. 128 nodes are chosen to be the number of input layer nodes. As the number of input layer node changes, different weighted mean accuracy (WMA) can be obtained. As shown in Figure 6, the best choice is 64 nodes based on our experiments. If there are more nodes, training the RBM model will increase both the redundancy and computational expense. However, fewer nodes cannot make full use of neural network to depict the characteristics of high-dimension, nonlinearity, and randomness in the transportation network.

Figure 6

Effect of nodes in output layer.

As shown in Figure 7, road segment $I_{3}$ chosen as the forecasted road segment denotes the traffic flow from upstream link F to downstream road I. As shown in Table 1, in order to predict the object traffic flow of $I_{3} (t_{ct} + T_{predict})$ at time $t_{ct} + T_{predict}$ , M ( $M = 32$ can be calculated through number of input nodes in RBM) most closely correlated traffic flow with different time indexes are selected from all the following available traffic flows: ${A_{1} (t_{ct}), A_{1} (t_{ct} - 1), \dots, A_{1} (t_{ct} - T)}$ , ${A_{3} (t_{ct}), A_{3} (t_{ct} - 1), \dots, A_{3} (t_{ct} - T)}, \dots$ , ${I_{3} (t_{ct}), I_{3} (t_{ct} - 1), \dots, I_{3} (t_{ct} - T)}, \dots, {J_{3} (t_{ct}), J_{3} (t_{ct} - 1), \dots, J_{3} (t_{ct} - T)}$ , and so forth. In this paper, we choose 5 min as the time unit and $T = 20$ .

Table 1

Most correlated traffic flows with road segment $I_{3} (t + T_{predict})$ .

Traffic flow	Correlation
$J_{1} (t)$	0.971911
$F_{3} (t)$	0.967859
$H_{2} (t)$	0.967743
$I_{3} (t)$	0.967213
$A_{3} (t - 1)$	0.966754
$A_{3} (t)$	0.966512
$B_{4} (t)$	0.964985
$A_{4} (t)$	0.962859
$D_{1} (t - 2)$	0.962543
$G_{1} (t - 1)$	0.962211
$D_{1} (t - 1)$	0.961019
$B_{4} (t - 1)$	0.960609
$G_{1} (t)$	0.960312
$A_{4} (t - 1)$	0.959412
$B_{2} (t - 1)$	0.959289
$F_{3} (t - 1)$	0.957864
$A_{3} (t - 2)$	0.957535
$H_{3} (t)$	0.957198
$B_{2} (t)$	0.955938
$A_{1} (t)$	0.955543
$E_{1} (t - 1)$	0.954375
$A_{1} (t - 2)$	0.954102
$G_{3} (t - 1)$	0.946793
$C_{2} (t - 1)$	0.946776
$H_{4} (t - 1)$	0.944312
$B_{2} (t - 2)$	0.944023
$J_{2} (t - 1)$	0.941005
$C_{2} (t)$	0.926284
$D_{3} (t - 2)$	0.924927
$G_{3} (t - 3)$	0.919505
$F_{1} (t - 1)$	0.918812
$C_{1} (t - 2)$	0.902242

Figure 7

Illustration of road network segment.

In the PCA approach, we choose ψ as 95 percent and 6 eigenvalues of the covariance matrix can be obtained as shown in Table 2.

Table 2

Eigenvalue and contribution rate of the covariance matrix.

Eigenvalue	Contribution rate
2.2928	58.03%
0.8010	20.27%
0.6322	16.00%
0.1594	4.03%
0.0406	1.03%
0.0250	0.63%

After computing the cumulative contribution rate, we find that the cumulative contribution rate of the previous four principle components has achieved 98.33%. Hence, the previous four principle components can be considered to represent all information. The principle component structure of traffic flow is shown in Table 3.

Table 3

Principle component structure of traffic flow.

	Principle component
	1	2	3	4	5	6
$F_{m}$	−0.0072	−0.0116	−0.0631	−0.2166	0.0764	−0.7433
$c_{m}$	0.2076	−0.7524	−0.5373	0.2606	0.1545	−0.0683
$Δ_{m x}$	0.2358	−0.4925	0.3296	−0.5935	−0.4530	0.1713
$d_{m x}$	0.3975	−0.1156	0.5612	−0.0020	0.5919	−0.2532
$w_{m}$	0.5540	0.0862	0.1869	0.4959	−0.1112	0.2889
$l_{m}$	0.4671	0.2517	−0.1530	0.1169	−0.5413	−0.4488

To express the predicted state and its probability more clearly, we define five numbers for each state; for example, $Ω = {congesting (Cg) = 1, slow (Sl) = 2, normal (Nm) = 3, moderate (Md) = 4, unimpeded (Un) = 5}$ . A road segment state is valued as the state number ${n^{t}}_{r}$ of the road segment r at the time stamp t. When a road segment traffic flow state is predicted as the state ${n^{t}}_{r}$ with the probability p, the predicted state value of the road segment is defined as ${n^{t}}_{r} - 1 + p$ ; for example, when the predicted state of a road segment is Sl and the corresponding probability is 0.9 generated by deep learning approach, then the deep learning-based predicted state value is 1.9.

Figure 8 shows the traffic state prediction of deep learning approach with prediction time $φ = 15$ min. There are two time slots (about 8 am and 6 pm) where the prediction values are more close to the actual values. It can be inferred that, in the rush hours, there are sufficient taxi trajectories to learn the states of traffic flow and produce a more accurate prediction. Temporal-spatial deep learning approach can effectively extract the temporal-spatial features of high-dimensional, nonlinear, and random traffic flow. However, due to the lack of the sufficient taxi GPS traces, the traffic prediction accuracy in the early morning hours (from 1:00 am to 5:00 am) is significantly lower than that during the daytime. With the increase of the vehicles at 6 am, the prediction accuracy improves gradually. It can be seen that the deep learning approach cannot train an effective network to extract features due to the sparsity problem of data.

Figure 8

Traffic state prediction of temporal-spatial deep learning.

5.2.2. Evaluation of Temporally Related CRF Approach

Figure 9 demonstrates the prediction result of temporally related CRF approach with the prediction time $φ = 15$ min. According to the calculation, the prediction precision of CRF approach is nearly 0.75. Comparing the results of two approaches to predict separately in Figures 8 and 9, it can be inferred that the effect of deep learning approach is much better than CRF approach in most cases. However, temporally related CRF approach has better prediction effect on compensating for sparse data in deep learning approach. It takes full advantage of supplementary information especially road noise that provides effective information to improve the prediction accuracy when in the midnight.

Figure 9

Traffic state prediction of temporally related CRF approach.

5.2.3. Evaluation of Dynamic Weighted Classifier Fusion

In the following experiments, Figure 10 presents the expectations and the standard deviations of scores in the same hour varied in these two approaches. Two observations can be found in Figure 10(a). First, deep learning approach in the daytime has good performance using ample temporal-spatial traffic features. Second, the performance of CRF is higher when there is sparse data in the traffic network. As shown in Figure 10(b), CRF approach performs unstably in daytime on the account of the supplementary features such as road noise mainly caused by many active sources rather than just the vehicles in daytime.

Figure 10

Expectations and standard deviations of two approaches.

The dynamic weights of two different approaches in the same hour of one day are showed in Figure 11, which reveals that we can make more use of the advantages of two classifiers through fusing two approaches with the variety of time. Figure 12 presents better results obtained by fusion in different time of one day. For example, between 1 to 5 am in the morning, due to the sparse data in the traffic network, deep learning approach may have a low degree of recognition. However, CRF can obtain higher prediction precision during the period of time by extracting temporally related supplementary features. Hence, more accurate results can be gained by O-Sense which combines two approaches.

Figure 11

Dynamic weights of two approaches.

Figure 12

Traffic state prediction of O-Sense.

Figure 13 shows the precision, the recall, and the F1-measure of three approaches. As is shown in Figures 13(a) and 13(b), under the same prediction time, O-Sense can obtain 8% higher precision and recall than existing deep learning approaches. In Figure 13(c), the F1-measure, which considers both the precision and the recall, also reveals that O-Sense can obtain high accuracy. O-Sense comprehensively uses temporal-spatial traffic flow information and temporally related supplementary features in the traffic network. We can conclude that the effect of O-Sense is closer to the deep learning approach. Moreover, combining temporally related supplementary features used by CRF approach to overcome the problem of sparse data in deep learning, the prediction result of O-Sense can be further improved.

Figure 13

Study on three different approaches.

5.2.4. Evaluation of Online Route Computing

As is shown in Figure 14, two routes are selected to evaluate the performance of our mechanism. Route 1 starts from the Wuhan University to Hankou Railway Station and route 2 starts from Wuhan University to Guangda Masion. Route 1 has longer path length and costs much more time for travel than that of route 2, which can be seen in Figure 4. Comparing Figures 14(a) and 14(b), there are several observations as follows: Firstly, the traffic conditions almost remain unchanged at about 6 am and 12 am on route 1 and route 2. O-Sense and STR can both relatively accurately estimate the travel cost of the routes at that time. Secondly, the prediction performance of O-Sense is much better than STR in the time period containing different traffic conditions on route 1 (about 8 am and 4:30 pm). It can be inferred that starting at 8 am in the morning, driving on route 1 needs more than one hour to complete the whole path. The traffic condition of route 1 varies from congestion to slow then to normal. Hence, the total travel cost of the whole route depends on the real-time traffic condition of road segments when the user actually drives on them. Nevertheless, STR approach considers that the traffic conditions of road segments are always the same as their initial traffic condition at the departure time. It leads to the estimation that the travel cost by STR approach is much longer than the ground truth. The situation is similar as at 4:30 pm from that at 8 am, which can be seen as the estimated travel cost by STR is much shorter than ground truth. It can be inferred that the traffic condition on route 1 becomes congested after driving for about 30 minutes. The actual travel cost of the remaining path is much longer than that computed by the initial condition of each road segment in STR approach. Thirdly, route 2 takes less time than route 1 and the traffic conditions in that time period do not change too much; hence the STR performs better effects on route 2 at about 8 am and 4:30 pm.

Figure 14

Travel cost of two routes at different time.

Figure 15 shows the variation of driving speed at different places when the participant drives along route 1. STR keeps the speed of each road segment the same as that of the starting time for its static estimation policy, while O-Sense can estimate the real-time speeds for all the places on the route based on the predicted arrival time. When the user drives from the downtown area at 8 am, the traffic condition is congested. After the user drives about 4/6 distance of the whole path, the traffic flow shows high variation over time and traffic condition on current road segment becomes unimpeded. The traffic condition can be estimated by both STR and O-Sense on account of the rush hour at the beginning of traveling. Nevertheless, with the driving distance increasing, the traffic condition of the driving area becomes unblocked. Owning to the online traffic condition prediction of current road segment, O-Sense can perceive this variation and estimate the travel cost more accurately.

Figure 15

The speed after different travel distance at 8 am.

6. Conclusions

This paper proposes an online-traffic-prediction based route finding mechanism, namely, O-Sense, for fastest route finding with large scale taxi GPS traces and environmental information. In O-Sense, a temporal-spatial deep learning approach with preprocessed traffic-related features can effectively extract nonlinear, random, and high-dimensional characteristics from the traffic flow changes. Utilizing environment features as supplementary, CRF classifier can further reflect the dynamic patterns in the transportation network. A dynamic weighted classifier fusion approach is used to obtain a better prediction result, which helps to find the fastest route based on online route computing. We adopt trajectory datasets generated by 30,000 taxis over a period of three months in Wuhan to evaluate our approach. The experiment results showed that O-Sense can improve the travel cost estimation accuracy effectively.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was partially supported by National Key Basic Research Program of China “973 Project” (no. 2011CB707106), Development Program of China “863 Project” (no. 2013AA-122301), National Natural Science Foundation of China “NSFC” (nos. 41127901-06, 61303212, and 61373169), the Program for Changjiang Scholars and Innovative Research Team in University (no. IRT1278), and the Natural Science Foundation of Hubei Province of China (no. 2014CFB191).

References

Yuan

N. J.

Zheng

Zhang

Xie

T-finder: a recommender system for finding passengers and vacant taxis

IEEE Transactions on Knowledge and Data Engineering 2013 25 10 2390 2403

10.1109/tkde.2012.153

2-s2.0-84883294122

Zheng

Liu

Yuan

Xie

Urban computing with taxicabs

Proceedings of the 13th International Conference on Ubiquitous Computing (UbiComp ′11)

September 2011

ACM

89 98

2-s2.0-80054052524

10.1145/2030112.2030126

Liu

L. M.

Fan

Towards mobility-based clustering

Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ′10)

July 2010

919 927

10.1145/1835804.1835920

2-s2.0-77956212962

Liu

Zheng

Chawla

Yuan

Xie

Discovering spatio-temporal causal interactions in traffic data streams

Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2011

1010 1018

10.1145/2020408.2020571

2-s2.0-80052647853

Zhang

Wilkie

Zheng

Xie

Sensing the pulse of urban refueling behavior

Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ′13)

September 2013

13 22

10.1145/2493432.2493448

2-s2.0-84885205300

Thiagarajan

Ravindranath

LaCurts

Madden

Balakrishnan

Toledo

Eriksson

VTrack: accurate, energy-aware road traffic delay estimation using mobile phones

Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems (SenSys ′09)

November 2009

85 98

2-s2.0-74549148439

10.1145/1644038.1644048

Niu

Zhu

Zhang

DeepSense: a novel learning mechanism for traffic prediction with taxi GPS traces

Proceedings of the IEEE Global Communications Conference (GLOBECOM ′14)

December 2014

Austin, Tex, USA

2745 2750

10.1109/glocom.2014.7037223

Zheng

L. M.

Modeling heterogeneous routing decisions in trajectories for driving experience learning

Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ′14)

September 2014

951 961

Tsuyoshi

Sugiyama

Trajectory regression on road networks

Proceedings of the 25th AAAI Conference on Artificial Intelligence

August 2011

203 208

10.

Gonzalez

Han

Myslinska

Sondag

J. P.

Adaptive fastest path computation on a road network: a traffic mining approach

Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB ′07)

September 2007

794 805

11.

Kanoulas

Xia

Zhang

Finding fastest paths on a road network with speed patterns

Proceedings of the 22nd International Conference on Data Engineering (ICDE ′06)

April 2006

1 10

10.1109/icde.2006.71

2-s2.0-33749609313

12.

Zheng

L. M.

Time-dependent trajectory regression on road networks via multi-task learning

Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI ′13)

July 2013

1048 1055

2-s2.0-84893420232

13.

van der Voort

Dougherty

Watson

Combining Kohonen maps with ARIMA time series models to forecast traffic flow

Transportation Research Part C: Emerging Technologies 1996 4 5 307 318

10.1016/s0968-090x(97)82903-8

2-s2.0-0030298951

14.

Yuan

Zheng

Xie

Sun

Driving with knowledge from the physical world

Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2011

316 324

10.1145/2020408.2020462

2-s2.0-80052688627

15.

Yuan

Zheng

Xie

Sun

T-drive: enhancing driving directions with taxi drivers' intelligence

IEEE Transactions on Knowledge and Data Engineering 2013 25 1 220 232

10.1109/tkde.2011.200

2-s2.0-84870471446

16.

Castro

P. S.

Zhang

Urban traffic modelling and prediction using large scale taxi GPS traces

Pervasive Computing 2012 7319

Berlin, Germany

Springer

57 72 Lecture Notes in Computer Science

10.1007/978-3-642-31205-2_4

17.

Lippi

Bertini

Frasconi

Collective traffic forecasting

Machine Learning and Knowledge Discovery in Databases 2010 6322

Berlin, Germany

Springer

259 273

18.

Sun

Zhang

A Bayesian network approach to traffic flow forecasting

IEEE Transactions on Intelligent Transportation Systems 2006 7 1 124 133

2-s2.0-33644989492

10.1109/TITS.2006.869623

19.

Sun

Zhang

The selective random subspace predictor for traffic flow forecasting

IEEE Transactions on Intelligent Transportation Systems 2007 8 2 367 373

10.1109/TITS.2006.888603

2-s2.0-34249888176

20.

Yuan

Zheng

Zhang

Xie

Sun

G.-Z.

An interactive-voting based map matching algorithm

Proceedings of the 11th IEEE International Conference on Mobile Data Management (MDM ′10)

May 2010

43 52

10.1109/mdm.2010.14

2-s2.0-77955183487

21.

Yuan

Zheng

Zhang

Xie

Sun

Huang

T-drive: driving directions based on taxi trajectories

Proceedings of the 18th International Conference on Advances in Geographic Information Systems ACM SIGSPATIAL (GIS ′10)

November 2010

99 108

10.1145/1869790.1869807

2-s2.0-78650629618

22.

Dean

B. C.

Continuous-time dynamic shortest path algorithms [Master thesis] 1999

Massachusetts Institute of Technology

23.

Jagota

Approximating maximum clique with a Hopfield network

IEEE Transactions on Neural Networks 1995 6 3 724 735

10.1109/72.377977

2-s2.0-0029304809

24.

Hinton

G. E.

Training products of experts by minimizing contrastive divergence

Neural Computation 2002 14 8 1771 1800

2-s2.0-0013344078

10.1162/089976602760128018

25.

Glorot

Bordes

Bengio

Domain adaptation for large-scale sentiment classification: a deep learning approach

Proceedings of the 28th International Conference on Machine Learning (ICML ′11)

July 2011

513 520

2-s2.0-80053443013

26.

Johnson

D. R.

Saunders

E. G.

The evaluation of noise from freely flowing road traffic

Journal of Sound and Vibration 1968 7 2 287 309

10.1016/0022-460x(68)90273-3

2-s2.0-0011725059

27.

Calixto

Diniz

F. B.

Zannin

P. H. T.

The statistical modeling of road traffic noise in an urban setting

Cities 2003 20 1 23 29

10.1016/S0264-2751(02)00093-8

2-s2.0-0037297149

28.

Lam

W. H. K.

Shao

Sumalee

Modeling impacts of adverse weather conditions on a road network with uncertainties in demand and supply

Transportation Research Part B: Methodological 2008 42 10 890 910

10.1016/j.trb.2008.02.004

2-s2.0-52749095241

29.

Maze

T. H.

Agarwal

Burchett

Whether weather matters to traffic demand, traffic safety, and traffic operations and flow

Transportation Research Record 2006 1948 170 176

2-s2.0-33750332104

30.

Lafferty

J. D.

McCallum

Pereira

F. C. N.

Conditional random fields: probabilistic models for segmenting and labeling sequence data

Proceedings of the 8th International Conference on Machine Learning (ICML ′01)

June 2001

282 289

31.

Liu

L.-K.

Feig

A block-based gradient descent search algorithm for block motion estimation in video coding

IEEE Transactions on Circuits and Systems for Video Technology 1996 6 4 419 422

10.1109/76.510936

2-s2.0-0030216786