Travel time prediction of expressway based on multi-dimensional data and the particle swarm optimization–autoregressive moving average with exogenous input model

Abstract

In order to meet the fine demand of different travelers, a multi-dimensional prediction method of travel time is proposed combining the toll collection data and meteorological data of highway. First, a logical model of multi-dimensional database is designed including vehicles’ dimension, meteorological dimension, and time dimension. Second, aiming at integration of the toll collection data and meteorological data, a matching method is presented within the space and time scale. Then, the multi-dimensional database is constructed. Next, an autoregressive moving average with exogenous input model is constructed using the travel time series and traffic flow series. The maximum likelihood estimation method is used to solve the parameters of the autoregressive moving average with exogenous input model. Considering the complexity and solving difficulty of the maximum likelihood equation, the particle swarm optimization algorithm is used to optimize the solution process. Finally, the toll collection data of two road links on Shenyang–Haikou expressway (G15) and the corresponding meteorological monitoring data are used to validate the algorithm. The results show that the prediction accuracy of the particle swarm optimization–autoregressive moving average with exogenous input model under normal and special conditions can be accepted and the absolute percentage error of road section between two neighboring toll stations is reduced by almost 5% after optimization.

Keywords

Transport engineering travel time prediction particle swarm optimization autoregressive moving average with exogenous input model multi-dimensional database model toll collection data

Introduction

Travel time prediction is an important part of intelligent transportation system (ITS). The prediction and publishing of travel time is significant to realize traffic flow guidance and improve road service quality. The travel time estimation and forecast studies are mainly based on the traffic data collected by loop vehicle detectors¹ and GPS.² However, in China, the highway is closed and there is a perfect network of toll system. The toll data are complete. The toll data record the types of vehicle, entry time, departure time, entry station, and departure station. This provides the possibility for travel time prediction. Some scholars have studied travel time prediction based on toll data. Faouzi and colleagues^3,4 investigated the travel time estimation based on electronic toll collection (ETC) data alone by fusing them with other traffic data. Yoshikazu et al.⁵ calculated travel time based on toll collection system data of Japan. Soriguera et al.⁶ presented a new approach for measuring travel times based on closed toll highway data. Yamazaki et al.⁷ used ETC data to calculate the average travel time and evaluate the network service level. Zhao et al.^8,9 proposed a method of travel time prediction using highway toll data. At present, there are few studies on travel time prediction based on toll data, and all of them predict the average travel time of all vehicles on the road.

When traveling on a highway, the travel time of a vehicle is affected by a variety of weather (rain, snow, wind) factors. Regarding weather impact investigation on traffic operations, it is clear that weather conditions have a noticeable impact on traffic operations in various ways. Considering the impact of the weather on the traffic condition, Faouzi and Ben³ designed a prediction algorithm that accounts for the weather impact during the prediction process. The result is considered accurate enough. Koesdwiady et al.¹⁰ investigated the relevance of weather parameters and traffic flow based on deep learning networks and data fusion methods to improve the accuracy of traffic flow forecasts in bad weather. Qiao et al.¹¹ used traffic and weather data from multiple data sources to develop an integrated model that could predict travel times under various weather conditions. Qiu et al.¹² propose an integrated model for traffic flow parameter forecasting during inclement weather. Polson and Sokolov¹³ used the deep learning architecture to forecast traffic flow, focusing on special weathers and events.

At the same time, there are different speed limits for cars and trucks on the highway, and the travel time of different vehicle models is different. Chen¹⁴ analyzed the travel time data of highway, showing that there are significant differences in the travel time distribution between small passenger cars and buses. Zheng et al.¹⁵ provided a model based on data fusion of multisource data to estimate travel times of different vehicle types on urban streets. Xia¹⁶ analyzed the probability distribution of the travel time of different vehicle models and proposed that the probability distribution of expressway travel time is the mixed probability distribution model composed of the travel time of several vehicle models. Qian et al.¹⁷ achieved multi-dimensional (time, space, vehicle model) traffic volume prediction based on highway toll data and the online analytical mining (OLAM) method. Bastard et al.¹⁸ used the existing widespread inductive loop detector (ILD) network for realizing an estimation of individual travel time for a mixed population of cars and trucks.

However, most of the above literature only deals with the single impact of vehicle type or weather change on travel time prediction.

The research on travel time prediction started earlier and scholars have proposed many different algorithms. Innamaa¹⁹ proposed a feedforward neural network prediction model of travel time, and the model was validated using the travel time data of the peak time interval on the weekend. On the basis of the above, Vanajakshi and Rilett²⁰ proposed the support vector machine (SVM) prediction model compared with the neural network model, and the results showed that the SVM model has better predicting precision. Zhao et al.^8,9 proposed a method of travel time prediction using self-adaptive interpolation Kalman filter. Xu²¹ applied the pattern matching method to establish the pattern library of travel time, and the K-nearest neighbor algorithm was used for library searching in order to predict travel time. Wu et al.²² proposed a combination of K-means clustering and the autoregressive moving average with exogenous input (ARMAX) model method for travel time prediction, which was verified by vehicle detector data. Fei et al.²³ have developed a Bayesian inference-based dynamic linear model to robustly predict the exogenous factors (e.g. accidents), and they emphasize that the Bayesian approach enables the efficient update of travel time prediction contributing to online practicability. Some scholars pointed^24–26 out that abrupt changes of traffic phenomena (e.g. lane changes, capacity drop, merge behavior, oscillations) can affect congestion development and propagation in various ways that require physical rather than statistical models to explain. The autoregressive and moving average (ARMA) model is a classic type of time series model for travel time prediction, with the inherent characteristics of time series for modeling and prediction, by which the random sequence can be predicted at high level of accuracy and following performance. The ARMAX model based on the ARMA model increases the regression variables to optimize prediction, improving the prediction problem of lagging. However, solving of the ARMAX is complex and difficult. Many studies are devoted to reduce the complexity and solving difficulty of ARMAX, in which some scholars use intelligent optimization algorithms to calculate the parameters of the ARMA model.^27–29 In order to reduce the complexity and solving difficulty, this article introduces the particle swarm optimization (PSO) algorithm to optimize the solution process and use the optimized algorithm for travel time prediction.

In summary, it is very meaningful to make full use of the information provided by the toll data in China’s highway and to carry out the study of travel time prediction considering the impact of vehicle type or weather change. Moreover, the prediction results under different weather dimensions, vehicle type dimensions, and time dimensions are given. In this article, first, a multi-dimensional toll data warehouse is constructed based on the toll collection data and meteorological data. Second, the ARMAX model is established by setting traffic flow factors as the regression variables. Then, considering the complexity and solving difficulty, the PSO algorithm is used to optimize the solving process of model parameters. Finally, the travel time prediction under multi-dimensional factors can be achieved. The research content and framework of this article are shown in Figure 1.

Figure 1.

Research content and structure.

Multi-dimensional database model

This study was carried out relying on the highway toll data and meteorological data. In the record of highway toll data, each line corresponds to the complete entry and exit information of a single vehicle. The information is described by 11 fields. The names and meanings of the fields are shown in Table 1.

Table 1.

Correspondence of field name of original toll collection data and its meaning.

Number	Name	Meaning	Number	Name	Meaning
1	NET_NO	Exit network number	7	KEY_IN_ENTR_PLAZ	Entry station number
2	PLAZ_NO	Exit station number	8	DHM_ENTR	Entry time
3	LANE_NO	Exit lane number	9	BILL_CODE	Toll collection manner
4	DHM	Departure time	10	AXLES_DETECTED	Axle number
5	KEY_IN_TYPE	Vehicle type	11	WEIGHT_DETECTED	Axle load
6	ENTR_NET_NO	Entry network number

The travel time can be obtained using the departure time minus the entry time, as shown in equation (1)

TRAVEL_TIME=DHM - DHM_ENTR

(1)

The meteorological data of the highway mainly record the types of weather, degree of weather (such as heavy snow, moderate snow, and light snow), start time, end time, route name, start location, end location, and so on.

Analysis of the influencing factors of travel time

This article studies the prediction of highway travel time, considering the impact of weather, vehicle type, period, and other factors. Therefore, the establishment of a complete highway toll database is the basis for completing the study. To build the database, first, we need to analyze the influencing factors of travel time.

Driving conditions are not the same under different weather, vehicle type, and period conditions on the same road section. Moreover, travel time will occur in varying degrees of fluctuations. The factors that affect the travel time between highway stations are summarized as follows:

Time. When the traffic flow exceeds the road capacity, traffic congestion will happen. There are a cyclical peak period and a low period of road traffic flow in 1 day. Similar changes are also present on different days of a week. Travel time changes in a regular manner with time.

Vehicle type. The driving characteristics will be different for different vehicle types in the same journey. The distributions of speed and traffic volume of different vehicle types also change over time. For instance, compared with trucks, the speed of small cars is faster and the travel time is relatively short under the same conditions; during the night, the proportion of trucks was significantly higher than that during the day.

Meteorological conditions. In the rain, snow, fog, and other meteorological conditions, highway traffic flow will be affected due to the reduced visibility, slippery road, and other reasons, resulting in the reduced driving speed or even congestion. Then travel time will increase.

The database model

The toll collection data record a large amount of commute information of single entrance–exit vehicles, including the entrance and exit gate names, the entrance and exit times, and the types of vehicles. In addition, the Highway Authority also generally records the meteorological monitoring data of road network, including rain, snow, fog weather types, starting and ending times, and starting and ending locations.

According to the analysis in the previous chapter, for the same link, the travel time varies in different degrees with different car type, time, and weather conditions. Based on these, a multi-dimensional database model is built. The period dimension, vehicle type dimension, and weather dimension were considered the three basic dimensions for the multi-dimensional database model. In addition, hierarchical dimensions were set up in each dimension. For instance, the period dimension includes the dimension of the day and the dimension of the week, the weather dimension includes rainy day dimensions, snow dimensions, and fog dimensions, and the vehicle type dimension includes the size dimension and the bus and truck dimensions. Figure 2 shows the basic structure of the multi-dimensional database model, where the leftmost side represents the fields of a complete database record and the right side represents the various dimensions of the multi-dimensional database model.

Figure 2.

Model of the multi-dimensional data warehouse.

Data processing

Data transformation and data cleaning

There are many data with the problems of wrong form or abnormal value in the original data. Before the establishment of a complete highway traffic database, the wrong data should be removed according to the rules:¹⁷ The entrance–exit times display abnormally, and the travel time is out of range obviously (more than 1 day) or the travel time is negative; data records are duplicate; there are the same entrance–exit time records.

The toll collection data with any characteristic mentioned above should be removed for data cleaning.

Data integration

The establishment of the multi-dimensional database model needs to match the toll data and meteorological data first. The mapping relationship between the weather and toll data can be established based on the core correlation parameters in the highway toll data and meteorological data. It specifically refers to the fields with the corresponding temporal and spatial meanings, including the vehicle entry time and the weather start time, the exit stake number of vehicle and the start stake number of weather, and so on.

Spatial matching

In the table of toll data, the vehicle’s inbound and outbound locations can be positioned by the KEY_IN_ENTR_PLAZ and PLAZ_NO fields. Moreover, in the table of weather data, the specific weather is positioned by the start location, the end location, and the route name fields. However, there is a limit that the weather data were only recorded in the same route. If a vehicle does not enter and leave on the same route, the meteorological conditions cannot be matched. Therefore, we only match and study the toll data records of vehicles that enter and leave the highway on the same route in this research.

Temporal match

In the table of toll data, the vehicle’s entry time and exit time can be positioned by the DHM_ENTR and ENTR fields. In the table of weather data, the specific time for which the weather occurs can be set by the start time and the end time field. Then the temporal match can be completed.

The influence degree of the meteorological factors

The matching rules of the toll collection data and the meteorological data are determined according to the influence degree of the meteorological factors influencing the travel route in the time and space scale, and the influence level is calculated as follows and marked in the database.

Assume that the initial influence level is 7, when the travel route is totally covered by the meteorological phenomenon in the time and space scale. Otherwise, after the determination of the space–time scale, the influence level may degrade; then, the final influence level is obtained and marked in the database as well. The process of obtaining the influence level is presented in Figure 3.

Figure 3.

Data integration flow chart.

First, considering the influence level in the time scale, the initial influence level is 7, when the travel route is regarded as totally covered and the entrance time is within the 5-min interval of the meteorological starting time. The influence level will be 7. If the entrance time is out the 5-min interval of the meteorological starting time and the time scale coverage is more than 2/3, the influence degree will be declined by one level. Moreover, the influence degree will be declined by two levels, when the time scale coverage is less than 2/3 and more than 1/3. The influence degree will be declined by three levels when the time scale coverage is less than 1/3.

After the time scale determination, the space scale determination is required. The travel route is regarded as totally covered in the space scale when the entrance station is within 2 km of the meteorological starting point. If the entrance station is not within 2 km of the meteorological starting point, the influence degree will be discounted and decline. The influence degree will be discounted and declined by one level when the space scale coverage is more than 2/3. Moreover, the influence degree will be discounted and declined by two levels when the space scale coverage is less than 2/3 and more than 1/3. Moreover, the influence degree will be discounted and declined by three levels when the space scale coverage is less than 1/3.

After determination, there will be a value being uploaded to the database to represent the influence level of the meteorological factors.

Data filtering

Based on the literature,¹⁷ this article presents an improved quartile screening method. First, a relaxation coefficient is set for rough screening. Then the coefficient multiplication method is used to set the upper limit of the interval. Combined with the actual speed limit and mileage of the road link, the lower limit of the interval is set to complete the fine screening. The principle is shown as equations (2)–(8)

T_{mean_before} = \frac{T_{1} + T_{2} + \dots + T_{N}}{N}

(2)

T_{interval_before} = [0, C_{1} \times T_{mean_before}]

(3)

T_{mean} = \frac{T_{1} + T_{2} + \dots + T_{R}}{R}

(4)

T_{interval} = [T_{lower}, T_{upper}]

(5)

T_{lower} = \frac{CHARGR_MILEAGE}{v_{upper_limit}}

(6)

T_{upper} = C_{2} \times T_{mean}

(7)

C_{2} = \frac{T_{mean}}{T_{lower}}

(8)

where $N$ is the number of vehicles before rough screening, $T_{mean_before}$ is the average travel time before rough screening, and $T_{interval_before}$ is the travel time interval before rough screening. The vehicle travel time $T_{i} (i = 1, 2, . . ., R - 1, R)$ is valid when it falls within the interval. $C_{1}$ is the rough screening coefficient and $R$ is the remaining number of vehicles after rough screening. $T_{interval}$ is the effective travel time interval. $T_{upper}$ and $T_{lower}$ reflect the upper and lower limits of the effective interval, respectively. $CHARGR_MILEAGE$ is the charge mileage. $v_{upper_limit}$ is the maximum speed limit for the highway, which is set at 120 km/h usually. $T_{lower}$ is the shortest theoretical distance and $C_{2}$ is the upper limit for the screening coefficient.

The original toll collection data of the Zhoushuizi–Jinzhou section on the Shenyang–Haikou expressway were used to verify the feasibility of the method. The outliers, that is, the travel time data noise, are removed using the improved quartile screening. The data of 320 cars were filtered out by coarse screening, 131 car data were filtered out by fine filter, 107 truck data were filtered out by coarse screening, and 131 car data were filtered out by fine filter. Taking the data of 1 day as the example, the travel time series of the target link is shown in Figure 4(a). Two distinct outliers are circled. The filtered travel time series obtained by the improved quartile method is shown in Figure 4(b). Two outliers are filtered. It can be seen that the data filtering method can remove the outlier effectively.

Figure 4.

Travel time data processing: (a) before filtering and (b) after filtering.

Time series extraction

It is necessary to select a suitable time scale when obtaining the travel time series. The time periods of 10 and 15 min were set as the periods of cycle, respectively, and the time series were arranged in chronological order according to the entrance time. The travel time series of different time scales (10 and 15 min) are shown in Figure 5(a) and (b), respectively. Based on Figure 5(a) and (b), it can be seen that the 15-min period time series weakens the randomness of the data, truly reflects the variation characteristics of the travel time, and preserves the fluctuation characteristics. The 15-min period is enough to meet the actual prediction demand. Therefore, 15 min is selected as the time series cycle period in this article.

Figure 5.

Time series of different time scales: (a) 10-min time series and (b) 15-min time series.

Travel time prediction principle

The ARMAX model

${Y_{t}}$ represents the observed time series. ${e_{t}}$ represents the unobserved white noise series. At this point, the general linear process ${Y_{t}}$ can be expressed as a weighted linear combination of white noise variables at present and in the past. As shown in equation (9)

Y_{t} = e_{t} + ψ_{1} e_{t - 1} + ψ_{2} e_{t - 2} + \dots

(9)

When a finite number of coefficients $ψ$ are not zero, the so-called moving average process is obtained as equation (10)

Y_{t} = e_{t} - θ_{1} e_{t - 1} - θ_{2} e_{t - 2} - \dots - θ_{q} e_{t - q}

(10)

which is named as the q-order moving average process, abbreviated as $MA (q)$ .

The series themselves are used as the regression variables. Specifically, the p-order autoregressive process ${Y_{t}}$ satisfies equation (11)

Y_{t} = ϕ_{1} Y_{t - 1} + ϕ_{2} Y_{t - 2} + \dots + ϕ_{p} Y_{t - p} + e_{t}

(11)

If it is assumed that part of the series is autoregressive and the other part is moving average, a general time series model can be obtained as equation (12)

\begin{matrix} Y_{t} = ϕ_{1} Y_{t - 1} + ϕ_{2} Y_{t - 2} + \dots + ϕ_{p} Y_{t - p} \\ + e_{t} - θ_{1} e_{t - 1} - θ_{2} e_{t - 2} - \dots - θ_{q} e_{t - q} \end{matrix}

(12)

where the orders of the autoregressive moving average mixed process ${Y_{t}}$ are $p$ and $q$ , abbreviated as $ARMA (p, q)$ .

The introduction of multiple correlation time series can help increase the fitting effect of the ARMA model and improve the prediction accuracy. The ARMAX model has the specific structure as follows

{\begin{matrix} y_{t} = μ + \sum_{i = 1}^{k} \frac{Θ_{i} (B)}{Φ_{i} (B)} B^{l_{i}} x^{it} + ε_{t} \\ ε_{t} = \frac{Θ_{i} (B)}{Φ_{i} (B)} a_{t} \end{matrix}

(13)

In equation (13), $Φ_{i} (B)$ is the autoregressive coefficient polynomial of the ith input variable and $Θ_{i} (B)$ is the average coefficient polynomial of the ith input variable. $l_{i}$ represents the delay order of the ith input variable. ${ε_{t}}$ represents the regression residual series. $Φ (B)$ is the autoregressive coefficient polynomial of the residual series. $Θ (B)$ is the moving average coefficient polynomial of the residual series. ${a_{t}}$ denotes the white noise series, the mean of which is zero. ${x}$ represents the multiple correlation time series.

Establishment of the ARMAX forecasting model

The basic idea of the ARMAX prediction model is as follows: first, the regression analysis of the relevant sequence and time series was done, obtaining the residuals and the linear relationship between the time series and its related series. Next, the ARMA model is used to model the residuals and then the short-time series is extracted as the model input. The prediction results of the two models are fused as the predicted results. This article extracts traffic flow series as the related series of the ARMAX model. The prediction flow chart is shown in Figure 6. This article mainly introduces the process of modeling the residual sequence using the ARMA model.

Figure 6.

Prediction principle of the ARMAX model.

In general, the establishment of the ARMA model needs to go through the following steps: first, calculate the sample autocorrelation coefficient (ACF) values and the sample partial autocorrelation coefficient (PACF) values of the observed sequence. Second, perform model identification. According to the ACF and PACF values obtained in the previous step, the ARMA(p,q) model is fitted with the appropriate order. Third, estimate the value of the unknown parameter in the model. Fourth, verify the validity of the model. Finally, use the fitting model to predict the future trend of the sequence.

Model identification and parameter estimation

First, the sample ACF $(ρ_{k}, 0 < k < n)$ values and the sample PACF $(ϕ_{kk}, 0 < k < n)$ values can be calculated according to the value of the sequence

ρ_{k} = \frac{\sum_{t = 1}^{n - k} (y_{t} - \bar{y}) (y_{t + k} - \bar{y})}{\sum_{t = 1}^{n} {(y_{t} - \bar{y})}^{2}}, \forall 0 < k < n

(14)

\begin{matrix} ϕ_{kk} = {\begin{matrix} ρ_{1}, k = 1 \\ \frac{ρ_{k} - \sum_{j = 1}^{k - 1} ϕ_{k - 1, j} ρ_{k - j}}{1 - \sum_{j = 1}^{k - 1} ϕ_{k - 1, j} ρ_{j}}, k = 2, 3, \dots \end{matrix} \\ ϕ_{kj} = ϕ_{k - 1, j} - ϕ_{kk} ϕ_{k - 1, k - j} \end{matrix}

(15)

The order of the model can be judged by the truncation characteristic of the ACF and the sample PACF.

In practice, because of the randomness of the sample, the correlation coefficient does not show the perfect situation of theoretical truncation. So a certain principle is needed to assist the order determination of the model. Quenouille proved that the sample ACF and the PACF obey the normal distribution

P (- \frac{2}{\sqrt{n}} \leq ρ_{k} \leq \frac{2}{\sqrt{n}}) \geq 0.95

(16)

P (- \frac{2}{\sqrt{n}} \leq ϕ_{kk} \leq \frac{2}{\sqrt{n}}) \geq 0.95

(17)

The two times range of standard deviation can be used for the judgment of truncation. If the ACF is significantly greater than the two times range of standard deviation, almost 95% of the ACF falls within the two times range of the standard deviation, and the process of decaying from a small non-zero ACF to a small value is very sudden, the ACF is usually regarded as truncated. The truncation order is d. If more than 5% of the sample correlation coefficient falls below the two times range of standard deviation, or if the process of decaying from a small non-zero ACF to a small value is relatively slow or very continuous, the correlation coefficient is not regarded as truncated.

The maximum likelihood estimation method is used to estimate the parameters. Under the maximum likelihood criterion, the likelihood function of the data sequence is given as follows

l (\tilde{β}, \tilde{y}) = - \frac{n}{2} \ln (2 π) - \frac{n}{2} \ln (σ_{e}^{2}) - \frac{n}{2} \ln | Ω | - \frac{1}{2 σ_{e}^{2}} S (\tilde{β})

(18)

where $\tilde{β} = [a_{1}, a_{2}, \dots, a_{p}, b_{1}, b_{2}, \dots, b_{q}]^{T}$ ; $\tilde{y} = [y_{1}, y_{2}, \dots, y_{n}]^{T}$ ; $Ω = E (y y^{T}) / σ_{e}^{2}$ ; $S (\tilde{β}) = y^{T} Ω^{- 1} y^{T}$ ; and $σ_{e}^{2}$ is the noise variance.

The partial derivatives of the unknown parameters $σ_{e}^{2}$ and $\tilde{β}$ are calculated separately

{\begin{matrix} \frac{\partial}{\partial σ_{e}^{2}} l (\tilde{β}, \tilde{y}) = - \frac{n}{2 σ_{e}^{2}} + \frac{1}{2 σ_{e}^{4}} S (\tilde{β}) \\ \frac{\partial}{\partial \tilde{β}} l (\tilde{β}, \tilde{y}) = - \frac{1}{2} \frac{\partial \ln | Ω |}{\partial \tilde{β}} - \frac{1}{2 σ_{e}^{2}} \frac{\partial S (\tilde{β})}{\partial \tilde{β}} \end{matrix}

(19)

Make it equal to zero and simplify it

{\begin{matrix} S (\tilde{β}) - n σ_{e}^{2} = 0 \\ \frac{\partial \ln | Ω |}{\partial \tilde{β}} + \frac{1}{σ_{e}^{2}} \frac{\partial S (\tilde{β})}{\partial \tilde{β}} = 0 \end{matrix}

(20)

Since $Ω$ determines the value of $S (\tilde{β})$ , the key to solving formula (20) is to find the relationship between $Ω$ and $\tilde{β}$ . In fact, $Ω$ is uniquely determined by $\tilde{β}$

Ω = [\begin{matrix} \sum_{i = 0}^{\infty} G_{i}^{2} & \sum_{i = 0}^{\infty} G_{i} G_{i + 1} & \dots & \sum_{i = 0}^{\infty} G_{i} G_{i + n - 1} \\ \sum_{i = 0}^{\infty} G_{i} G_{i + 1} & \sum_{i = 0}^{\infty} G_{i}^{2} & \dots & \sum_{i = 0}^{\infty} G_{i} G_{i + n - 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \sum_{i = 0}^{\infty} G_{i} G_{i + n - 1} & \sum_{i = 0}^{\infty} G_{i} G_{i + n - 2} & \dots & \sum_{i = 0}^{\infty} G_{i}^{2} \end{matrix}]

(21)

where $G_{i}$ is called Green’s function of the ARMA(p,q) model, getting through the recursive way using the p+ q model parameters. Formula (21) consists of the p + q + 1 transcendental equation. Assume that the corresponding solution and the objective function are $Y = [a_{1}, a_{2}, \dots, a_{p}, b_{1}, b_{2}, \dots, b_{q}, σ_{e}^{2}]$ and $F_{i} (Y), i = 1, 2, \dots, p + q + 1$ .

The PSO-ARMAX prediction model

Maximum likelihood estimation can take full advantage of the relationship between the sample sequences, but the transcendental equations under this model are complex. Moreover, the solution is very difficult and cannot be expressed by numerical solutions. Therefore, this article attempts to use the intelligent optimization algorithm in the model. The PSO algorithm is used to solve the maximum likelihood equation in the maximum likelihood estimation. Finally, the numerical solution of the model parameters is obtained.

The basic principle of PSO is as follows.

In the $R^{n}$ space, a group contains n particles and the position of each particle $Y = [y_{i 1}, y_{i 2}, \dots, y_{in}]^{T} (i = 1, 2, \dots, N)$ is a possible solution for optimization. Assume that the position of the k generation particle is $Y_{i}^{(k)} = [y_{i 1}^{(k)}, y_{i 2}^{(k)}, \dots, y_{in}^{(k)}]^{T} (i = 1, 2, \dots, N)$ , the velocity of each particle is $V_{i}^{(k)} = [v_{i 1}^{(k)}, v_{i 2}^{(k)}, \dots, v_{in}^{(k)}]^{T}$ , the local optimal position of each particle is $P_{i}^{(k)} = [p_{i 1}^{(k)}, p_{i 2}^{(k)}, \dots, {vp}_{in}^{(k)}]^{T}$ , and the global optimal position of each particle is $P_{g}^{(k)} = [p_{g 1}^{(k)}, p_{g 2}^{(k)}, \dots, {vp}_{gn}^{(k)}]^{T}$ . The update speed from the k generation particle to the k + l generation particle is

\begin{matrix} V_{i}^{(k + 1)} = ω V_{i}^{(k)} + c_{1} r_{i 1} (P_{i}^{(k)} - Y_{i}^{(k)}) \\ + c_{2} r_{i 2} (P_{g}^{(k)} - Y_{i}^{(k)}), i = 1, 2, \dots, N \end{matrix}

(22)

where c_l and c₂ are the learning factors; r_i₁ and r_i₂ are the random numbers distributed uniformly on [0, 1]; and $ω$ is the inertia weight.

The position of the k generation particle to the k + 1 generation particle is updated to

Y_{i}^{(k + 1)} = Y_{i}^{(k)} + V_{i}^{(k + 1)}

(23)

In order to make the objective function $F_{i} (Y) = 0$ always let the numerical optimization of $| F_{i} (Y) |$ the smallest, the fitness function of PSO optimization can be considered as

{\begin{matrix} min_{Y \in R^{p + q + 1}} max_{1 \leq i \leq p + q + 1} {| F_{i} (Y) |} \\ s . b . Y \in [Y_{0} - d, Y_{0} + d] \end{matrix}

(24)

where $Y_{0}$ is the initial value of the optimization, obtained by the moment estimation method, and $d$ is the radius of the search.

Based on the above ideas, the solution space of the particle swarm is $ω$ . The estimated value of the parameter obtained by the moment estimation method is used as the initial value of the PSO algorithm. The iteration of the fitness function with two unknown parameters is repeated. When the predetermined accuracy is reached, the iteration will be stopped.

The specific process of the algorithm proposed in this article is shown in Figure 6, where Gbest represents the global optimal position of each particle and Pbest represents the local optimal position of each particle.

Evaluation index of prediction error

According to the literature,⁸ this article mainly uses the absolute percentage error and mean square error as the error evaluation indices of travel time prediction, which are calculated as follows

APE (t) = \frac{| T_{Y} (t) - T (t) |}{T (t)} \times 100 %

(25)

SPE (t) = \frac{{(T_{Y} (t) - T (t))}^{2}}{T (t)} \times 100 %

(26)

MAPE = \frac{1}{L} \sum_{L} \frac{| T_{Y} (t) - T (t) |}{T (t)} \times 100 %

(27)

MSPE = \frac{1}{L} \sum_{L} \frac{{(T_{Y} (t) - T (t))}^{2}}{T (t)} \times 100 %

(28)

Here, $T (t)$ represents the real travel time and $T_{Y} (t)$ represents the predicted travel time. $APE (t)$ is the absolute percentage error of the prediction moment $t$ and $SPE (t)$ is the squared percentage error of the prediction moment $t$ . $L$ represents the time of one cycle period, mean absolute percentage error (MAPE) is the mean absolute percentage error of one cycle period, and mean square percentage error (MSPE) is the mean value id of mean square percentage error of one cycle period.

Test case of the prediction method

It is appropriate to extract 3–5 cycles of data of samples for model training. Therefore, the first five cycle periods of data were selected as the training samples of the prediction model and the training samples are updated with the update of the prediction moment. The short-term and real-time prediction of the travel time process can be obtained.

Road sections for experiment

Several sections of the Shenyang–Haikou expressway (G15) are adopted for the experiment. Toll stations on the experiment road are shown in Figure 7.

Figure 7.

Toll stations on the experiment road.

Prediction results and error analysis

The travel time series of the Zhoushuizi–Jinzhou section and the Jinzhou–Sanshilipu section during 21 December 2014–21 January 2015 were extracted and used for rolling prediction, thus verifying the validity of the prediction model. First, the ARMA model was built. Then the associated traffic flow series were used as the regression variable to construct the ARMAX model. The evaluation of the prediction accuracy was carried out and the prediction effectiveness of the PSO-ARMAX model under different conditions was analyzed. In this article, the prediction results on 5 January 2015 are chosen, when there were snowy hours during the day, as the example to show more details.

Regression variable settings

Selecting 15 min as the cycle period of traffic flow series, the entrance traffic flows were counted and arranged as traffic flow series according to the entrance time. The cross-correlation analysis of the travel time and traffic flow is performed. The cross-correlation can be calculated using equation (29)

R (τ) = \frac{1}{N} \sum_{1}^{N} (X (t) Y (t + τ))

(29)

where $X (t)$ and $Y (t)$ represent two time series of equal length, $N$ represents the number of time series terms, and $R (τ)$ represents the cross-covariance, which describes the degree of correlation between $X (t)$ and $Y (t)$ .

R software was used to analyze the cross-correlation between travel time and traffic flow series of cars and trucks, and the cross-correlation function graphs are plotted as shown in Figure 8(a) and (b). The two time series have a certain degree of correlation. In addition, the travel time series lag the flow series, which can be used as the regression variables to construct the ARMAX model.

Figure 8.

Correlation of travel time series and traffic flow series: (a) correlation coefficient of cars and (b) correlation coefficient of trucks.

Results and error analysis

Take the Jinzhou–Sanshilipu section as the target. The prediction results of trucks using the ARMAX model and the PSO-ARMAX model are shown in Figure 9. The solid line represents the real travel time, the dashed line represents the predicted travel time using the ARMAX model, and the circular marker lines represent the predicted travel time using the PSO-ARMAX model. The prediction errors are shown in Figure 10.

Figure 9.

The ARMAX and PSO-ARMAX model prediction results for the Jinzhou–Sanshilipu section on 5 January 2015.

Figure 10.

Prediction error of the ARMAX and PSO-ARMAX models for the Jinzhou–Sanshilipu section on 5 January 2015.

The prediction process was carried out using the time series of a whole month. The prediction error of the whole month is shown in Table 2.

Table 2.

Prediction error evaluation for the Jinzhou–Sanshilipu section.

Model	Vehicle type	Error interval	MAPE (%)	MSPE (%)
ARMAX	Cars	[–16.98%, 14.50%]	9.55	11.66
ARMAX	Trucks	[–17.89%, 16.51%]	10.92	13.28
PSO-ARMAX	Cars	[–9.81%, 8.29%]	4.71	6.70
PSO-ARMAX	Trucks	[–10.79%, 11.34%]	6.63	9.08

ARMAX: autoregressive moving average with exogenous input; PSO-ARMAX: particle swarm optimization–autoregressive moving average with exogenous input; MAPE: mean absolute percentage error; MSPE: mean square percentage error.

Take the Zhoushuizi–Jinzhou section as the target. The prediction results of trucks using the ARMAX model and the PSO-ARMAX model are shown in Figure 11. The solid line represents the real travel time, the dashed line represents the predicted travel time using the ARMAX model, and the circular marker lines represent the predicted travel time using the PSO-ARMAX model. The prediction errors are shown in Figure 12.

Figure 11.

The ARMAX and PSO-ARMAX model prediction results for the Zhoushuizi–Jinzhou section on 5 January 2015.

Figure 12.

Prediction error of the ARMAX and PSO-ARMAX models for the Zhoushuizi–Jinzhou section on 5 January 2015.

The prediction process was carried out using the time series of a whole month. The prediction errors of different vehicle types using the different models are shown in Table 3.

Table 3.

Prediction error evaluation for the Zhoushuizi–Jinzhou section.

Model	Vehicle type	Error interval	MAPE (%)	MSPE (%)
ARMAX	Cars	[−16.72%, 14.20%]	8.85	10.97
ARMAX	Trucks	[−17.61%, 15.98%]	10.23	12.58
PSO-ARMAX	Cars	[−9.62%, 8.20%]	4.05	6.21
PSO-ARMAX	Trucks	[−10.51%, 11.19%]	6.18	8.58
Method in Chen et al.²⁹	Cars	[−13.75%, 12.81%]	7.91	9.31
Method in Chen et al.²⁹	Trucks	[−14.07%, 16.83%]	9.76	11.04

It can be seen from the above table and figure that both the ARMAX and PSO-ARMAX models can give the acceptable prediction values. However, the PSO-ARMAX model can give better prediction results than the ARMAX model.

Then, we compared our method with that reported in Chen et al.,²⁹ which uses intelligent optimization algorithms to calculate the parameters. The prediction results and errors are shown in Figures 13 and 14. The prediction error for a whole month is shown in Table 3. It can be seen that the prediction accuracy of the PSO-ARMAX method is higher than that of the method reported in Chen et al.²⁹

Figure 13.

Prediction results of the PSO-ARMAX model and the method reported in Chen et al.²⁹ for the Zhoushuizi–Jinzhou section.

Figure 14.

Prediction errors of the method reported in Chen et al.²⁹ and the PSO-ARMAX model for the Zhoushuizi–Jinzhou section.

Take the travel time forecast for 6:00–22:00 on 5 January 2015 as an example. According to the weather monitoring record, there was snow during 14:30–19:00. The prediction results of cars using the PSO-ARMAX model are shown in Figure 15. The prediction results of trucks using the PSO-ARMAX model are shown in Figure 16. The solid line represents the real travel time, the dashed line represents the predicted travel time of the PSO-ARMAX model. It can be seen that the PSO-ARMAX model can give good prediction results. In the case of cars, the travel time during the snow period has significantly increased, while the forecast model detected this trend and followed the real travel time. It can be inferred that the model can be applied in the accident or congestion conditions and achieve good predicting effect. For the trucks, the snow weather has little effect on the overall travel time, and the PSO-ARMAX prediction model gave acceptable predicting values.

Figure 15.

Travel time prediction of cars for the Zhoushuizi–Jinzhou section.

Figure 16.

Travel time prediction of trucks for the Zhoushuizi–Jinzhou section.

Error analysis and model comparison

It can be seen that the prediction accuracy of the cars is always higher than that of the trucks when comparing the prediction error of the travel time of cars with that of the trucks. The main reason is that the actual proportion of trucks in traffic is lower than that of cars. The toll collection data of trucks are less. Because of the scarcity of data, even if the interpolation is done, the sparseness of the data has not been fully improved, and the travel time data are highly fluctuating, which cannot fully reflect the real road condition and indirectly reduce the prediction effect of the model.

The results show that the PSO-ARMAX model is better than the ARMAX model. Mean absolute percentage error of the road section between two neighboring toll stations is reduced by almost 5%. The prediction accuracy is improved significantly. In this article, we focused on the travel time prediction of road sections, and the improvement effect is limited because of the short distance.

Conclusion

In this article, the prediction of highway travel time was studied using toll data. Some conclusions are as follows:

A multi-dimensional logic frame of database was constructed, and the integration method of the original toll collection data and the meteorological monitoring data was put forward; thus, a multi-dimensional data warehouse was built up.

This research used the ARMAX model for travel time prediction. The PSO algorithm was used to optimize the solving process of the model parameters, and setting the regression variables significantly reduces the mean absolute percentage error by almost 5%.

The application of travel time prediction in the examples shows that the PSO-ARMAX model has high accuracy in the prediction of travel time and can provide supporting evidences for traffic control and travel guidance. However, there is still much room for improvement in the prediction accuracy, and it can be achieved by improving the quality of the original data and adding the database dimension involving more factors, which have some influence on the travel time.

Through the above conclusions, here are some future research directions: this article carries out the prediction of travel time combining the highway toll data and the meteorological data. At present, the data fusion method is applied to the prediction research. In the future research, we can consider adding more data sources as well as events and accidents in order to enrich the multi-dimensional data warehouse and improve the prediction accuracy.

Footnotes

Handling Editor: Gang Chen

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by “the Fundamental Research Funds for the Central Universities (2016JBM053).”

ORCID iD

Jiandong Zhao

References

Liu

Cui

Cao

et al . Iterative Bayesian estimation of travel times on urban arterials: fusing loop detector and probe vehicle data. PLoS ONE 2016; 11: e0158123.

Cohen

Christoforou

. Travel time estimation between loop detectors and FCD: a compatibility study on the Lille network, France. Transp Res Proc 2015; 10: 245–255.

Faouzi

NEE

Ben

. Motorway travel time prediction based on toll data and weather effect integration. IET Intell Transp Sy 2010; 4: 338–345.

Faouzi

NEE

Klein

Mouzon

. Improving travel time estimates from inductive loop and toll collection data with Dempster-Shafer data fusion. Transp Res Record 2009; 2129: 73–80.

Yoshikazu

Hideki

Masao

. Travel time calculation method for expressway using toll collection system data. In: Proceedings of international conference on intelligent transportation systems, Tokyo, Japan, 5–8 October 1999, pp.471–475. New York: IEEE.

Soriguera

Rosas

Robuste

. Travel time measurement in closed toll highways. Transport Res B: Meth 2010; 44: 1242–1267.

Yamazaki

Uno

Kurauchi

. The effect of a new intercity expressway based on travel time reliability using electronic toll collection data. IET Intell Transp Sy 2012; 6: 306–317.

Zhao

Wang

Liu

. Prediction of expressway travel time based on adaptive interpolation Kalman Filtering. J South China Univ Technol 2014; 42: 109–115.

Zhao

Wang

Liu

. Highway travel time prediction between stations based on toll ticket data. J Tongji Univ 2013; 41: 1849–1854.

10.

Koesdwiady

Soua

Karray

. Improving traffic flow prediction with weather information in connected cars: a deep learning approach. IEEE T Veh Technol 2016; 65: 9508–9517.

11.

Qiao

Haghani

Hamedi

. Short-term travel time prediction considering the effects of weather. Transp Res Record 2012; 2308: 61–72.

12.

Qiu

Liu

. Integrated model for traffic flow forecasting under rainy conditions. J Adv Transport 2016; 50: 1754–1769.

13.

Polson

Sokolov

. Deep learning predictors for traffic flows. Transport Res C: Emer 2016; 79: 1–17.

14.

Chen

. Evaluation and analysis of highway traffic state under huge data. PhD Thesis, Chang’an University, Xi’an, China, 2016.

15.

Zheng

et al . Estimation of travel time of different vehicle types at urban streets based on data fusion of multisource data. In: Proceedings of 14th COTA international conference of transportation professionals, Changsha, China, 4–7 July 2014, pp.452–466. New York: ASME.

16.

Xia

. Study on the key technologies of expressway network traffic surveillance systems. Master Thesis, South China University of Technology, Shanghai, China, 2013.

17.

Qian

. OLAM-based multi-dimensional prediction of expressway traffic volume. J Trans Syst Eng Inf Technol 2013; 13: 48–56.

18.

Bastard

Guilbert

Delepoulle

et al . Vehicule identification from inductive loops application: travel time estimation for a mixed population of cars and trucks. In: Proceedings of international IEEE conference on intelligent transportation systems, Washington, DC, 5–7 October 2011, pp.507–512. New York: IEEE.

19.

Innamaa

. Short-term prediction of travel time using neural networks on an interurban highway. Transportation 2005; 32: 649–669.

20.

Vanajakshi

Rilett

. Support vector machine technique for the short term prediction of travel time. In: Proceedings of the 2007 IEEE intelligent vehicles symposium, Istanbul, Turkey, 13–15 June 2007, pp.600–605. New York: IEEE.

21.

. Highway traffic congestion detection and travel time prediction based on microwave data. Master’s Thesis, Beijing Jiaotong University, Beijing, China, 2016.

22.

Schreiter

Horowitz

. Multiple-clustering ARMAX-based predictor and its application to freeway traffic flow prediction. In: Proceedings of the American control conference, Portland, OR, 4–6 June 2014, pp.4397–4403. New York: IEEE.

23.

Fei

Liu

. A Bayesian dynamic linear model approach for real-time short-term freeway travel time prediction. Transport Res C: Emer 2011; 19: 1306–1318.

24.

Leclercq

Lavalj

Chiabaut

. Capacity drops at merges: an endogenous model. Transport Res B: Meth 2011; 17: 1302–1313.

25.

Peng

Ouyang

. Measurement and estimation of traffic oscillation properties. Transport Res B: Meth 2010; 44: 1–14.

26.

Treiber

Kesting

Helbing

. Three-phase traffic theory and two-phase models with a fundamental diagram in the light of empirical stylized facts. Transport Res B: Meth 2010; 44: 983–1000.

27.

Abo-Hammour

Alsmadi

Al-Smadi

et al . ARMA model order and parameter estimation using genetic algorithms. Math Comp Model Dyn 2012; 18: 201–221.

28.

Cui

Shan

. ARMA model parameter optimized estimate method. In: Proceedings of the first ACIS international symposium on cryptography and network security, data mining and knowledge discovery, E-commerce & ITS applications and embedded systems, Qinhuangdao, China, 23–24 October 2010, pp.22–26. New York: IEEE.

29.

Chen

Fang

Chen

. A system identification method based on genetic algorithm. J Wuhan Univ Sci Technol 2007; 30: 87–89.