Sage Journals: Discover world-class research

Abstract

Short-term traffic flow prediction is an important part of intelligent transportation systems research and applications. For further improving the accuracy of short-time traffic flow prediction, a novel hybrid prediction model (multivariate phase space reconstruction–combined kernel function-least squares support vector machine) based on multivariate phase space reconstruction and combined kernel function-least squares support vector machine is proposed. The C-C method is used to determine the optimal time delay and the optimal embedding dimension of traffic variables’ (flow, speed, and occupancy) time series for phase space reconstruction. The G-P method is selected to calculate the correlation dimension of attractor which is an important index for judging chaotic characteristics of the traffic variables’ series. The optimal input form of combined kernel function-least squares support vector machine model is determined by multivariate phase space reconstruction, and the model’s parameters are optimized by particle swarm optimization algorithm. Finally, case validation is carried out using the measured data of an expressway in Xiamen, China. The experimental results suggest that the new proposed model yields better predictions compared with similar models (combined kernel function-least squares support vector machine, multivariate phase space reconstruction–generalized kernel function-least squares support vector machine, and phase space reconstruction–combined kernel function-least squares support vector machine), which indicates that the new proposed model exhibits stronger prediction ability and robustness.

Keywords

Short-term traffic flow prediction particle swarm optimization least squares support vector machine combined kernel function chaos

Introduction

Short-term traffic prediction is one of the most important areas in intelligent transportation system (ITS) research.^1–3 A number of ITS applications such as dynamic route guidance (DRG) and urban traffic control (UTC) can benefit from accurate prediction of traffic variables (including but not limited to traffic flow, travel time, traffic speed, and occupancy) for the short-term future (less than 15 min). In reality, for traffic managers, the short-term traffic variables’ prediction information would enable them to apply traffic control management early enough to prevent traffic congestion rather than to deal with the traffic problems after the traffic congestion has already occurred. For travelers, it would enable them to plan their trips in advance and adjust their way at any moment with the dynamic short-term traffic prediction information. Because of its importance, short-term prediction of traffic flow has generated great interest among researchers and a significant number of methods exist in the literature. These existing methods include but not limited to auto-regressive integrated moving average (ARIMA) model,⁴ local linear regression model,⁵ Kalman filtering model,⁶ and nonparametric regression model.⁷ Generally, these prediction methods can be categorized as statistical methods, which develop their predictions based on statistical analysis of historical data. One benefit of statistical methods is that they can make very good predictions when the traffic flow varies temporally. However, these methods often assume several restrictive assumptions, such as the normality of residuals, the stationary of the time series, and a predefined model structure, which are seldom satisfied in the case of nonlinear traffic flow. To overcome this problem, numerous studies have used machine learning methods such as support vector machines (SVM)^8–10 and artificial neural networks (ANNs)^11–13 as alternative predictors. A machine learning method could approximate any degree of complexity of traffic flow without prior knowledge of problem-solving.

With the development of traffic surveillance systems (it is a part of ITS), more and more real-time traffic data, such as traffic flow, speed, and occupancy, become available in every couple of minutes or seconds. Machine learning methods have gained special attention. Compared with the statistical methods, the machine learning methods are more adaptable and suitable to the short-term traffic variables’ prediction field. SVM is a typical machine learning method and can solve the small sample, nonlinearity, high dimension, and local minima problems effectively. However, the SVM algorithm has a relatively complex computation. To solve this problem, the least squares support vector machine (LSSVM) based on SVM was presented by Suykens and Vandewalle¹⁴ in 1999. Its nature is to translate a quadratic programming problem into solving linear equations, consequently accelerating problem-solving and improving computational convergence. Until now, LSSVM has been successfully used in pattern recognition and nonlinear regression estimation problems. In this article, LSSVM is selected as the basic algorithm for short-term traffic flow prediction. At the same time, to obtain the optimal LSSVM model, it is important to choose a kernel function and determine the kernel parameters. If the kernel function and the parameters are not selected correctly, the LSSVM model will not perform well. Therefore, we construct combined kernel function (CKF) and introduce particle swarm optimization (PSO) algorithm¹⁵ to optimize the parameters of LSSVM.

Recent studies have found that the short-term traffic variables’ time series had nonlinear chaotic phenomena.^16,17 Chaos is a universal phenomenon of nonlinear dynamic systems. Some sudden and dramatic changes in nonlinear systems may give rise to complex behavior called chaos. A chaotic system, which may appear to be random but is in effect generated by a deterministic model, cannot be examined by standard statistical techniques.¹⁸ A chaotic time series appears stochastic, but it is actually generated by the deterministic system. Chaotic time series prediction has drawn significant attention of researchers over the past few years and some chaotic prediction methods have been developed for short-term traffic variables’ prediction, such as SVM method,¹⁹ Lyapunov exponent method,²⁰ and local polynomial prediction method,²¹ which are based on dynamics reconstruction technique called phase space reconstruction (PSR) and have also achieved some promising results. However, in these methods, scalar traffic variable is mostly used for PSR. Multivariate time series contains more dynamic information than a scalar time series; using available multivariate time series would improve the prediction performance.²²

In this article, a hybrid short-term traffic flow prediction model (multivariate phase space reconstruction (MPSR)–CKF-LSSVM) based on MPSR and CKF-LSSVM is proposed. First, the C-C method²³ is used to choose the optimal time delay and the optimal embedding dimension of the three traffic variables’ (flow, speed, and occupancy) time series for the PSR. Then, the correlation dimension is calculated by G-P method²⁴ and it is gradually saturated with increasing the embedding dimension, which indicates that the traffic variables’ time series is chaotic. Finally, the optimal input form of the CKF-LSSVM model was determined by MPSR and the parameters of the model are optimized by PSO algorithm.

The remainder of this article is organized as follows. In section “Methodology,” multivariate chaotic time series analysis theory and CKF-LSSVM model are described briefly, and the hybrid traffic flow prediction model MPSR–CKF-LSSVM is introduced. In section “Empirical analysis,” empirical analysis is performed, and the prediction results of several different prediction models are presented and discussed. In section “Discussion and conclusion,” a brief review of this article and the future research are presented.

Methodology

Multivariate chaotic time series analysis

MPSR

For M-dimensional time series: $X_{1}, X_{2}, \dots, X_{M}$ , where $X_{i} = (x_{i, 1}, x_{i, 2}, \dots, x_{i, N})$ , $i = 1, 2, \dots, M$ . A time delayed reconstruction can be made as follows

\begin{matrix} D_{n} = & [x_{1, n}, x_{1, n - τ_{1}}, \dots, x_{1, n - (m_{1} - 1) τ_{1}}, \\ x_{2, n}, x_{2, n - τ_{2}}, \dots, x_{2, n - (m_{2} - 1) τ_{2}}, \\ \dots \\ x_{M, n}, x_{M, n - τ_{M}}, \dots, x_{M, n - (m_{M} - 1) τ_{M}}] \end{matrix}

(1)

where $τ_{i}$ and $m_{i}$ are the time delay and the embedding dimension, respectively. As Takens’ embedding theorem,²⁵ if each $m_{i}$ is large enough, there exists generally a function $F = R^{d} \to R^{d}$ , $(m = \sum_{i = 1}^{M} m_{i})$ , such that

D_{n + 1} = F (D_{n})

(2)

The equivalent form of equation (2) can be written as

\begin{matrix} x_{1, n + 1} = F_{1} (D_{n}) \\ x_{2, n + 1} = F_{2} (D_{n}) \\ \dots \\ x_{M, n + 1} = F_{M} (D_{n}) \end{matrix}

(3)

The remaining problems are how to determine the time delay $τ_{i}$ and embedding dimension $m_{i}$ , so that equation (2) or equation (3) holds.

In this article, $X_{1}, X_{2}$ , and $X_{3}$ represent traffic flow, speed, and occupancy, respectively. If the function $F_{1}$ is achieved, the traffic flow prediction can be realized based on equation (3).

Determination of time delay and embedding dimension

There are two kinds of views about the selection of these two parameters. One view is that the two parameters are independent and could be determined separately. The methods of calculating delay time include average displacement method,²⁶ mutual information method,²⁷ and autocorrelation function method.²⁸ The methods of calculating embedding dimension include false nearest neighbor method,²⁹ Cao method,³⁰ and G-P method.³¹ Another view is that the two parameters are interrelated and should be determined simultaneously, such as C-C method.²³ The C-C method can simultaneously estimate the delay time and embedding dimension by applying the correlation integral. Although the C-C method is based on statistical results and has no solid theoretical basis, it is easy to use, requires a small amount of calculation, and also has strong anti-noise capability. Therefore, C-C method is employed to determine delay time $τ$ and embedding window width $τ_{w}$ and then the embedding dimension m is calculated according to $τ_{w} = (m - 1) τ$ . The main principle of C-C method is as follows.

The correlation integral is the following function

C (m, N, r, t) = \frac{2}{M (M - 1)} \sum_{1 \leq i \leq j \leq M} H (r - d_{ij})

(4)

where r is the search radius, $M = N - (m - 1) τ$ is the number of embedded points in m-dimensional space, and N is the data number of the time series. $d_{ij} = ‖ X_{i} - X_{j} ‖$ denotes the sup-norm and $H (z)$ is a Heaviside function

H (z) = {\begin{matrix} 1 (z > 0) \\ 0 z \leq 0 \end{matrix}

(5)

The correlation integral is a cumulative distribution function and denotes the probability of distance between any pairs of points in the phase space which is not greater than r. The distance is denoted by the sup-norm of the difference between two vectors.

For $m = 2, 3, 4, 5$ , $r_{i} = i σ / 2$ , $i = 1, 2, 3, 4$ , interpret the statistic $S (m, N, r, t)$ as the serial correlation of a nonlinear time series

S (m, N, r, t) = C (M, N, r, t) - C^{m} (1, N, r, t)

(6)

\bar{S} (t) = \frac{1}{16} \sum_{m = 2}^{5} \sum_{j = 1}^{4} S (m, r_{j}, t)

(7)

Δ \bar{S} (t) = \frac{1}{4} \sum_{m = 2}^{5} Δ S (m, t)

(8)

Scor (t) = Δ \bar{S} (t) + | \bar{S} (t) |

(9)

Let t equal or be smaller than 200, the optimal delay time $τ$ is the first local minimum point of $Δ \bar{S} (t) ~ t$ . The delay time window $τ_{w}$ is global minimum point of $Scor (t) ~ t$ .

Correlation dimension

In the chaotic time series analysis, an important step is determining the presence of chaos. One method to determine the presence of chaos uses correlation dimension,³¹ which is the most efficient for practical applications. If the correlation dimension is gradually saturated with the increase in the embedding dimension, the system under investigation is considered as chaotic. On the contrary, if the correlation dimension increases without bound with the increase in the embedding dimension, the system under investigation is considered as stochastic.

G-P method proposed by Grassberger and Procaccia²⁴ is a classical method and frequently used to estimate the correlation dimension. In the G-P method, a double logarithmic plot of the radius and the correlation integral will be an estimate of the correlation dimension. For an m-dimensional phase space, the correlation integral function $C (r)$ is given by

C (r) = lim_{N \to \infty} \frac{2}{N (n - 1)} \sum_{i, j = 1}^{N} H (r - | Y_{i} - Y_{j} |)

(10)

where H is the Heaviside step function, with $H (u) = 1$ for $u > 0$ , and $H (u) = 0$ for $u \leq 0$ , where $u = r - | Y_{i} - Y_{j} |$ , N is the number of points on the reconstructed attractor, and r is the radius of the sphere centered on $Y_{i}$ or $Y_{j}$ . If the time series is characterized by an attractor, then for positive values of r, $C (r)$ is related to the radius r by the following relation

C (r) \propto α r^{D_{2}}

(11)

where $α$ is a constant, and $D_{2}$ is the correlation dimension or the slope of the $\ln C (r)$ versus $\ln r$ plot given by

D_{2} = lim_{r \to 0} \frac{\ln C (r)}{\ln r}

(12)

The slope and intercept of the curve between $\ln C (r)$ and $\ln r$ in a linear scaling region will approximate $D_{2}$ .

CKF-LSSVM model

Basic principle of LSSVM

Consider a given training set $D = (x_{i}, y_{i}), i = 1, 2, \dots, l$ with input data $x_{i} \in R^{m}$ and output data $y_{i} \in R$ . LSSVM defines the regression function as

min J (w, e) = \frac{1}{2} w^{T} w + \frac{1}{2} γ \sum_{i = 1}^{l} e_{i}^{2}

(13)

subject to

y_{i} = w^{T} ϕ (x_{i}) + b + e_{i}, i = 1, 2, \dots, l

(14)

where w is the weight vector, $γ$ is the penalty parameter, $e_{i}$ is the approximation error, $ϕ (\cdot)$ is the nonlinear mapping function, and b is the bias term. The corresponding Lagrange function can be obtained by

L (w, b, e, α) = J (w, e) - \sum_{i = 1}^{l} α_{i} {w^{T} ϕ (x_{i}) + b + e_{i} - y_{i}}

(15)

where $α_{i}$ is the Lagrange multiplier. Based on the Karush–Kuhn–Tucker (KKT) conditions, the solutions can be obtained by partially differentiating with respect to w, b, $e_{i}$ , and $α_{i}$

{\begin{matrix} \frac{\partial L}{\partial ω} = 0 \to w = \sum_{i = 1}^{l} α_{i} ϕ (x_{i}) \\ \frac{\partial L}{\partial b} = 0 \to \sum_{i = 1}^{l} α_{i} = 0 \\ \frac{\partial L}{\partial e_{i}} = 0 \to α_{i} = γ e_{i} \\ \frac{\partial L}{\partial α_{i}} = 0 \to w^{T} ϕ (x_{i}) + b + e_{i} - y_{i} = 0 \end{matrix}

(16)

By eliminating w and $e_{i}$ , the equations can be changed into

[\begin{matrix} 0 & l_{v}^{T} \\ l_{v} & Ω + \frac{1}{γ} I \end{matrix}] [\begin{matrix} b \\ α \end{matrix}] = [\begin{matrix} 0 \\ y \end{matrix}]

(17)

where $y = [y_{1}, y_{2}, \dots, y_{l}]^{T}$ , $α = [α_{1}, α_{2}, \dots, α_{l}]^{T}$ , $l_{v} = [1, 1, \dots, 1]^{T}$ , and the Mercer condition has been applied to matrix $Ω$ with $Ω_{km} = ϕ (x_{k})^{T} ϕ (x_{m}) = K (x_{k}, x_{m})$ , $k, m = 1, 2, \dots, l$ . Therefore, the LSSVM for regression can be obtained from

y (x) = \sum_{i = 1}^{l} α_{i} K (x, x_{i}) + b

(18)

where $K (x, x_{i})$ is the kernel function. There are a number of kernel functions that can be chosen, and the following three kernel functions are common:

Sigmoid kernel function

K (x, x_{i}) = \tanh [b (x_{i} \cdot x_{j}) + c]

(19)

Polynomial kernel function

K (x, x_{i}) = (x \cdot x_{i} + 1)^{d}

(20)

Gaussian radial basis function (RBF) kernel function

K (x, x_{i}) = \exp (- \frac{{‖ x - x_{i} ‖}^{2}}{2 σ^{2}})

(21)

Construction of CKF

The traditional LSSVM model mostly adopts single kernel function to complete the process of feature space mapping, which has achieved good performance in many practical applications. Gaussian RBF kernel function is the most effective one in nonlinear function estimation.³² However, the single kernel function has great limitations when the sample data contain heterogeneous information. Therefore, this article integrates the Gaussian RBF kernel function (which is a typical local kernel function with strong learning ability and weak generalization ability) and polynomial kernel function (which is a typical global kernel function with strong generalization ability and weak learning ability) to construct a new combination kernel function. The form of combination kernel function is as follows

K (x, x_{i}) = λ \cdot \exp (- \frac{{‖ x - x_{i} ‖}^{2}}{2 σ^{2}}) + (1 - λ) \cdot (x x_{i} + 1)^{d}

(22)

where $λ (0 \leq λ \leq 1)$ is the weight coefficient, $σ$ is the kernel width of Gaussian RBF kernel function, and d is the order of polynomial kernel function.

Different kernel functions have different advantages; if the weight coefficient of combination kernel function is inappropriate, the performance of combination kernel function may be lower than single kernel function. Therefore, proper weight coefficient is of great importance for the CKF.

When using the LSSVM method with the CKF, the selection of the CKF parameters $σ$ , d, $λ$ and the regularization parameter $γ$ has an important influence on the establishment of the model, and to obtain better prediction results, it is necessary to optimize these parameters. This article employs the PSO algorithm to obtain the optimal parameters.

PSO algorithm

The PSO¹⁵ is one of the modern evolutionary computational algorithms and widely used for parameters’ optimization, which is based on the simulation of flocking and swarming behaviors of birds and insects. Compared with other evolutionary computational algorithms, it can efficiently find optimal or near-optimal solutions to the problem under consideration.

In PSO, each particle has its own position and speed, and a potential solution to an optimization problem is represented by the position of one particle. The speed of each particle is updated according to the following two best positions for every iteration. The first one is obtained so far by itself, which can be denoted as $p_{i} = (p_{i 1}, p_{i 2}, \dots, p_{id})$ for the ith particle in the d-dimensional search space. The second one is obtained so far by any particle in the whole swarm, which can be represented by $p_{g} = (p_{g 1}, p_{g 2}, \dots, p_{gd})$ . Assume that the position vector and speed vector of the ith particle can be denoted as $x_{i} = (x_{i 1}, x_{i 2}, \dots, x_{id})$ and $v_{i} = (v_{i 1}, v_{i 2}, \dots, v_{id})$ , respectively. The speed and position of each particle modifying rule are given as

\begin{matrix} v_{i, j} (t + 1) = & w \cdot v_{i, j} (t) + c_{1} r_{1} \cdot (p_{i, j} - x_{i, j} (t)) \\ + c_{2} r_{2} (p_{g, i} - x_{i, j} (t)) \end{matrix}

(23)

x_{i, j} (t + 1) = x_{i, j} (t) + v_{i, j} (t + 1), j = 1, 2, \dots, d

(24)

where t is the iteration number, c₁ and c₂ are the constants called acceleration coefficients, r₁ and r₁ are the two independent random numbers uniformly distributed in the range of [0, 1], and w is called the inertia factor. The inertia factor w can be set to a constant or a linearly decreasing variable. A nonlinear decreasing w can be determined by

w = w_{start} - \frac{w_{start} - w_{end}}{T_{max}} \times T

(25)

where $w_{start}$ and $w_{end}$ represent the start and end of w, respectively, T is the iteration number, and $T_{max}$ is the maximum iteration number.

MPSR–CKF-LSSVM model

The overall flowchart of MPSR–CKF-LSSVM model is illustrated in Figure 1. The main steps of the short-term traffic flow prediction based on the hybrid MPSR–CKF-LSSVM model are as follows:

Step 1: the C-C method is employed to determine the optimal time delay and the optimal embedding dimension of traffic variables’ (flow, speed, and occupancy) time series for PSR.

Step 2: the G-P method is used to calculate the correlation dimension of traffic variables’ (flow, speed, and occupancy) time series. The correlation dimension is saturated with the increase in embedding dimension, which indicates that these three traffic variables are chaotic.

Step 3: the optimal input form of CKF-LSSVM model is determined by PSR of the three traffic variables. Thus, the input and output of CKF-LSSVM model are as follows

\begin{matrix} input = [x_{1, n}, x_{1, n - τ_{1}}, \dots, x_{1, n - (m_{1} - 1) τ_{1}}, \\ x_{2, n}, x_{2, n - τ_{2}}, \dots, x_{2, n - (m_{2} - 1) τ_{2}}, \\ x_{3, n}, x_{3, n - τ_{3}}, \dots, x_{3, n - (m_{3} - 1) τ_{3}}] \end{matrix}

(26)

output = [x_{1, n + 1}]

(27)

Step 4: the parameters of CKF-LSSVM model are optimized by PSO algorithm

(4.1) Initialize the particle swarm and CKF-LSSVM model.

(4.2) Train CKF-LSSVM model and evaluate the fitness value of all particles.

(4.3) Update the particle’s position and velocity according to the position and velocity update formula.

(4.4) Judge whether the terminated conditions are met (usually the default calculation accuracy or iterations), and go to Step 4.2 and continue to search if reached; otherwise, continue to Step 4.5.

(4.5) Output the optimal parameters of CKF-LSSVM model.

Figure 1.

Flowchart of MPSR–CKF-LSSVM model.

Empirical analysis

Data source

The experimental traffic data were obtained from two magnetic coil detectors (No. DC00004965 and No. DC00004966 collected traffic data from east to west and from west to east, respectively) installed at an expressway named Lianqian West Road in Xiamen, China. The traffic data (flow, average speed, and average occupancy) were collected every 5 min in 5 consecutive working days (5 January 2015 to 9 January 2015). Figure 2 gives the traffic data time series from No. DC00004965 detector. Figure 3 shows the three-dimensional scatterplot of traffic flow, speed, and occupancy from No. DC00004965 detector. Traffic data of the first 4 days are used for chaotic time series analysis and training the CKF-LSSVM model. Traffic data of the fifth day are used to test the CKF-LSSVM model.

Figure 2.

Traffic data time series from the No. DC00004965 detector: (a) traffic flow, (b) speed, and (c) occupancy.

Figure 3.

Three-dimensional scatterplot of traffic flow, speed, and occupancy from the No. DC00004965 detector.

In the subsequent section, the data of No. DC00004965 detector are used to illustrate the modeling process in detail, and the two detectors’ data are used to analyze the performance of the model.

Determination of time delay and embedding dimension for MPSR

The C-C method is adopted to calculate the time delay and embedding dimension of the traffic flow, speed, and occupancy time series. Figure 4 gives the curve graph between $Δ \bar{S} (t)$ and t. Figure 5 gives the curve graph between $Scor (t)$ and t. The variable t corresponding to the first minimum value of $Δ \bar{S} (t)$ is the time delay $τ$ of the time series. The variable t corresponding to the minimum value of $Scor (t)$ is the embedding window width $τ_{w}$ of the time series and the embedding dimension $m = τ_{w} / τ + 1$ . Therefore, the time delay and the embedding dimension can be obtained and are shown in Table 1.

Figure 4.

$Δ \bar{S} (t) ~ t$ of (a) traffic flow, (b) speed, and (c) occupancy.

Figure 5.

$Scor (t) ~ t$ of (a) traffic flow, (b) speed, and (c) occupancy.

Table 1.

Time delay and embedding dimension.

	Traffic flow	Speed	Occupancy
Time delay	$τ_{1} = 9$	$τ_{2} = 9$	$τ_{3} = 12$
Embedding dimension	$m_{1} = 15$	$m_{2} = 10$	$m_{3} = 7$

Correlation dimension

Figure 6 shows the relationship of $\ln C (r)$ versus $\ln r$ for increasing m, whereas the relationship between the correlation dimension values and the embedding dimension values m is shown in Figure 7. It can be seen that the correlation dimension value increases with the embedding dimension up to a certain value and then saturates beyond that value. The saturation of the correlation dimension beyond a certain embedding dimension value is an indication of the existence of deterministic dynamics. The saturated correlation dimension of traffic flow, speed, and occupancy is about 2.3, 3.2, and 2.6, respectively. The finite value of correlation dimension suggests the presence of chaos in the short-term traffic variables’ series.

Figure 6.

$\ln C (r) ~ \ln r$ curves of (a) traffic flow, (b) speed, and (c) occupancy.

Figure 7.

Relation between correlation dimension $D_{2} (m)$ and embedding dimension m of (a) traffic flow, (b) speed, and (c) occupancy.

Optimization parameters

The parameters of the CKF-LSSVM model are determined by PSO. To ensure the PSO algorithm efficiency, the mean absolute percentage error as fitness function, and then the fitness function is calculated as follows

fitness = \frac{1}{n} \sum_{i = 1}^{n} | \frac{y_{i} - {\tilde{y}}_{i}}{y_{i}} |

(28)

where $y_{i}$ is the real value in time interval i, ${\tilde{y}}_{i}$ is the prediction value for time interval i, and n is the total number in the time series.

The traffic data of the first 4 days are used to construct a training set which contains a total of 1025 group input–output mappings. The cross validation method is used to prevent over-fitting and under-fitting. Cross validation is to divide the training data into K groups, then the K − 1 groups are used to train and the other one is adopted to verify the accuracy of the model. It is repeated K times till every group of data is used to verify. The average value of the results from K times verifying is accounted, which is called K-fold cross validation. In this article, fivefold cross validation is used in the training. Model initialization: $γ \in (0.01, 1000]$ , $λ \in [0, 1]$ , $σ \in [0.01, 1]$ , $d = 1, 2, \dots, 10$ . PSO algorithm initialization: population number is 20, acceleration coefficients $c_{1} = c_{2} = 2$ , maximum iteration number $T_{max} = 100$ , the start inertia factor $w_{start} = 0.9$ , and the end inertia factor $w_{end} = 0.4$ .

The fitness curves are described in Figure 8. The best fitness value is 3.66% and the corresponding optimal parameter combination of CKF-LSSVM model is $γ = 57.8$ , $λ = 0.83$ , $σ = 0.12$ , and $d = 3$ .

Figure 8.

Fitness curves of PSO.

Evaluation performance indices

To evaluate the efficiency of the proposed hybrid model, three statistical indices are utilized to measure the prediction accuracy. These indices are the mean absolute error (MAE), mean absolute percent error (MAPE), and equal coefficient (EC). The smaller the values of MAE and MAPE, the more accurate the prediction results. If the value of EC is closer to 1, the prediction result is more accurate. The computing formulas of these indices are as follows

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\tilde{y}}_{i} |

(29)

MAPE = \frac{1}{n} \sum_{i = 1}^{n} | \frac{y_{i} - {\tilde{y}}_{i}}{y_{i}} |

(30)

EC = 1 - \frac{\sqrt{{\sum_{i = 1}^{n} (y_{i} - {\tilde{y}}_{i})}^{2}}}{\sqrt{{\sum_{i = 1}^{n} (y_{i})}^{2}} + \sqrt{{\sum_{i = 1}^{n} ({\tilde{y}}_{i})}^{2}}}

(31)

where $y_{i}$ is the real value in time interval i, ${\tilde{y}}_{i}$ is the prediction value for time interval i, and n is the total number of the time series.

Model performance and analysis

Traffic data of the fifth day are used as test samples to evaluate the performance of the prediction model. In order to illustrate the prediction performance of the proposed method intuitively, Figures 9 and 10 show the prediction results of the proposed model based on No. DC00004965 detector’s data and No. DC00004966 detector’s data, respectively. Figures 9(a) and 10(a) present the curves of prediction data and measured data. The prediction curves are quite close to the measured curves. Figures 9(b) and 10(b) present the scatterplots between the predicted and observed traffic flow data. It is clear that these scatter points distribute near the measured line (the red line) without large deviation. Figures 9(c) and 10(c) show absolute percent error (APE) of the proposed model. The APEs are mostly within 15%. However, the APE from 24 to 48 (correspond to 2:00–4:00) is high, and this is because the actual traffic flow data during that time period are small. Overall, the hybrid MPSR–CKF-LSSVM model achieves good prediction performance, which could meet the needs of short-term traffic flow prediction.

Figure 9.

Prediction performance of the proposed method based on No. DC00004965 detector’s data: (a) curves of predicted data and measured data, (b) scatterplot of predicted data and measured data, and (c) absolute percent error of predicted data.

Figure 10.

Prediction performance of the proposed method based on No. DC00004966 detector’s data: (a) curves of predicted data and measured data, (b) scatterplot of predicted data and measured data, and (c) absolute percent error of predicted data.

In order to describe the superiority of the proposed model in detail, comparative analysis is carried out. In this article, the comparative models are as follows: the single model (CKF-LSSVM), the hybrid model (MPSR-GKF-LSSVM) based on MPSR and Gaussian RBF kernel function LSSVM, and the hybrid model (PSR–CKF-LSSVM) using PSR and CKF-LSSVM. The input vector of CKF-LSSVM model is the front time series of the predicted value and the dimension of the input vector is the same as MPSR–CKF-LSSVM model. The input vector of MPSR–GKF-LSSVM model is the same as the MPSR–CKF-LSSVM model. While the input vector of PSR–CKF-LSSVM model is determined by a single traffic variable (traffic flow is selected, because the aim of modeling is to predict traffic flow) PSR. Moreover, the parameters of the three comparative models are optimized by PSO.

For the sake of comparison and analysis in terms of macroscopic and microscopic aspects, Figures 11(a) and 12(a) give the microscopic comparative results of different methods based on No. DC00004965 detector’s data and No. DC00004966 detector’s data, respectively. As shown in Figures 11(a) and 12(b), we could see clearly that the prediction results of MPSR–CKF-LSSVM model have the best fitting performance comparing with the other three models, especially when the traffic flow changes greatly (it is clearly shown in Figures 11(b)–(d), and 12(b)). Figure 13(a) and (b) show absolute error (AE) of the four prediction models based on No. DC00004965 detector’s data and No. DC00004966 detector’s data, respectively. We could see clearly that the range of AE based on MPSR–CKF-LSSVM model is minimum, which illustrates the prediction model has better stability. Therefore, the MPSR–CKF-LSSVM model could further improve the accuracy of short-term traffic flow prediction.

Figure 11.

(a) Prediction results of different models based on No. DC00004965 detector’s data and (b)–(d) its parts’ specific prediction results.

Figure 12.

(a) Prediction results of different models based on No. DC00004966 detector’s data and (b) its parts’ specific prediction results.

Figure 13.

Absolute error of different models based on (a) No. DC00004965 detector’s data and (b) No. DC00004966 detector’s data.

The prediction accuracy is shown in Table 2 using three statistical indices (MAE, MAPE, and EC). We could see that the overall improvement of the MPSR–CKF-LSSVM model is obviously comparing to the other three models. More precisely, the MPSR–CKF-LSSVM model has an extra 41.5% improvement over the CKF-LSSVM model, an extra 23.9% improvement over the MPSR–GKF-LSSVM model, and an extra 34.0% improvement over the PSR–CKF-LSSVM model in the aspect of MAE. In the aspect of MAPE, the MPSR–CKF-LSSVM model has an extra 36.4% improvement over the CKF-LSSVM model, an extra 20.0% improvement over the MPSR–GKF-LSSVM model, and an extra 26.2% improvement over the PSR–CKF-LSSVM model. Meanwhile, the MPSR–CKF-LSSVM model is also superior to the other three models in the aspect of EC. Comparison of the results of the MPSR–CKF-LSSVM model and the MPSR–GKF-LSSVM model shows that the performance of CKF-LSSVM is better than Gaussian RBF kernel function LSSVM. Comparison of the results of the MPSR–CKF-LSSVM model, the PSR–CKF-LSSVM model, and the CKF-LSSVM model shows that using MPSR to determine model’s input form can improve the model’s performance effectively. Furthermore, the experimental results also demonstrate that the MPSR–CKF-LSSVM model achieves good prediction performance for both No. DC00004965 detector’s data and No. DC00004966 detector’s data, which proves that MPSR–CKF-LSSVM model has strong generalization ability. Overall, the proposed model is an effective and accurate method for short-time traffic flow prediction, which can provide satisfactory prediction results.

Table 2.

Performance accuracy for different models.

Model	Detector (DC00004965)			Detector (DC00004966)			Mean of the two detectors
Model	MAE	MAPE (%)	EC	MAE	MAPE (%)	EC	MAE	MAPE (%)	EC
MPSR–GKF-LSSVM	6.96	8.28	0.967	7.19	8.54	0.951	7.08	8.41	0.959
CKF-LSSVM	9.16	10.50	0.930	9.25	10.61	0.923	9.21	10.56	0.927
PSR–CKF-LSSVM	8.30	9.27	0.955	8.04	8.92	0.938	8.17	9.10	0.947
MPSR–CKF-LSSVM	5.42	6.41	0.982	5.36	7.02	0.975	5.39	6.72	0.979

MAE: mean absolute error; MAPE: mean absolute percent error; EC: equal coefficient; MPSR: multivariate phase space reconstruction; GKF: generalized kernel function; LSSVM: least squares support vector machine; CKF: combined kernel function; PSR: phase space reconstruction.

Discussion and conclusion

A new short-term traffic flow prediction model, named MPSR–CKF-LSSVM, is proposed using a combination of MPSR and CKF-LSSVM. The performance of MPSR–CKF-LSSVM model is compared with the MPSR–GKF-LSSVM model, the PSR–CKF-LSSVM model, and the CKF-LSSVM model using the same real-world traffic data. The comparison results show that the MPSR–CKF-LSSVM model can improve the prediction accuracy effectively. Accordingly, a conclusion can be obtained that the short-term traffic variables’ time series shows chaotic characteristic and it is necessary to use PSR (especially MPSR) for improving short-term traffic prediction accuracy.

In order to obtain more general and robust conclusions, traffic data from different roadways require further exploration. And future studies are required to apply the model to other traffic variables’ data sets (such as travel time, traffic speed, and average occupancy; this study chooses the traffic flow as the demonstration). Moreover, it will be interesting to test traffic data set in different time intervals in the model. Other advanced optimization algorithms should be further studied to search for more appropriate parameter combinations for the new proposed model and to obtain more accurate results of short-term traffic flow prediction.

Footnotes

Acknowledgements

Author Ciyun Lin and Zhaosheng Yang are also affiliated with: State Key Laboratory of Automobile Simulation and Control, Jilin University, Changchun, China.

Academic Editor: Nicolas Garcia-Aracil

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by National Key Technology Support Program (Grant No. 2014BAG03B03), and National Natural Science Foundation of China (Grant Nos. 51408257 and 51308248).

References

Yang

Bing

Lin

. Research on short-term traffic flow prediction method based on similarity search of time series. Math Probl Eng 2014; 2014: 184632 (8 pp.).

Rong

Zhang

Feng

. Comparative analysis for traffic flow forecasting models with real-life data in Beijing. Adv Mech Eng 2015; 7: 1–9.

Bing

Gong

Yang

. Short-term traffic flow local prediction based on combined kernel function relevance vector machine model. Math Probl Eng 2015; 2015: 154703 (9 pp.).

Williams

Hoel

LA.

Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: theoretical basis and empirical results. J Transp Eng: ASCE 2003; 129: 664–672.

Sun

Liu

Xiao

. Short-term traffic forecasting using the local linear regression model. Transport Res Rec 2003; 1836: 143–150.

Wang

Papageorgiou

Real-time freeway traffic state estimation based on extended Kalman filter: a general approach. Transport Res B: Meth 2005; 39: 141–167.

Clark

Traffic prediction using multivariate nonparametric regression. J Transp Eng: ASCE 2003; 129: 161–168.

Hong

Dong

Zheng

. Hybrid evolutionary algorithms in a SVR traffic flow forecasting model. Appl Math Comput 2011; 217: 6733–6747.

Lee

DT.

Travel-time prediction with support vector regression. IEEE T Intell Transp 2004; 5: 276–281.

10.

Zhang

Xie

Forecasting of short-term freeway volume with v-support vector machines. Transport Res Rec 2008; 2024: 92–99.

11.

Van Lint

Hoogendoorn

Van Zuylen

. Freeway travel time prediction with state-space neural networks: modeling state-space dynamics with recurrent neural networks. Transport Res Rec 2002; 1811: 30–39.

12.

Jiang

Adeli

Dynamic wavelet neural network model for traffic flow forecasting. J Transp Eng: ASCE 2005; 131: 771–779.

13.

Zhu

Cao

Zhu

Traffic volume forecasting based on radial basis function neural network with the consideration of traffic flows at the adjacent intersections. Transport Res C: Emer 2014; 47: 139–154.

14.

Suykens

JAK

Vandewalle

. Least squares support vector machine classifiers. Neural Process Lett 1999; 9: 293–300.

15.

Kennedy

Eberhart

RC.

Particle swarm optimization. Inst Electr Electron Eng 1995; 11: 1942–1948.

16.

Lawrence

April

Lin

. Testing and prediction of traffic flow dynamics with chaos. J East Asia Soc Transp Stud 2003; 5: 1975–1990.

17.

Frazier

Kockelman

Chaos theory and transportation systems: instructive example. Transport Res Rec 2004; 1897: 9–17.

18.

Lan

Sheu

Huang

YS.

Investigation of temporal freeway traffic patterns in reconstructed state spaces. Transport Res C: Emer 2008; 16: 116–136.

19.

Wang

Shi

Short-term traffic speed forecasting hybrid model based on chaos-wavelet analysis-support vector machine theory. Transport Res C: Emer 2013; 27: 219–232.

20.

Shang

Kamae

Chaotic analysis of traffic time series. Chaos Soliton Fract 2005; 25: 121–128.

21.

Xue

Shi

Short-time traffic flow prediction based on chaos time series theory. J Transp Syst Eng Inf Technol 2008; 8: 68–72.

22.

Dhanya

Kumar

DN.

Multivariate nonlinear ensemble prediction of daily chaotic rainfall with climate inputs. J Hydrol 2011; 403: 292–306.

23.

Kim

Eykholt

Salas

JD.

Nonlinear dynamics, delay times, and embedding windows. Physica D 1999; 127: 48–60.

24.

Grassberger

Procaccia

Estimation of the Kolmogorov entropy from a chaotic signal. Phys Rev A 1983; 28: 2591.

25.

Takens

Detecting strange attractors in turbulence. Berlin, Heidelberg: Springer, 1981.

26.

Rosenstein

Collins

De Luca

CJ.

Reconstruction expansion as a geometry-based framework for choosing proper delay times. Physica D 1994; 73: 82–98.

27.

Fraser

Swinney

HL.

Independent coordinates for strange attractors from mutual information. Phys Rev A 1986; 33: 1134–1140.

28.

Holzfuss

Mayer-Kress

An approach to error-estimation in the application of dimension algorithms. Dimens Entropies Chaot Syst 1986; 32: 114–122.

29.

Kennel

Brown

Abarbanel

HDI

. Determining embedding dimension for phase-space reconstruction using a geometrical construction. Phys Rev A 1992; 45: 3403–3411.

30.

Cao

Practical method for determining the minimum embedding dimension of a scalar time series. Physica D 1997; 110: 43–50.

31.

Abarbanel

HDI

Brown

Sidorowich

. The analysis of observed chaotic data in physical systems. Rev Mod Phys 1993; 65: 1331.

32.

Keerthi

Lin

CJ.

Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Comput 2003; 15: 1667–1689.

Short-term traffic flow prediction model using particle swarm optimization–based combined kernel function-least squares support vector machine combined with chaos theory

Abstract

Keywords

Introduction

Methodology

Multivariate chaotic time series analysis

MPSR

Determination of time delay and embedding dimension

Correlation dimension

CKF-LSSVM model

Basic principle of LSSVM

Construction of CKF

PSO algorithm

MPSR–CKF-LSSVM model

Empirical analysis

Data source

Determination of time delay and embedding dimension for MPSR

Correlation dimension

Optimization parameters

Evaluation performance indices

Model performance and analysis

Discussion and conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

References