Abstract
In order to improve the effect of estimating travel time and provide more precise and reliable traffic information to traffic management department and travelers, we proposed an arterial travel time estimation method using Sydney Coordinated Adaptive Traffic System traffic data based on K-nearest neighbor–least squares support vector regression model. First, the virtual time series is constructed by analyzing the characteristics of the inconsistent time intervals of Sydney Coordinated Adaptive Traffic System traffic data. Second, the K-nearest neighbor method was used to search the K similarity patterns matching the current traffic pattern and obtain K travel time data. Then, the least squares support vector regression model was used to perform travel time estimation. Finally, case validation is carried out using the measured data of Sydney Coordinated Adaptive Traffic System traffic control system. The estimation results demonstrate that the travel time estimation accuracy of proposed method outperforms the other two methods.
Keywords
Introduction
Travel time is an important measurement for evaluating the performance of traffic management strategies. 1 From the traffic manager’s point of view, they could make full use of the travel time information to balance travel demand in different parts of a road network and achieve more efficient use of existing traffic infrastructure. From the traveler’s perspective, accurate time information helps them to make better decisions in terms of route selection or mode choice, and when given pre-trip it allows them to choose the time of departure, thus alleviating driver stress. Research has illustrated that about 77.7% drivers will change their travel route based on travel time information and even if there is no any alternative route, they will find driving less stressful when knowing what to expect ahead of them. 2 However, it is well known that travel time estimation is not an easy task because travel time is a complex dynamic parameter which is influenced by a range of different factors, such as driver characteristics, weather conditions, traffic flow characteristics, road way conditions, and so on. In particular, the travel time estimation for arterial is a more challenging task because arterials are interrupted facilities with traffic signals and other control devices. Therefore, the travel time estimation, especially for arterial, has been recognized as a critical need for the intelligent transportation systems.
At present, the methods for collecting travel time data can be generally divided into two categories. The first category is direct approaches which use automatic vehicle identification data,3,4 probe vehicles data,5–8 toll collection system data,9,10 and mobile phones data.11–13 Travel time data can be quickly collected via these approaches, but most of the direct travel time measurement techniques are expensive and immature. The second category of methods for collecting travel time data is indirect approach in which loop detectors are the most commonly used equipment.14–17 Currently, the indirect travel time estimation methods can be generally divided into four classes: speed-based estimation models,18,19 cumulative plot-based methods, 20 regression models,21,22 and artificial intelligence methods.23–25 Li et al. 18 compared the performance of four speed-based travel time estimation methods: an instantaneous model, a time slice model, a dynamic time slice model, and a linear model. Van Lint and van der Zijpp 19 presented a new travel time estimation algorithm based on a linear function of speed. Bhaskar et al. 20 presented a travel time estimation method based on cumulative plots for urban signalized intersections. Kwon et al. 21 formulated their regression model using flow, occupancy, time of departure, and day of week. Tang et al. 23 designed a new travel time estimator based on an evolving fuzzy neural network by using traffic flow data collected from existing loop detectors. Liu et al. 24 presented a neural network–based traffic flow model to estimate urban arterial travel time.
Travel time estimation has generated great interest among researchers and a significant number of methods exist in the literature. However, due to the special location of vehicle detectors and the special data types in the Sydney Coordinated Adaptive Traffic System (SCATS) traffic control system, there are few related research results. Luk et al. 26 described the ARRB travel time model for estimating arterial travel times for general traffic and buses. The model retrieves traffic data from SCATS each minute and has been implemented in a server at VicRoads. Mazloumi et al. 27 presented an artificial neural network that used saturation degree data collected by the SCATS at intermediate signalized intersections along with schedule adherence to predict bus travel time. Cheu et al. 28 introduced a model to estimate average link travel time of signalized arterials using data obtained from detectors in SCATS. The related studies often assumed that vehicle detectors can obtain traffic data according to certain fixed sampling intervals, even on the basis of some data that are not available at present. These assumptions do not conform to the actual situation of SCATS traffic control system. Jiang et al. 29 designed a travel time estimation method using SCATS data based on k-NN algorithm, but this method only considered the linear relationship of travel time data, the accuracy of travel time estimation needs to be further improved. Taking into account the above reasons, and with the goal of improving the accuracy of travel time estimation for arterial, we put forward arterial travel time estimation method using SCATS traffic data based on K-nearest neighbor (KNN)–least squares support vector regression (LSSVR) model. The remainder of this article is structured as follows: Section “Feature analysis and processing of SCATS data” presents the feature analysis and processing of SCATS data. Section “Arterial travel time estimation based on KNN-LSSVM model” gives the arterial travel time estimation based on KNN-LSSVR model. Section “Empirical analysis” describes the empirical analysis. Section “Conclusion” draws some conclusions.
Feature analysis and processing of SCATS data
Characteristics analysis of SCATS traffic data
SCATS traffic control system employs 16–32 loop detectors per intersection to obtain traffic parameter data. The large quantity of detectors is located downstream of lane and near stop line which could record traffic information per signal cycle. The traffic information recorded by the SCATS system is shown in Table 1.
The traffic information recorded by the SCATS system.
SCATS: Sydney Coordinated Adaptive Traffic System.
As we can see from Table 1, the SCATS traffic control system can provide two kinds of traffic information. One is the signal setting information such as cycle length, signal phase, start time of per phase, and green times. The other is traffic parameter information such as traffic counts and time headway during each phase green time.
Through extensive analysis of the basic data obtained from the SCATS traffic control system, the following main characteristics were found:
The traffic parameters provided by the SCATS traffic control system include traffic counts and average headway time, not providing speed and occupancy data.
The detectors of SCATS traffic control system are placed in front of the stop line of the intersection. Therefore, the obtained traffic parameter data can’t reflect the influence of different number of queued vehicles on the road, which will limit the application of data to a certain extent.
The data sampling interval of SCATS traffic control system is determined by green signal phase, while the green phase duration is dynamically changing. Therefore, the traffic parameter data of each sampling interval are not strictly comparable, which increases the difficulty of travel time estimation.
Construction of virtual time series for SCATS traffic data
In order to obtain the data time series of the SCATS traffic control system with a fixed sampling interval, this article proposes the concept of a virtual time series. The basic principle is that the arrival of the vehicle in each signal cycle converts from random distribution to uniform distribution. A virtual sampling interval (5 min, 10 min, etc.) is set up and inserted into the time axis of SCATS traffic control system data. The actual sampling interval of the SCATS traffic control system is still the green signal phase, while the virtual sampling interval is the setting time length. The traffic data in each virtual sampling interval can be obtained by converting the corresponding data in the actual sampling interval. Schematic diagram of virtual sampling interval interpolation is shown as Figure 1, where C represents different signal cycle and
1. Conversion of traffic flow during virtual sampling interval.

Schematic diagram of virtual sampling interval interpolation.
Traffic flow is a cumulative value. Under the assumption that the vehicle is uniformly arriving, the number of vehicles arriving per unit time in a signal cycle is equal to the ratio of the actual number of vehicles arriving to the signal cycle length, which is called the average traffic flow
where
The traffic flow mapping relationship from the actual sampling interval to the virtual sampling interval is as follows
where qj,x is the traffic flow within the virtual sampling interval, ti is the part time length of the virtual sampling interval j located in the actual sampling interval i, and N is the number of jth virtual sampling interval spanning the actual sampling interval.
2. Conversion of average time headway during virtual sampling interval.
Under the actual sampling interval, the average time headway is the average time interval between random arrival vehicles, which is equivalent to converting the vehicle arrival mode from the random distribution to the uniform distribution. Therefore, at the virtual sampling interval, the calculation of the average headway time is as follows
where hj,x is the average time headway of the virtual sampling interval, and hi,s is the average time headway of the actual sampling interval i.
3. Conversion of traffic signal control parameters during virtual sampling interval.
SCATS traffic control system adopts the small-step online selection method to dynamically optimize the timing parameters. The difference of the adjacent signal cycle is within 6 s, which is shown as Figure 2.

Schematic of green time and cycle time.
Therefore, in the virtual sampling interval j, the cycle length and the green times can be approximated as the average of the corresponding parameters of the actual sampling interval i. The specific mapping relationship is as follows
where gj,x and Cj,x are the average green times and average cycle length of the virtual sampling interval j, respectively, and gi,s is the green times of the actual sampling interval i.
In the SCATS traffic control system, saturation refers to the ratio of the effective green time to the green time. The mapping relationship from the actual sampling interval i to the virtual sampling interval j is as follows
where DSj,x is the average saturation of the virtual sampling interval j, and DSi,s is the saturation of the actual sampling interval i.
Arterial travel time estimation based on KNN-LSSVM model
KNN search mechanism
The KNN algorithm is based on the pattern recognition theory. The premise is that the current pattern of the research object has similarities with several historical pattern. The basic principle is to search for the most K similar historical patterns to the current pattern in the specified database and determining the pattern similarity measure method. Based on this, the target travel time value corresponding to the current traffic pattern is obtained. The traffic parameter data have both stochastic volatility and long-term trend, so it can meet the needs of KNN algorithm to a certain extent.
1. The definition of the feature vector.
Feature vectors are the criteria for comparing current traffic patterns with historical traffic patterns. There is no uniform standard for the selection of feature vectors. Taking as many factors as possible into the feature vector not only does not improve the estimation accuracy but also can increase the running time of the algorithm. Therefore, this article will determine the appropriate feature vector composition based on the characteristics of SCATS traffic data.
2. Selection of similarity measure method.
At present, multiple similarity measure methods can be applied to KNN search, such as Chebyshev distance, Mahalanobis distance, Euclidean distance, and so on. It has been proved that the use of different spatial distance measures does not have a significant impact on the final outcome of the research object. 30 Therefore, this article will calculate the Euclidean distance of the feature vector between the current traffic pattern and the historical traffic pattern, which is used to measure the matching degree between different traffic patterns
where di is the distance between current data and ith group data in the historical database, Vi is the value of the ith item in the current data,
3. The determination of the nearest neighbor number K.
The selection of K value is largely related to the specific situation of historical data and the specific composition of feature vectors. At present, there are no rules to guide the selection of K value. In this article, aiming at the specific experimental environment, the determination of the K value is based on the minimum error of the estimated travel time.
The principle of LSSVR model
LSSVR is an improved algorithm based on SVR. By introducing the method of equality constraint and least square loss function, the optimization problem is changed into a linear equation, and the complexity of the algorithm is reduced by avoiding the two programming problem. Regression forecasting based on LSSVR can be described as follows.
Considering a given training data set
subject to
where w is the weight vector, C is the penalty factor,
where
By eliminating w and
where
Therefore, the regression model of LSSVR can be obtained as follows
where
Modeling of travel time estimation based on KNN-LSSVR
According to the characteristics of KNN model and LSSVR model, this article combines these two methods and proposes a KNN-LSSVR estimation model.
In this article, the traffic pattern at specific time intervals and the travel time at the same interval are called traffic pattern pairs. Based on the K similarity patterns matching the current traffic patterns, the corresponding K link travel time data can be determined through pattern matching. Then the corresponding K link travel time data are used to train LSSVR model. The framework of the KNN-LSSVR model is illustrated in Figure 3.

KNN-SVR modeling process.
Empirical analysis
Design of experimental scheme
The experimental data are derived from the measured data of SCATS traffic control system in Shanghai, China, which is collected on May to July 2009. An arterial segment named Changshou Road is selected as the test object. The experimental area consists of seven consecutive intersections, with 12 road sections in both directions. The schematic diagram of the experimental area is shown in Figure 4, where A denotes the intersection of Changshou Road and Jiaozhou Road and C denotes the intersection of Changshou Road and Shanxi Road. Figure 5 presents the traffic signal phase of intersection A and intersection C.

Experimental area schematic.

Traffic signal phase diagram. (a) Traffic signal phase of intersection A. (b) Traffic signal phase of intersection C.
Intersection A and intersection C are equipped with video detectors. The license plate recognition rate is 99%, and the accuracy rate is 95% (daytime) − 90% (night). The video detector codes of the East and west entrance of the intersection C are 1 and 4, respectively, and the video detector codes of the East and west entrance of the intersection A are 2 and 3, respectively. The schematic diagram of the experimental segment is shown in Figure 6.

The schematic diagram of the experimental segment.
In this article, the link travel time based on the license plate recognition data of the video detector is taken as the true value. Considering the particularity of the license plate recognition video detector layout, the section between two consecutive intersections is called a road section unit, and the section between two consecutive video detectors is called a combination road section. Due to the travel time, true value of the road section unit cannot be obtained, and the estimation effect of the travel time of the combination road section is only evaluated. The combination road sections A–C and C–A are taken as research objects, and the experimental data are divided into two parts: calibration set and test set. The calibration set accounts for about two-thirds of the total number of data.
The arterial road in the experimental area includes two straight lanes, one left-turn lane, and one straight-right lane. In this article, we only evaluate the travel time estimation effect of the combination road section, so we only take the straight lane as the research object without considering the turning lane. When constructing the feature vector of the traffic pattern, the traffic flow parameter data of the same position coil of the two straight lanes are first averaged. Since the two straight lanes belong to the same phase, the traffic signal control parameter data do not need to be processed similarly.
Parameter calibration
Taking the combination road section A–C as an example, the parameter calibration process of the proposed method is illustrated as follows:
1. Definition of traffic pattern feature vector.
The average traffic data of two straight lane groups at west entrance of intersection B are recorded as
2. The selection of the nearest neighbor number K.
Based on the Euclidean distance, the average of the K travel times corresponding to the similar pattern searched is used as the estimated value. The determination of the K value is based on the minimum error of the estimated travel time. When selecting different K values, the mean absolute percent error (MAPE) of the travel time estimation is shown in Figure 7.

Error measures of travel time estimation under different K.
As can be seen from Figure 7, when K is taken as 4, MAPE is relatively small. Therefore, the nearest neighbor K is selected as 4.
3. Parameter optimization.
In order to determine the best value for C and g, the grid search method is used to optimize the parameters. Meanwhile, the K-fold cross-validation method is used to prevent over-fitting and under-fitting. The training data set is randomly divided into K subset, the LSSVR model is built using K − 1 subset as the training set. The performance of the parameters is checked on the Kth subset. In this article, Gauss RBF is used as kernel function and fivefold cross-validation method is used. Parameter optimization results are shown in Figure 8.

Parameter optimization results.
As we can see from Figure 8, the optimal parameters of LSSVR model are C = 0.70711, g = 256.
Performance evaluation index
In order to evaluate the efficiency of the proposed approach, three different types of statistical indices are utilized to measure the estimation accuracy. These indices are the mean absolute error (MAE), MAPE, and root mean square error (RMSE). The equations of these indices are as follows
where yi denotes the actual value for the ith time interval,
Analysis of experimental results
In order to illustrate the estimation performance of the proposed method intuitively, Figure 9 presents estimation results of combination road section A–C.

Travel time estimation results of combination road section A–C.
As we can see from Figure 9, the travel time estimation result of section A–C has a good trend consistency with its true value. Most of the relative errors are within 10%, and only a few are more than 20%. Most of the absolute error is within 20 s, but the absolute error is large at 17:25:00–18:45:00, which indicates that the travel time estimation effect is not good under severe congestion.
Considering different traffic conditions in different time periods, we selected three periods across the day, including morning peak (6:00–10:00), noon off-peak (11:00–14:00), and evening peak (16:00–20:00) to test performance of the proposed method. In the model validation, we compare the travel time estimation performance of the proposed method with the multiple linear regression (MLR) model and Jiang et al. 29 method. Table 2 provides the estimation results of the combination road section A–C. Table 3 provides the estimation results of the combination road section C–A.
Comparison of the estimation results for the combination road section A–C.
MLR: multiple linear regression; MAE: mean absolute error; MAPE: mean absolute percent error; RMSE: root mean square error.
Comparison of the estimation results for the combination road section C–A.
MLR: multiple linear regression; MAE: mean absolute error; MAPE: mean absolute percent error; RMSE: root mean square error.
To consider different patterns of travel time according to the time of day, we compare the travel time estimation results in three periods: morning, noon, and evening. From Tables 2 and 3, we can see that the estimation errors in morning and evening periods were generally higher than those in the noon period. Furthermore, the proposed method is superior to the other two methods both in three periods and all-day. We obtain similar analyzing results for both sections C–A and A-C, which demonstrates that the proposed method has good generalization ability.
Conclusion
Arterial travel time is an important performance measure for both road travelers and traffic engineers. The KNN-LSSVR method is proposed to estimate the arterial travel time based on data commonly provided by SCATS system loop detectors (flow and time headway) and the signal settings (cycle length, green times, and DS value) at each traffic signal. The main contribution of this article is that we propose the concept of a virtual time series to process the original data of the SCATS system and construct a KNN-LSSVR model to estimate arterial travel time. Finally, validation of the travel time estimates has been carried out by using the observed travel times collected from SCATS traffic control system on the arterial road of Shanghai, China. The validation results indicate that the proposed method has a good potential to be developed and is suitable for arterial travel time estimation.
Due to the limitation of the current engineering conditions, this article fails to analyze the travel time estimation effect of the natural section. In addition, the application effect on other roads also needs to be further verified. Further research will be carried out to consider the effects of other factors, such as adverse weather and traffic accidents.
Footnotes
Acknowledgements
The authors express their sincere appreciation to the National Natural Science Foundation of China (no. 51678320).
Handling Editor: Liping Jiang
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by a grant (no. 51678320) from the National Natural Science Foundation of China.
