Abstract
In order to improve the tracking accuracy and real-time performance of the optoelectronic tracking system, an improved kernelized correlation filter approach is developed to obtain precise tracking of a maneuvering object. The proposed strategy contains merits of adaptive threshold approach, kernelized correlation filter method, and Kalman filter algorithm. The adaptive threshold approach can choose the suitable threshold in accordance with the size of the target in the image to improve the tracking performance of the kernelized correlation filter method. When the change between previous position and current position is larger than the distance threshold, Kalman filter algorithm is used to predict the target position for tracking. The tracking accuracy of the proposed algorithm is improved by updating the prediction of the target position with a trusted algorithm. The experimental results on comparison with some state-of-the-art trackers, such as kernelized correlation filter; Tracking-Learning-Detection; scale adaptive with multiple features; minimum output sum of squared error; and dual correlation filter, demonstrate that the proposed approach has the effectiveness of tracking accuracy and real-time performance in tracking the maneuvering object.
Keywords
Introduction
Optoelectronic tracking system has been widely used in military and civil domain in recent years. 1,2 It is a comprehensive technology including optics, mechanics, electrics, automation, and sensors. As a representative of real-time monitoring fields, the optoelectronic tracking system can track the moving target and record video in order to achieve the goal of target tracking and video forensics. Image tracking algorithm of the optoelectronic tracking system has to meet the requirements of real-time performance and tracking accuracy for tracking maneuvering object.
Although a variety of object tracking algorithms 3 –5 have been developed in computer vision, 6 –10 object tracking is still a critical issue, due to the presence of scale variation, illumination, occlusion, background clutter, and viewpoint.
In general, the effective appearance models can be categorized into generative or discriminative algorithms. Generative approaches consider the tracking problem as searching for image region with the most similarities to represent the target. A novel object tracking method based on adaptive sparse representation 11 was designed for tracking the target. A joint model of appearance and spatial configuration of pixels 12 was developed to estimate the local distortion of the object. The orthogonal matching pursuit strategy 13 was designed for handling the optimization problems efficiently. Despite much verified success of these generative tracking methods, the drift problem remains to be settled.
The generative algorithms mentioned above cannot treat major appearance changes as the track evolves. An online boosting feature selection method 14 was formulated for visual tracking. However, the drift problem occurred as only one positive sample was used to update the classifier. The performance of tracking was enhanced by researching the relationship between the target and the structured environment in Zhu et al. 15 A semi-supervised learning algorithm 16 was presented to choose the positive and negative samples by an online classifier with structural constraints. Babenko et al. 17 presented the idea of multiple instance learning (MIL) to alleviate drift and improved the stability of tracking with fewer parameter tweaks. A novel online weighted multiple instance learning tracker that had the performance of rapidity and robustness was presented in a study. 18 It could naturally integrate the sample significant into the learning process by assuming the tracking position at current frame was the position of most correct positive sample. A discriminative appearance model 19 based on superpixels was introduced to deal with occlusions and recovery from drift.
Tracking by detection strategies was widely utilized in computer vision in recent years. A classifier of these strategies was trained using positive samples and negative ones. By utilizing a series of structural constraints with boosting classifier, the tracking task is considered as Tracking-Learning-Detection (TLD). 20 A novel online tracking method which integrated the online MIL into the recent sparse representation was proposed by Yan et al. 21 Object tracking 22 was considered as a binary classification issue in which the correlation of object appearance and class labels from foreground and background was modeled by partial least squares analysis. A novel discriminative, generative, and collaborative appearance model 23 was presented for robust object tracking. A robust visual tracking method 24 based on online learning sparse representation was designed for handling appearance variations.
Recently, the methods based on correlation filters had shown advantages both in tracking accuracy and robustness. A novel extension to correlation filter 25 was developed to consider the employment of multichannel signals with the efficient utilization of memory and computations. Kernelized correlation filter (KCF) 26 used the histogram of oriented gradients (HOG) characteristic instead of raw pixel value and Gaussian kernels to improve the tracking accuracy. Dual correlation filter (DCF) 26 adopted a fast multichannel extension of linear correlation filters using a linear kernel to improve the tracking performance of tracker. Scale adaptive with multiple features (SAMF) 27 increased scale to the KCF framework by sampling the primitive object with different scales and learning the model at each scale. Although it integrates HOG descriptor with a color-naming 28 technique to improve the tracking performance, it leads to a large amount of computational cost. Minimum output sum of squared error (MOSSE) algorithm 29 could be seen as a kernelized version of a linear correlation filter, which formed the basis for the fastest trackers available.
In this article, an improved kernelized correlation filter (IKCF) approach is proposed to gain good tracking performance for the maneuvering target. The main contribution is that the adaptive threshold approach and Kalman filter (KF) 30 are incorporated into KCF algorithm to strengthen the tracking accuracy and real-time performance of the optoelectronic tracking system. The adaptive threshold is selected by experiments to improve the tracking performance of KCF algorithm. KF algorithm is used to estimate the target position when the change in position is larger than the distance threshold. The tracking performance of the proposed algorithm is verified by experimental results on comparison with some state-of-the-art trackers such as KCF, TLD, SAMF MOSSE, and DCF for tracking maneuvering target.
The article is organized as follows: The second section describes the proposed visual tracking strategy. It contains adaptive threshold approach, KCF method and KF algorithm. Then, the flow chart of the IKCF method is described. In the third section, firstly, the composition of the system is introduced. Then, the experimental results are shown to verify the validity of the proposed algorithm. The fourth section gives the conclusions and future work.
Proposed visual tracking strategy
In this section, the principle of the proposed visual tracking strategy is introduced. It consists of adaptive threshold approach, KCF algorithm, and KF-based tracking with novel algorithm of position updating. Then, the flow chart of the IKCF method is presented. The details are introduced as below.
Adaptive threshold approach
The traditional fixed-size search window usually selects a fixed threshold value for the whole video frames and does not consider any current information gained from the feature of object and video frame. Consequently, this strategy is more likely to fail to track object under the condition of maneuvering object.
The bounding box is defined as the size of the extra area surrounding the object. The selection of its threshold value 4 is critical for KCF algorithm. If it is too large, it expands the search scope of the object. Then, the running efficiency of KCF algorithm will reduce and even lead to the tracking failure. If it is too small, the object may move out of the bounding box, resulting in failure of tracking. Therefore, adaptive bounding box value approach is designed by studying several video sequences. This method is based on the image pixels LH × WH and the object pixels lh × wh, where LH and WH represent the length and width of the image, and lh and wh represent the length and width of the object. The threshold value H is described as below.
where
Summary of KCF method
The reason why KCF algorithm has the performance of running at hundreds of frames per second (fps) is that the training samples are constructed by cyclic shifts, which makes the data matrix into cyclic matrix. According to the characteristic of the cyclic matrix, the solution of the problem is transformed to the discrete Fourier transform (DFT) domain. Therefore, it avoids the process of matrix inversion and reduces the complexity of several orders of magnitude. The summary of the KCF method is briefly introduced as follows.
The KCF classifier is trained by utilizing image patch
Classifier
The classifier of KCF tracker is trained by minimizing the following function
where γ represents a regularization parameter. The solution of equation (2) can be extended as a linear combination of the samples xt
where
Kernel matrix K with elements Kt,s is described as below
F(β) is defined as follows
where kxx represents the first row of the kernel matrix K. F represents the DFT.
Fast detection
In the KCF algorithm, the kernel matrix is defined as
Cyclic shift p can be gained by equation (7)
where g(p) represents a 1 × n vector and F −1 represents the inverse DFT. The coordinates corresponding to the maximum value of g(p) is defined as the new position. The details of the KCF method can be obtained from the study by Henriques et al. 26
KF algorithm
In order to improve the robustness of the tracking algorithm, the KF algorithm that estimates the position of the target in the next frame is used when the difference of object position between previous frame and current frame is larger than a certain threshold value. The predicted position is developed by the following conditions. For one thing, a certain threshold is selected by the characteristic of the target which includes the speed and the position of the target. For another, the predicted position is obtained according to the predicted position of the previous frame.
The model of the object is usually considered as “constant velocity with random walk.” The state equation and the observation equation of the object are listed as follows
where
where T
1 represents the sampling time.
where rxi
and ryi
represent the uncorrelated zero-mean Gaussian noise elements, respectively.
where xi
and yi
represent the object coordinates at the time step i.
where vx i and vy i represent the uncorrelated zero-mean Gaussian noise elements, respectively.
The flow chart of the IKCF method
Figure 1 describes the flow chart of the IKCF method.

The flow chart of the IKCF method. IKCF: improved kernelized correlation filter.
When the target is moving fast in a long distance, scale changes and occlusion on the target will occur. When the condition occurs, the position of the target will change dramatically. The proposed IKCF algorithm will play an important role under this condition. In the process of target tracking, the different value of object position between previous and current frame is considered as threshold. As its value is larger than the distance threshold, the information of the target coordinates provided by the KCF algorithm is no longer trust worthy. Therefore, the coordinates predicted by KF is considered as current position of the target, and KF is updated using its prediction algorithm. When the difference value is smaller than the distance threshold, the information of the target coordinates provided by the KCF algorithm is trust worthy. Then, the proposed algorithm will implement the KCF algorithm to track target.
The tracking performance is evaluated by two strategies. The first strategy is the distance error, which shows the error between the center of the target and target’s ground truth value. The second strategy is precision curve, 31 which describes the percentage of correctly tracked frames for the bounds of threshold. The distance error D s is described as below
where (x k, y k) and (y o, x o) are the center of the target and ground truth value of the target, respectively.
System setup and experiments
In this section, first of all, the composition of the optoelectronic tracking system is introduced. Then, the performance of IKCF algorithm is compared with some state-of-the-art trackers such as KCF, TLD, SAMF, MOSSE, and DCF algorithms.
System setup
The system includes control cabinet and pointing unit. It can track moving target by vision camera or infrared thermal imager in pointing unit and record the video for searching, tracking, and monitoring the target. Control cabinet contains industrial personal computer, image capture card, and liquid crystal display. The pointing unit includes infrared thermal imager, vision camera, bearing base, laser, torque motor, and transformer. The optoelectronic tracking system is shown in Figure 2.

The optoelectronic tracking system.
The proposed IKCF algorithm has been implemented in MATLAB environment. The designed code can carry out dozens to hundreds of fps, according to the size and the complexity of background of the template adopted. Using automatic focusing module, the target can be clearly imaged in the range of the lens.
Experiments
The effectiveness of the proposed tracking method is verified by experiments on challenging videos and selected video. The challenging videos are Couple, Deer, Dancer, Football, CarScale, and Car4. They have been used for benchmarking in several papers. 24,32,33 The selected video is the video sequences of Civil aviation aircraft recorded by ourselves.
For the sequences of Couple, Deer, Dancer, Football, CarScale, Car4, and Civil aviation aircraft, the ground truth location of the target for each frame is marked by the authors. The description of these videos is shown in Table 1.
Description of sequences.
Tracking results of the proposed algorithm are compared with the five state-of-the-art tracking algorithms, that is, KCF, 26 TLD, 20 SAMF, 27 MOSSE, 29 and DCF. 26
Figures 3, 5, 7, 9, 11, 13, and 15 describe precision, distance error, and time for the Couple, Deer, Dancer, Football, CarScale, Car4, and Civil aviation aircraft videos, respectively. The curves show the results of KCF, IKCF, TLD, SAMF, MOSSE, and DCF in red, black, blue, cyan, magenta, and green colors, respectively.

Target tracking performance of Couple.
Figures 4, 6, 8, 10, 12, 14, and 16 describe screenshots for the Couple, Deer, Dancer, Football, CarScale, Car4, and Civil aviation aircraft videos, respectively. The tracking box shows the results of KCF, IKCF, TLD, SAMF, MOSSE, and DCF in black, green, blue, red, yellow, and cyan colors, respectively.

Screenshots of Couple.

Target tracking performance of Deer.

Screenshots of Deer.

Target tracking performance of Dancer.

Screenshots of Dancer.

Target tracking performance of Football.

Screenshots of Football.

Target tracking performance of CarScale.

Screenshots of CarScale.

Target tracking performance of Car4.

Screenshots of Car4.

Target tracking performance of Civil aviation aircraft.

Screenshots of Civil aviation aircraft.
Figure 4 shows some sample tracked frames for Couple video sequence. The IKCF algorithm successfully tracks the target during fast motion and background clutter (e.g. Frames 91, 95, 102, and 122).
Figure 6 describes the performance of the IKCF tracking method for Deer video during motion blur (e.g. Frame 25), fast motion, and background clutter (as shown in Frames 27, 29, and 35).
Figure 8 explains that the IKCF tracking method successfully deals with the out-of-plane rotation of the target with scale variation (e.g. Frames 45 and 122) and deformation in Dancer video (e.g. Frames 135 and 142).
Figure 10 describes the tracking results of the IKCF algorithm on Football video. The video includes background clutter (e.g. appearance in Frame 277 and Frame 335) and occlusion (e.g. more than 50% of the head is occluded as shown in Frame 286).
Figure 12 depicts some frames from CarScale data set. The video includes fast motion, occlusion, and scale variation, which makes it difficult to track target. The situation becomes worse due to the occlusion (e.g. Frame 162) and scale variation (e.g. Frames 213 and 230).
Figure 14 describes some frames of Car4 video. The proposed algorithm treats scale variation (e.g. Frames 10 and 400) and illumination variation (e.g. Frames 240 and 645).
Figure 16 shows a few frames of Civil aviation aircraft video. The IKCF method successfully treats the scale variation effects on the target (e.g. Frames 27 and 98) and occlusion (e.g. Frame 38).
Table 2 shows the results of mean precision, and Table 3 describes the mean fps. The best result for video is described in bold-underline, the second best is in bold, and the third best result is in underline style. The last line in each table describes the average score of the methods for all seven videos.
Mean precision (20 pixels).
KCF: kernelized correlation filter; IKCF: improved kernelized correlation filter; TLD: Tracking-Learning-Detection; SAMF: scale adaptive with multiple features; MOSSE: minimum output sum of squared error; DCF: dual correlation filter.
Mean fps.
fps: frames per second; KCF: kernelized correlation filter; IKCF: improved kernelized correlation filter; TLD: Tracking-Learning-Detection; SAMF: scale adaptive with multiple features; MOSSE: minimum output sum of squared error; DCF: dual correlation filter.
It can be observed from Table 2 that the proposed algorithm performs better than other algorithms in mean precision. The tracking accuracy of IKCF algorithm is improved due to usage of KF algorithm. The number of fps depends upon the size of template and search window. According to the size of the template and the fast processing of the search window, the normalized correlation is computed in Fourier domain or spatial domain. Although the fps of proposed algorithm is slower than that of MOSSE and DCF algorithm, the accuracy of the proposed algorithm has been improved, which lays the foundation for the accurate tracking target for optoelectronic tracking system.
Through the above seven experimental results and tracking performance comparison experiment on Table 2 and Table 3, it can be seen that the IKCF algorithm has better advantages in tracking accuracy and mean fps than the KCF, TLD, and SAMF algorithms. Although the mean fps of MOSSE algorithm is faster than the other five trackers, the tracking accuracy is poor compared to the IKCF trackers. Therefore, the IKCF algorithm can meet the tracking performance of optoelectronic tracking system for maneuvering target.
Conclusions and future work
In this article, IKCF approach is proposed to achieve good tracking performance for maneuvering target. The proposed algorithm integrates the merits of adaptive threshold approach, KCF algorithm, and KF algorithm. The adaptive threshold approach can obtain the threshold of bounding box on the basis of the size of the target in the image. KF algorithm will predict the next target position when the different value of object position between previous and current frame is larger than the distance threshold. The tracking performance of the IKCF algorithm is demonstrated by comparing to some state-of-the-art trackers such as the KCF, TLD, SAMF, MOSSE, and DCF algorithms. The experimental results verify the stability and effectiveness of the proposed approach for tracking maneuvering target.
Further work will be focused on the adaptive distance threshold selected technique and the real-time application of the proposed algorithm in the optoelectronic tracking system.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the Foundation of the National Natural Science Foundation of China (No. 61427810 and 61733012) and by State Key Laboratory of Precision Measuring Technology and Instruments (No. PILT1713 and PILQ1703).
