Abstract
Due to complex background and volatile object shape-appearance in image, the stability and accuracy of tracking algorithm is often disturbed and reduced. So how to accurately and robustly track object in object tracking application is a challenge topic at home and abroad. Built upon the methodologies of compressive tracking and spatio-temporal context, a simple yet robust object tracking method is proposed for solving the drift and occlusion problems in paper. It combines two existing classical ideas into a single framework: adaptive weighted idea and occlusion detection mechanism. In order to weaken interference problems of object background, object area is firstly partitioned into equal-sized sub-patches and the different weight related with location information is assigned for each patch; Then, for improving its robustness, Bhattacharyya distance is adopted to find out these samples with maximum discrimination; In addition, our proposed occlusion detection mechanism is for recapturing the tracked object when occlusion occurs. Many simulation experiments show that our proposed algorithm achieves more favorable performance than these existing state-of-the-art algorithms in handing various challenging infrared videos, especially occlusion and shape deformation.
Keywords
Introduction
It is well known that infrared image has anti-interference, good hidden property in all-weather and all-time, and has been widely used in various civilian and military fields, such as aerial defense detecting system, infrared guidance, wide area monitoring. The goal of infrared object tracking is to estimate the location of an object from a sequence of frames given a known initial position. 1 It plays a critical role in many applications such as motion analysis, object recognition, guidance technology, photoelectric measure field and intelligent user interfaces.2,3 Object tracking algorithms have been studied comprehensively in the last decades, but the research for efficient object tracking methods still is a valid challenge due to appearance change caused by occlusion, shape deformation, motion and illumination. 3
Essentially there are three types of infrared object tracking method: Optimal-matching, Decision-making and the mixture of them. Model-matching methods focus on finding out the most similar object with maximum response by searching local area. In the field of object tracking, subspace modeling is an effective way to achieve object location, such as Wavelet feature, Transform-domain feature, LBP, LTP and Pixel-level feature.4–10 Early feature-based tracking method relied on fixed basis, while advanced signal processing represent each image patch as a linear combination of a few elements from a basis set called a dictionary. In essence, they are all data driven optimal-matching methods. Once the object feature is identified, classifier design is the key to determine whether the algorithm can be effective in the tracking process. In fast compression tracking algorithm, 3 the naive Bayesian classifier is used to classify these features of samples extracted from a image feature space, but it is easy to misclassify the edge background interference with high similarity. A weighted multi-instance learning algorithm proposed in Zhang and Song, 5 explored a concept of “Bag of positive-negative” and assigned different weight related to the importance of patch from positive-bag, and hence the robustness of the classifier is improved to a certain extent. In Zhang et al., 6 a dynamic object feature-learning model is proposed to select training feature with strong discrimination ability so as to distinguish the object and its background in the complex background. Real-time object tracking via online discriminative feature selection can automatically select high discrimination features to get a robust classifier. 7 However, there still exist some shortcomings. Due to lack of background information of object, these Optimal-matching algorithms would cause tracking-point drifting, even tracking failure when occlusion occurs.
Decision-making model is based on the correlation property between object and its background to track object. Traditional tracking algorithms make use of the global information of image, such as background subtraction, frame difference algorithm, particle filter and optical flow, where global parameter can be equivalent to object motion parameter. In order to search the background feature related to object region, SIFT or SURF feature is used to extract the feature of background region, increasing greatly the computational complexity.9–16 It can be seen that most of decision-making tracking algorithms usually only use some of features from partial region to assist object tracking or estimate the object state. Many tracking algorithms neglecting often the relationship between object and its background have led to lose a lot of important background information. Obviously, the use of background is the key to success. Therefore, spatio-temporal context tracking algorithm 4 proposed by Zhang et.al, takes full advantages of all of local-background information of object, and establishes context model with fast Fourier transform, which greatly improves the efficiency of the tracking algorithm.16,17
In this paper, an object tracking method is proposed which is built upon the methodology of “Compression Tracking” and “Spatio-temporal context”. It combines two classical ideas into a single framework: adaptive weighted idea and occlusion detection mechanism. So we propose an adaptive weighted compression tracking algorithm integrated with background information as decision-making. In order to weaken interference problems of object edge, object area is firstly partitioned into equal-sized sub-patches and the different weight related with location information is assigned to each patch; Then, for improving its robustness, Bhattacharyya 15 distance is adopted to find out these samples with maximum discrimination; simultaneously, our proposed occlusion detection mechanism is for continuing to track object when occlusion occurs. Many simulation experiments shows that our proposed algorithm achieves more favorable performance than these existing state-of-the-art algorithms in handing various challenging videos, especially occlusion and shape deformation.
The rest of our paper is organized as follows: we firstly review the related algorithms on fast compression tracking algorithm and spatio-temporal context tacking algorithm in the next section. Our proposed tracking algorithm is detailed in ‘Processed tracking algorithm’ section. In ‘Experimental results’ section, we present the experiment results and performance analysis against existing state-of-the-art algorithm. The last section is the conclusion of the whole dissertation.
Related work
Our work is motivated by recent advances in fast compression tracking algorithm 3 and spatio-temporal context tacking algorithm. 4 Thus, we will only discuss the most related techniques.
In the early tracking works, we always are required to search the most similar patches for a given reference patch from a search window sharing the same center with given reference patch, then group them into an positive sample set. 18 And in practical terms, distance is the sole criterion for judging similarity between the given object area and a sample, namely, the closer the distance of samples is, the more similar their features. Nevertheless, because of appearance changes caused by occlusion, shape deformation and so on, dissimilar samples cannot be used to train a discriminative classifier for distinguishing the object and its surrounding background.19,20
Fast compressive tracking (FCT) algorithm is a fast efficient tracking algorithm based on the Compressed Sensing (CS) theory. It mainly adopted a random matrix
In this entries definition,
The compressive tracking algorithm takes full advantages of the low-dimensional space feature of the infrared object, and has high robustness when the contour of the object is clear and complete. However, lack of detection mechanism for the tracking objects will lead to the misclassification to the positive and negative samples when the object is under occlusion; it will cause tracking-point drifting, even tracking failure if background information has not yet been fully utilized. The tracker will be distracted to a wrong location. Although the distractor has similar appearance to the object, most of their surrounding background contexts have different appearances which are useful to discriminate object from distractor. So Wen et al. 4 proposed a simple and robust algorithm which exploits the spatio-temporal relationships between the object and its locally dense contexts in a Bayesian framework for visual tracking, which are robust to appearance variations introduced by occlusion, distortion and pose variations. It is observed that the tracking algorithm 4 is one of several algorithms based on decision-making models, making full use of the spatial and background information.
Processed tracking algorithm
Adaptive weighted compressive tracking algorithm
Since the shape of object is usually irregular, it means that the inside of the bounding box contains less background information and more object texture, while the edge of the object brings in more background disturbance directly affecting the classification of positive and negative samples. If classifier model is updated on the basis of misclassification sample, there’s no question that it must affect the tracking accuracy and reliability. Therefore, we propose the strategy of extracted features where object patches are handily assigned weights so as to weaken background interference in paper, as shown in Figure 1. In Zhang and Song, 5 the strong classifier is constructed from all samples where the object location with maximum likelihood probability is selected out and weights based on distance to the center of object are assigned.

Confidence map; (a) Weight map (b) Sub-patches extraction.
Given that the center area of search window with higher confidence has less background information than that of edge area, the searching window is divides
It is obvious from equation (4) about weight that there are the larger weight in the center location of bounding area, so their confidence scores have been bigger than that of edge area. The true core idea of feature-matching based object tracking algorithm is the discrimination between the object feature and its background. Due to occlusion, background interference, motion blur and illumination changes, it does not ensure that the extracted low-dimensional feature, especially from edge area, has a good separability when the random measurement matrix is used to extract the low-dimension feature in compression tracking. Thus, Babenko et al.
13
creatively proposed a concept of “Bag of Instance”, and log-likelihood function of Bags are computed for selecting out weak classifier, then the best ones from all classifiers in pool are greedily picked to form a strong classifiers. On the basis of Babenko et al.,
13
Wen et al.
4
further improved the classifier and increased its robustness, which assigned different weight to these selected weak classifier. A dynamic object feature-learning model proposed in Tian et al.
15
used high divergence between two classes to train classifier. Diaconius and Freedman in Tian et al.
14
have proved that the random projection of high-dimensional random vector are approximately Gaussian distribution, and it can be described by using 4 parameters as follows
It is well known that these features not only include high discrimination feature but also have relative weak or redundancy feature. In order to magnify discrimination between all these feature and improve the accuracy of object tracking, we introduce Bhattacharyya distance to measure the separability between two probability distributions for low-dimensional features of positive (negative) samples, so that feature with larger separability can be adaptively selected to train a classifier. As shown in Figure 2, all samples are manifested as Gaussian distribution, where we also note that there are big differences between features of positive-sample 2, 3 and that of negative-sample and it was more difficult to distinguish the positive-sample 2 and negative-sample. Let us suppose the probability density distribution (PDF) of the

Schematic diagram for sample feature probability.
In either case,
In addition, a naive Bayesian classifier based on adaptive confidence-weighted is proposed, and shown as follows.
Background decision-making mechanism
The compressive tracking algorithm takes full advantages of the low-dimensional space feature of the object, and has high robustness when the contour of the object is clear and complete. However, lack of detection mechanism for the tracking objects will lead to the misclassification to the positive or negative samples when the object is under occlusion; That background information has not yet been fully utilized will cause tracking-point drift, even tracking failure. To solve this problem, the object detection mechanism is proposed in paper. The classification score of edge patch in the tracking bounding vertices is usually represented the detection threshold. The positive and negative samples will stop updating if the classification score of the edge patch is lower than the threshold.
When the occlusion has been detected by detection module, the relationship between the object and local background are modeled firstly by the conditional probability function of the spatial context model, where the PDF is defined as follows:
Notice that
Simplifying the equation (11), confidence model of the object prior probability is written as
According to the spatio-temporal relationship between object and background in context prior model(11) and object confidence model(12), our proposed tracking model can be expressed as:
Since a time-domain convolution is equivalent to its frequency-domain multiplication, so we can get
After the spatio-temporal context model is transformed by Fourier inversion, we can obtain the confidence value of each pixel-point, and maximum value is object location, so the computational method is denoted as
Occlusion detection
In paper, we proposed a fancy yet robust object tracking algorithm, combining spatial context feature with adaptive weighted compression tracking algorithm. Our algorithm deals with the boundary and center of bounding box in different ways so as to overcome the interference of complex background, which uses the weight to describe the interference coefficient of each patch in tracking bounding box. In other words, the more close to the object center, the bigger the weight will be. In order to obtain the distinguished low-dimensional feature of positive and negative samples to train the classifier, compression feature is adaptively selected so as to improve the robustness of classifier.21–24 Since the inaccurate updated parameter of positive and negative samples could cause tracking-point drifting, even tracking failure, a object detection mechanism is built on 4 patches centered around tracking bounding vertices. The object template will stop updating and store the score
The main framework of proposed tracking scheme
In order to reduce the computational complexity, a coarse-to-fine sliding window search strategy is adopted, significantly reducing computational cost. To conclude this section, we will summarize the main step of the proposed object tracking method. The concrete process is shown as follows:
The tracking bounding-box is searched and generated from a Within the scope of Stop updating positive and negative samples when occlusion detected by occlusion detection mechanism occurs, then the spatio-temporal context relationship between object and its background is modeled and obtained the classifier score If The sample is updated, where the range of positive samples is
Experimental results
In this section, the performance of our proposed tracking scheme is tested and analyzed. Due to the lack of object reference value in infrared image, it is very difficult to evaluate quantitatively our proposed algorithm.Therefore, the infrared image sequence obtained by the uncooled infrared detector is adopted in our experiment to measure manually the object position and scale so as to facilitate the quantitative and quantitative analysis of our tracking performance, where 56 researchers in related fields have calibrated and measured the position and size of the object frame by frame, and then averaged as the baseline data of the infrared target. In order to more fully verify the tracking ability of the proposed method, 12 nature image sequences with reference value is adopted to analyze qualitatively and quantitatively. The main innovation in the framework is the combination of adaptive weighted idea and occlusion detection mechanism.
Experiments configuration
All of the experiments are run under MATLAB v7.8 (R2009a) on PCs with an Intel quad-core i5 CPU at 3.3 GHz and 4 GB memory. Center location ratio(CLR) and Overlap ratio (OR) based on the ground truth are used to evaluate quantitatively the performance of tracking method. Our proposed algorithm is evaluated with other 4 state-of-the-art algorithms on 12 challenging sequences with publicly available. It is noted that all quantitative evaluation of proposed method are averaged over six independent trials. In addition, the infrared image sequences captured by infrared detector are used to analyzed qualitatively.
The 4 evaluated trackers are the Fast Compressive Tracker (FCT), 3 the spatio-temporal context tracker(STC), 4 MIL tracker, 5 and TLD tracker, 8 respectively. It is worth noticing that the most challenging sequences from the existing works are used for evaluation. All parameters in comparative algorithms are fixed for all the experiments to demonstrate the robustness and stability of our proposed tracking method.
Parameters setup
The range of positive sample is set to
Comparison of quantitative evaluation for nature sequences
In order to evaluate the effectiveness of our proposed algorithm, the success rate and the center location error are adopted to measure the accuracy of tracker. The former metric shows the coverage rate between the tracked bounding-box
Quantitative comparison for nature sequences.
According to the experimental results in Table 1, our proposed algorithm enjoys the highest success rate in most sequences. Our proposed tracking algorithm is the most efficient algorithm among all evaluated methods except for the STC method which is down only 0.1 in some sequences. In addition, our algorithm achieve much better results than the FCT algorithm in terms of center location error, shown the effectiveness of using adaptive weighted idea.
Even on a certain challenging video sequences containing occlusion, shape deformation, blur motion and illumination, such as Singer1, Faceocc2 and Occlusion1, our proposed scheme outperforms others existing algorithms, That demonstrates adaptive weighted idea and occlusion detection mechanism are very suitable for tracking and recapturing the object with occlusion. For Surfer3, when the man is easily lost within the sea, performance of proposed method outperforms other trackers. The reason is that adaptive weighted idea can make sure the classifier without disruption, and can recapture the object when object first appears. No matter how strong the occlusion is, it is easy to capture the corresponding local pattern of object. Therefore, our scheme always outperforms the FCT and STC.
It is not surprising that our method shows outstanding performances for the test sequences containing background occlusion and shape deformation. For Surfer3 and Bird1, our proposed method outperforms the TLD tracker and MIL tracker, and it is superior to FCT by 0.05∼0.1 and to CST by 0.04. For Jumping, our proposed method outperforms the FCT, STC, the TLD tracker and MIL tracker, and it is superior to FCT by 0.08, to STC by 0.1 and to TLD by 0.1.
Compared with the success rate(SR), the center location error(CLE) is a popular evaluation for measuring the tracking accuracy of object.24,25 To understand how the combination of adaptive weighted idea and occlusion detection mechanism tracks a object in stable, we also compare the metric of the proposed algorithm and other leading tacker techniques in the literature at different videos. The CLE results are also shown in Table 1. The best performance and the second best are highlighted by red and green font in each cell, respectively. We conclude the proposed method has achieved competitive CLE performance to the FCT and STC for the most test sequences. The success owe to the fact the proposed method aims to construct optimum classifier and weaken interference from background information. In addition, our tracker can accurately recapture object when it appears.
Note that the parameters in our tracker have not been optimized for speed and for quality in this experiment. They are set just on basis of experience. Optimized parameters may further improve the performances. This is another direction of future research. It is natural conclusion that the proposed method is faster than FCT and STC. The reason is that our algorithm reduces data dimension and remove the multi-scale filtering module.
The objects in the Girl, Singer1, Occlusion1 and Carl sequences are partially occluded at times. The object in the women sequence also undergoes rotation which makes the tracking tasks difficult. Only our proposed algorithm can successfully track object in most frames.
The Occlusion1 sequence has deformation and heavy occlusion, shown in Figure 3. All the other trackers fail to successfully track the object except for our proposed algorithms. Full sequences are adopted to better evaluate the performance of all algorithms. Only our proposed algorithm is able to achieve favorable tracking results in terms of both accuracy and success rate. This can be attributed to the use of adaptive weighted idea and occlusion detection mechanism.

Representative frames of some tracking result for nature images.(a) Boat; (b) person; (c) car; (d) corridor.
According to the results of comparison experiments above, it is not hard to see that our proposed algorithm has the best stability among various existing state-of-the-art object tracking algorithms in the process of object tracking. In the Boat and the Car, the objects are occluded and interfered by background. In these cases, our proposed algorithm can still realize accurate tracking, which shows that our adaptive weighted method, can effectively diminish the interference from background information. In the
Comparison of quantitative evaluation for infrared sequences
Due to the lack of object reference value in infrared image, so the infrared image sequence obtained is adopted in our experiment to measure manually the object position and scale so as to evaluate the accuracy, stability and coverage of our proposed tracking algorithm, as shown in Figure 4 and Table 2. Sequence 1(the first row of Figure 4) is white-hot image from the infrared detector, where the object is a van and the tracking process occlusion, background interference, rotation and similarity background. Sequence 2(the second row of Figure 4) is a black-hot image from the infrared detector, where the object is a truck and the tracking process appears pose changes, occlusion, blur and other interference. The TLD and MIL tracking algorithms will tracking-point drifting due to the interference of the poles and weeds. With the accumulation of the error, the tracking bounding-box will gradually deviate from the object, which is mainly because the generalization ability of the tracking model is not enough and can not be fully adapted complex background and significant appearance changes. Our proposed algorithm, FCT and STC algorithm can track the object, but our bounding-box always surrounds the object and STC has a certain drift. Qualitative analysis shows that our proposed method has better tracking stability than other existing state-of-the-art algorithms in handing various challenging infrared videos, especially occlusion and shape deformation.

Representative frames of some sampled tracking result for infrared images.
Quantitative comparison for infrared sequences.
Table 2 shows the quantitative comparison of the different tracking algorithms for infrared images. It can be seen from the data in Table 2 that the algorithm proposed in this paper has achieved good tracking effect for 6 different infrared scenes. For example, in the tracking process of IR sequence 1, the tracking accuracy of the other three methods is better, except for TLD and MIL, mainly because the TLD and MIL methods begin to drift in the 139th frame. The SFC algorithm has the drift and the STC algorithm will occasionally jump due to similar background disturbances, which results in a greater influence on the OR and the CLE. For Sequence 2, our proposed method outperforms the FCT, STC, the TLD tracker and MIL tracker, and it is superior to FCT by 0.03, to STC by 0.08 and to TLD by 0.17. Our proposed algorithm in this paper can track the object better when it is occluded, and obtain good tracking precision, which is mainly due to the generalization ability of the multi-cue model.
Conclusion
In combination with infrared object tracking based on compression algorithm, our proposed algorithm in this paper mainly utilize sub-patch feature extraction to reduce the size of the features matrix, and decrease the storage space of random matrix. The weight of candidate samples depend on the location of feature patches, which weakens the interference from background. We use Bhattacharyya distance as the measure to adaptively extract the features of object so as to enlarge the difference between the object and its background and improve the robustness of classifiers. Meanwhile, we set a detection mechanism on the edge of the object; when the object is occluded, the object and local background can be modeled through spatio-temporal context algorithm so as to achieve an fine tracking. Compared with various existing state-of-the-art object tracking algorithms, many experimental results show that our proposed infrared object tracking algorithm has a good stability and high robustness in the process of object tracking.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of Shanxi Province (No.2014021022-5), the Technological Project of State Grid Corporation of China (No.5205301500).
