Real-time fast moving object tracking in severely degraded videos captured by unmanned aerial vehicle

Abstract

Object tracking for unmanned aerial vehicle applications in outdoor scenes is a very complex problem. In videos captured by unmanned aerial vehicle, due to frequent variation in illumination, motion blur, image noise, deformation, lack of image texture, occlusion, fast motion, and other degradations, most tracking methods will lead to failure. The article focuses on the object tracking in severely degraded videos. To deal with those various degradations, a real-time object tracking method for high dynamic background is developed. By integrating histogram of oriented gradient, RGB histogram and motion histogram into a novel statistical model, our method can robustly track the target in unmanned aerial vehicle captured videos. Compared to those existing methods, our proposed approach costs less resource in the tracking, significantly increases the tracking speed, and runs faster than state-of-the-art methods. Also, our approach achieved satisfactory tracking results on the challenging visual tracking benchmark, object tracking benchmark 2013, the supplementary experiments demonstrates that our method is more effective and accurate than other methods.

Keywords

Fast moving degradation optical flow real-time tracking correlation filter

Introduction

During the last 10 years, people witnessed the emergence of notable development in the technology and application of unmanned aerial vehicles (UAV). The military drones have become an important role in battlefield scouting. The reliability and efficiency of unmanned aerial vehicles make it easy to operate and maintain in the battlefield. In field of civilian, UAVs have been applied to proceed aerial surveillance, fast disaster monitoring, and short distance parcel delivering. There are large amount of companies developing UAV systems to keep their technology competitive, such as Amazon Prime Air and Googles Project Wing. The flight security will be a significant issue in the future, thus the UAV must have the ability to sense its surroundings during the flight.

Vision-based tracking systems are becoming ever more important in current UAV applications. Visual cameras are lightweight and less expensive, and above all, they provide more useful information than other sensors. With the abundant informations provided by the vision system, a drone can detect hidden military threats and take the appointed action. This technology has aroused lots of attention in the recent 10 years. A reliable UAV vision system should have the ability to automatically track objects. Research on such issues, as a central theme of computer vision, has been active for decades and has made lots of great achievement.

However, there are various challenges in UAV captured videos. As shown in Figure 1, because of the cost limitation, UAV captured videos often contains a lot of video quality degradations, such as defocus and motion blur. Even if the video is captured by a high frames per second (FPS) camera, once the UAV carries on abrupt rotation or other motion (1080p @ 60 FPS), severe degradation will occur in the video. Thus, how to deal with those degradations becomes an important problem in object tracking on UAV captured videos.

Figure 1.

These are the tracking results of proposed method in severely degraded video. (The yellow rectangles annotate our tracking results).

The proposed approach has been compared with the state-of-the-art methods on a sequence with challenging degradations, including high dynamic background, video transmission error, violent rotation, illumination and contrast change, motion blur, and fast motion. For common object tracking methods, the motion blur, camera rotation, and fast motion is quite challenging to achieve high quality tracking result for the UAV captured video because it does not do further processing for the degradation information. However, those previously proposed trackers are seldom specialized to handle degradations like motion blur and fast motion. Figure 1 is some successful tracking results of proposed method in severely degraded video, and the yellow rectangles indicate the position of the target.

To sum up, our contribution has three aspects: (1) the proposed method introduced three different features, histogram of oriented gradient (HOG), RGB histogram and histogram of the optical flow result into the statistical model; (2) we proposed a tracking method for the motion degradation of video by analyzing the optical flow of the target; and (3) the proposed technique performs better than the state-of-the-art tracker which is evaluated by a widely applied benchmark—object tracking benchmark 2013 (OTB2013). The overview of our algorithm is given in Figure 2.

Figure 2.

Overview of our algorithm (Firstly, we calculate the color, gradient, and optical flow distribution of the frame sequence. Secondly, we combine those distribution into our statistical degradation model with optical flow. Finally, we use correlation filter to track the target.)

The rest of the article is organized as follows: the related work on object tracking is briefly and systematically reviewed in “Related works” section; in “The proposed approach” section, we present a detailed description of the proposed algorithm; and in “Experiments” section, we give out the quantitative and qualitative experiments’ results, some of the limitations of our approach is also discussed; and finally, the conclusion of this article is bring out in the last section.

Related works

In the coming decades, UAV-related computer vision applications are very promising. When tracking target in UAV captured videos, people need to use an online object tracking algorithm. Algorithm for robustness improvement of the results,¹ Struck² has concise formulations, with the purpose to find the minimum of the localized structured object output.³ However, the various features and amount of the training samples make those algorithms lack efficiency.

Correlation filter⁴ is to find the minimum in all the cyclic shifts of the positive example from the least squares loss. This doesn’t seem to be a proper approximation to the current real world, by using the Fourier domain and dense sampling examples and high-dimensional feature images, it can easily achieve real time. The disadvantage of correlation filters is that they are limited to learning from all cyclic shifts. Several recent works^5
–7 have attempted to solve this problem, especially spatial regularization,⁸ and formulation has shown good tracking results. However, this is at the expense of real-time operations,⁹ they extend to multiple feature channels, so the HOG feature¹⁰ enables the technology to implement the most advanced performance¹¹ in VOT14. DSST⁸ challenger winners incorporate multi scale templates using one-dimensional correlation filters to distinguish scale space tracking.

Current correlation filters-based method¹² are inherently limited in learning rigid template problems. This is a problem when a target undergoes shape deformation in sequence. Perhaps the simplest way to achieve robustness to deformation is to use a representation¹³ insensitive to shape changes. The image histogram has this property because they discard the position of each pixel. In fact, the histogram can be considered orthogonal to the correlation filter because the correlation filter is learned from the cyclic shift, while the histogram is invariant for cyclic shifts.^14,15 However, separate histograms are usually insufficient to distinguish objects from backgrounds.

The primary alternative to achieving deformation robustness is to learn a deformable model. We think that learning deformable models from a single video is very meaningful, and the only monitoring is the position in the first frame, so a simple bounding box is adopted. Although our approach is superior to the recent sophisticated parts-based model^16,17 in benchmarking, deformable models have richer representations that can do tracking better.

In most common situations, the target template which select from the frame can be used to generate the highest score for the target. However, the scores of the background are also unavoidably high. To overcome this issue, a variety of correlation filters is proposed recently. For the ASEF,¹⁸ this filter combines different learned filters by averaging over them, while MOSSE⁹ filter try to train all the images. By solving ridge regression problem and circulant matrices, kernelized correlation filters can be proposed for object tracking. In its theory, the linear kernel of KCF¹⁹ can be the same with MOSSE⁹ filter, if the multiple samples of one channel are introduced for training progress. A general case which is using several multichannel information to train filters needs expensive computation costs, which is quiet inappropriate in real-time online visual tracking. The obvious differences between STC²⁰ and other proposed training schemes are introduced in the following aspects. Firstly, STC²⁰ is proposed to model the interrelationships between the object itself and its local spatial contents, while normal CFTs just model the input appearance with some trained filters. Secondly, the calculated values of the confidence map in STC²⁰ can be considered as prior probabilities given the tracking object, while values in confidence maps of other CFTs are correlation responses. Thirdly, the algorithm of STC²⁰ is capable of estimating the variations of scale, which is challenging for CFTs like MOSSE9 and KCF.¹⁹ However, the newly proposed discriminative correlation filter tracking algorithms have caused great concern and accomplished some remarkable achievement.

The optical flow filter²¹ is a real-time optical flow algorithm implemented on GPU, which can obtain the motion information of the image pixelwise, which can help us calculate the degradation information of the target.

The proposed approach

Statistical degradation model with optical flow

In our statistical degradation model, we use the i to be the number of frame, we choose l_i to denote the bounding-box, so that it can determine the target position in the frame x_i. The bounding-box l_i is chosen from the set S_i to find the maximum score

l_{i} = \underset{l \in S_{i}}{arg max} f (T (x_{i}, l); ρ_{i - 1}))

where T is a transformation function of the frame, $f (T (x, l); ρ)$ can assign the score to the bounding-box l in frame x by the designed parameters ρ. The designed parameters should be given to find the minimum of the loss function $L (ρ; X_{i})$ that object depends in those on the former frames and the position of the object in those frames $X_{i} = {(x_{j}, l_{j} {)}}_{j = 1}^{i}$

ρ_{t} = \underset{ρ \in Q}{arg min} {L (ρ; X_{i}) + λ G (ρ)}

The parameters space is Q. In this section, we use regularization term G(ρ) with a weight λ to keep the model complexity and eliminate overfitting. The position l₁ is the object position in the first frame. To get a real-time speed, functions f and function L is given to find the position of the object efficiently, accurately, and reliably. The score function is proposed to combine template, histogram, and degradation scores

f (x) = γ_{grad} f_{grad} (x) + γ_{hist} f_{hist} (x) + γ_{degr} f_{degr} (x)

The whole model parameter $ρ = (α, β, ω)$ , because λ_grad, λ_hist, and λ_degr is the implicit in α, β, and ω. Loss of the training that can be optimized to find the best parameters, which is considered to be the weighted linear combination of the losses in each image

L (ρ, x_{t}) = \sum_{i = 1}^{T} w_{i} U (x_{i}, l_{i}, ρ)

And the image loss function in each image could be

U (x, l, ρ) = d (l, \underset{m \in S}{arg max} f (T (x, m); ρ))

in which d(l, m) denotes the cost of the dedicated rectangle m while the right rectangle is l.

Those three feature scores are learnt in our model

\begin{array}{l} α_{i} = \underset{α}{arg min} {L_{grad} (h; X_{t}) + \frac{1}{2} λ_{grad} {‖ α ‖}^{2}} \\ β_{i} = \underset{β}{arg min} {L_{hist} (β; X_{t}) + \frac{1}{2} λ_{hist} {‖ β ‖}^{2}} \\ ω_{i} = \underset{ω}{arg min} {L_{degr} (ω; X_{t}) + \frac{1}{2} λ_{degr} {‖ ω ‖}^{2}} \end{array}

where α denotes the color distribution scores, β the gradient distribution scores, and ω the degradation distribution scores.

Finally, we give out a overall combination of the three scores, using $λ_{grad} = 1 - η_{1} - η_{2}, λ_{hist} = η_{1}, and λ_{degr} = η_{2}$ . The maximum final score is considered as the center of the target in current frame.

Statistical degradation features

In this article, three different statistical features are applied to describe the target in object tracking progress. Color-based statistical feature is used to cope with deformation and defocus, gradient distributions are combined to eliminate the influence of illumination change, and degradation distribution is used to deal with fast motion and motion blur. Figure 3 is the three different feature of a image patch. In which, (a) the origin image input, (b) the result of the optical flow which shows that the UAV is moving toward the top-left direction, (c) the degradation color map of the optical flow, the color indicates the directions of the patches and RGB value indicates the amplitude of motion, and (d) the histogram of optical flow of the input image.

Figure 3.

(a) The origin image input, (b) the result of the optical flow which shows that the UAV is moving toward the top-left direction, (c) the degradation color map of the optical flow, the color indicates the directions of the patches and RGB value indicates the amplitude of motion, and (d) the histogram of optical flow of the input image.

Gradient distribution

Obtained by HOG feature, with a correlation filter formulation using least squares, the image loss in each frame is

ℓ_{grad} (x, l, α) = {‖ \sum_{n = 1}^{N} α^{n} ⋆ ϕ^{n} - y ‖}^{2}

where α_n is the channel n of multichannel frame image α, ϕ is the short form of $ϕ_{T (x, l)}$ , and y is the expected score (usually we use a maximum value 1 Gaussian function at the first time), and ⋆ denotes the periodic cross-correlation.

Color distribution

The RGB histogram score is calculated from the samples in each image using the correct location as a positive sample. We use W to denote the set pairs (m, y) of rectangular box m and the corresponding regression result y ∈ R, including the positive sample (p, 1). And the image loss in each frame is then

ℓ_{hist} (x, p, β) = \sum_{(m, y) \in W} {(β^{T} [\sum_{u \in H} ψ_{T (x, m)} [u]] - y)}^{2}

Degradation distribution

The degradation information D is calculated by the optical flow filter,²¹ this method can obtain satisfying optical flow result with 300 FPS.

The degradation distribution is the histogram of D, and the degradation score is calculated as the same way as the RGB histogram score. And the image loss in each frame is

ℓ_{degr} (x, l, ω) = \sum_{(m, y) \in W} {(ω^{T} [\sum_{u \in D} ψ_{T (x, m)} [u]] - y)}^{2}

The maximum final score is considered as the center of the target in current frame. The Figure 4 is a visual view of the overall parameters and the final score.

Figure 4.

(a) The input frame patch, (b) the per-pixel score of the color response map, (c) the gradient distribution score map, (d) the color distribution score map, (e) the optical flow distribution score map, and (f) the final score map. The maximum of the score map indicates the predicted position of the target.

Experiments

The proposed approach is implemented in Matlab and runs on a Computer Vision Server with dual Intel Xeon E5-2670 2.60 GHz CPU, 32 GB RAM, and a graphic card GTX1080Ti. The proposed tracker is evaluated on popular OTB2013,¹ and some other sequences captured by our UAVs. The common frame attributes in our video is $1920 \times 1080 @ 60 FPS$ .

The OTB2013 data set contains 50 various sequences, includes a variety of scenes with challenging conditions, such as in-plane rotation, out-of-plane rotation, out-of-view, background clutters, low resolution, illumination variation, scale variation, occlusion, deformation, motion blur, and fast motion.

Parameters evaluation

In our implementation, the input images patch is firstly resized to 150 × 150 to achieve real-time tracking. We compared the proposed approach with eight state-of-the-art tracker in a popular benchmark OTB2013, the OTB2013 database have been manually tagged with nine attributes, which represents the challenging aspects in visual tracking. And 29 publicly available visual trackers are already tested in the benchmark.

Two different experiments are designed to determine the parameter of our method. In Figure 5, we draw the result of different optical flow merging factor (from 0.10 − 0.30) and the different optical flow learning rate (from 0.10 − 0.40) using line chart. From the line chart, we can easily find the best optical flow merging factor is 0:15, and the best optical flow learning factor is 0:20.

Figure 5.

(a) The different optical flow merging factor (from 0.10 to 0.30) and (b) the result of different optical flow learning rate (from 0.10 to 0.40).

Comparison with the state-of-the-art trackers

In this section, we compared our tracker with some state-of-the-art trackers. The performance of our algorithm is evaluated quantitatively, following the method used by Kristan et al.¹¹ We evaluate the proposed method by comparing to the eight state-of-the-art trackers: Staple,⁴ Struck,² TLD,²² CXT,²³ TM,²⁴ LOT,²⁵ OAB,²⁶ and MTT.²⁷

As shown in the Figure 6, our method outperforms other state-of-the-art trackers. The evidence of our superiority is that, our method obtains the best performance in the average precision (0.748), which is 13.8% superior to the second best tracker Staple. Besides, in 11 sequences our tracker achieved first or second place. The comparison results of success rate and execution speed on the 13 sequences are given in Table 1. The best results are highlighted in boldface and the second best are in underline fonts. However, the application of optical flow makes our method slightly slower than Staple,⁴ which is still faster than other seven algorithms.

Figure 6.

These figures are the result of some popular methods in OTB2013, (a) to (c) are the precision plot of fast motion data sets, motion blur data sets, and all data sets, (d) to (f) are the success rate plot of fast motion data sets, motion blur data sets, and all data sets (The red curve is the result of our method which has the best performance, the green curve is the state-of-the-art method Staple⁴ and the score in the legend is the average score of the OTB2013). OTB2013: object tracking benchmark 2013.

Table 1.

Success rate and FPS on the 13 sequences.

Sequence	Our method	Staple4	Struck2	TLD22	CXT23	TM24	LOT25	OAB26	MTT27
Boy	1.000	1.000	0.985	1.000	0.849	0.995	0.656	0.990	0.522
CarScale	0.837	0.794	0.639	0.607	0.722	0.278	0.464	0.635	0.647
Couple	0.950	0.657	0.586	1.000	0.621	0.429	0.571	0.457	0.629
Dudek	0.737	0.721	0.746	0.474	0.695	0.586	0.509	0.486	0.662
Fleetface	0.465	0.448	0.553	0.328	0.397	0.397	0.457	0.325	0.335
Ironman	0.193	0.145	0.048	0.090	0.030	0.054	0.096	0.030	0.090
Jumping	0.930	0.278	1.000	0.949	0.799	0.907	0.958	0.051	0.099
Lemming	0.393	0.263	0.469	0.784	0.704	0.404	0.608	0.478	0.356
Liquor	0.909	0.967	0.384	0.526	0.207	0.346	0.892	0.403	0.195
Matrix	0.610	0.420	0.120	0.090	0.040	0.060	0.070	0.350	0.330
Soccer	0.732	0.265	0.240	0.107	0.176	0.120	0.265	0.087	0.173
SUV	0.972	0.975	0.560	0.903	0.905	0.743	0.786	0.753	0.519
Woman	0.997	0.993	0.972	0.188	0.328	0.198	0.142	0.571	0.201
Average precision rate	0.748	0.610	0.562	0.542	0.498	0.424	0.498	0.432	0.366
FPS	72.9	74.3	21.2	28.3	14.3	55.8	0.8	22.4	1.3

FPS: frames per second.

We can also see that, in common sequences, most of the trackers perform well. But when the sequence have different perturbations like motion blur and fast motion, lots of the trackers fail to finish the tracking procedure. For example, in the boy sequence a lot of trackers performed well, but in the soccer sequence which contains severe degradation, our method achieved satisfying result (0.732) while other trackers can hardly reach 0.3.

Actually, various kinds of degradations exist in those videos captured by UAVs, which are the most important issues to cope with in object tracking for UAV videos. In Table 2, the success rate and execution speed on the four UAV sequences of the proposed and the eight competing trackers.

Table 2.

Success rate and FPS on the UAV sequences.

Sequence	Our method	Staple4	Struck2	TLD22	CXT23	TM24	LOT25	OAB26	MTT27
UAV1	1.000	0.903	0.979	0.759	0.793	0.798	0.537	0.931	0.349
UAV2	0.783	0.694	0.497	0.585	0.596	0.248	0.363	0.458	0.531
UAV3	0.912	0.501	0.454	0.540	0.612	0.308	0.402	0.287	0.470
UAV4	0.641	0.610	0.737	0.460	0.516	0.438	0.506	0.337	0.563
Average precision rate	0.834	0.677	0.666	0.586	0.629	0.448	0.452	0.503	0.478
FPS	72.4	73.6	22.3	26.7	13.5	51.5	0.8	21.6	1.3

FPS: frames per second; UAV: unmanned aerial vehicle.

In summary, the motion blur and fast motion are great challenges in object tracking, in our method, we combined three different feature to overcome the tracking problem. And we outperform other trackers in the experiments, especially in UAV videos. The result of OTB2013 in Figure 6 shows that our result is much better than others in fast motion and motion blur sequences. And in overall evaluation, we achieve similar result with Staple, which suggest that we did not introduce more error into the method.

Limitation

Our method could obtain satisfying result in most conditions, even if there are severe motion blurs and camera distortions in the sequences (Figure 7). However, with the encounter of longtime occlusion, our method could fail in tracking the target. That is because the occlusion cloud change the template, and influence the tracking result.

Figure 7.

These are the tracking results (first and third column) and optical flow results (second and fourth column) of proposed method in severely degraded situation. (The yellow rectangles annotates our result).

In Table 3, we report the values of the most important parameters we use.

Table 3.

Parameters in our method.

Parameter	Value
Color features	RGB
Color histogram bins	32 × 32 × 32
Degradation histogram bins	32 × 32 × 32
HOG Merge factor η₁	0.5
Optical flow Merge factor η₂	0.15
Color Learning rate ς_c	0.01
HOG Learning rate ς_g	0.04
Optical flow Learning rate ς_o	0.20

HOG: histogram of oriented gradient.

Advantages

Our algorithms perform in a more satisfying manner when there are obvious variation in motion (Figure 6). The sequences captured by UAV often possess severe degradations, as shown in Figure 6, and according to the optical flow the motion in the frame is very severe.

Furthermore, severe movements, such as fast motion, motion blur, and nonuniform degradation, are usually grim challenge for object tracking, these abnormal movements usually do not follow the movement hypothesis. Most of the algorithms fail tracking the target when the target accelerate or change the direction of motion in nonuniform degradation data sets. Also, Staple4 suddenly fail to track the target when the target abruptly turn to another direction, which makes the target very fuzzy and take a lot of useless information.

As shown in Figure 6, our method performs much better than other methods in the severely degraded videos, that is because we have applied degradation information into the tracking procedure. The degradation information is a misplaced resource, and it could be useful by combining it into the tracking progress. In our algorithm, the degradation is considered as the motion direction, unlike the HOG feature, motion direction gives more information of the moving target, which makes our method outperform others.

All the methods in OTB2013 are in contrast to our article, and only the best 10 of the methods are shown in Figure 5. Common tracking methods(like STRUCK,² MIL,²⁸ TLD,²² CT,²⁹ KCF¹⁹) inevitably failed to track the target in severely degraded videos. Obviously, our method outperforms other methods very much. However, because of the limitation of HOG and RGB histogram, our method performs similar to Staple⁴ when the sequence have no degradation.

Conclusion

We proposed a statistical degradation model in this article, in which, three advantageous features are combined to make the model sensitive to deformation, color change, and degradation. The color distribution is generated simply by the RGB histogram, the gradient distribution is calculated by the HOG feature, and the degradation distribution of the target is obtained by calculating the histogram of motion direction. With those three features our method could achieve outstanding result when the degradation of the video occurs, and performs as good as Staple,⁴ when there is no degradation. Although the proposed tracker performs very well in most image sequences in our experiments, it could not handle occluded scene very well.

In the future work, we plan to improve our model with features calculated by deep learning. That would further increase the overall performance of our algorithm. The speed of our algorithm is approximatively 80 FPS. With the help of deep neural network, we look forward to improving its result in the future.

Footnotes

Authors’ note

This paper was presented in part at the CCF Chinese Conference on Computer Vision, Tianjin, 2017.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Zhejiang Provincial Natural Science Foundation of China under Grant number LY15F020031 and LQ16F030007, National Natural Science Foundation of China (NSFC) under Grant numbers 11302195 and 61401397.

References

Lim

Yang

. Online object tracking: a benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Portland, OR, 2013, pp. 2411–2418.

Hare

Golodetz

Saffari

. Struck: structured output tracking with kernels. IEEE Trans Pattern Anal Mach Intell 2016; 38(10): 2096–2109.

Blaschko

Lampert

. Learning to localize objects with structured output regression. In: Forsyth

Torr

Zisserman

(eds) Computer vision – ECCV. Lecture notes in computer science. Vol. 5302. Berlin, Heidelberg: Springer, 2008.

Bertinetto

Valmadre

Golodetz

. Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, 2016, pp. 1401–1409.

Naresh Boddeti

Kanade

Vijaya Kumar

. Correlation filters for object alignment. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Portland, OR, 2013, pp. 2291–2298.

Henriques

Carreira

Caseiro

. Beyond hard negative mining: efficient detector learning via block-circulant decomposition. In: Proceedings of the IEEE international conference on computer vision, Sydney, NSW, 2013, pp. 2760–2767.

Kiani Galoogahi

Sim

Lucey

. Multi-channel correlation filters. In: Proceedings of the IEEE international conference on computer vision, Sydney, NSW, 2013, pp. 3072–3079.

Danelljan

Shahbaz Khan

Felsberg

. Adaptive color attributes for real-time visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, OH, 2014, pp. 1090–1097.

Bolme

Beveridge

Draper

. Visual object tracking using adaptive correlation filters. In: 2010 IEEE conference on, computer vision and pattern recognition (CVPR), San Francisco, CA, 2010, pp. 2544–2550. IEEE.

10.

Dalal

Triggs

. Histograms of oriented gradients for human detection. In: IEEE computer society conference on, computer vision and pattern recognition, 2005. CVPR 2005, San Diego, CA, 2005, Vol. 1, pp. 886–893. IEEE.

11.

Kristan

Matas

Leonardis

. The visual object tracking vot2015 challenge results. In: Proceedings of the IEEE international conference on computer vision workshops (ICCVW), Santiago, 2015, pp. 1–23.

12.

Yang

Zhang

. Long-term correlation tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5388–5396.

13.

Possegger

Mauthner

Bischof

. In defense of color-based model-free tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, 2015, pp. 2113–2120.

14.

Nummiaro

Koller-Meier

Van Gool

. Object tracking with an adaptive color-based particle filter. Pattern Recognit 2002; 2449: 353–360.

15.

Pérez

Hue

Vermaak

. Color-based probabilistic tracking. Comput Vis ECCV 2002; 2350: 661–675.

16.

Cai

Wen

Lei

. Robust deformable and occluded object tracking with dynamic graph. IEEE Trans Image Process 2014; 23(12): 5497–5509.

17.

Xiao

Stolkin

Leonardis

. Single target tracking using adaptive clustered decision trees and dynamic multi-level appearance models. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, 2015, pp. 4978–4987.

18.

Bolme

Draper

Beveridge

. Average of synthetic exact filters. In: IEEE conference on, computer vision and pattern recognition (CVPR), Miami, FL, 2009, pp. 2105–2112. IEEE.

19.

Henriques

Caseiro

Martins

. High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 2015; 37(3): 583–596.

20.

Zhang

Liu

. Fast visual tracking via dense spatio-temporal context learning. In: Fleet

Pajdla

Schiele

(eds) Computer vision-ECCV. lecture notes in computer science, Vol. 8693. Cham, Switzerland: Springer, 2014, pp. 127–141.

21.

Adarve

Mahony

. A filter formulation for computing real time optical flow. IEEE Robot Autom Lett 2016; 1(2): 1192–1199.

22.

Kalal

Mikolajczyk

Matas

. Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 2012; 34(7): 1409–1422.

23.

Dinh

Medioni

. Context tracker: exploring supporters and distracters in unconstrained environments. In: 2011 IEEE conference on, computer vision and pattern recognition (CVPR), Providence, RI, 2011, pp. 1177–1184. IEEE.

24.

Collins

Liu

Leordeanu

. Online selection of discriminative tracking features. IEEE Trans Pattern Anal Mach Intell 2005; 27(10): 1631–1643.

25.

Oron

Bar-Hillel

Levi

. Locally orderless tracking. Int J Comput Vis 2015; 111(2): 213–228.

26.

Grabner

Bischof

. Real-time tracking via on-line boosting. In: Proceedings of British machine vision conference (BMVC), Edinburgh, UK, 4–7 September 2006.

27.

Zhang

Ghanem

Liu

. Robust visual tracking via multi-task sparse learning. In: 2012 IEEE Conference on, computer vision and pattern recognition (CVPR), pp. 2042–2049. IEEE.

28.

Babenko

Yang

Belongie

. Robust object tracking with online multiple instance learning. IEEE Trans Pattern Anal Mach Intell 2011; 33(8): 1619–1632.

29.

Zhang

Yang

. Real-time compressive tracking. In: Fitzgibbon

Lazebnik

Perona

(eds) Computer vision – ECCV. Lecture notes in computer science, Vol. 7574. Berlin, Heidelberg: Springer, pp. 864–877.