Tracking objects using Grassmann manifold appearance modeling based on wireless multimedia sensor networks

Abstract

Visual object tracking methods based on wireless multimedia sensor network is one of the research hotspots while the present linear method for processing feature vectors often lead to the tracking drift when tracking object with significant nonplanar pose variations through wireless sensor networks. In this article, we propose a novel nonlinear algorithm for tracking significant deformable objects. The proposed tracking scheme has two filters. On one hand, considering that Grassmann manifold is one of entropy manifold in Lie group manifold, which can describe and process the data of appearance feature more accurately, one filter is designed on it, to estimate the object appearance, by making full use of the transformation relationship between the point on manifold and its corresponding point on tangent space. On the other hand, considering that the process of objects imaging is essentially projection transformation process, the other filter is designed on projection transformation (SL(3)) group, describing the geometric deformation of the objects. The two filters execute alternatively to mitigate tracking drift. Extensive experiments prove that the proposed method can realize stable and accurate tracking for targets with significant geometric deformation, even obscured and illumination changes.

Keywords

Multimedia wireless sensor networks Grassmann manifold projection transformation group tangent space particle filter

Introduction

Wireless multimedia sensor network is a new type of sensor networks with multimedia applications, which is widely used in multimedia network monitoring, intelligent transportation, environmental monitoring, industrial control, and other fields. The visual object tracking methods based on wireless multimedia sensor network is one of the research hotspots.

Various methods have been designed to track the dynamic target. Ross et al.¹ presented a tracking algorithm to learn a subspace representation in low-dimension, efficiently adapting online deformation of the object appearance, and to update the model according to incremental algorithms for principal component analysis. Wang et al.² designed appearance models of the object based on partial least squares (PLS) analysis. However, because of video object with large nonplanar pose variations, these tracking methods often lead to drifting or even failure. The main reason is that the feature vectors describing the object appearance do not obey a single vector space, so the traditional linear method for processing feature vectors does not meet the actual need.

Tracking algorithm based on manifold^3–6 has become one of the hot research topics in video target tracking domain. In the early stage, the tracking work on manifold subspace applies a conjugate gradient and Newton’s method to track the target on Grassmann manifold and Stiefel manifold.⁷ Srivvastava and Klassen⁸ designed piecewise geodesics on Grassmann manifold and used projection matrices in the subspace to track the simulation of synthetic signal. But all these methods do not apply to the video image tracking field. Wang et al.⁹ used Kalman filter on the velocity vectors in the tangent space of Grassmann manifold to achieve object tracking. For the inaccuracy of both the low-dimensional velocity vectors model and Kalman filter, this method can only track the object with moderate pose or appearance changes, and the tracking result will be drift or failure when it tracks the objects with significant pose or appearance changes. Grassmann manifold also applies in classification, human behavior analysis, object recognition,¹⁰ and some other fields. Cetingul and Vidal¹¹ proposed mean shift on Grassmann manifold and Stiefel manifold and perform object categorization and segmentation of multiple objects. Turaga et al.¹² computed distance and design estimators of density on Grassmann manifold and the Stiefel manifold and realize conditional densities estimator on shape classification, face recognition, and activity recognition. Begelfor and Werman¹³ built an affine invariant clustering subspace of image data using the structure of Grassmann manifold. Subbarao and Meer¹⁴ proposed mean shift which is nonlinear on Riemannian manifold for nonlinear filtering and segmentation process. Lui et al.¹⁵ accomplished online learning for local line manifold of appearance by applying a coarse to fine sampling for particle filtering. Qiao et al.¹⁶ used offline manifold strategy on face data set to realize face tracking. Covariance manifold has also been used in object tracking field. Porikli et al.¹⁷ and Forstner and Moonen¹⁸ designed a feature covariance matrix to describe object region and used the positive definite matrices on Lie group structure to estimate covariance matrices similarity. Wu et al.¹⁹ applied incremental covariance tensor learning to design a tracking method on Riemannian manifold, which combines the framework of particle filter to process background clutter efficiently. Khan and Gui^20,21 proposed a visual object tracking method using piecewise geodesic under a Bayesian framework and geometrical structure under Riemannian manifold.

Although the above methods based on manifold have achieved reasonably good results, it is still a challenge for tracking objects in complex scenes.^22,23 For example, tracking objects with significant pose variations or heavy deformation, or long-term partial occlusion. The main problem of these methods may be the low efficiency of online-learning method on manifold or the updating strategy for occlusion.

Lie group manifold not only contains differential manifolds but also contains the related theory of groups, and differential manifolds contain topological manifolds and differential structures. Considering that Grassmann manifold is one of entropy manifold in Lie group manifold,²⁴ it not only has the space expression of smooth surfaces but also has the character of much fitting for measuring the distance between feature spaces of deformable objects. Therefore, establishing appearance model and online updating strategy using intrinsic mean and geodesic distance based on the Grassmann manifold can describe the appearance variations of the deformable object more accurately. In addition, particle filter has nonlinear character and does not limit to Gaussian noise, so designing a particle-filtering method on low-dimensional manifold can not only realize more stable target tracking with large nonplanar pose variations but also can effectively improve the real-time performance.

Motivated by the methods mentioned above, we represent a new visual object tracking algorithm using Grassmann manifold appearance and SL(3) group shape modeling. The proposed tracker scheme has double particle filters, one of which is on Grassmann manifolds, which is explored to estimate the object appearance by making full use of the intrinsic geometry characteristics of the state space, the other is on SL(3) group, describing the geometric deformation of the object. The two filters execute alternatively to mitigate the tracking drift. For object occlusion, we explore online-learning strategy to avoid abnormal information introducing into the feature space to ensure its accuracy.

Grassmann manifold and its metrics

The feature vectors describing the object appearance do not obey a single vector space while the Lie group manifold space can describe the object appearance more efficiently. When we apply the tracking method on Grassmann manifold, we use the distance between the two features to represent the similarity between them, which is computed using Grassmann manifold. In order to make clear how to compute the geodesic distance on Grassmann manifold, the geodesic distance on Grassmann manifold is formulated as follows.

The points on Grassmann manifold $Gr (k, n)$ are the set of equivalence classes of $n \times k$ dimensional orthogonal matrix, that is

Gr (k, n) = {⌈ Y ⌉}_{O_{k}} = {YV : V \in O_{k}}

(1)

where Y denotes $n \times k$ dimensional orthogonal matrix, $⌈ Y ⌉$ represents the relationship of equivalence classes, and V is $k \times k$ dimensional orthogonal matrix.

Grassmann manifold $Gr (k, n)$ can also represent the set of all the k-dimensional subspaces in n-dimensional vector space $R^{n}$ . Grassmann manifold has the representation form of quotient space $Gr (k, n) = O (n) / (O (n - k) \times O (k))$ , which is the remaining portion in the orthogonal Lie group removing the swirling of coplanar and non-coplanar.

The sectional curvature is

K_{Gr (n, k)} (X, Y) = ‖ [X, Y] ‖^{2}

(2)

where $X, Y$ is an antisymmetric matrix. And the common method of defining a metric structure on manifold M is to assign inner product $〈 \cdot, \cdot 〉$ for the tangent space $T_{p} M$ of any point $p \in M$ , which is known as Riemann metric. For any point $p \in Gr (k, n)$ , the definition of tangent space is defined as

T_{p} Gr (k, n) = {ω | ω = p_{⊥} g, g \in R (n - k, k)}

(3)

where $p_{⊥}$ is the orthogonal complement for point p. The metric on the $Gr (k, n)$ is defined as

‖ ω ‖ = Tr (ω^{T} ω)

(4)

Let $γ : t \Rightarrow γ (t)$ be the geodesics with initial point $γ (0)$ and initial vector $d γ / dt (0) = ω$ , and the exponential map $Ex p_{p} (ω) = γ (1)$ defines the end of the geodesics

Ex p_{p} (ω) = pV \cos (θ) + U \sin (θ)

(5)

where p and q are any two points on Grassmann manifold. And $θ$ is the principal angle between p and q. Meanwhile $U θ V^{T} = SVD (ω)$ , where SVD represents the singular value decomposition. And the corresponding inverse mapping is $Lo g_{p} (q) = U θ V^{T}$ , where $θ = \arctan (S)$ and $US V^{T} = p_{⊥} p_{⊥}^{T} q (p^{T} q)^{- 1}$ . Therefore, the geodesic distance from point p and q under Grassmann manifold is

d_{G} (p, q) = {(\sum_{i = 1}^{k} θ_{i}^{2})}^{1 / 2} = ‖ θ ‖_{2}

(6)

We gain the distance on Grassmann manifold through formula (6).

Tracking model

Dynamic model

The accuracy of the dynamic model is one of the main factors to impact tracking performance because the target state changes continuously in tracking sequence. For the changes of a target in continuous sequence can be described as their corresponding points moving on Riemannian manifold, the thought for designing dynamic model is to design the dynamic transformation relationship between two adjacent points on the manifold. In this article, projection transformation is used for describing the location and deformation of a target. And the tangent vectors of the points on the manifold are used to design the dynamic transformation relationship. The dynamic model on Riemannian manifold and its tangent space are designed as

S_{t} = S_{t - 1} + v_{t} S_{t} : p (S_{t} | S_{t - 1}) ~ N (0, Ω)

(7)

where $t = {1, 2, 3, \dots}$ is the video frame number, $S_{t} = [x_{1}, x_{2}, x_{3}, x_{4}, x_{5}, x_{6}, x_{7}, x_{8}]^{T}$ is the projection transformation parameter for the object state. $V_{t}$ is defined as the velocity vector linking the point $S_{t - 1}$ and $S_{t}$ , which is the tangent vector beginning at $S_{t - 1}$ . And $V_{t}$ represents the object motion.

We can describe the proposed method as that the candidate points $S_{t}^{j} (j = 1, \dots, N)$ is built using particle filter on Riemannian manifold. Let $S_{t - 1}^{j}$ be the points on the manifold at the moment $t - 1$ and $v_{t - 1}^{j}$ be the corresponding velocity vector that links the points $(S_{t - 1}^{j}, S_{t}^{j})$ , where $S_{t}^{j}$ is the other end of the geodesic beginning from $S_{t - 1}^{j}$ . while $S_{t - 1}^{obj}$ is the tracked object points of frame number $t - 1$ , and $U_{t - 1}$ is the tracked basis matrix of frame number $t - 1$ .

Particle filter 2 is designed to forecast the geometric deformation of the target, which is shown as Figure 1. In this filter, the input variables are $S_{t - 1}^{obj}$ , $v_{t}$ , and $U_{t - 1}$ . Filter 2 first predicts the state vector $S_{t}^{j}$ , $j = 1, 2 \dots, M$ , where M is the number of samples according to formula (7). Then, it calculates the image basis matrix $U_{t}^{j}$ corresponding to each $S_{t}^{j}$ according formula (5) and gains the Log-Euclidean distance between each $U_{t}^{j}$ and the tracked basis matrix $U_{t - 1}$ at the moment t – 1, which is $d^{2} (U_{t}^{j}, U_{t - 1})$ . Finally, it outputs the basis matrix $U_{t} = U_{t}^{j *}$ and the State vector $S_{t}^{obj} = S_{t}^{j *}$ where $j^{*} = \arg max_{j} (| | d^{2} (U_{t}^{j}, U_{t - 1}) | |^{2})$ . So the variable $S_{t}^{obj}$ is the tracked geometric deformation of the target at frame t.

Figure 1.

The complete tracking scheme.

Observation model

Based on the method for metric and solution on Grassmann manifold, the particle-filtering algorithm is designed to estimate the state of target appearance, hereinafter referred to as the particle filter 1.

Suppose the object appearance feature space is made up of the L latest tracked appearance of the object. The matrix form is as follows²⁵

Y_{t} = {[I_{t - L + 1}^{obj}, \dots, I_{t}^{obj}]}^{T}

(8)

where vector $I_{t}^{obj}$ represents the tracked object appearance at frame t. SVD of $Y_{t}$ is used as $\tilde{U} \tilde{D} \tilde{V} = SVD (Y_{t})$ , and then choosing the k dominant eigenvectors to build basis matrix $U_{t} \in ℜ^{n, k}$ . Because the basis matrix $U_{t}$ and its speed vector can describe the object appearance at frame t, the target changes between adjacent frames can be thought as their corresponding points moving on Grassmann manifold. The thought for modeling an observation is to compute the dynamic conversion relationship between two adjacent points on the manifold. The point tangent vectors on the manifold, that is also called the tangent space, are applied to build the relationship. So the observation model on Grassmann manifold and the corresponding tangent space can be formulated

U_{t} = U_{t - 1} \exp (V_{t - 1})

(9)

V_{t} = V_{t - 1} + μ_{t - 1}

(10)

where $V_{t}$ represents the speed vector between $U_{t - 1}$ and $U_{t}$ , representing the target motion, which is the tangent vector starting from state $U_{t}$ on the manifold. Furthermore, $V_{1 : t}$ is the corresponding tangent space, $μ_{1 : t}$ is Gaussian white noise.

Tracking algorithm

Based on the above tracking models, the corresponding tracking algorithm is formed as Figure 1, where particle filter 1 and particle filter 2 are executed alternately. The steps described in detail are as follows: first, making use of particle filter 2 to predict the geometric deformation of the target, next, computing the appearance character of the object within each geometric box and then making use of particle filter 1 to gain the basis matrix of the object for the current frame. Finally, updating feature space according to the proposed occlusion process strategy.

It builds two particle filters combining projection group with basis matrix Grassmann manifolds. One is for dynamic updating geometric transformation parameters on projection group of the object and the other is for online updating appearance model on Grassmann manifolds of the object, and the two particle filters are alternately applied for tracking, which can make tracking results more accurately.

Input: $S_{t - 1}$ , $U_{t - 1}$ , $V_{t - 1}$ , and video sequence

Output: the estimated object state $S_{t}^{obj}$ and the covariance $U_{t}^{obj}$ of tracked object region

Computing state vector $S_{t}^{obj}$ at time t according to section “Dynamic model,” as well as the corresponding covariance feature matrix ${\hat{U}}_{t}^{obj}$ .

Generating M sampling particles ${V_{t - 1}}^{i}$ , $i = 1, 2, \dots, M$ .

For each sampling particle $V_{t - 1}^{i}$ , compute the corresponding covariance matrix, using the formula ${U_{t}}^{i} = {U_{t - 1}}^{i} \exp (V_{t - 1}^{i})$ , then compute the weight of each particle: ${wb}_{t}^{j} = \exp (- λ | | d^{2} (U_{t}^{i}, {\hat{U}}_{t}^{obj}) | |^{2})$ .

Computing the weighted Lie group mean: $U_{t}^{obj} = \exp (1 / M \sum_{j = 1}^{M} \log ({wb}_{t}^{j} U_{t}^{j}))$ .

Output $U_{t}^{obj}$ .

Feature space updating

We use Grassmann distance between two basis matrices to compute the similarity of the two regions. Let ds represent the minimum distance between the basis matrix of the current frame and the basis matrices of each feature vector in the feature space. If the value of ds is larger than some given threshold value (in this article, the threshold value is $1 / 4 ‖ U_{t} ‖^{2}$ ), it indicates that the object is obscured seriously, and in this case, the feature space is not updated. Otherwise, updating the feature space according to the following two cases:

When the total number of the feature vectors is smaller than the predetermined number, adding the feature vector to the feature space directly.

Otherwise, updating the feature space using the feature vector of the current space to replace the feature vector with maximal distance value.

Experimental results

We test the performance of the proposed algorithm using some public-free videos and some videos taken by the cameras through wireless sensor networks. Experiments have been done for tracking objects with obvious geometric deformation, tracking objects with illumination variations, and tracking objects with partly obscured, respectively.

Experiments on objects with geometric deformation

First of all, experiments have been done on targets undergoing dramatically geometric deformation. In the first frame, we mark the bounding lines manually.

We compare the results gained by the proposed algorithm with the results gained by the method put forward by Porikli et al.¹⁷ that is covariance tracking using model update based on Lie group (CMUL), and the results gained by the method represented by Wang et al.² which is object tracking via PLS analysis. The three methods have been done under the same experimental environments. For the PLS and the proposed method, we set the same particle filter parameters for comparing effectively. And N is the sample number, which is equal to 300. For each video sequence, each program for the above methods runs 10 times respectively. We use two ways to compare the performance of the above three algorithms.

For each image frame, the Euclidean distance between the coordinates of the ground-truth object region center and the coordinates of the tracked object region center at the current frame is computed. Suppose the coordinates of the object region center at moment t to be $(x_{t}', y_{t}')$ , and the coordinates of the ground-truth object region center at moment t to be $(x_{t}, y_{t})$ , then the distance is

DisErro r_{t} = \sqrt{{(x_{t}' - x_{t})}^{2} + {(y_{t}' - y_{t})}^{2}}

(11)

In the first video sequence, the object has encountered dramatic geometric transformation, with the size for each frame is 240 × 320 (pixels). And the initial size of the bounding box is 53 × 53(pixels). For the proposed algorithm, the eight-dimensional projection parameter is (0.04;0.001;0.001;0.04;4;4;0.0006;0.0006). For PLS and the proposed method, both the initial rotation are −11. The experimental results for these methods are shown in Figure 2, respectively. It shows five experimental results, these are the results for frames 2, 680, 780, 1000, and 1100. From the Figure 2(a), it shows that the experimental results of CMUL method nearly fails, because the target region box keeps the same size during the whole tracking process, and it is also short of updating mechanism in the whole method. But in fact, the object to be tracked has experienced dramatic appearance deformation.

Figure 2.

Experimental results for video sequence 1: (a) experimental results of CMUL method, (b) experimental results of PLS method, and (c) experimental results of the proposed method.

Figure 2(b) and (c) shows that the tracking window size changes according to the size changes of the target. Meanwhile, it obviously shows that the experimental results of the proposed method is more accurate than the results of PLS method. Figure 3(a) shows the distance between the ground-truth center and the tracked center for each video frame. From Figure 3, the conclusion is that the distance gained from CMUL method is much further than the other two methods; this is corresponding to the experimental results in Figure 1. The distance of PLS method is much further than the proposed method too. This is because the proposed method applies projection transformation group for modeling the target state, it can reflect the projection process more accurately than the affine group.

Figure 3.

Results comparison for the distance between the ground-truth and the tracked regions: (a) sequence 1, (b) sequence 2, (c) sequence 3, and (d) sequence 4.

For the second group, the object has also encountered significant geometric deformation, with the size for each frame is 240 × 320 (pixels). And the initial size of the bounding box is 46 × 58 (pixels). For the proposed algorithm, the eight-dimensional projection parameter is (0.05;0.002;0.002;0.05;5;5;0.005;0.005). For PLS and the proposed method, the initial rotation equals to 0. From Figures 3(b) and 4, the distance of CMUL method and PLS method are much further than the proposed method.

Figure 4.

Experimental results for sequence 2: (a) experimental results of CMUL method, (b) experimental results of PLS method, and (c) experimental results of the proposed method.

Experiments on objects with illumination change

We also do groups of experiments to test the tracking stability for objects under large illumination change. For the limits of this article, we show one of the sequences in Figure 5. In this sequence, the size of each frame is 320 × 240 (pixels), and the initial size of the template is 58 × 68 (pixels). From Figure 5, we can conclude that when the target experienced significant illumination change, our algorithm can still track the target more accurately than the other two algorithms. The reason is that the appearance manifold of Grassmann is much insensitive for the illumination change. And Figure 3(c) also draws the same conclusion.

Figure 5.

Experimental results for sequence 3: (a) experimental results of CMUL method, (b) experimental results of PLS method, and (c) experimental results of the proposed method.

Experiments for occlusion

The video sequence shown in Figure 6 is taken by the cameras through wireless sensor networks, and the object has experienced serious occlusion during the sequence. The size of each frame is 1920 × 1080 (pixels), the initial size of the template is 135 × 442 (pixels). And the initial rotation equals to 0. The results for the three filters are shown in Figure 6, respectively. We can see that CMUL filter and PLS filter both lost after full target occlusion, but the proposed filter can also track the target well, even after the two person met together. Figure 3(d) shows the experimental values of the distance between the ground-truth target center and the tracked region center of each frame. It can be concluded that the distance of CMUL filter is further than the other two. The distances by CMUL filter and PLS filter is increasing rapidly when the object is gradually obscured. The proposed method captures the object when it re-emergences again, and the distance declines to normal value.

Figure 6.

Experimental results for sequence 4: (a) experimental results of CMUL method, (b) experimental results of PLS method, and (c) experimental results of the proposed method.

Experiments for face tracking

In this group experiments, we compare the proposed method with and the algorithm proposed by Khan and Gui,²¹ which is Bayesian online learning on Riemannian manifolds using a dual model with applications to video object tracking (BDMT), and it has two filters.

The video sequence shown in Figure 7 is the face sequence, which has experienced partial obscured. It shows that the proposed algorithm has tracking the face accurately.

Figure 7.

Experimental results for sequence 5: (a) experimental results of BDMT method and (b) experimental results of the proposed method.

Analysis on experiment results

The sample number of every group is 400. And the computational complexity of the proposed algorithm is O(N). The effective results of our algorithm are due to the following advantages:

Projection transformation group SL(3) can describe the geometric deformation of the object more accurately, for the objects imaging transformation is essentially projection transformation. And affine transformation is an approximation of projection transformation.

Particle-filtering algorithm is explored to estimate the object appearance more accurately, by making full use of the intrinsic geometry characteristics of the state space on Grassmann manifold.

Dual models technology is explicitly taking both the SL(3) group and Grassmann manifolds into consideration in designing a particle filtering–based tracking method, which makes tracking results more effectively.

Using the distance metric on the Grassmann manifold to design the occlusion strategy on the manifold avoids introducing abnormal information when updating the feature space, so as to assure the effectiveness and accuracy of it.

Conclusion

In this article, we exploited the visual object tracking method using joint Grassmann manifold appearance and SL(3) group shape modeling. Based on Grassmann manifold, we proposed a novel object appearance modeling algorithm, where one particle filter was explored to estimate the object appearance by making full use of the intrinsic geometry characteristics of the state space. Meanwhile, based on SL(3) group, the other particle filter was designed to predict the geometric deformation of the object.

In the two filters, each candidate appearance was predicted by modeling both the appearance and its tangent space, where a new predicted value remained on the manifold and moved along the shortest geodesic. Combining the two filters, tracking algorithm was proposed to make the two filters executing alternately to gain more effective tracking results. Using distance metric on the Grassmann manifold, updating strategy for feature space was designed, which avoids introducing abnormal information effectively.

Footnotes

Handling Editor: Li-Minn Ang

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (No. 61503274, No. 61603415, No. 61701101, No. 61603080), Doctor Startup Fund Program funded by Liaoning Province Education Administration (No. 201501090), research funds by Shenyang City (No. 1787000), fundamental research funds from the Central Universities (No. 140403005), and the Normal Program of Education Commission of Liaoning Province of China (No. L2015558).

References

Ross

Lim

RS.

Incremental learning for robust visual tracking. Int J Comput Vision 2008; 77: 125–141.

Wang

Chen

et al . Object tracking via partial least squares analysis. IEEE T Image Process 2012; 21: 4454–4465.

IYH

Khan

. Grassmann manifold online learning and partial occlusion handling for visual object tracking under Bayesian formulation. In: Proceedings of the 21st international conference on pattern recognition, Tsukuba, Japan, 11–15 November 2012, pp.1463–1466. New York: IEEE.

Shirazi

Harandi

Lovell

et al . Object tracking via non-Euclidean geometry: a Grassmann approach. In: Proceedings of the IEEE winter conference on applications of computer vision, Steamboat Springs, CO, 24–26 March 2014, pp.901–908. New York: IEEE.

Akbulut

Urhan

Erturk

Region covariance descriptor based probabilistic object tracking using enhanced similarity criterion. In: Proceedings of the 21st international conference on signal processing and communications applications conference, Haspolat, 24–26 April 2013, pp.1–4. New York: IEEE.

Hsu

Kang

Liao

HYM

. Cross-camera vehicle tracking via affine invariant object matching for video forensics applications. In: Proceedings of the 2013 IEEE international conference on multimedia and expo, San Jose, CA, 15–19 July 2013, pp.1–6. New York: IEEE.

Edelman

Arias

Smith

ST.

The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anal A 1999; 20: 303–353.

Srivvastava

Klassen

Bayesian and geometric subspace tracking. Adv Appl Probab 2004; 36: 43–56.

Wang

Backhouse

IYH

. Online subspace learning on Grassmann manifold for moving object tracking in video. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, Las Vegas, NV, 31 March–4 April 2008, pp.969–972. New York: IEEE.

10.

Liu

Shi

ZL.

Affine-variant shape recognition using Grassmann manifold. Acta Automat Sin 2012; 38: 248–258.

11.

Cetingul

Vidal

. Intrinsic mean shift for clustering on Stiefel and Grassmann manifolds. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Miami, FL, 20–25 June 2009, pp.1896–1902. New York: IEEE.

12.

Turaga

Veeraraghavan

Chellappa

. Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Anchorage, AK, 23–28 June 2008, pp.1–8. New York: IEEE.

13.

Begelfor

Werman

Affine invariance revisited. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, New York, 17–22 June 2006, pp.2087–2094. New York: IEEE.

14.

Subbarao

Meer

Nonlinear mean shift over Riemannian manifolds. Int J Comput Vision 2009; 84: 1–20.

15.

Lui

Beveridge

Whitley

LD.

Adaptive appearance model and condensation algorithm for robust face tracking. IEEE T Syst Man Cyb 2010; 40: 437–448.

16.

Qiao

Zhang

et al . Learning an intrinsic-variable preserving manifold for dynamic visual tracking. IEEE T Syst Man Cyb 2010; 40: 868–880.

17.

Porikli

Tuzel

Meer

. Covariance tracking using model update based on lie algebra. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, New York, 17–22 June 2006, pp.728–735. New York: IEEE.

18.

Forstner

Moonen

. A metric for covariance matrices. Technical report no. 13–128, 1999. Stuttgart: Department of Geodesy and Geoinformatics, University of Stuttgart.

19.

Cheng

Wang

JQ.

Real-time probabilistic covariance tracking with efficient model update. IEEE T Image Process 2012; 21: 2824–2837.

20.

Khan

Gui

YH.

Tracking visual and infrared objects using joint Riemannian manifold appearance and affine shape modeling. In: Proceedings of the IEEE international conference on computer vision workshops, Barcelona, 6–13 November 2011, pp.1847–1854. New York: IEEE.

21.

Khan

Gui

. Bayesian online learning on Riemannian manifolds using a dual model with applications to video object tracking. In: Proceedings of the IEEE international conference on computer vision workshops, Barcelona, 6–13 November 2011, pp.1042–1409. New York: IEEE.

22.

Henriques

Caseiro

Martins

High-speed tracking with kernelized correlation filters. IEEE T Pattern Anal 2015; 37: 583–596.

23.

Lim

Yang

MH.

Object tracking benchmark. IEEE T Pattern Anal 2015; 37: 1834–1848.

24.

Liu

Shi

ZL.

Projective registration algorithm based on Riemannian manifold. Acta Automat Sin 2009; 35: 1378–1386.

25.

Khan

IYH

. Nonlinear dynamic model for visual object tracking on Grassmann manifolds with partial occlusion handling. IEEE T Cybernetics 2013; 43: 2005–2019.