Abstract
Visual object tracking methods based on wireless multimedia sensor network is one of the research hotspots while the present linear method for processing feature vectors often lead to the tracking drift when tracking object with significant nonplanar pose variations through wireless sensor networks. In this article, we propose a novel nonlinear algorithm for tracking significant deformable objects. The proposed tracking scheme has two filters. On one hand, considering that Grassmann manifold is one of entropy manifold in Lie group manifold, which can describe and process the data of appearance feature more accurately, one filter is designed on it, to estimate the object appearance, by making full use of the transformation relationship between the point on manifold and its corresponding point on tangent space. On the other hand, considering that the process of objects imaging is essentially projection transformation process, the other filter is designed on projection transformation (SL(3)) group, describing the geometric deformation of the objects. The two filters execute alternatively to mitigate tracking drift. Extensive experiments prove that the proposed method can realize stable and accurate tracking for targets with significant geometric deformation, even obscured and illumination changes.
Keywords
Introduction
Wireless multimedia sensor network is a new type of sensor networks with multimedia applications, which is widely used in multimedia network monitoring, intelligent transportation, environmental monitoring, industrial control, and other fields. The visual object tracking methods based on wireless multimedia sensor network is one of the research hotspots.
Various methods have been designed to track the dynamic target. Ross et al. 1 presented a tracking algorithm to learn a subspace representation in low-dimension, efficiently adapting online deformation of the object appearance, and to update the model according to incremental algorithms for principal component analysis. Wang et al. 2 designed appearance models of the object based on partial least squares (PLS) analysis. However, because of video object with large nonplanar pose variations, these tracking methods often lead to drifting or even failure. The main reason is that the feature vectors describing the object appearance do not obey a single vector space, so the traditional linear method for processing feature vectors does not meet the actual need.
Tracking algorithm based on manifold3–6 has become one of the hot research topics in video target tracking domain. In the early stage, the tracking work on manifold subspace applies a conjugate gradient and Newton’s method to track the target on Grassmann manifold and Stiefel manifold. 7 Srivvastava and Klassen 8 designed piecewise geodesics on Grassmann manifold and used projection matrices in the subspace to track the simulation of synthetic signal. But all these methods do not apply to the video image tracking field. Wang et al. 9 used Kalman filter on the velocity vectors in the tangent space of Grassmann manifold to achieve object tracking. For the inaccuracy of both the low-dimensional velocity vectors model and Kalman filter, this method can only track the object with moderate pose or appearance changes, and the tracking result will be drift or failure when it tracks the objects with significant pose or appearance changes. Grassmann manifold also applies in classification, human behavior analysis, object recognition, 10 and some other fields. Cetingul and Vidal 11 proposed mean shift on Grassmann manifold and Stiefel manifold and perform object categorization and segmentation of multiple objects. Turaga et al. 12 computed distance and design estimators of density on Grassmann manifold and the Stiefel manifold and realize conditional densities estimator on shape classification, face recognition, and activity recognition. Begelfor and Werman 13 built an affine invariant clustering subspace of image data using the structure of Grassmann manifold. Subbarao and Meer 14 proposed mean shift which is nonlinear on Riemannian manifold for nonlinear filtering and segmentation process. Lui et al. 15 accomplished online learning for local line manifold of appearance by applying a coarse to fine sampling for particle filtering. Qiao et al. 16 used offline manifold strategy on face data set to realize face tracking. Covariance manifold has also been used in object tracking field. Porikli et al. 17 and Forstner and Moonen 18 designed a feature covariance matrix to describe object region and used the positive definite matrices on Lie group structure to estimate covariance matrices similarity. Wu et al. 19 applied incremental covariance tensor learning to design a tracking method on Riemannian manifold, which combines the framework of particle filter to process background clutter efficiently. Khan and Gui20,21 proposed a visual object tracking method using piecewise geodesic under a Bayesian framework and geometrical structure under Riemannian manifold.
Although the above methods based on manifold have achieved reasonably good results, it is still a challenge for tracking objects in complex scenes.22,23 For example, tracking objects with significant pose variations or heavy deformation, or long-term partial occlusion. The main problem of these methods may be the low efficiency of online-learning method on manifold or the updating strategy for occlusion.
Lie group manifold not only contains differential manifolds but also contains the related theory of groups, and differential manifolds contain topological manifolds and differential structures. Considering that Grassmann manifold is one of entropy manifold in Lie group manifold, 24 it not only has the space expression of smooth surfaces but also has the character of much fitting for measuring the distance between feature spaces of deformable objects. Therefore, establishing appearance model and online updating strategy using intrinsic mean and geodesic distance based on the Grassmann manifold can describe the appearance variations of the deformable object more accurately. In addition, particle filter has nonlinear character and does not limit to Gaussian noise, so designing a particle-filtering method on low-dimensional manifold can not only realize more stable target tracking with large nonplanar pose variations but also can effectively improve the real-time performance.
Motivated by the methods mentioned above, we represent a new visual object tracking algorithm using Grassmann manifold appearance and SL(3) group shape modeling. The proposed tracker scheme has double particle filters, one of which is on Grassmann manifolds, which is explored to estimate the object appearance by making full use of the intrinsic geometry characteristics of the state space, the other is on SL(3) group, describing the geometric deformation of the object. The two filters execute alternatively to mitigate the tracking drift. For object occlusion, we explore online-learning strategy to avoid abnormal information introducing into the feature space to ensure its accuracy.
Grassmann manifold and its metrics
The feature vectors describing the object appearance do not obey a single vector space while the Lie group manifold space can describe the object appearance more efficiently. When we apply the tracking method on Grassmann manifold, we use the distance between the two features to represent the similarity between them, which is computed using Grassmann manifold. In order to make clear how to compute the geodesic distance on Grassmann manifold, the geodesic distance on Grassmann manifold is formulated as follows.
The points on Grassmann manifold
where
Grassmann manifold
The sectional curvature is
where
where
Let
where
We gain the distance on Grassmann manifold through formula (6).
Tracking model
Dynamic model
The accuracy of the dynamic model is one of the main factors to impact tracking performance because the target state changes continuously in tracking sequence. For the changes of a target in continuous sequence can be described as their corresponding points moving on Riemannian manifold, the thought for designing dynamic model is to design the dynamic transformation relationship between two adjacent points on the manifold. In this article, projection transformation is used for describing the location and deformation of a target. And the tangent vectors of the points on the manifold are used to design the dynamic transformation relationship. The dynamic model on Riemannian manifold and its tangent space are designed as
where
We can describe the proposed method as that the candidate points
Particle filter 2 is designed to forecast the geometric deformation of the target, which is shown as Figure 1. In this filter, the input variables are

The complete tracking scheme.
Observation model
Based on the method for metric and solution on Grassmann manifold, the particle-filtering algorithm is designed to estimate the state of target appearance, hereinafter referred to as the particle filter 1.
Suppose the object appearance feature space is made up of the
where vector
where
Tracking algorithm
Tracking algorithm
Based on the above tracking models, the corresponding tracking algorithm is formed as Figure 1, where particle filter 1 and particle filter 2 are executed alternately. The steps described in detail are as follows: first, making use of particle filter 2 to predict the geometric deformation of the target, next, computing the appearance character of the object within each geometric box and then making use of particle filter 1 to gain the basis matrix of the object for the current frame. Finally, updating feature space according to the proposed occlusion process strategy.
It builds two particle filters combining projection group with basis matrix Grassmann manifolds. One is for dynamic updating geometric transformation parameters on projection group of the object and the other is for online updating appearance model on Grassmann manifolds of the object, and the two particle filters are alternately applied for tracking, which can make tracking results more accurately.
Computing state vector
Generating
For each sampling particle
Computing the weighted Lie group mean:
Output
Feature space updating
We use Grassmann distance between two basis matrices to compute the similarity of the two regions. Let
When the total number of the feature vectors is smaller than the predetermined number, adding the feature vector to the feature space directly.
Otherwise, updating the feature space using the feature vector of the current space to replace the feature vector with maximal distance value.
Experimental results
We test the performance of the proposed algorithm using some public-free videos and some videos taken by the cameras through wireless sensor networks. Experiments have been done for tracking objects with obvious geometric deformation, tracking objects with illumination variations, and tracking objects with partly obscured, respectively.
Experiments on objects with geometric deformation
First of all, experiments have been done on targets undergoing dramatically geometric deformation. In the first frame, we mark the bounding lines manually.
We compare the results gained by the proposed algorithm with the results gained by the method put forward by Porikli et al.
17
that is covariance tracking using model update based on Lie group (CMUL), and the results gained by the method represented by Wang et al.
2
which is object tracking via PLS analysis. The three methods have been done under the same experimental environments. For the PLS and the proposed method, we set the same particle filter parameters for comparing effectively. And
For each image frame, the Euclidean distance between the coordinates of the ground-truth object region center and the coordinates of the tracked object region center at the current frame is computed. Suppose the coordinates of the object region center at moment
In the first video sequence, the object has encountered dramatic geometric transformation, with the size for each frame is 240 × 320 (pixels). And the initial size of the bounding box is 53 × 53(pixels). For the proposed algorithm, the eight-dimensional projection parameter is (0.04;0.001;0.001;0.04;4;4;0.0006;0.0006). For PLS and the proposed method, both the initial rotation are −11. The experimental results for these methods are shown in Figure 2, respectively. It shows five experimental results, these are the results for frames 2, 680, 780, 1000, and 1100. From the Figure 2(a), it shows that the experimental results of CMUL method nearly fails, because the target region box keeps the same size during the whole tracking process, and it is also short of updating mechanism in the whole method. But in fact, the object to be tracked has experienced dramatic appearance deformation.

Experimental results for video sequence 1: (a) experimental results of CMUL method, (b) experimental results of PLS method, and (c) experimental results of the proposed method.
Figure 2(b) and (c) shows that the tracking window size changes according to the size changes of the target. Meanwhile, it obviously shows that the experimental results of the proposed method is more accurate than the results of PLS method. Figure 3(a) shows the distance between the ground-truth center and the tracked center for each video frame. From Figure 3, the conclusion is that the distance gained from CMUL method is much further than the other two methods; this is corresponding to the experimental results in Figure 1. The distance of PLS method is much further than the proposed method too. This is because the proposed method applies projection transformation group for modeling the target state, it can reflect the projection process more accurately than the affine group.

Results comparison for the distance between the ground-truth and the tracked regions: (a) sequence 1, (b) sequence 2, (c) sequence 3, and (d) sequence 4.
For the second group, the object has also encountered significant geometric deformation, with the size for each frame is 240 × 320 (pixels). And the initial size of the bounding box is 46 × 58 (pixels). For the proposed algorithm, the eight-dimensional projection parameter is (0.05;0.002;0.002;0.05;5;5;0.005;0.005). For PLS and the proposed method, the initial rotation equals to 0. From Figures 3(b) and 4, the distance of CMUL method and PLS method are much further than the proposed method.

Experimental results for sequence 2: (a) experimental results of CMUL method, (b) experimental results of PLS method, and (c) experimental results of the proposed method.
Experiments on objects with illumination change
We also do groups of experiments to test the tracking stability for objects under large illumination change. For the limits of this article, we show one of the sequences in Figure 5. In this sequence, the size of each frame is 320 × 240 (pixels), and the initial size of the template is 58 × 68 (pixels). From Figure 5, we can conclude that when the target experienced significant illumination change, our algorithm can still track the target more accurately than the other two algorithms. The reason is that the appearance manifold of Grassmann is much insensitive for the illumination change. And Figure 3(c) also draws the same conclusion.

Experimental results for sequence 3: (a) experimental results of CMUL method, (b) experimental results of PLS method, and (c) experimental results of the proposed method.
Experiments for occlusion
The video sequence shown in Figure 6 is taken by the cameras through wireless sensor networks, and the object has experienced serious occlusion during the sequence. The size of each frame is 1920 × 1080 (pixels), the initial size of the template is 135 × 442 (pixels). And the initial rotation equals to 0. The results for the three filters are shown in Figure 6, respectively. We can see that CMUL filter and PLS filter both lost after full target occlusion, but the proposed filter can also track the target well, even after the two person met together. Figure 3(d) shows the experimental values of the distance between the ground-truth target center and the tracked region center of each frame. It can be concluded that the distance of CMUL filter is further than the other two. The distances by CMUL filter and PLS filter is increasing rapidly when the object is gradually obscured. The proposed method captures the object when it re-emergences again, and the distance declines to normal value.

Experimental results for sequence 4: (a) experimental results of CMUL method, (b) experimental results of PLS method, and (c) experimental results of the proposed method.
Experiments for face tracking
In this group experiments, we compare the proposed method with and the algorithm proposed by Khan and Gui, 21 which is Bayesian online learning on Riemannian manifolds using a dual model with applications to video object tracking (BDMT), and it has two filters.
The video sequence shown in Figure 7 is the face sequence, which has experienced partial obscured. It shows that the proposed algorithm has tracking the face accurately.

Experimental results for sequence 5: (a) experimental results of BDMT method and (b) experimental results of the proposed method.
Analysis on experiment results
The sample number of every group is 400. And the computational complexity of the proposed algorithm is O(
Projection transformation group SL(3) can describe the geometric deformation of the object more accurately, for the objects imaging transformation is essentially projection transformation. And affine transformation is an approximation of projection transformation.
Particle-filtering algorithm is explored to estimate the object appearance more accurately, by making full use of the intrinsic geometry characteristics of the state space on Grassmann manifold.
Dual models technology is explicitly taking both the SL(3) group and Grassmann manifolds into consideration in designing a particle filtering–based tracking method, which makes tracking results more effectively.
Using the distance metric on the Grassmann manifold to design the occlusion strategy on the manifold avoids introducing abnormal information when updating the feature space, so as to assure the effectiveness and accuracy of it.
Conclusion
In this article, we exploited the visual object tracking method using joint Grassmann manifold appearance and SL(3) group shape modeling. Based on Grassmann manifold, we proposed a novel object appearance modeling algorithm, where one particle filter was explored to estimate the object appearance by making full use of the intrinsic geometry characteristics of the state space. Meanwhile, based on SL(3) group, the other particle filter was designed to predict the geometric deformation of the object.
In the two filters, each candidate appearance was predicted by modeling both the appearance and its tangent space, where a new predicted value remained on the manifold and moved along the shortest geodesic. Combining the two filters, tracking algorithm was proposed to make the two filters executing alternately to gain more effective tracking results. Using distance metric on the Grassmann manifold, updating strategy for feature space was designed, which avoids introducing abnormal information effectively.
Footnotes
Handling Editor: Li-Minn Ang
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (No. 61503274, No. 61603415, No. 61701101, No. 61603080), Doctor Startup Fund Program funded by Liaoning Province Education Administration (No. 201501090), research funds by Shenyang City (No. 1787000), fundamental research funds from the Central Universities (No. 140403005), and the Normal Program of Education Commission of Liaoning Province of China (No. L2015558).
