Visual Object Servo Tracking Based on the Particle Filter Method Using a Pan-Tilt-Zoom Camera

Abstract

We present a servo control model in a particle filter to realize robust visual object tracking using Pan-Tilt-Zoom (PTZ) camera. The particle filter method has attracted much attention due to its robust tracking performance in cluttered environments. However, most methods are in the mode of moving object and stationary camera, as a result, the tracking will end in failure if the object goes out of the field of view of the camera. In this paper, a closed-loop control model based on speed regulation is proposed to drive the PTZ camera to keep the target at the centre of the camera angle. The experiment results show that our system can track the moving object well and can always keep the object in the middle of the field of view. The system is computationally efficient and can run in real-time completely.

Keywords

servo tracking particle filter Pan-Tilt-Zoom dynamic model

1. Introduction

Object tracking based on a video sequence plays a critical role in many applications such as intelligent robots, video surveillance and human computer interfaces [1 -3]. Numerous approaches have been proposed to track moving objects in a video sequence. Among the tracking methods, the particle filter method is frequently used. Particle filter is a parametric method which solves non-linear and non-Gaussian state estimation problems, and can deal with multi-modal probability density functions (PDF). As a type of powerful tool in dealing with the non-linear and non-Gaussian problems, particle filter has been extensively studied in recent years for visual tracking [4 -8]. The basic idea of the particle filter is that the posterior density is approximated by a set of discrete samples (called particles) with associated weights.

In the case of using a stationary camera to track a moving object, the traditional tracking approaches, such as background subtraction, temporal differencing, optical flow and blob detection, can track the moving target well due to the limited changes in the background. However, in this case the field of view of the camera is restricted, as a result, the PTZ camera is commonly used in many applications such as intelligent robots and video surveillance. In the case of using a PTZ camera to track a moving object, it is a challenging task to track the moving target in a complex dynamic environment due to the tremor of the camera, disturbances and illumination changes in the environment. This paper makes great efforts to find a robust and computationally efficient algorithm which can track the object in a complex environment smoothly with the help of the PTZ camera.

2. Colour-based Particle Filter

2.1 Particle Filter

Particle filter is a Monte Carlo approximation to the optimal Bayesian filter and provides robust tracking of moving objects in a cluttered environment. The Bayesian filter is a probabilistic framework for sequentially estimating the target's state. It is used in the case of non-linear and non-Gaussian problems where the interest lies in the detection and tracking of moving objects. It is a probabilistic framework for sequentially estimating the target's state and the goal of the Bayesian filter is to recursively computer the posterior density p(x_t | z_1:t) of current object state x_t conditioned on all observations z_1:t = (z₁, z₂……z_t) up to time t. The posterior density p(x_t | z_1:t) can be obtained recursively in two stages: prediction and update, which are, respectively, written as:

p (x_{t} | z_{1 : t}) = \frac{p (z_{t} | x_{t}) p (x_{t} | z_{1 : t - 1})}{p (z_{t} | z_{1 : t - 1})} = k p (z_{t} | x_{t}) p (x_{t} | z_{1 : t - 1})

(1)

p (x_{t} | z_{1 : t - 1}) = \int p (x_{t} | x_{t - 1}) p (x_{t - 1} | z_{1 : t - 1}) d x_{t - 1}

(2)

According to Equation (1) and (2), we have the following formula:

p (x_{t} | z_{1 : t}) = k p (z_{t} | x_{t}) \int p (x_{t} | x_{t - 1}) p (x_{t - 1} | z_{1 : t - 1}) d x_{t - 1}

(3)

where k is a normalizing constant that is independent of x_t, p(z_t | x_t) is the likelihood function, p(x_t | x_t–1) is the system dynamic model, and p(x_t | z_1:t–1) is the temporal prior over x_t given the prior observations.

The integral in (3) does not have a closed form solution, except in some of the most basic cases, so the particle filter is used to approximate (3) by using a set of weighted particles ${x_{t}^{(i)}, π_{t}^{(i)}} i = 1, …, n$ and each particle represents one hypothetical state of the object. Under this representation, the formula (3) can be approximated as follows:

p (x_{t} | z_{1 : t}) \approx k p (z_{t} | x_{t}) \sum_{i = 1}^{n} π_{t - 1}^{(i)} p (x_{t}^{​} | x_{t - 1}^{(i)})

(4)

where π_t⁽ⁱ⁾ is the weight for particle x_t⁽ⁱ⁾.

2.2 Colour Distribution

To achieve robustness against non-rigidity, rotation and partial occlusion, we choose the colour distributions as the target observation models. Because the intensity channel V is easily influenced by illumination variations, we choose histograms of 16×8 bins in H-S two-dimension colour space to represent the colour distribution.

The colour distribution p(y) = {p(y)^(u)}_{u = 1…m} at location y is calculated as:

p {(y)}^{(u)} = f \sum_{i = 1}^{n} ​ δ [h (x_{i}^{​}) - u]

(5)

where n is the number of pixels in the region, m = 16 × 8 is the bins used in the colour distribution, δ is the Kronecher delta function and f is the normalization factor that ensures $\sum_{u = 1}^{m} p {(y)}^{(u)} = 1$ .

To measure the similarity of two distributions p = {p^(u)}_{u = 1…m} and q = {q^(u)}_{u = 1…m}, we use the Bhattacharyya distance to evaluate the similarity:

d = \sqrt{1 - \sum_{u = 1}^{m} \sqrt{p^{(u)} q^{(u)}}}

(6)

2.3 Colour-based Particle Filter

As the proposed method is based on an adaptive colour-based particle filter [6], the tracking process is accomplished by choosing an appropriate bounding region to the area of interest from some candidate regions, by comparing the similarity of the colour distribution of each region. In this paper, we use two different appearance modes, rectangular model and circular model, according to the tracked object's appearance.

2.3.1 Rectangular Appearance Model

In this model, a candidate bounding box can be described as:

s = {x, y, w, h, θ}

(7)

where (x, y) represents the location of the rectangle, (w, h) the width and height of the rectangle, and θ the rotation angle, as described in Fig.1. Thus, the track region can be represented by four vertices {P₀, P₁, P₂, P₃} of the rectangle:

Figure 1.

Rectangular appearance model

[\begin{array}{l} x_{0} \\ x_{1} \\ x_{2} \\ x_{3} \end{array}] = [\begin{array}{l} x \\ x \\ x \\ x \end{array}] + [\begin{array}{l} R \sin (φ - θ) \\ R \sin (π - φ - θ) \\ R \sin (π + φ - θ) \\ R \sin (- φ - θ) \end{array}]

(8)

[\begin{array}{l} y_{0} \\ y_{1} \\ y_{2} \\ y_{3} \end{array}] = [\begin{array}{l} y \\ y \\ y \\ y \end{array}] + [\begin{array}{l} R \cos (φ - θ) \\ R \cos (π - φ - θ) \\ R \cos (π + φ - θ) \\ R \cos (- φ - θ) \end{array}]

(9)

where $R = \frac{1}{2} \sqrt{w^{2} + h^{2}}$ , φ=acrtan(w/h), $- \frac{π}{2} \leq θ \leq \frac{π}{2}$ .

2.3.2 Circular Appearance Model

The circular model is relatively simple, in which a candidate bounding circle can be described as:

s = {x, y, r}

(10)

where (x, y) represents the location of the circle, r the radius of the circle, as described in Fig.2. Then the tracking region can be represented by a circle centred in (x, y) with the radius r.

Figure 2.

Circular appearance model

The sample set is propagated through the application of a dynamic model:

s_{t} = A s_{t - 1} + B w_{t - 1}

(11)

where A defines the deterministic component of the dynamic model, S_t is the state vector of time t, w_t–1 ∈ (0,1) is the system noise and B is the propagation radius that determines how far away the particles propagated in each frame.

The probability of each sample is represented by weighting samples using the Bhattacharyya distance with a Gaussian variance σ:

π^{(n)} = \frac{1}{\sqrt{2 π} σ} e^{- \frac{d^{2}}{2 σ^{2}}} = \frac{1}{\sqrt{2 π} σ} e^{- \frac{(1 - ρ [p (s^{(n)}, q)])}{2 σ^{2}}}

(12)

Through resampling, the samples with high weights will be copied several times, while others with relatively low weights may not be chosen at all. In the end, the mean state of set S_t can be estimated as follows:

E [S t] = \sum_{n = 1}^{N} π t^{(n)} s_{t} ​^{(n)}

(13)

3. PTZ Camera Servo Control Model

Because the PTZ camera has pan-tilt-zoom three degrees of freedom, it can act as our eyes following an object and keeping the object at the centre of the field of view. As a result, the PTZ camera is commonly used for surveillance purposes. The PTZ camera used in our system is shown in Fig.3, featuring a resolution of 720×576 effective pixels, a 18× zoom set to the maximum focal length of 73.8mm, pan and tilt rotation angles of 0~360° and 0~90° respectively, and most importantly a very high maximum angular speed of 300°/s in pan direction and 180°/s in tilt direction.

Figure 3.

Pan-tilt-zoom camera

Precise control of the PTZ camera is an essential part of a tracking system. The camera should follow the movement of the target depending on how quick the target is moving, when the target moves fast, the camera should response quickly and accurately in order to keep the target in the field of view (FOV), meanwhile, when the target moves slowly, the camera should track the target smoothly with a lower speed. In our system, a servo control model based on speed regulation is applied to track the moving target.

As we want to keep the target in the middle of the FOV, the coordinates of the image centre and size (x_c, y_c, z_c) become the input of the control model, and the PTZ camera can be controlled through sending commands which contain the direction of pan/tilt/zoom and the corresponding speed, so the vector (ω_pan, ω_tilt, ω_zoom) become the output of the control model. The feedback of the control model is the location coordinates (x, y, z) of the moving target. According to the error signal e = (e_x, e_y, e_z) between (x, y, z) and (x_c, y_c, z_c), the vector (ω_pan, ω_tilt, ω_zoom) is decided by the following function:

[\begin{array}{l} ω_{p a n} \\ ω_{t i l t} \\ ω_{z o o m} \end{array}] = \frac{K}{F_{c}} [\begin{array}{l} e_{x} \\ e_{y} \\ e_{z} \end{array}] = \frac{K}{F_{c}} [\begin{array}{l} x - x_{c} \\ y - y_{c} \\ z - z_{c} \end{array}]

(14)

where K is the proportional gain decided by a polynomial approximation using many experiments and F_c is the focal length of the camera. The camera servo control model is given as Fig.4.

Figure 4.

PTZ camera servo control block diagram

4. Experimental results

To demonstrate the effectiveness and robustness of the proposed tracking scheme, three different colour videos have been used in our experiments, two of them for testing the rectangular appearance model and the other for the circular appearance model.

Our algorithm is developed in Microsoft VC++6.0 and run on a 1.8 GHz Pentium Dual-Core CPU, 2Gbyte DDR memory, using a video image size of 320×240. The camera is FCB-EX48CP from SONY and the image grabber is 10Moons SDK-2000. Computer controls the PTZ camera via RS-232C.

All the parameters used in our algorithm are shown in Table 1, where the parameters A and B of Equation (11) indicate that we use a fixed random walk model as the system dynamic model, the parameter σ of Equation (12) and K of Equation (14) are decided by a polynomial approximation using many experiments, and the parameter N indicates the how many particles are used in our algorithm.

Table 1.

Parameter values used in the experiments

video	A	B	σ	K	N
bottle	1	{15,15, 0.1, 0.1,0.1}	0.16	{0.3,0.15,0.1}	64
person	1	{15,15, 0.1, 0.1,0.1}	0.16	{0.3,0.15,0.1}	64
ball	1	{15,15, 0.1}	0.16	{0.3,0.15,0.1}	64

The tracking results are shown in Fig.5, Fig.6, Fig.7, Fig.8, Fig.9 and Fig.10. In the first experiment, as seen in Fig.5 and Fig6, the results show that the PTZ can always keep tracking and put the object (a green bottle) at the centre of the camera angle from the 1^st frame to the end. At the beginning, the tracked object was on the right hand side of the person, then when the person held the bottle and took it away from the original position, the particle filter tracker kept tracking and detected the how far away the object position was from the image centre, according to this information, the tracker sent commands to PTZ according to the error signal e = (e_x, e_y, e_z), and drive the PTZ to the right until the error signal e = (e_x, e_y, e_z) was small enough. When the object rotation takes place, our algorithm can also follow the rotation precisely and quickly, for example, in the 634^th, 866^th and 1402^th frame, the bottle was rotated obviously and the tracking rectangle still fitted the bottle very well. When the bottle was moved far away from the camera, the tracker judged the distance according to the rectangle size, when the size was smaller than the threshold, a zoom in command was sent to the PTZ until the size was appropriate, as seen in the 1402^th and 1569^th frame. Fig.6 shows the tracking results with all particle states, each white rectangle represents the one possible object state and the red rectangle means the estimated object with position (x, y), the width and height (w, h), and the rotation angle θ. This indicates how the particle filter is implemented and the object state is calculated.

Figure 5.

Tracking results of a green bottle

Figure 6.

Tracking results of a green bottle with all particle states

Figure 7.

Tracking results of a walking person

Figure 8.

Tracking results of a walking person with all particle states

Figure 9.

Tracking results of a green ball

Figure 10.

Tracking results of a green ball with all particle states

In the second experiment, the object was a person wandering in a parking lot, as seen in Fig.7 and Fig.8. The results also show that the camera can always maintain tracking and place the object at the centre of the camera angle from the start to the end. Unlike the first experiment where the bottle was always kept the same shape throughout the whole sequence, the wandering person did not have a regular shape and the appearance changed all the time as the person walked, turned and bent down. However, no matter what took place, the camera always followed the person well and kept an appropriate size of the person in the field of view.

The last experiment used the circular appearance model to track a green ball indoors, as seen in Fig.9 and Fig.10. From the experiment results, one can tell that our algorithm is also robust in tracking a circular object and the tracking process was accurate not only in position, but also in the object size, that means the tracker can always drive the camera correctly in all pan, tilt and zoom directions.

In order to evaluate the effectiveness of the proposed algorithm, two metrics have been used to analyse the experiments results, one is the Euclidian distance between the tracking position and the ground truth which is the centre of the video image, the other is the algorithm time the tracker used. The first metric was used to evaluate the tracking position accuracy and the second one measured the computational efficiency.

The Euclidian distance between the tracking position at time t and the ground truth position can be described as:

d_{t}^{​} = \sqrt{{(x_{t}^{​} - x_{t}^{G T})}^{2} + {(y_{t}^{​} - y_{t}^{G T})}^{2}}

(15)

where (x_t, y_t) is the object centre of the tracking result at time t and (x_t^GT, y_t^GT) is the ground truth in the same frame, as our goal is try to keep the object in the middle of the field of view with video size 320×240, so we have (x_t^GT, y_t^GT)=(160,120).

The averaged Euclidian distances and algorithm times of all three experiments are shown in Table 2, and Fig.11 shows the Euclidian distance of the green ball from the beginning to the end, while Fig.12 shows the corresponding time taken. From the averaged Euclidian distance in Table 2 and Fig.11 we can see that the tracking object always kept in the middle region of the image, although there were some moments where the Euclidian distances were up to nearly 70 pixels when the objects moved too fast, the averaged distance is less than 30 pixels. In addition, from the algorithm time in Table 2 and Fig.12, one can tell that the algorithm time is less than 25ms which is computationally efficient and completely real-time.

Table 2.

Experiment results

Video	Total Frames	Euclidian distance (pixels)	Algorithm time(ms)
bottle	1849	26.13	23.02
person	1550	28.64	24.52
ball	1143	24.51	23.40

Figure 11.

Euclidian distance of the green ball

Figure 12.

Algorithm time of tracking the green ball

5. Conclusion

This paper has presented a servo control model for moving object tracking based on the particle filter method using a pan-tilt-zoom camera. We discussed the principle theory of particle filter, how to obtain the required colour distribution for every possible state and the detailed information of the proposed servo control model. The experimental results demonstrate that our algorithm is effective and robust in dealing with moving object tracking, and can always keep the target at the centre of the camera angle to avoid tracking failure when an object moves out of the field of view.

Footnotes

6. Acknowledgements

This research was supported in part by the Natural Science Foundation of China (NSFC) (grant no: 51175459).

References

Yilmaz

Javed

, and Shah

, “Object tracking: A survey,” ACM Computer Surveys., vol.38, no.4, pp.1–45, 2006.

Cannons

, “A review of visual tracking,” York University, Department of Computer Science and Engineering, Tech. Rep., 2008.

Khan

S.M.

and Shah

, “A multiview approach to tracking people in crowded scenes using a planar homography constraint,” 9th European Conference on Computer Vision, Graz, Austria, vol.3954, pp.133–146, 2006.

Arulampalam

M.S.

Maskell

Gordon

and Clapp

, “A tutorial on particle filters for on-line non-linear/non-Gaussian Bayesian tracking,” IEEE Transactions Signal Processing, vol.50, no.2, pp.174–188, 2002.

Gordon

N.J.

Salmond

D.J.

, and Smith

A.F.M.

, “Novel approach to nonlinear/non-Gaussian Bayesian state estimation,” Radar and Signal Processing, IEE Proceedings F, vol.140, no.2, pp.107–113, 1993.

Nummiaro

Koller-Meier

, and Gool

L.V.

, “An adaptive color-based particle filter,” Image and Vision Computing, vol.21, no.1, pp.99–110, 2002.

Yamashita

Lao

Kawade

, “Tracking in low frame rate video: A cascade particle filter with discriminative observers of different life spans,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30, no.10, pp.1728–1740, 2008.

Khan

Z.H.

I.Y.H.

, and Backhouse

A.G.

, “A Robust Particle Filter-Based Method for Tracking Single Visual Object Through Complex Scenes Using Dynamical Object Shape and Appearance Similarity,” Journal of Signal Processing Systems, vol.65, no.1, pp.63–79, 2011.