Abstract
Pedestrian detection and tracking is the key to autonomous vehicle navigation systems avoiding potentially dangerous situations. Firstly, the probability distribution of colour information is established after a pedestrian is located in an image. Then the detected results are utilized to initialize a Kalman filter to predict the possible position of the pedestrian centroid in the future frame. A Camshift tracking algorithm is used to track the pedestrian in the specific search window of the next frame based on the prediction results. The actual position of the pedestrian centroid is output from the Camshift tracking algorithm to update the gain and error covariance matrix of the Kalman filter. Experimental results in real traffic situations show the proposed pedestrian tracking algorithm can achieve good performance even when they are partly occluded in inconsistent illumination circumstances.
Keywords
1. Introduction
In road traffic accidents, pedestrians are more likely to be injured than drivers and passengers because of poor pedestrian protection countermeasures. According to a World Health Organization (WHO) report, road traffic injuries were the ninth leading cause of death globally in 2011 and will be the fifth in 2030 unless certain actions are undertaken [1]. Fortunately, the United Nations General Assembly has adopted a resolution named the Decade of Action for Road Safety (2011–2020) to stabilize and reduce road deaths [2]. The requirement to enhance pedestrian safety is particularly urgent in developing countries, such as China, because of the high distribution of the pedestrian fatalities. Naci et al. [3] indicate that the magnitude of pedestrian fatalities varies across low-, middle- and high-income countries. Due to the lack of automotive safety interventions and strict traffic law enforcement in developing countries, pedestrians tend to be at a higher risk than drivers and passengers in a traffic accident [4]. According to a report from the Governors Highway Safety Association (GHSA), the number of pedestrians killed on United States in 2014 is 4,884, which is 19 percent higher than that in 2009. As in China, pedestrians are most likely to be killed followed by vehicle passengers, motorcyclists, bicyclists and vehicle drivers [5].
To reduce the pedestrian-vehicle accidents and protect the safety of pedestrians effectively, strict pedestrian protection regulations are put forward by governments and certain countermeasures are conducted by automotive companies and research departments. Most of all, the characteristics of pedestrian-vehicle crashes and pedestrian behaviour are analysed to develop suitable systems for pedestrian protection. For example, Koh et al. [6] examined the crossing behaviour of pedestrians at signalled pedestrian crossings to further enhance their safety. Li et al. [7] conducted an investigation of minibus-pedestrian collisions to better understand the injury patterns and injury risk of pedestrians in China. With automotive safety technology, the active pedestrian protection system can help the driver detect pedestrians in front of the host vehicle and sense a crash before it happens. If the situation is dangerous, the pedestrian protection system can stop or decelerate the vehicle. It can also trigger the passive pedestrian protection system to reduce the severity of or even avoid a crash [8]. For example, Broggi at al. [9] presented an automatic braking application for a pedestrian detection system aimed at localizing potentially dangerous situations in critical urban scenarios. Dollar et al. [10] declared that automatically detecting pedestrians from moving vehicles could have a considerable economic impact and the potential to substantially reduce pedestrian injuries and fatalities. Fredriksson et al. [11] confirmed that an integrated pedestrian protection system combining active and passive countermeasures has a potential to better enhance pedestrian safety.
The paper is organized as follows. Section 2 reviews the literature concentrating on pedestrian tracking and presents the main purposes of the paper. Section 3 describes the pedestrian tracking methodology. Case study results of pedestrian tracking experiments are presented in Section 4. Section 5 concludes the paper.
2. Related Work
Pedestrian detection and tracking is key to pedestrian protection systems and has attracted lots of attention in recent years using different kinds of passive and active sensors, such as video cameras, infrared cameras, radar and lidar scanners [12–14]. Compared with other kinds of sensors, the video sensor is the main solution for the pedestrian protection application because of its lower price, better signal-to-noise ratio and more amounts of information. Pedestrian tracking aims to locate instances of each detected pedestrian in the frames of the analysed film [15]. After being detected, the target pedestrian should be tracked to narrow the detection area in the next frame. Moreover, the tracking results can provide the pedestrian's movement trajectory, which is essential to predict the motion of the pedestrian and estimating the probability of collision [16].
Current pedestrian tracking methods can be categorized into four classes: model, region, active contour and feature based. Model-based tracking algorithms track pedestrians by matching projected linear, 2D or 3D pedestrian models, produced with prior knowledge, to image data. For example, Munder et al. [17] combined a generative shape model represented by a set of linear subspace models and a discriminative texture classifier to achieve integrated pedestrian detection and tracking. Generally, a more complex human body model means a more accurate tracking result. Unfortunately, the computation is also more expensive. Region-based pedestrian tracking methods fulfil the tracking tasks by matching and searching for the variations of the image regions corresponding to the moving targets. Region-based tracking algorithms require small deformations and cannot reliably handle occlusion between objects [18]. Increasing the tracking performance, colour or motion information gives more useful cues to overcome the deformations. Active contour-based tracking methods differ from region based by directly extracting outlines of targets as bounding contours and updating them dynamically in successive frames. Rathi et al. [19] provided particle filter-based movement and deformation object tracking in a geometric active contour framework, which can deal with partial occlusions and can track robustly even in the absence of a learnt model. Feature-based tracking methods need to extract global or local features of the tracked targets and categorize them into higher level features first, then match the features between adjacent image sequences [20]. The method can handle partial occlusion as long as the target feature portion is visible.
Current pedestrian-tracking methods are mainly oriented to surveillance systems, where they generally track moving objects in an image sequence captured by a static camera [21]. In pedestrian protection systems the camera is mounted on the vehicle and moves with it, which may lead to difficulties in handling complex pedestrian appearance changes caused by factors such as illumination variation, partial occlusion and camera motion. This paper proposes a technique aiming to improve real-time pedestrian tracking performance in an urban traffic environment even with partial occlusions. Based on the concept of a region-based object tracking theory, a robust pedestrian-tracking method integration of Camshift and Kalman filter algorithms is presented. The traditional Camshift has the ability to detect and track the target robustly because the colour distribution map of the target shows slow variation. However, it easily fails to track the target when occlusion or similar colour interference appears because of the change of the colour probability distribution [22]. Therefore, the Kalman filter has been used to improve these inaccurate tracking results by predicting the position of the target in the next frame using its motion information.
3. Methodology
3.1. Overview of the pedestrian protection system
Pedestrian detection and tracking are the two main modules of our pedestrian protection system based on monocular machine vision, as shown in Fig. 1.

Framework of the pedestrian protection system
Once the system is initialized, the processing unit captures the images in front of the host vehicle and starts the pedestrian detection module utilizing the pedestrian classifier, trained offline. The pedestrian detection module is fulfilled by combining the AdaBoost and support vector machine (SVM) to train the pedestrian classifier, which is not the main purpose of this paper and has been introduced in previous work [23]. If a pedestrian is detected successfully in three consecutive frames, the tracking state is assigned to be true and the pedestrian tracking module is triggered. In the pedestrian tracking module, which is the main work reported in this paper, the detected pedestrian is tracked and their motion and colour information is recorded to analyse their trajectory. Fig. 2 displays the flowchart of the pedestrian-tracking module.

Flowchart of the pedestrian tracking module
The Kalman filter is adopted to predict the position of the detected pedestrian in the next frame. Then, combined with the colour probability distribution map of the detected pedestrian, the Camshift algorithm is utilized to track the pedestrian in the search window according to the position-prediction results. If the pedestrian is tracked successfully their current position is updated, as well as the state and the error covariance matrices of the Kalman filter. If the target pedestrian is not tracked successfully in three consecutive frames, it is believed that they have moved out of sight or the tracking is invalid. Then the pedestrian detection module starts again, to find another pedestrian.
3.2. Constructing colour probability distribution of pedestrians
Commonly, once the pedestrian is located their colour characteristics are less affected by the variation of their motion. Therefore, the tracking of the detected pedestrian and adjusting of their size can be fulfilled based on their colour probability distribution. RGB (red-green-blue) and HSV (hue-saturation-value) colour are the two classic image colour spaces in machine vision applications. The components of the RGB colour space are sensitive to changes in brightness. HSV colour space uses hue, saturation and brightness to describe the colour quantitatively, reflecting the human visual observation of colour intuitively. Considering the fact that hue, saturation and brightness are separate in HSV colour space, the stability of the algorithm will increase when using the HSV space [24]. Therefore, the hue value of the pedestrian colour image, which is symbolized by
where
To find the colour histogram of the detected pedestrian, the value of the image hue is divided into several small bins then the total number of hues that fall within the range of each bin is computed. Let the total number of image pixels be
where
To emphasize the main colour information of the target pedestrian, values of hue histogram lower than a certain threshold are eliminated to keep only those with high values. Set the maximum value of the hue histogram as max(
where

Hue histogram and probability distribution map of a pedestrian in the image
3.3. Position prediction based on the Kalman filter
The Kalman filter was first put forward by R.E. Kalman in 1960 when attempting to describe a recursive solution to the discrete-data linear filtering problem [25]. The time update and measurement-update are the main equations of the Kalman filter. The time update equation aims to find a priori estimates for the next time step by projecting forward the current state and error covariance estimates. The measurement update equations can obtain an improved a posteriori estimate by incorporating a new measurement into the a priori estimate.
If a system is linear and its process noise and measurement noise are Gaussian white noise, it can be modelled as a pair of linear stochastic process and measurement equations [26]. The state equation of the system at time step
where
The system state can be achieved indirectly from the measurement value. Take
where
The a priori estimates for the next time step can be achieved by the prediction step of the Kalman filter, which is realized by a set of time update equations as:
where
The correction step is carried out by a set of error covariance prediction and correction equations as:
where
3.4. Pedestrian tracking based on Camshift
The Camshift is a moving object tracking algorithm based on the change of colour probability distribution in an image sequence. Its core algorithm is the Meanshift, which is a non-parametric probability density estimation method [27]. Unlike the Meanshift, the Camshift is designed for dynamically changing distributions. It can implement tracking moving objects in continuous video frames through adaptively adjusting the size and position of the search window, which helps save search time and improve efficiency. The detailed steps for pedestrian tracking based on the Camshift are as follows:
Calculate the colour probability distribution map of the predicted search window and obtain the first order and zero order moments by:
where (
Calculate the mass centre (
Set the centre of the window as the initial Meanshift algorithm and repeat steps 1 and 2 until the mass centre position converges to a point. Take the results as the pedestrian position at current time
Unlike the traditional Camshift that needs to determine the initial search window manually, the pedestrian detection module can locate the pedestrian accurately and the tracking module can predict the possible position of the detected pedestrian in future frames. It can automatically initialize the search window and track multiple objects at the same time. With the prediction of the Kalman filter and narrowed search window, the Camshift will find the optimal position in this window. At the same time, the output of the Camshift can provide the observation value to modify the prediction value in the next frame, which can speed up the convergence rate of the tracking process and overcome the occlusion problem effectively.
4. Case Study
To verify the performance of the proposed pedestrian-tracking algorithm, an intelligent vehicle prototype named DLUTIV-I is taken as the test platform. The system is run on a Core2 2.66GHz PC that exhaustively searches a 320×240 image captured by an AVT Stingray camera [28]. The pedestrian classifiers are trained offline using the samples manually selected from different kinds of circumstances and loaded by the system to realize the real-time pedestrian detection. The pedestrian detection results are taken to initialize the parameters of the Kalman filter.
The state vector of the Kalman filter includes the centroid pixel position of the detected pedestrian and their position velocity. Let (
Taking into account the fact that the speed and direction of the pedestrian motion does not change significantly, it is assumed that the motion is uniform in a straight line during two consecutive frames, that is:
Then the system-state transition matrix and the measurement matrix can be written as:
where △
To realize pedestrian tracking, the Kalman filter should be initialized first. The tracking procedure is activated after the pedestrian is detected successfully in three consecutive frames. Suppose the consecutive two frames to be at time step
We also need to specify the initial covariance matrix
Based on the observation of the pedestrian motion, we set the standard deviation for the centroid position as 10 pixels and 15 pixels its position variation velocity. So the state and measurement noise covariance matrices can be described as:
Using the state prediction and correction equations in conjunction with the initial conditions, the state vector
During the Camshift searching procedure, the Bhattacharrya coefficient is used to define whether the centroid position converges at a certain point [29]. When searching the centroid, the regular movement of legs may accumulate the error of the centroid position after several frames. Commonly, the pedestrian is composed of a head, torso and legs. Once a pedestrian is located, the colour probability distribution of their torso changes less than that of their legs. Compared with the colour probability distribution based on the whole pedestrian, that based on the torso is more distinct, as shown in Fig. 4. Therefore, the colour probability distribution based on the torso is taken as the input for the Camshift.

Colour probability distribution based on different human parts
Fig. 5 shows the tracking image sequences with pedestrians moving in front of a static experimental vehicle in an inconsistent illumination situation. The pedestrians are located using the pedestrian classifier trained offline and their locations are transferred to the pedestrian module. The pedestrian centroid tracking result indicates the moving trajectory of the pedestrian in the image, as shown in Fig. 6.

Tracking image sequences of pedestrians in front of a static vehicle
Fig.7 shows the tracking image sequences with pedestrians moving in front of a vehicle that moves at an average speed of 18km/h. The pedestrians are moving in front of the vehicle at normal speed and maintaining direction. After detecting a pedestrian in the image, they are tracked successfully until they are out of sight after 210 consecutive frames. After that, the system judges that the tracking fails and the pedestrian detection module is restarted. At the 318th frame, a new pedestrian is detected and then is tracked steadily until the 432nd frame, when a new pedestrian shows up. After the two pedestrians are verified by the pedestrian detection module, they are tracked until one moves out of sight at the left of the image and the other one is tracked to the end of the experiment. The tracking results are shown in Fig. 8. The experiment results show that the proposed pedestrian detection and tracking method is effective.

Tracking results with pedestrians moving in front of a static vehicle

Tracking image sequences with pedestrians moving in front of a running vehicle

Tracking results with pedestrians moving in front of a running vehicle
5. Conclusions
Pedestrians are the weakest and most at-risk group when road crashes happen. The purpose of this paper is to present a technique to make autonomous vehicles safer by presenting a novel pedestrian tracking method combining the Camshift algorithm and the Kalman filter. Hue histogram and probability distribution help enhance the stability of the pedestrian-tracking module. The tracking method based on colour probability distribution can speed up the convergence rate of the tracking process and overcomes the occlusion problem. Moreover, when conducting the Camshift tracking algorithm, the colour probability distribution of the torso is more distinct than that of the pedestrian as a whole. Therefore, the colour probability distribution of the torso is chosen as the input to the Camshift algorithm, which can reduce the accumulated error after several frames.
Considering that the speed and direction of pedestrian motion do not change significantly during two consecutive frames, the Kalman filter is initialized after the pedestrian is located and the future position of the pedestrian can be predicted. The Camshift is utilized to track the pedestrian in a specific search window in the next frame based on the prediction results, avoiding tracking throughout the whole image; the tracking robustness is improved as well. Several experimental scenarios were implemented in actual traffic situations. Results show that the proposed method can track multiple pedestrians in a complex scenario. It can also overcome problems such as sudden appearance, disappearance and inconsistent illumination of pedestrian scenes. The algorithm provided basic research for collision-avoidance technology for autonomous vehicles.
Footnotes
6. Acknowledgements
This work was mainly supported by the National Natural Science Foundation of China under Granted No. 51575079 and 51305065, as well as the Fundamental Research Funds for the Central Universities under Granted No. DUT16QY10. Finally, the authors are grateful for the reviewers and the editor for their insightful comments, which helped improve the quality of the paper.
