Abstract
This paper reports an algorithm of traffic multiple targets occlusion processing and tracking based on both the prediction of Kalman filtering operators and the processing of the PCA-SIFT (Principle Component Analysis Scale Invariant Feature Transform) feature with the active operators. The Kalman filtering operators are used for both the prediction of occlusion event and the fast location of occlusion area. The template for each target in the occlusion area is built and the PCA-SIFT feature with the active contour operators is used to estimate and trace the multiple targets separately in the area. Through updating, the occlusion processing and tracking of multiple targets continue until the occlusion ends. Obtained results have shown that the algorithm in the paper can solve multiple targets occlusion and tracking effectively and accurately, comparing with the real trajectories.
1. Introduction
The tracking of multiple targets is necessary to extract a variety of parameters in traffic detection. The continuous traffic trajectory especially is quite important for the entire traffic behavior analysis, as discussed by Cavanagh and Alvarez [1], Liu et al. [2, 3], Bie et al. [4, 5], and Gavrila [6]. Based on the traffic trajectory, scholars have carried out a lot of works including pedestrians’ road-crossing behavior [7, 8], car following theory [9, 10], and prediction model of bus arrival time [11, 12]. Based on those researches, the traffic is becoming more secure and convenient. For further research, better tracking performance, faster tracking speed, and more extensive content are needed. So in the current year, in the field of traffic trajectory extracting, methods are often chosen like the Kalman filtering operators [13–15], active contour matching method, classic snake algorithm and Hausdorff distance estimation [16, 17], intelligent algorithm [18], and so on [19, 20]. Those algorithms can achieve real-time tracking in ordinary situation. But while occlusion is occurring, most of those algorithms cannot deal with it and some have to abnegate the real-time processing. Those problems restricted the application of multiple targets tracking technology.
For occlusion problems, many scholars have conducted a lot of researches. Such studies focused on occlusion projections, targets matching, and occlusion reduction aspects. For example, Erdem et al. [21] divided the target contour into subcontours and used the underlying features such as color edges and continuous shapes to trace and judge the number of contours. When the number changed sharply, the occlusion was judged to occur, and the subcontours were used to achieve the occlusion processing and tracking. But some subcontours often changed themselves, which brought the occlusion processing results with large errors. Iordanescu et al. [22] adopted the distribution to build the processing model in the similar region for occlusion processing. While the distribution had amounts of changes, it was considered as occlusion. And the similar distribution was taken to separate occlusion area on purpose. Nicolas and Gregory [23] used spatiotemporal templates to track the changes in area of occlusion. And the templates were updated to keep the occlusion processing accurately. However, when the occlusion continued for more than a certain number of frames, the templates would contain lots of errors which made the matching seriously wrong. Rathi et al. [24] proposed a method to target the affine transformation parameters, which was based on the combination of the particle filter and the level set. In the method, the contour level set was used to obtain the track characteristics based on the premise that the target contours changed continuously and are small. But the contours in the occlusion area were often lost, resulting in the particle filters being mismatched. At the year of 2004, Lowe [25] proposed SIFT (Scale-Invariant Feature Transform) feature operators for matching and tracking. Then, more and more scholars conducted in-depth studies based on the SIFT feature [26] and its expansion studies such as SURF (Speed Up Robust Features) [27, 28] and PCA-SIFT (Principle Component Analysis Scale Invariant Feature Transform) [29]. All algorithms of this kind had invariance in affine and scale, which made them widely applied. In particular, the PCA-SIFT matching algorithm could reduce the complexity of computation and keep the almost characters of SIFT, which had attracted much attention in recent years. But in the occlusion processing, only the PCA-SIFT feature sometimes would change or be lost resulting in the unstable and failure of the tracking.
Considering the advantage and disadvantage above, this paper presents an occlusion processing and tracking technique for multiple targets in traffic. Based on the motion trends, the Kalman filters for multiple objects are built to forecast occlusion. Then, the PCA-SIFT operator with similar active contour is introduced to process the separation of occlusion. After operation, multiple targets can be tracked rapidly and accurately.
2. Occlusion Processing Method
Predicting occlusion is the prerequisite for occlusion processing. Without predicting, the templates of multiple targets for matching might change and not be fit for separating, which would make occlusion processing more difficult.
In order to predict whether the occlusion happens, the Kalman filtering operators [15, 30] for multiple objects are established to calculate the current targets distance d
ij
k
and the prediction distance
where k is the frame number, i, j are the serial number of targets, p is the current coordinate value, and
In order to predict the distance between targets, the composite Kalman filtering operators are built:
where
After prediction targets are gotten, the occlusion event and area location are calculated as follows:
where Lthr is the threshold of occlusion event and w i , w j are the width of different image targets. If the prediction distance of the next frame is still near, the occlusion is assumed to occur. Then the separate templates are created and the area where the occlusion may occur is located.
3. Fast-Tracking Based on PCA-SIFT and Similar Active Contour
The method of SIFT is usually adopted in matching and tracking, but its calculation is difficult to meet the real-time requirements. To overcome that problem, Ke and Sukthankar [29] did not calculate the gradient histogram directly, but adopted the PCA (Principal Components Analysis) method of the reduction dimensionality to obtain stability and strong and robust characteristics. With the dimension reducing, the matching rate was increasing.
To achieve detection and matching by PCA-SIFT feature, key points need to be extracted. Key points typically have three parameters, which are scale, orientation, and size. In the SIFT feature, the key points are established on the DoG (Difference of Gaussian) pyramid. In order to build DoG pyramid, the two-dimensional Gaussian function is first used to normalize initial image I(x, y) to LoG (Laplace of Gaussian) of different scales. Then the LoG of different scales is subtracted to generate DoG pyramid, as is shown in
where D(x, y, s) is the difference of Gaussian image, G(x, y, ks) is the Gaussian function, L(x, y, ks) is the Laplace of Gaussian image, k is the octave number, and s is scale. The procedure of DoG pyramid establishment is shown in Figure 1.

DoG pyramid.
In order to find the key points in the image space, local extremum of DoG is searched among the adjacent points in both scale and pixel dimension, as is shown as Figure 2.

DoG extremum.
The middle point is compared with the adjacent 26 points to get the extreme point. Considering increasing the stability of key points and restraining sensitive noise, the Taylor formula is used to expand DoG function in scale space, such that
where D(X) is the fitting function of Taylor formula and X = (x, y, δ)
T
is the chosen pixel. The first derivative and second derivative can get approximately the solution by neighbor difference. When extremum is gotten, the offset of the extremum
After that, the locations of extreme points are exacted to satisfy the requirements of scale invariance. In order to remove those points with low contrast and unstable edges, the extreme points are calculated into the formula, leaving only the front two as shown in
By calculation, the gradient direction distribution characteristics of neighborhood pixel are used to give each key point a specified direction. Then, the key point is explained by a set of expression vectors, which considers the effects of neighborhood points. Based on the operation above, the image area around key points is divided into subregions. Through calculating the gradient histogram of the internal subregions, the feature vectors are set to be unique and fit to match. In the following, the histogram of oriented gradients in 8 directions is calculated in the 4 × 4 subregion. And each gradient direction is accumulated to form a point seed, and 4 seeds are combined to form 128 dimensional point descriptors, which contain matching information.
The process to describe the key points of PCA-SIFT is similar to standard SIFT, as is shown in Figure 3. But through principal component analysis, a low-dimensional subspace can represent high-dimensional data to complete the dimensionality reduction of the 128-dimensional SIFT descriptors. The key subprocess steps of the establishment of PCA-SIFT operators are shown as follows.

SIFT descriptor.
Step 1. Get the n dimensional key points and their descriptors of SIFT as the elements {x1, x2, …, x n }.
Step 2. According to the covariance matrix R, 128 key values λ and key vector e are gotten. And sort the key values λ in descending order followed with key vector e:
Step 3. Select the front m eigenvectors in {e1, e2, e3, …, e128} as a main component direction, and m is the dimension of the main component.
Step 4. On this basis, matrix A of 128 × m is established and becomes the main component mapping model. Then the original SIFT key {x1, x2, …, x n } is mapped to m dimensional space y i through the matrix A:
In the paper, after the expression of PCA-SIFT key points and their descriptors, the matching processing is operated by setting a match condition. And the corresponding optimal matching points are extracted as shown in Figure 4.

Matching based on PCA-SIFT.
It can be seen from Figure 4 that the PCA-SIFT feature operators can effectively overcome the influence of rotation, translation, scaling, and affine to realize the good matching result in the nonocclusion status. While occlusion occurs, the movements of multiple targets need to be predicted by the Kalman filter for better processing results. The single target template is created before occlusion occurs, and the composite area will be detected through the PCA-SIFT method in the next frame, as is shown in Figure 5.

Multiple targets matching in the composite area.
On the basis of registration, segmentation is executed correspondingly to the two targets in the composite region. However, due to the occlusion, the direct use of segmental blocks may affect the identification of characteristics and reduce the accuracy of parameter extraction. With the occlusion continuing, errors will be accumulated in the subsequent match, segmentation, and recognition, which would result in the difficulty of the effective links between subsequent trajectory and historical information.
Therefore, the active contour of corresponding feature is chosen for assistance. Active contour models usually adopt continuous curve to express target edge. And a functional energy is defined to let its independent variable contain edge curve, which can reveal the potential relation to ensure segmenting and tracking correctly. The segmentation process of active contour is to find the minimum value of functional energy which is obtained by solving the corresponding Euler equations. While the energy of the curve reaches the minimum, the position of target curve is obtained. With the target curve, multiple targets can be separated of high exactitude.
Referring to the Snake model [16, 31], the optimal similar algorithm of active contour is established in the paper. Traditional snake model is a curve which consists of a set of control points:
These points are both linked in straight constituting contours. C(s) is the curve contour, x(s) and y(s) are the coordinate position of each control point in image, and s is the variable of contour descriptor in Fourier transform form.
The energy function, which reflects the relationship between energy and contours, is defined as follows:
The first derivative is elastic energy, the second derivative is bending energy, and the third derivative is external energy. In the basic Snake model, the gradient features of image are generally chosen for control point or connection position, as is shown in the following:
When the contour C is approaching the image edge of target, the gradient value of C in grayscale increases. Then the energy of the formula above will reach minimum, and the speed of this point will become zero according to curve evolution equation, which means the movement stops. In this way, the contour C stops at the edge of image which means the segment is completed. The Snake model can separate multitargets, but the accuracy is low and the adaptation is poor while occlusion continues. So, in this paper, the edge contour of separate template is extracted as initial contour before occlusion occurs. Then, combining with the representation relationship among the features obtained by PCA-SIFT, the trends and the functional energy equation are determined. Finally, considering the relevance and the nearest transforming course, an optimal similarity model is built to make active contour extracted and segmented more accurately, such as
where C(s), C n (s) are the active contours, α, β are the distortion factors, Eext is the extension function gotten by Kalman prediction, and ΔC n is the similar rate which determines the trend of active contour. The contours in the composite are assimilated to build new active contours. And the new active contours are updating to separate the occlusion area. The extracting and segmenting result is shown in Figure 6. The blue and red contours are the last contours for pedestrian and vehicle, and the yellow ones are the updated contours which can segment the composite template into separate ones.

Active contour extract and segment.
4. Experimental Results
In order to verify the effect of the algorithm, some multitargets tracking experiments in different traffic scenes are tested. And the specific procedures of experiment are as follows.
Step 1. Retrieve the video frame and establish multitargets Kalman filter combined with calibration algorithm. The multitargets tracking should be done in the nonocclusion status. As is shown in Figure 7, the foreground objects are not in occlusion, and their tracking trajectories are shown by lines.

Multitargets trajectories.
Step 2. Through the establishment of the occlusion prediction algorithm, it is examined whether the occlusion occurs. Although the occlusion does not happen in Figure 7(b), the prediction by Kalman filter shows that the occlusion is going to come in the next frame. So the templates of pedestrian and vehicle need to be established in independent state at that time. Then the occlusion area is quickly locked and the templates of composite targets are extracted, as is shown in Figure 8.

Detection template.
Step 3. When the occlusion occurs, PCA-SIFT descriptors are built for the separate targets and the composite area. Then, in order to match the PCA-SIFT descriptors between both the separate template and the composite template, the paper chose the BBF (Best Bin First) searching mechanism of k-d tree [32]. The k-d tree is a binary tree in which every node is a k-dimensional point, which has leaf node and non-leaf-node. Non-leaf-node can generate a splitting hyperplane implicitly and divide the space into two parts. Points to the each part of this hyperplane are represented by the corresponding part of subtree of that node. And the hyperplane direction is chosen in the following way: every node in the tree is associated with one of the k-dimensions, with the hyperplane perpendicular to that dimension's axis. Based on the k-d tree, the BBF (Best Bin First) algorithm is taken to optimize retrieve and matching. BBF starts searching from the root and non-leaf-node to the leaf straightly while plugging the missed points into the priority queue. Then retrieve the key of minimum value from the current queue, and repeat the processing until the search finishes. Through the model above, the optimal matches are linked, as is shown in Figure 9.

Matching effect.
Step 4. After matching, each active contour corresponded with the PCA-SIFT is picked up to extract the optimal similar contour in the composite area. Then separate the area by the active contours, and get the tracking trajectory of each target, as is shown in Figure 10.

Extracted contour marking.
Step 5. Use occlusion processing algorithm and Kalman filter to predict whether the occlusion ends. And after the end of the occlusion, the tracking is continuing with rapid Kalman filter to record the whole trajectory, as is shown in Figure 11.

Tracking results.
The experiments show that when the occlusion does not occur, using only Kalman filter can achieve good prediction and tracking. While multitargets occlusion is occurring, the PCA-SIFT features with the optimal similar active contours are used to process the area located by the prediction of Kalman filter. Then the occlusion of multitargets can be separated to keep on the correct and rapid tracking.
Then, the paper extracts 20 trajectories in one period while many occlusions happen. The extract results are shown in Figure 12. For the research of traffic behavior, the spatiotemporal trajectories are considered as the key point for analysis. And based on the algorithm, the spatiotemporal trajectories can be extracted, which are shown in Figure 13.

Trajectories in the area.

Spatiotemporal trajectories.
Then comparing the demarcated trajectories with the real trajectories, which are extracted by manual work, the errors are shown in Table 1.
Trajectories comparison.
From Figures 12 and 13 and Table 1, it can be seen that the algorithm of this paper can extract the continuous traffic trajectory and express the entire traffic behavior overcoming the occlusion, which are important for the further research of behavior. And the accuracy and efficiency can satisfy the necessary of the traffic detection and parameter extraction.
5. Conclusion
In the paper, the basic algorithm is built for the problems of multitargets tracking in traffic video detection by the Kalman filtering prediction combined with PCA-SIFT matching method, which achieves fast occlusion area location and multitargets matching. Then the algorithm is expanded by both the PCA-SIFT description and the optimal similar active contour, which separates the occlusion into independent targets and achieves correct and effective multitargets tracking. Finally, comparing the trajectories extracted by the algorithm with the real trajectories, the effectiveness and accuracy of the extracted trajectory in this paper are verified, and the mean error is less than 5 pixels in the occlusion state which can satisfy the need of multitargets tracking. In the future, the extracted traffic parameters such as velocity, acceleration, and behaviors could be in-depth studied for further research.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Footnotes
Acknowledgments
This research was funded by the National Natural Science Foundation of China (nos. 51278220, 51108208, and 51278520), the Postdoctoral Science Foundation Funded Project of China (no. 2013T60330), the Fundamental Research Fund for the Central Universities of China (no. 201103146), the Science and Technology Development Project of Jilin Province (no. 20130522121JH), and Knowledge Innovation Program of the Chinese Academy of Sciences (no. YYYJ-1122).
