Collaboration calibration and three-dimensional localization in multi-view system

Abstract

In this research, the authors have addressed the collaboration calibration and real-time three-dimensional (3D) localization problem in the multi-view system. The 3D localization method is proposed to fuse the two-dimensional image coordinates from multi-views and provide the 3D space location in real time. It is a fundamental solution to obtain the 3D location of the moving object in the research field of computer vision. Improved common perpendicular centroid algorithm is presented to reduce the side effect of the shadow detection and improve localization accuracy. The collaboration calibration is used to generate the intrinsic and extrinsic parameters of multi-view cameras synchronously. The experimental results show that the algorithm can complete accurate positioning in indoor multi-view monitoring and reduce the complexity.

Keywords

Multi-view system collaboration calibration three-dimensional localization improved common perpendicular centroid algorithm wireless multimedia sensor networks

Introduction

The object tracking is a challenging task for various data sets; therefore, it is still a hot topic of research in computer vision.^1

–5 To extract the metric information from two-dimensional (2D) images, a flexible calibration technique is proposed.⁶ In this technique, a planar pattern shown at least two different orientations is required. It is a fundamental theory and used in practice to locate the target. To the best of our knowledge, the three-dimensional (3D) localization algorithm for the multi-view system still needs further research.^7
–9

In disaster environment, dynamic localization is the key for mobile robot to carry out various rescue missions. The key areas are monitoring real-time 3D location of mobile robot and providing real situation of disaster scene for rescue crews. The current localization technology focuses on localization by single^3,5,10 or binocular vision,^6,11,12 and localization accuracy needs to be improved. The main problems can be summarized as follows. First, the collaboration of multi-views is only used for object detection and extraction. There is no efficient method to fuse data from multi-views to obtain 3D space location of targets. Second, the target is located at 2D ground plane by single view or 3D space by binocular vision. Multi-view informations are not fully participated in the localization algorithm. Last but not least, the shadow detection can damage the localization results. The shadow on the floor can cause the localization errors significantly.

The camera calibration is a key issue in the camera system. Many researches focus on the accuracy of the parameters and localization results.^6,13
–15 However, the collaboration calibration of multi-view cameras is convenient and efficient at the deployment phase of indoor surveillance. It can reduce the heavy work of multi-camera calibration in visual sensor networks. This article pays more attention on the collaboration calibration.

To optimize the deployment of multi-view cameras, the position and orientation of cameras are predetermined and well ordered. Assuming the cameras are calibrated by the initial placement, the collaborative target localization with fault tolerance is presented to solve the target localization problem.¹⁶ To tolerate potential sensor faults, a voting mechanism is adopted and a threshold value needs to be specified, which is the key to the realization of the distributed solution. Analytical study is conducted to derive the lower and upper bounds for the threshold such that the probability of faulty sensors that negatively impacts the localization performance is less than a small value. Correspondence between multiple cameras is critical to get the panoramic view of the environment. A state-of-the-art method for people tracking is presented for the multi-views.¹⁷ It is a simple and robust method based on principal axes of people. The positions of people can be localized in the partial occlusion scenarios.

The localization of mobile robots is the frontier field of wireless sensor networks (WSNs) in disaster relief system.^18,19 For the indoor network blind areas, an autonomous dynamic localization algorithm is proposed.²⁰ This method chooses neighbor beacon node and sets up grids with received signal strength indication for the distance measurement.

The 2D localization algorithm is popular in the pedestrians monitoring.^17,21,22 First, the correspondence between multi-view cameras is used to match the pairs of objects with the minimum distance of various cost functions. The pedestrians are extracted by object detection technique. Second, the data fusion from multi-view cameras is used to improve the tracking results in each view. The centroid of foot position is located within the common ground plane. Third, the 3D space location of the pedestrians is located by 2D localization algorithm, which suggests the height is 0 in the Z-axis of the ground plane. The shortcoming is obvious that the foot position must be as accurate as possible, because the side effect of the shadow detection can damage the localization results.

Camera networks (CNs) have been widespreadly employed in many real-world applications of numerous aspects of life, such as intelligent transportation, smart city, and building surveillance.²³ With the development of camera-based technologies, image-based localization may be employed in an indoor environment where the global position system signal is weak.²⁴ Localization of a robot relative to its environment using vision information (i.e. appearance-based localization) has received extensive attention over the past few decades from the robotic and computer vision communities. Vision-based robot positioning may involve two steps. The first step involves learning some properties of vision data (features) with respect to the spatial position where observation is made (so-called mapping). The second step is to find the best match for the new spatial position corresponding to the newly observed features (so-called matching). The mapping from these visual features to the domain of the associated spatial position is highly nonlinear and sensitive to the type of selected features.²⁵

The next generation of sensor node such as Lotus mote enables higher performance, low power consumption, higher storage/memory, and higher speed processing capability than the older generation, such as TelosB or MicaZ motes, Crossbow Technology, which facilitates multimedia data preprocessing and compression in wireless multimedia sensor networks (WMSNs).²⁶ Target localization is to estimate the location of a target in the world coordinate based on the visual information of camera nodes. Target localization in WMSNs faces great challenges. First, image processing is in general costly to implement in local nodes, because the capabilities of computing are limited in local nodes.²⁷ Second, the bandwidth resources are also restricted in WMSNs. Thus, there are constraints to transmit a huge amount of visual data generated by cameras to central node or a base station. Third, since the sensing capability of a camera is characterized by directional sensing, the location information of a target in the depth dimension is lost in an image. Fourth, due to the cost limitation, visual nodes in WMSNs are equipped with low-resolution optical sensors. Thus, the accuracy of filtering and extraction of target’s position relevant information cannot be guaranteed in local sensor level. Vision-based surveillance by multiple cameras receives considerable attentions, so visual surveillance by multiple cameras will enlarge the area and information from multiple views can be used to solve many problems.²⁸

In the indoor multi-view system, it is a challenging task to obtain the real-time 3D space location of the mobile robot by the predetermined and well-ordered multi-camera system. The main contributions of this article can be summarized as follows. First, for the sake of the real-time location of the mobile robot, a 3D localization is proposed to detect the moving object and obtain 3D space location by fusing the 2D image coordinates from multi-views. Second, improved common perpendicular centroid (ICPC) algorithm is introduced to provide the error correction mechanism for the side effect of shadow detection and improve the accurate localization in the indoor intelligent monitoring. Third, the collaboration calibration of multi-view cameras is used to improve the efficiency of the deployment of indoor surveillance. Experimental results show that the proposed method can realize real-time 3D localization reliably and efficiently.

System descriptions

Figure 1 shows the flowchart of 3D localization in multi-view system. The final purpose of this work is to realize the real-time 3D localization. Therefore, camera calibration must be fulfilled, and the 3D space location of mobile robot is determined by the data fusion of 2D image coordinates from multi-views.

Figure 1.

The flowchart of 3D localization in multi-view system. 3D: three-dimensional.

Collaboration calibration is to find out the intrinsic and extrinsic parameters of multi-view cameras by the coordinates of the reference points. The coordinates of the reference points can be divided into the 3D world coordinates and 2D image coordinates. The image coordinate is obtained by Harris corner detection. The position of the calibration board must be predetermined and measured to obtain the 3D world coordinates of the corner points.

CPC algorithm is the proposed 3D localization solution. It can take advantages of multi-view cameras and determine the real-time 3D space locations. ICPC algorithm is an improved version of CPC algorithm. Since the estimated 3D locations are always lower than the ground truths of the object, a base plane is employed by ICPC algorithm to slightly move up the estimated locations.

CNs have been widespreadly employed in many real-world applications of numerous aspects of life, such as intelligent transportation, smart city, and building surveillance.²³ The first experimental platform in this work deploys four charge coupled device (CCD) cameras to monitor the overlapping field of view in the interest of area. It is suitable for the proposed 3D localization algorithms to realize the data fusion from multi-views. The next generation of sensor node such as Lotus mote and PIXY enables higher performance and higher speed processing capability than the older generation, such as TelosB or MicaZ motes, which facilitates multimedia data preprocessing and compression in WMSNs.^26,29 The second experimental platform in this work deploys four PIXY visual sensors in the room and localizes the mobile robots by the proposed algorithms. WSNs are employed for the data transmission.²⁰

To obtain the 2D image coordinates from multi-views, background subtraction is used for object detection in the CNs platform. It is capable to track the single object in indoor environment.³⁰ Since the lights and illuminations changes can cause high false alarms rates, it is not employed for multi-object tracking. PIXY visual sensor is the latest product in the robotics field.²⁶ It employs the color algorithm to distinguish between different colors in the real world. Since the color algorithm provides very low false alarms rates, it is used for multi-object tracking by the proposed 3D localization algorithm. The red boxes in Figure 1 show the main contributions of this work.

Harris corner detection and extraction

For the issue of the corner point detection, the technology of Harris corner is employed. Due to the image edge effect and adjacent corner point phenomenon, an improved Harris algorithm is presented to detect and extract the image coordinate. It can handle the effect of the image edge and double-peak phenomenon efficiently.

Image edge effects processing

The matrix obtained by corner response function (CRF) covers all the characteristics of the gray variance of the image. Among them, the location of the corner point is relatively complex. At first, the method adopts 5 × 5 search window, with the pace of 1 pixel to traverse the entire image, to look for the local maximum in small scale. And then through a fixed threshold, the position of the corner point is localized. The formula is as follows

{\tilde{M}}_{i, j} = \sum_{i = 3}^{r o w} \sum_{j = 3}^{c o l} max (\begin{matrix} [\begin{matrix} {\tilde{R}}_{i - 2, j - 2} & \dots & {\tilde{R}}_{i - 2, j + 2} \\ ⋮ & {\tilde{R}}_{i, j} & ⋮ \\ {\tilde{R}}_{i + 2, j - 2} & \dots & {\tilde{R}}_{i + 2, j + 2} \end{matrix}] \end{matrix})

where row and col are the length and width of the image, ${\tilde{M}}_{i, j}$ is a data set, with the center ${\tilde{R}}_{i, j}$ of ith row and jth column cell in CRF matrix and the edge of the search window.

Search results of the local maximum matrix in small scale are shown in Figure 2. Before repair, detected corner points with the red cross mark are away from image edge because there are many 0’s in the edge area of the matrix.

Figure 2.

The small-scale peak search.

Therefore, the method focuses on the data repair of the matrix edge, and the adjacent coverage principle was presented. The first and second rows of CRF matrix are replaced with the data of the corresponding unit in the third row. Other edges have the similar application. After repair, the fresh corner points with blue circle mark are shown in the amplification region above; when CRF data are greater than a certain threshold, the corner points can be extracted.

Center point principle of adjacent points

In the acquired images, the double-peak phenomenon is depicted clearly as shown in Figure 2, the amplification area of adjacent points. The image distance between adjacent points is less than 5 pixels. Since the distance of world coordinates between the feature points in the calibration board is about 10 cm, the image distance between the feature points is greater than 5 pixels.

To eliminate the adjacent points, a threshold to judge the distance between adjacent points is present. If less than the threshold, the center point is computed, that is, $P_{center} = mean (P_{1}, P_{2})$ , where P ₁ and P ₂ are the adjacent points.

Extraction of the effective corner points

In the experiment environment, the corner points around the calibration board need to be filtered out automatically. According to the position feature of corner points in the calibration board, the slope formula between two points is used. All the corner points, whose slope is greater than a certain threshold, are eliminated, and then the corner points of the calibration board are retained. First, 2 × 9 corner points of both sides in the calibration board are selected manually. Second, the slope with the each pair of the points on both sides is calculated. Third, the corner points of the calibration board are determined through the comparison between the slope and a fixed threshold. To prevent the edge effect, the 9 × 9 corner points inside the calibration board are selected.

Collaboration calibration

The imaging model is a key link in the conversion process of image coordinates and world coordinates.⁶ This work adopts four cameras with an angle of 40°, which is applicable to the pinhole imaging principle. Therefore, the linear model can be used for coordinate transformation.

Linear model

To estimate the 3D camera motion from 2D images, vision techniques are usually based on the pinhole camera model described in Figure 3, where $[X_{C}, Y_{C}, Z_{C}]$ represents the camera coordinate system and $[X_{W}, Y_{W}, Z_{W}]$ represents the world coordinate system. The distance between the camera center C and image frame center C ₀ is the focal length f. The dotted line through the image frame center C ₀ is the principal axis.

Figure 3.

Frame definition.

The linear model between the coordinates of a physical 3D point $P = {[X_{W}, Y_{W}, Z_{W}]}^{T}$ expressed in the world frame and its projection in the image plane $p = {[u, v]}^{T}$ is given by³¹

s \tilde{p} = K \tilde{P}

where s is a scale factor, $\tilde{p} = {[u, v,1]}^{T}$ in $ℝ^{2}$ and $\tilde{P} = {[X_{W}, Y_{W}, Z_{W},1]}^{T}$ are the homogeneous coordinates of p, and K is a (3 × 4) projection matrix defined up to a scale factor. The homogeneous coordinates are used in order to express the projection as a linear transformation. The projection matrix depends both on camera intrinsic and extrinsic parameters. Intrinsic parameters do not depend on the camera location, but rather on the internal camera parameters such as the focal length f, the number of pixels per distance unit in u and v directions k_u and k_v , the skew factor γ which equals zero if and only if the u and v directions are perfectly orthogonal, and the image frame coordinates of the intersection between the optical axis and the image plane called the principal point $c_{0} = (u_{0}, v_{0})$ . These parameters define the calibration matrix $\tilde{K}$ of the camera expressing the linear transformation between the camera frame and the image frame, given by³²

\tilde{K} = [\begin{matrix} k_{u} f & γ & u_{0} \\ 0 & k_{v} f & v_{0} \\ 0 & 0 & 1 \end{matrix}]

Homogeneous coordinates simplify the notation needed to describe perspective projections and allow for projective-geometric concepts such as points and lines at infinity.³³

Linear model of collaboration calibration

If the projection matrix is $M^{τ}$ , which integrates the intrinsic and extrinsic parameters and $τ = 1, \dots,4$ is the τth camera, the relationship between the image coordinate and world coordinate system of cameras is described as follows

Z_{c i}^{τ} [\begin{matrix} u_{i}^{τ} \\ v_{i}^{τ} \\ 1 \end{matrix}] = [\begin{matrix} m_{11}^{τ} & m_{12}^{τ} & m_{13}^{τ} & m_{14}^{τ} \\ m_{21}^{τ} & m_{22}^{τ} & m_{23}^{τ} & m_{24}^{τ} \\ m_{31}^{τ} & m_{32}^{τ} & m_{33}^{τ} & 1 \end{matrix}] [\begin{matrix} X_{w i} \\ Y_{w i} \\ Z_{w i} \\ 1 \end{matrix}]

where $(u_{i}^{τ}, v_{i}^{τ},1)$ is the homogeneous vector of the image coordinates for ith feature point, $Z_{c i}^{τ}$ is the Z-axis coordinate of the feature point in the coordinate system with the τth camera as the origin, and $(X_{w i}, Y_{w i}, Z_{w i},1)$ is the homogeneous vector of the feature point in the world coordinate system. Expanding the formula above, the former two formulas were divided by the last formula, $Z_{c i}^{τ}$ can be eliminated, and the following equations can be obtained by the rearrangement

{\begin{array}{l} X_{w i} m_{11}^{τ} + Y_{w i} m_{12}^{τ} + Z_{w i} m_{13}^{τ} + m_{14}^{τ} \\ - u_{i}^{τ} X_{w i} m_{31}^{τ} - u_{i}^{τ} Y_{w i} m_{32}^{τ} - u_{i}^{τ} Z_{w i} m_{33}^{τ} & = u_{i}^{τ} m_{34}^{τ} \\ X_{w i} m_{21}^{τ} + Y_{w i} m_{22}^{τ} + Z_{w i} m_{23}^{τ} + m_{24}^{τ} \\ - v_{i}^{τ} X_{w i} m_{31}^{τ} - v_{i}^{τ} Y_{w i} m_{32}^{τ} - v_{i}^{τ} Z_{w i} m_{33}^{τ} & = v_{i}^{τ} m_{34}^{τ} \end{array}

Assuming $m_{34}^{τ} = 1$ , then there will not be effect on the results. Therefore, the above formula has become the equations about elements of $M^{τ}$ , which is the mapping between the image coordinates and world coordinates of the ith feature point. Among them, the number of unknown elements for $M^{τ}$ is 11. Then the following formula is obtained

K^{τ} M^{τ} = U^{τ}

where $K^{τ}$ is $2 n \times 11$ -dimensional matrix, comprised of image coordinates and world coordinates, the number of which is n, in the left side of formula (3). $U^{τ}$ is 2n-dimensional matrix, comprised of image coordinates in the right side. If the rank of $K^{τ}$ is $rank (K^{τ}) = 11$ , the solution can be obtained through maximum likelihood estimation (MLE)

M^{τ} = {((K^{τ})^{T} (K^{τ}))}^{- 1} {(K^{τ})}^{T} U^{τ}

Nonsingular deployment of calibration boards

The characteristics of the linear model are the invariance of the translation and rotation. It causes that K^τ matrix is unable to achieve the full rank, which is expressed as $rank (K^{τ}) < 11$ . So as to obtaining nonunique solution of MLE, it leads to the calibration failure.

However, when the calibration board is deployed in different heights, the third and seventh columns of matrix K^τ fuse two groups of data, which leads to a linearly independent feature. That is $rank (K^{τ}) = 11$ . Matrix K^τ has nonsingular phenomenon, so that formula (5) has a unique solution.

Aiming at the linear correlation feature of matrix K^τ , the calibration boards are placed separately in the height of 27 and 130 mm from the ground level horizontally.

Since the angles and orientations of four cameras are predetermined, they have the common ground plane of the monitoring area. In the calibration stage, the cameras can snapshot and extract the feature points of the same calibration board. Therefore, the accomplishment of the calibration algorithm for four cameras can be achieved synchronously—that is the presented collaboration calibration method.

3D localization based on ICPC

The 3D localization was implemented in the indoor environment. The robot does not need to carry any wireless sensor to realize the indoor localization. The image coordinate of the target is extracted by the foreground segmentation of the predetermined multi-view cameras. And then ICPC is used for real-time 3D localization in the world coordinate system.

Adaptive background mixture model

The mathematical model of the background subtraction is adopted to detect the mobile robot. First, three Gaussian models are established to describe the background model and foreground model. Second, the first 40 frames are used to train the background model. Third, the threshold of the background model is set to 0.7. It is robust to the most noises in the indoor environment.

To train the background model, 40 frames without foreground objects are used to build the model. Then, the background model is updated by the incoming frames. At any moment N, the mathematical models of the pixel X_N are as follows

p (x_{N}) = \sum_{j = 1}^{K} w_{j} η (x_{N}; θ_{j}) η (x_{N}; θ_{j}) = η (x_{N}; μ_{j}, \sum_{j}) = \frac{1}{{(2 π)}^{\frac{D}{2}} {| \sum_{j} |}^{\frac{1}{2}}} e^{- \frac{1}{2} {(x - μ_{j})}^{T} \sum_{j}^{- 1} (x - μ_{j})}

where w_j is the weight value of the jth Gaussian model and $j = 1, \dots, K, K = 3$ . $η (x_{N}; θ_{j})$ is the normal distribution of the jth Gaussian model. $μ_{j}$ is the mean value of the j th Gaussian model, $\sum_{j} = σ_{j}^{2} I$ is the variance of the j th Gaussian model, σ is the standard deviation of the j th Gaussian model, and I is the identity matrix.

The computational complexity is $O (r o w \times c o l \times K)$ .

The positioning result of the mask image by adaptive background mixture model (ABMM) is shown in Figure 4(a). And the smallest rectangle method is used to localize the white region. Finally, the localization result is shown in Figure 4(b).

Figure 4.

The localization method of ABMM. (a) Mask result and (b) localization result. ABMM: adaptive background mixture model.

Common perpendicular centroid

The 2D image coordinate of the detected target in each view of the monitoring cameras is the most important evidence to localize the mobile robot. The experimental platform of this article is shown in Figure 5, and the straight line between the 2D image coordinate and the mobile robot is the specific ray of one view through the optical centerof the camera. Because of the environmental noise,^6,16 two rays shown as the dotted lines cannot be intersected at the same point. The bold solid line is the common perpendicular between two dotted lines in 3D space. The red cross mark is the center point. Any two rays in the experimental platform can find out the center point of the common perpendicular. The centroid of all center points is the estimated 3D location of the robot.

Figure 5.

The center point of common perpendicular.

The ray equations from two cameras to the robot are shown as follows

l_{1}^{β} : {\begin{array}{l} a_{1}^{β} X + a_{2}^{β} Y + a_{3}^{β} Z = a_{4}^{β} \\ b_{1}^{β} X + b_{2}^{β} Y + b_{3}^{β} Z = b_{4}^{β} \end{array}

l_{2}^{β} : {\begin{array}{l} c_{1}^{β} X + c_{2}^{β} Y + c_{3}^{β} Z = c_{4}^{β} \\ d_{1}^{β} X + d_{2}^{β} Y + d_{3}^{β} Z = d_{4}^{β} \end{array}

where $a_{i}^{β}, b_{i}^{β}, c_{i}^{β}, and d_{i}^{β}, i = 1, \dots,4$ , are determined by the coefficients of the camera calibration. $β = 1, \dots, C_{O b j}^{2}$ is the β th pair of the combination of all the rays. The number of the rays is Obj. It means the mobile robot is successfully detected by Obj cameras.

Assuming the direction vector of line $l_{i}^{β}$ is $r_{i}^{β} = (A_{i}^{β}, B_{i}^{β}, C_{i}^{β}), i = 1, 2$ , the following formula is obtained

{\begin{array}{l} r_{1}^{β} = (a_{1}^{β}, a_{2}^{β}, a_{3}^{β}) \times (b_{1}^{β}, b_{2}^{β}, b_{3}^{β}) \\ r_{2}^{β} = (c_{1}^{β}, c_{2}^{β}, c_{3}^{β}) \times (d_{1}^{β}, d_{2}^{β}, d_{3}^{β}) \end{array}

Assuming the intersection point between line $l_{i}^{β}$ and the ground is $(x_{i}^{β}, y_{i}^{β}, z_{i}^{β})$ , the coordinate can be determined by $z_{i}^{β} = 0$ and formulas (8) and (9). The symmetrical equation of line $l_{i}^{β}$ is as follows

l_{i}^{β} : \frac{X - x_{i}^{β}}{A_{i}^{β}} = \frac{Y - y_{i}^{β}}{B_{i}^{β}} = \frac{Z - z_{i}^{β}}{C_{i}^{β}}

Therefore, the direction vector of the common perpendicular for line $l_{i}^{β}$ is

e^{β} = (A_{1}^{β}, B_{1}^{β}, C_{1}^{β}) \times (A_{2}^{β}, B_{2}^{β}, C_{2}^{β}) = (e_{1}^{β}, e_{2}^{β}, e_{3}^{β})

Assuming $π_{i}^{β}, i = 1, 2$ , are the planes, which are parallel to the common perpendicular direction and through line $l_{i}^{β}$ , as follows

π_{i}^{β} : det (\begin{matrix} [\begin{matrix} X - x_{i}^{β} & Y - y_{i}^{β} & Z - z_{i}^{β} \\ A_{i}^{β} & B_{i}^{β} & C_{i}^{β} \\ e_{1}^{β} & e_{2}^{β} & e_{3}^{β} \end{matrix}] \end{matrix}) = 0

The parametric form of line $l_{i}^{β}$ is shown as follows

{\begin{array}{l} X & = x_{i}^{β} + A_{i}^{β} t_{i}^{β} \\ Y & = y_{i}^{β} + B_{i}^{β} t_{i}^{β} \\ Z & = z_{i}^{β} + C_{i}^{β} t_{i}^{β} \end{array}

Line $l_{i}^{β}$ put into $π_{2}^{β}$ , $t_{1}^{β}$ is obtained. Similarly, Line $l_{2}^{β}$ put into $π_{1}^{β}$ , $t_{2}^{β}$ is obtained.

Therefore, the center point of the common perpendicular and the estimated coordinate of the robot are as follows

{\tilde{P}}^{β} = [\begin{matrix} (x_{1}^{β} + A_{1}^{β} t_{1}^{β} + x_{2}^{β} + A_{2}^{β} t_{2}^{β}) / 2 \\ (y_{1}^{β} + B_{1}^{β} t_{1}^{β} + y_{2}^{β} + B_{2}^{β} t_{2}^{β}) / 2 \\ (z_{1}^{β} + C_{1}^{β} t_{1}^{β} + z_{2}^{β} + C_{2}^{β} t_{2}^{β}) / 2 \end{matrix}]

\hat{P} = \frac{1}{num} \sum_{β = 1}^{num} {\tilde{P}}^{β}, num = C_{Obj}^{2}

Improved common perpendicular centroid

The main challenge of the background model is the shadow on the floor caused by the robot moving. It can cause the significant errors between the detected bounding box by ABMM and actual bounding box of the mobile robot. Figure 6 depicts the errors in details. The yellow bounding box in Figure 6(a) is generated by ABMM. The green bounding box is the actual size of the robot. It clearly indicates the size fluctuation caused by the shadow on the floor. The consequence is that there is a standard deviation between the estimated and actual position of the mobile robot. Figure 6(b) describes the situation. The red plus is the estimated position of the robot and the blue circle is the actual location obtained by the centroids of the bounding boxes. The distance between the two positions is the standard deviation. Since the shadow on the floor has a great impact on the detection results, the estimated position is lower than the actual position in the majority of cases. The error correction is the motivation of ICPC algorithm.

Figure 6.

The standard deviation of ABMM. Standard deviation of the (a) bounding boxes and (b) centroids. ABMM: adaptive background mixture model.

The solution of the error correction is presented by ICPC algorithm. First, the six center points are generated by CPC algorithm in the “Common perpendicular centroid” section. They are distributed in the 3D space according to the four rays obtained by the four views. The four blue balls and two green balls represent the six center points. Second, the centroid of the six center points is located, which is represented by the red balls in Figure 7. The base plane is generated based on the centroid location horizontally. Third, the six center points are divided into two groups by the base plane. The four blue balls are lower than the base plane and two green balls are higher than the base plane. Last but not least, only the red ball and two green balls are used by ICPC algorithm. Among them, the red ball is the centroid of the six center points and the two green balls are higher than the base plane. Then, the centroid of the three balls is the estimated position of ICPC algorithm.

Figure 7.

The demonstration of ICPC algorithm. ICPC: improved common perpendicular centroid.

After experimental confirmation, the estimated position of CPC algorithm is always below the ground truth of the robot. Therefore, a reference point is presented—it is the estimated position of CPC algorithm. The center points of common perpendiculars, above the horizontal plane of the reference point, are screened out, and the centroids are refined as follows

\hat{\hat{P}} = \frac{1}{card (S) + 1} (\begin{matrix} \sum_{j = 1}^{card (S)} {\tilde{P}}^{S_{j}} + \hat{P} \end{matrix}), S = {β | β \in ({\tilde{Z}}^{β} > \hat{Z})}

where ${\tilde{Z}}^{β}$ is the Z-axis coordinate of the $β$ th center point ${\tilde{P}}^{β}$ of the common perpendicular, $\hat{Z}$ is the Z-axis coordinate of the estimated position of CPC algorithm, and $card (\cdot)$ is the number of the elements of the set S.

ICPC algorithm is a data fusion method to obtain the 2D image coordinates from multi-view cameras and presents the estimated 3D space location of mobile robot. There are two phases of data fusion of ICPC algorithm. The first phase of the data fusion method is CPC algorithm. The multi-view system in indoor environment has four side-view cameras in this work. When mobile robot is moving in the monitoring area, each side-view camera can locate the detected bounding box of mobile robot in its own image coordinate system by background subtraction method. Thus, a ray is generated from the optical center to the centroid of detect bounding box for each side view. Four side-view cameras can present four rays from its optical center to the mobile robot. Four rays can be divided into six unique pairs of rays based on the mathematical model $C_{O b j}^{2}$ . Each unique pair of rays can present a 3D space location of mobile robot by CPC algorithm. Six unique pairs of rays can present six 3D space locations of mobile robot by CPC algorithm. It is clearly indicated in Figure 6. Four blue balls below the base plane and two green balls above the base plane are the six 3D space locations by CPC algorithm. Some of the locations are close to the ground truth of mobile robot, but the rest of them are not. It is very difficult to determine which one is the best choice. Thus, the centroid of the six locations is used as the data fusion method to determine the estimated 3D space location of mobile robot by CPC algorithm. The red ball in Figure 6 shows the centroid of the six locations. The data fusion method can guarantee that the centroid of the six locations is not the worst choice at least.

The second phase of the data fusion method is the error correction of ICPC algorithm. Since the shadow detection can cause the significant errors between the detected bounding box and actual bounding box of mobile robot, there is a standard deviation between the estimated and actual position of mobile robot. As shown in Figure 5(b), the estimated position is lower than the ground truth of mobile robot in the majority of cases. The error correction mechanism of ICPC algorithm is to filter out the 3D space locations below the base plane. Figure 6 depicts that four blue balls below the base plane are filtered out. Only the centroid of the six locations and the locations above the base plane are used for the data fusion method of ICPC algorithm. As shown in Figure 6, the red ball and two green balls are used for data fusion. The centroid of the locations above the base plane is the data fusion method to present the estimated 3D space location of mobile robot by ICPC algorithm. The black ball in Figure 6 indicates the situation. The data fusion method can gurantee that the estimated 3D space location of ICPC algorithm is always higher than the estimated 3D space location of CPC algorithm. Thus, the standard deviation caused by shadow detection can be compensated by the error correction mechanism of ICPC algorithm.

ICPC algorithm is a parameter-free method^34
–36 that automatically computes the estimated 3D space location of mobile robot. However, when the location geometry is not desirable, the constrained weighted algorithm avoids the ill-conditioning problem efficiently.³⁷ The relationship between the source position and the auxiliary variables is explicitly incorporating in order to overcome the problems such as the accurate sensor location information may not be avaible. In this work, four side-view cameras present six locations of mobile robot. Since the cameras are deployed manually and calibrated by Harris corner technology, their locations and extrinsic parameters may not be accurate. The prior knowledge of a weighting matrix is helpful to compensate the errors caused by sensor position uncertainty. Because of the dynamic changes of illumination and environment noises over time, it is very difficult to obtain the prior knowledge of the weighting matrix over time. Thus, ICPC algorithm automatically determines the centroid of the six locations of CPC algorithm as the estimated location of mobile robot. There is no prior knowledge of the weighting matrix and no need to tune the parameters manually. It can achieve a good balance between model complexity and localization accuracy.

3D localization in WMSNs

The smart devices architecture of WMSN is a mesh network, including PIXY sensors, Arduino UNO, XBee S2, and PC. The image-processing task is completed by PIXY. The processing results include image coordinates and bounding boxes of detected mobile robot. In addition, XBee sensor network is responsible for data transmission, and the coordinator is used for data aggregation from multi-views of visual sensors synchronously.

Mesh network

The mesh network of the proposed WMSNs is shown in Figure 8. The predetermined four visual sensors are located at the four corners of the room. The red star is the coordinate origin and the ground plane coordinates of the visual sensors are measured in millimeter. All visual sensors are directed to the center of the room. About 1.5 × 1.5 m² area of the room is overlapped by four visual sensors, and the rest of the room is overlapped by less than four visual sensors. The mobile robot is carrying the PC that connected with the coordinator and moving at the center of the rooms. The visual sensors record the target with the interval of 1 s synchronously, and transmit the predefined package to the coordinator in its own time slot asynchronously. In addition, the state of the visual sensor is shifted from sleep mode to active mode as the mobile robot is detected. If the mobile robot is not detected by any visual sensors, the dynamic localization of the mobile robot in WSN will provide the navigation service.^20,38

Figure 8.

The mesh network of the proposed WMSN. WMSN: wireless multimedia sensor network.

Target detection

In fact, it is difficult to detect potential targets with low false alarms rates. In addition, the methods need to be implemented by smart devices with limited hardware resources like PIXY. Thus, color algorithm, which tracks the color blobs by hue saturation value color space, is adopted by PIXY to detect single color information.^29,39
–41 The 2D image coordinates and bounding boxes are transmitted to Arduino UNO. The detected results are shown in Figure 9. The cooked image in Figure 9(a) shows the red helmet fixed at the tripod on the mobile robot. Because of the rotation and shape invariance, the color information on the helmet is discriminative to other objects. Further, its height is greater than most of obstacles in the rooms. In addition, the color information can be detected successfully, when the mobile robot is still in the same place or moving around. The bounding box in Figure 9(b) describes the centroid and size of the helmet. The minimum block of the bounding box is 200 pixels. Therefore, most of the false alarms can be filtered out. The signature $s = 1$ represents the identity of the mobile robot, which can be identified by the multiple visual sensors without image matching when the color information is detected by the multiple visual sensors.

Figure 9.

The detected mobile robot by the color algorithm. (a) Cooked image and (b) bounding box.

The differences between the 3D camera and 2D cameras can be concluded into four aspects. First, the cost of 3D cameras is more expensive than that of the off-the-shelf 2D cameras.³⁹ It is much easier and less expensive to deploy the 2D cameras in the large-scale area. Second, 2D CNs have been widespreadly employed in many real-world applications of numerous aspects of life, such as intelligent transportation, smart city, building surveillance, and crime prevention.²³ Third, the computer vision techniques in 2D CNs are matural technology in the real life, which are capable of intelligent processing using multiple cameras, such as target detection, localization, identification, tracking, and events of interest, for public security. For example, the 2D cameras in an airport environment can implement the face recognition and tracking algorithms for the passengers at important locations of interest.⁴² The next generation of sensor node such as Lotus mote enables higher performance, low power consumption, higher storage/memory, and higher speed processing capability than the older generation, such as TelosB or MicaZ motes, which facilitates multimedia data preprocessing and compression in WMSNs.^26,29 WMSN are an important and exciting new technology with great protential for strengthening the traditional WSN applications, as well as creating a series of new multimedia applications such as multi-camera surveillance, visual target tracking, location-based multimedia services, and situation awareness.^40,41,43 Currently, 3D cameras have not been employed in WMSNs successfully.

Experiment and analysis

Experimental platform in CNs

The position and orientation of four CCD cameras are predefined and well ordered in the indoor environment. The focal length and photosensitive area of the cameras are 8 mm and 1/3, respectively. The video capture card is the model of 9508AV with the video compression algorithm H.264. The frame rate and resolution of the video are 25 fps and $352 \times 288$ pixels, respectively. The aim of the experiment is to evaluate the accuracy of camera calibration and positioning. The robot is placed at a predetermined trajectory purposefully. First, 40 frames of background images are captured by four cameras. Second, the robot is placed at a set of the predetermined position, and is shot by multiple cameras. Third, the proposed algorithm is to estimate the 3D world coordinate of the robot.

The fundamental task is the collaboration calibration. First, the calibration board is placed horizontally at the center of the monitoring area and shot by the predetermined four cameras. The heights of the calibration board are 27 and 130 cm in order to overcome the nonsingular of the projection matrix $M^{τ}$ . Figures 10 and 11 show the detected corner points of the 27 and 130 cm height calibration boards, respectively, by four cameras. Since the position and orientation of the side-view cameras are predetermined and well ordered, the detected corner points are located at the center of the four side-view images and cover the whole field of views. Some of the corner points are close to the image edges. Due to the image edge effects processing in the “Image edge effects processing” section, the corner points at the image edge can be located accurately. Because of the center point principle of adjacent points in the “Center point principle of adjacent points” section, the redundant corner points are eliminated successfully. All the corner points are located accordingly. The corner points are neatly arranged into regular rows and columns of the 9 × 9 matrix. However, because of lens distortion, there will be a large distortion at the edge of the field of views that is far away from the center of lens. The shape of the matrix is not a regular square. Each corner point of the calibration board has four detected corner points at four side-view images. Thus, a ray from the optical center of one side-view camera to the detected corner point of the same side-view image can be generated accordingly. Four detected corner points of four side-view images can generate four rays individually. Four rays can be divided into six unique pairs of rays. Further, six 3D space locations can be determined based on six pairs of rays by CPC algorithm. The data fusion of the six locations is the estimated location of one corner point of the calibration board.

Figure 10.

The corner points of the 27-cm height calibration board. (a) First view, (b) second view, (c) third view, and (d) fourth view.

Figure 11.

The corner points of the 130-cm height calibration board. (a) First view, (b) second view, (c) third view, and (d) fourth view.

Figure 12 depicts the localization results of CPC algorithm. The estimated 3D space location of any corner point of calibration board is located by four detected corner points of four side-view images in Figures 10 and 11. It is the centroid of six 3D space locations by CPC algorithm. The estimated locations are close to the ground truth of corner points. The shapes of the matrix formed by the estimated locations are the same as regular squares. However, because of lens distortion, the localization accuracy at the center is better than that at the edge of the monitoring area. Even in the worst cases, the estimated 3D space locations represented by the red plus markers are around the ground truths of corner points of calibration board represented by the blue circles obviously.

Figure 12.

The localization of CPC algorithm of calibration boards.

Figure 13 describes the localization errors of CPC algorithm. Since the calibration board has obvious feature points, the side effects of the illumination change can be limited to minimum. The estimate location of the detected corner points can be as accurate as possible. The majority of the localization errors are around 2 mm. All the errors are less than 7 mm eventually. The curve shows regular fluctuation. Some of the locations have a violent fluctuation and the rest of the locations are relatively stable. It is due to lens distortion and has a significant impact on the localization accuracy at the edge of the side-view images. However, the overall localization errors are controlled within a reasonable range.

Figure 13.

The localization errors of corner points.

Figure 14 depicts the cumulative distribution of the localization errors of CPC algorithm. Over 90% of the errors are less than 4 mm. Around 10% of the errors are less than 1 mm indeed. It indicates that CPC algorithm has a good performance for the localization of the 3D space points. Since the calibration board has distinct feature points and there are no side effects of shadow detection, the 3D space points can be accurately located by Harris corner detection technology. CPC algorithm can precisely localize the corner points based on the detected corner points of four side-view images. It is an ideal localization accuracy in the most applications.

Figure 14.

The cumulative distribution of the localization errors.

The experimental environment has the characteristics of the majority of robot application scenarios in the indoor lab. It is situated in a densely populated area, and the light source is complex including the sunlight and daylight lamp. Figure 15 describes the detection results from four side views in the real time. Each column presents a side view of the monitoring area. Each set of two rows forms a group of real-time detection results and masks in a sampling time. Figure 15 clearly indicates that the majority of the errors are generated by the shadow on the floor. The error correction can be obtained by ICPC algorithm. The yellow bounding boxes are the detected results of mobile robot by background subtraction. There are obvious tails at the foot of mobile robot. The centroids of the bounding boxes significantly deviate from the ground truths of mobile robot. The positions of the centroids are lower than the positions of the ground truths. Since the 2D image coordinates of the detected results from four side-view images are eventually lower than the actual image coordinates of mobile robot, the estimated 3D space locations of CPC algorithm are lower than the ground truths of mobile robot accordingly. It is crucial to compensate the errors caused by the shadow on the floor. The error correction mechanism of ICPC algorithm is used to minimize the side effects of shadow detection.

Figure 15.

The detection results and masks of four side views. (a) First view, (b) second view, (c) third view, (d) fourth view, (e) first view, (f) second view, (g) third view, (h) fourth view, (i) first view, (j) second view, (k) third view, (l) fourth view, (m) first view, (n) second view, (o) third view, (p) fourth view, (q) first view, (r) second view, (s) third view, (t) fourth view, (u) first view, (v) second view, (w) third view, and (x) fourth view.

In the “Adaptive background mixture model” section, ABMM is presented to detect the 2D image coordinate of the mobile robot in the area of interest. In the “Common perpendicular centroid” section, CPC is proposed to provide the methodology to fuse the 2D image coordinates from multi-views to acquire the 3D space location of the mobile robot. Then, centroid ABMM CPC (CACPC) is to localize the 3D location of the centroid of the robot in the world coordinate system. To improve localization accuracy and overcome the errors in 3D space, ICPC is proposed to reduce the side effects in the “Improved common perpendicular centroid” section. Further, centroid AICPC (CAICPC) method is presented to localize the real-time location of the centroid of the mobile robot.

For comparison with other related works, the principal axis-based tracking (PAT) algorithm¹⁷ is used at the experimental platform. It is a state-of-the-art method for people tracking in computer vision. Since the proposed experiment has the similar scenario with the study by Hu et al.,¹⁷ PAT algorithm can be implemented in this platform. Because of the geometric symmetry of the robot in this experiment, the bottom point of the bounding box can be treated as the intersections in one view for a robot. The centroid of these intersections is selected as the ground point of the robot.

The positioning errors are shown in Figure 16. The localization errors of CACPC and CAICPC algorithms are relatively less than the other. By calculating the mean error, the best accuracy is 44.66 mm achieved by the proposed CAICPC. Some of the localization errors of CAICPC algorithm are less than 10 mm. It clearly indicates that CAICPC method has improved the localization accuracy of CACPC method at the every sampling point. Since CAICPC algorithm can compensate the errors caused by shadow detection, the estimated 3D space locations of CAICPC algorithm are higher than those of CACPC algorithm, and the localization errors of CAICPC algorithm are less than those of CACPC algorithm accordingly. Therefore, the overall mean error has around 1 cm improvement. Because PAT algorithm cannot effectively overcome the side effects of shadow detection, the positioning errors are greater than the others. The performance of PAT is not competitive with any proposed methods.

Figure 16.

The errors distribution of the localization algorithms.

The error accumulation distribution of the localization algorithms is shown in Figure 17. The localization errors of CACPC and CAICPC algorithms can be limited within 10 cm. It indicates that the centroid part of the robot is more robust than the foot part of the robot. The localization results of the foot part of robot are worst because the side effects of shadow detection are serious at the experimental environment. The accurate positioning is achieved by the proposed CAICPC algorithm. It can successfully overcome the side effects of shadow detection and provide the real-time 3D space locations of mobile robot. The errors of the other algorithms are relatively greater. The worst results come from PAT, which is far behind the others.

Figure 17.

The cumulative distribution of the localization algorithms.

Experimental platform in WMSNs

As shown in Figure 8, the proposed smart devices architecture of WMSNs is implemented on the self-developed platform. PIXY has 24 clock cycles per pixel $320 \times 200$ 50 fps. The image size of $320 \times 200$ with the resolution of $640 \times 400$ is adopted for the real-time 2D image coordinates localization. In addition, PIXY requires only 20 ms to fulfill the target detection.

The synchronization message is transmitted to each visual sensor by the coordinator with the interval of 5 s. The key frame is recorded synchronously by the visual sensors with the interval of 500 ms. Further, two key frames are capsulated into the delivery package, and transmitted to the coordinator at its own time slot with the interval of 1 s asynchronously. The real-time data are clustered into discriminative groups by rules of spatial and temporal correlations.

The predetermined four visual sensors are placed at the four corners of the room. The angles of cameras are predefined. Thus, the two calibration planes are displayed at the centers of multi-views. Figure 18 shows the detected corner points of the low plane with the height of 1630 mm. They clearly indicate that the $12 \times 12$ corner points are covering perfectly the views of cameras. Figure 19 shows the detected results of the high plane with the height of 1732 mm. Even though the positions and angles of four cameras are fixed and the height of calibration plane is modified, the $12 \times 12$ corner points are still covering the views of cameras. The locations of the detected points are around the ground truth of the corner points. If the distances between the camera and corner points are shorter, the density of corner points is greater.

Figure 18.

The corner points of the 1630-cm height calibration board. (a) First view, (b) second view, (c) third view, and (d) fourth view.

Figure 19.

The corner points of the 1732 cm height calibration board. (a) First view, (b) second view, (c) third view, and (d) fourth view.

The angles and orientations of the four cameras are predetermined and well ordered in the indoor environment. The angles and orientations are measured by the two calibration boards. Figures 10, 11, 18, and 19 show the effects of the snapshots of four cameras in two platforms. Since the proposed 3D localization algorithms demand the object tracking across multiple cameras with overlapping view, the overlapping field of view should be as big as possible. Thus, the calibration boards are employed as a tool to measure the angles and orientations of the four cameras. In the CNs platform, the calibration boards have $1 \times 1$ m² area and place at the center of the monitoring region. The cameras need manual adjustment in order to place the two calibration boards at the center of the camera views in Figures 10 and 11. In this way, $1 \times 1$ m² overlapping field of view by the four cameras can be guaranteed at the center region of the monitoring area. Similarly, in the WMSNs platform, since the size of 3D space in the WMSNs platform is 3.6 × 4.5 × 2.5 m³, which is larger than that in the CNs platform, the size of calibration boards is 1.5 × 1.5 m², which is also bigger than that in the CNs platform. The cameras need manual adjustment in order to place the two calibration boards at the center of the camera views in Figures 18 and 19. Therefore, 1.5 × 1.5 m² overlapping field of view by the four cameras can be guaranteed at the center region of the monitoring area. As shown in the figures, the calibration boards can fill the entire camera views, which represents the maximum overlapping field of view by the four cameras.

Figure 20 depicts the localization results of CPC algorithm. The estimated 3D space location of any corner point of calibration board is located by four detected corner points of four side-view images in Figures 18 or 19. It is the centroid of six 3D space locations by CPC algorithm. The estimated locations are close to the ground truth of corner points. The shapes of the matrix formed by the estimated locations are the same as regular squares. However, because of lens distortion, the localization accuracy at the center is better than that at the edge of the monitoring area. Even in the worst cases, the estimated 3D space locations represented by the red plus markers are around the ground truths of corner points of calibration board represented by the blue circles obviously.

Figure 20.

The localization of CPC algorithm of calibration boards.

Figure 21 describes the localization errors of CPC algorithm. Since the calibration board has obvious feature points, the side effects of the illumination change can be limited to minimum. The estimate location of the detected corner points can be as accurate as possible. The majority of the localization errors are around 5 mm. All the errors are less than 11 mm eventually. The curve shows regular fluctuation. Some of the locations have a violent fluctuation and the rest of the locations are relatively stable. It is due to lens distortion and has a significant impact on the localization accuracy at the edge of the side-view images. However, the overall localization errors are controlled within a reasonable range.

Figure 21.

The localization errors of corner points.

Figure 22 depicts the cumulative distribution of the localization errors of CPC algorithm. Over 90% of the errors are less than 7 mm. Around 10% of the errors are less than 2 mm indeed. It indicates that CPC algorithm has a good performance for the localization of the 3D space points. Since the calibration board has distinct feature points and there are no side effects of shadow detection, the 3D space points can be accurately located by Harris corner detection technology. CPC algorithm can precisely localize the corner points based on the detected corner points of four side-view images. It is an ideal localization accuracy in the most applications.

Figure 22.

The cumulative distribution of the localization errors.

Since the size of 3D space in the WMSNs platform is 3.6 × 4.5 × 2.5 m³, the size of 3D space in the CNs platform is 2.2 × 2.5 × 1.8 m³, the monitoring area in the WMSNs platform is much larger than that in the CNs platform; thus, the localization errors are reasonably bigger. The maximum errors in WMSNs are around 11 mm and the maximum errors in CNs are around 7 mm. The lights and illuminations are also different in two platforms. Since the lights and illuminations are the major sources of the environmental noises, they can affect the detection results in computer vision techniques. In these experiments, CPC algorithm shows excellent localization performance. Even in the bigger monitoring area, CPC algorithm can control the average errors in 5 mm. There are no serious errors in the whole process.

Triangulation is important in various engineering applications, for example, surveying, navigation, metrology, astrometry, binocular vision, and target tracking, and is the fundamental estimation problem. Triangulation algorithm (TA)³¹ is a conventional 3D localization method and used for the comparation and performance evaluation with the proposed algorithms.

Figure 23 shows 3D localization results of the mobile robot of circle trajectory in the room. The estimated 3D locations of CPC and ICPC algorithms are consistent with the trends of circle trajectory, respectively. However, since the false alarms at the target detection have significant impacts on the data fusion of CPC and ICPC algorithms, there are still a few of 3D locations far away from the circle trajectory. The estimated locations are slightly lower than the ground truths in the whole process, it is the common phenomenon in the 3D localization, since the object is always viewed by the cameras mounted on the top ceiling. 3D localization algorithms cannot determine the exact height of the object, even though ICPC algorithm can use the base plane in Figure 7 to make the estimated locations higher than that of CPC algorithm.

Figure 23.

3D localization of circle trajectory. 3D: three-dimensional.

Figure 24 shows the cumulative distribution of localization errors for the performance evaluation of localization algorithms. All the errors of the proposed CPC and ICPC algorithms are less than 100 mm. Since only 85% of the errors of TA algorithm are less than 100 mm, the localization accuracy of the proposed algorithms is much better than the state-of-the-art method. Further, the maximum values of the errors of TA algorithm are more than 500 mm. It indicates that TA algorithm cannot provide the smooth localization results, unexpectedly. The average value of the errors of the proposed ICPC algorithm is 89.3281 mm. It is the best accuracy among the other localization algorithms. The localization accuracy of CPC algorithm is slightly worse than that of ICPC algorithm, since the estimated heights of ICPC algorithm are slightly higher than CPC algorithm.

Figure 24.

The cumulative distribution of the localization errors of single object.

Figure 25 depicts the localization errors of localization algorithms. The trends of the proposed CPC and ICPC algorithms are smooth and stable. The magnitudes of the errors of TA algorithm are very big. It indicates that TA algorithm cannot provide stable localization results, especially at the beginning, the localization performance of TA algorithm is very poor due to the lights and illuminations changes.

Figure 25.

The localization errors of single object.

Figure 26 describes the packet transmission of wireless communication in WMSNs. Because of the instability and package loss of wireless communication, only 24 samplings contain the synchronous detected 2D coordinates from four visual sensors. There are 176 samplings containing the synchronous data from three visual sensors with a missing view. Two-hunderd and thirty-six samplings only contain the synchronous data from two visual sensors with two missing views. Since the bottleneck at the coordinator, some of the transmission packages are missing accordingly.²⁰ It can seriously affect the localization accuracy of the proposed CPC and ICPC algorithms, since the proposed algorithms require the detection results from multi-views synchronously. If the received detection results are only from two views, there is only one center point instead of six center points in Figure 7, it will make the estimated location seriously deviated from the ground truth unexpectedly.

Figure 26.

The wireless transmission in WMSNs. WMSN: wireless multimedia sensor network.

Figure 27 shows the multi-object detections from multi-views by the color algorithm. There are two mobile robots moving in the room, which have red and blue helmets fixed at the tripod on the mobile robots. The color algorithm builds the red and blue color models based on the color appearance of the helmets. PIXY visual sensor can build hundreds of color models based on the differences of color appearances. It means PIXY visual sensor can track hundreds of objects based on the differences of the color appearances. Because of the dynamic changes of lights and illuminations, if the color appearance has no salient features, the false alarm rate can be very high. It will lead to a poor localization performance. Therefore, red and blue colors are employed in this experiment, since their salient features can be distinguished and detected with very low false alarm rate. The signatures $s = 1$ and $s = 2$ represent the red and blue color models, respectively, which can be used to match the objects among multi-views. Figure 27(b) and (d) both have the signature $s = 1$ , which means that the object with the red color appearance is detected by both the first view and second view cameras. The signature $s = 2$ is only detected by the second view camera. Since the red helmet occludes the blue helmet in the particular scenario, even though the blue helmet can be partially appeared in the first view, the detected blue pixels cannot reach the threshold of 200 pixels, which is the minimum block of the bounding box, the color algorithm fails to detect the blue color model in Figure 27(a). In case that there are four cameras in the multi-view system and an object is occluded in one view, it does not mean that the object can be occluded in all the other views. $s = 2$ in Figure 27(d) shows that the blue helmet can be detected successfully by the second view, even though it is occluded and fails to be detected by the first view. Figure 27(c) shows some of the detected blue pixels are outside the bounding box. It is very common in the real world. Since the indoor environment cannot exclude all the red and blue colors in purpose, there are always some false alarms in the system. However, because there is a threshold of minimum block of the bounding box, which is 200 pixels, any small part of the detected color pixels can be exluded from the object detection successfully. It will lead to a very low false alarm rate.

Figure 27.

The multi-object detections from multi-views by the color algorithm. (a) Cooked image of the first view, (b) bounding box of the first view, (c) cooked image of the second view, and (d) bounding box of the second view.

Figure 28 depicts the multi-object tracking trajectories of the proposed algorithms. In order to distinguish between two objects, the heights of two objects are 1773 and 1643 m. The two black cross trajectories show the ground truths of two objects. When two objects are moving along the cross paths simultaneously, they can occlude each other before the cameras. Even though an object is occluded by the other object in one view, it can still be detected by the other three cameras. It means the multi-views monitoring system can overcome obstacles. Furthermore, the cross trajectories include several sharp turns, which can be employed to evaluate the localization performance in difficult circumstances. The estimated locations of the proposed algorithms are close to the ground truths. It clearly shows that the red and blue markers can form the cross shapes in high and low level. It indicates that the proposed algorithms are suitable for localization of multiple objects.

Figure 28.

3D localization of cross trajectory. 3D: three-dimensional.

Figure 29 depicts the cumulative distribution of localization errors of the proposed algorithms. Around 60% of the errors of the proposed algorithms can be controlled in 100 mm. Around 90% of the errors of the proposed algorithms can be limited in 400 mm. However, some errors are still more than 500 mm. TA algorithm can only control 20% of the errors in 300 mm. Almost half of the errors are over 400 mm. The average errors of the proposed algorithms are both around 190 mm. Many transmission packages are missing due to the bottleneck at the coordinater. It can seriously affect the localization performance of the proposed algorithms. The average localization error of TA algorithm is 366.72 mm and still the worst localization accuracy.

Figure 29.

The cumulative distribution of the localization errors of single object.

Figure 30 depicts the error distribution of the proposed algorithms. The fluctuation is similar to the trends of one object. The proposed algorithms can provide the better localization accuracy than TA algorithm in every sampling point. It clearly indicates that the proposed algorithms always have the best performance than TA algorithm.

Figure 30.

The localization errors of single object.

Figure 31 depicts the wireless communication environment. The majority of samplings have received transmission packages from two views. Only eight samplings have received data from three views. It means that the proposed algorithms can only provide one center point in Figure 7. It will greatly affect the localization accuracy of the proposed algorithm. Especially, ICPC algorithm cannot function well, and there are no center points above the base plane. It leads to the same estimated location between CPC and ICPC algorithm as shown in Figure 29.

Figure 31.

The wireless transmission in WMSNs. WMSN: wireless multimedia sensor network.

Conclusions

A 3D localization algorithm based on the concept of common perpendicular has been presented for the experimental platforms of CNs and WMSNs. It is extended to the dynamic localization of mobile robots, which is the single- or multi-object tracking. The proposed single-object tracking relies on background subtraction that has been shown to be valid for a suitable camera setups and object motions. The proposed multi-object tracking employs the color algorithm to build the color models based on the color appearance. The performance of the localization algorithm has been demonstrated, not only for camera calibration on the different types of lights and illuminations, but also for the underlying problems of real-time single-object 3D localization, as well as for multi-object 3D localization from multi-view. The experimental results show that it can effectively realize the high localization accuracy, overcome the side effects of shadow detection, and can be extended to follow-up experiments. Future work will cover the extension of the proposed algorithm to the smoothness and continuity of trajectory by the filters, and a comparative study of these approaches for WMSN-based 3D localization as well as for collaborative calibration.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by National Natural Science Foundation of China under grant No. 61772018, General Research Project of Zhejiang Provincial Department of Education grant number Y201839944, Public Welfare Technology Research Project of Zhejiang Province grant number GG19F020015, Public Welfare Technology Application Research Project of Shaoxing City grant number 2018026059, and Research Foundation for Talented Scholars of Shaoxing University under grant No. 20185001.

References

Lin

. Nus-pro: a new visual tracking challenge. IEEE Trans Pattern Anal Mach Intell 2016; 38(2): 335–349.

Mazzu

Morerio

Marcenaro

. A cognitive control-inspired approach to object tracking. IEEE Trans Image Process 2016; 25(6): 2697–2711.

Jiang

Wang

. Individual adaptive metric learning for visual tracking. Neurocomputing 2016; 191: 273–285.

Shen

Dick

. Online metric-weighted linear representations for robust visual tracking. IEEE Trans Pattern Anal Mach Intell 2015; 38(5): 931–950.

Shen

Sui

Pan

. Adaptive pedestrian tracking via patch-based features and spatial–temporal similarity measurement. Pattern Recognit 2015; 53: 163–173.

Otero

Sánchez

. Local iterative DLT soft-computing vs. interval-valued stereo calibration and triangulation with uncertainty bounding in 3D reconstruction. Neurocomputing 2015; 167: 44–51.

Razavi

Valkama

Lohan

. Robust statistical approaches for RSS-based floor detection in indoor localization. Sensors 2016; 16(6): 793.

Ishizu

Seo

Igarashi

. Noninvasive localization of accessory pathways in Wolff-Parkinson-White syndrome by three-dimensional speckle tracking echocardiography. Circ Cardiovasc Imaging 2016; 9(6).

Zheng

Zhou

Tang

. A 3D indoor positioning system based on low-cost MEMS sensors. Simul Model Pract Theory 2016; 65: 45–56.

10.

Filko

Cupec

Nyarko

. Evaluation of color and texture descriptors for matching of planar surfaces in global localization scheme. Robot Auton Syst 2016; 80(C): 55–68.

11.

Sánchez

Taddei

Ceriani

. Localization and tracking in known large environments using portable real-time 3D sensors. Comput Vision Image Underst 2016; 149: 197–208.

12.

De Silva

Uneri

Ketcha

. 3D–2D image registration for target localization in spine surgery: investigation of similarity metrics providing robustness to content mismatch. Phys Med Biol 2016; 61(8): 3009–3025.

13.

Wen

Cheng

Wang

. Court reconstruction for camera calibration in broadcast basketball videos. IEEE Trans Visual Computer Graphics, 2016; 22(5): 1517–1526.

14.

Heinze

Spyropoulos

Hussmann

. Automated robust metric calibration algorithm for multifocus plenoptic cameras. IEEE Trans Instrum Measure 2016; 65(5): 1197–1205.

15.

Wang

Fan

. Photometric calibration and image stitching for a large field of view multi-camera system. Sensors 2016; 16(4): 516.

16.

Karakaya

. Collaborative localization in visual sensor networks. ACM Trans Sens Netwk 2014; 10(2): 127–146.

17.

Zhou

. Principal axis-based correspondence between multiple cameras for people tracking. IEEE Trans Pattern Anal Mach Intell 2006; 28(4): 663–671.

18.

Sheng

Zhang

. Dynamic localization of mobile robot based on asynchronous Kalman filter. Dongbei Daxue Xuebao J North Univ 2013; 34(3): 312–316.

19.

Rashvand

Abedi

Alcaraz-Calero

. Wireless sensor systems for space and extreme environments: a review. IEEE Sens J 2014; 14(11): 3955–3970.

20.

Feng

Zhang

. Grid-based improved maximum likelihood estimation for dynamic localization of mobile robots. Int J Distrib Sens Netwk, 2014; 10(3): 1–15.

21.

Berclaz

Fleuret

Turetken

. Multiple object tracking using k-shortest paths optimization. IEEE Trans Pattern Anal Mach Intell 2011; 33(9): 1806–1819.

22.

Milan

Roth

Schindler

. Continuous energy minimization for multitarget tracking. IEEE Trans Pattern Anal Mach Intell 2014; 36(1): 58–72.

23.

Piciarelli

Esterle

Khan

. Dynamic reconfiguration in camera networks: a short survey. IEEE Trans Circ Syst Video Technol 2016; 26(5): 965–977.

24.

Yan

Ren

. Where am i in the dark: exploring active transfer learning on the use of indoor localization based on thermal imaging. Neurocomputing 2016; 173: 83–92.

25.

Jadaliha

Choi

. Feature selection for position estimation using an omnidirectional camera. Image Vision Comput 2015; 39: 1–9.

26.

Ahmed

. An optimal complexity H. 264/AVC encoding for video streaming over next generation of wireless multimedia sensor networks. Signal Image Video Proc 2016; 10(6): 1143–1150.

27.

Alhilal

Soudani

Al-Dhelaan

. Image-based object identification for efficient event-driven sensing in wireless multimedia sensor networks. Int J Distrib Sens Netwk 2015; 2015: 24.

28.

Pahlavan

Krishnamurthy

Geng

. Localization challenges for the emergence of the smart world. IEEE Access 2015; 3: 3058–3067.

29.

Rowe

Goode

Goel

. CMUcam3: an open programmable embedded vision sensor. Tech Rep, RI-TR-07-13, Carnegie Mellon Robotics Institute, 2007.

30.

Kaewtrakulpong

Bowden

. An improved adaptive background mixture model for real-time tracking with shadow detection. Boston, MA: Springer, 2002.

31.

Houssineau

Clark

Ivekovic

. A unified approach for multi-object triangulation, tracking and camera calibration. IEEE Trans Signal Proc 2016; 64(11): 2934–2948.

32.

Ben-Afia

Deambrogio

Salós

. Review and classification of vision-based localisation techniques in unknown environments. IET Radar Sonar Navi 2014; 8(9): 1059–1072.

33.

Hartley

Zisserman

Multiple view geometry in computer vision. Cambridge, United Kingdom: Cambridge University Press, 2001, pp. 233–236.

34.

Panagiotakis

Argyros

. Parameter-free modelling of 2D shapes with ellipses. Pattern Recognit 2016; 53: 259–275.

35.

Masching

Bletzinger

. Parameter free structural optimization applied to the shape optimization of smart structures. Finite Elem Anal Des 2016; 111: 33–45.

36.

Davoudi

Sadeh

Kamyab

. Parameter-free fault location for transmission lines based on optimisation. IET Gen Trans Distrib 2015; 9(11): 1061–1068.

37.

Xie

. An efficient convex constrained weighted least squares source localization algorithm based on TDOA measurements. Signal Proc 2016; 119(C): 142–152.

38.

Feng

Zhang

. Dynamic localization of mobile robot based on asynchronous Kalman filter. J North Univ (Natural Sci) 2013; 3: 312–316.

39.

Djelouah

Franco

Boyer

. Sparse multi-view consistency for object segmentation. IEEE Trans Pattern Anal Mach Intell 2015; 37(9): 1890–1903.

40.

Dinh

Inanc

. Low cost mobile robotics experiment with camera and sonar sensors. In: American control conference 2009., St. Louis, MO, 10–12 June 2009, pp. 3793–3798. New York: IEEE.

41.

Wibowo

Purwacandra

. Object tracking using initial data to count object image based-on wireless sensor network. Adv Sci Lett 2015; 21(1): 112–116.

42.

Rinner

Esterle

Simonjan

. Self-aware and self-expressive camera networks. Computer 2015; 48(7): 21–28.

43.

Shen

Bai

. Routing in wireless multimedia sensor networks: a survey and challenges ahead. J Netwk Comput Appl 2016; 71: 30–49.