A robust and accurate camera pose determination method based on geometric optimization search using Internet of Things

Abstract

We propose a robust and accurate camera pose determination method based on geometric optimization search using the Internet of Things (IoT). The central idea is to (1) obtain image information through Internet of Things technology, (2) obtain the first pose by minimizing the error function, and (3) use the geometric relationship and constraint condition to obtain the appropriate attitude angles as a new initial value for the next iteration calculation. The features of this method are as follows. First, this method can deal with a large amount of uncertain data, such as in the case of any shooting angle, in the case of any reference point, and in the case of a small number of feature points. Finally, because of using Internet of Things technology, our method can quickly complete data processing and transmission. Compared to state-of-the-art methods, the experimental results show that our approach performs well on both synthetic and real data and can be used to provide accurate and stable data for subsequent applications.

Keywords

Camera pose determination Internet of Things uncertain data pose ambiguity arbitrary initial value

Introduction

With the wide application of image sensors in various fields, a large amount of image information is transmitted through the Internet of Things (IoT). Meanwhile, with the development of image data acquisition and processing techniques, there is an increasing demand for processing uncertain data. Therefore, real-time positioning of the camera has become more and more important. Camera pose determination which is also called pose estimation plays an important role in the field of computer vision^1–3 and photogrammetry.⁴ Meanwhile, the use of visual sensors to estimate robot pose control,⁵ unmanned aerial vehicle (UAV),⁶ augmented reality,⁷ aerospace,⁸ and other aspects has great applications. In practice, all these applications require that the pose determination algorithm has robust solutions in the following cases: at arbitrary camera attitude angles, especially for large angles, a few feature points (n≤5) and planar and non-planar point configurations. In addition, data acquisition needs a wide range of rapid completion, such as distributed acquisition, and data processing needs cloud servers to coordinate processing which can speed up the processing and improve efficiency. Camera pose determination is to restore the pose of a photograph based on a series of spatial reference points and the corresponding image points.

The existing approaches to this problem are generally classified as direct method and iteration method. The direct method is fast because of the linear algorithm. However, it may be unstable due to ignoring some constraints, especially when there is noise in the data. The classical direct method is the direct linear transformation (DLT) proposed by Abdel-Aziz et al.⁹ which is an algorithm that solves a set of variables from a set of equations. Since the orthogonal constraints are neglected, this method requires a large number of known reference points as control points. Hu and Wu¹⁰ obtained an unstable linear solution under the condition of four points. Through exploiting the advantage of data redundancy, Quan and Lan¹¹ proposed a family of linear solutions. Ansar and Daniilidis¹² developed linear solutions for not only n points but also n lines. Hmam and Kim¹³ proposed a global optimization solution using a semi-definite positive relaxation (SDR) program. The main drawback of the algorithm is that the lack of tightness consistently leads to the inappropriateness for real-time applications because of dependence on an off-the-shelf SDR solver. Tang et al.¹⁴ applied five points to obtain a linear solution. Zheng et al.¹⁵ proposed to solve the functional minimization problem by Gröbner basis technique to find the global optimal solution. Lepetit et al.¹⁶ proposed an efficient and accurate method named EPnP (efficient perspective-n-point) with a time complexity of $O (n)$ , which is considered to be the best direct method algorithm. The basic idea is to represent the coordinates of reference points through four virtual control points, the coordinates of which in the camera coordinate system are solved by the information from the image points. However, it is not suitable for the case where reference points are coplanar. In brief, most direct methods are sensitive to data noise due to ignoring the non-linear relationship and lack of constraints, when the number of reference points is insufficient.

When the redundant reference points are not available, higher accuracy can be achieved through an iterative algorithm. Oberkampf et al.,¹⁷ Dementhon and Davis,¹⁸ and Gramegna et al.¹⁹ proposed a POSIT method to solve the camera pose determination by constantly updating the scale factor of orthogonal projection. Most iterative algorithms transform camera pose determination into a non-linear least-squares problem solved by non-linear optimization. Garro et al.²⁰ proposed establishing a minimum residual error function in image space. Besides, an orthogonal iterative (OI) algorithm establishing the error function in the reference point space was proposed by Lu et al.²¹ This method has the advantages of high accuracy and fast convergence speed. But, if the initialization is not appropriate, the result may be easily trapped in a local extremum. It needs a good initial value to converge to the correct solution. Then Schweighofer and Pinz²² proposed an improved method that is only suitable for planar reference point when considering multiple solutions. Olsson et al.²³ presented a branch-and-bound method to search for the global optimum solution. Hartley and Kahl²⁴ used the infinite norm to solve the global optimization. Unfortunately, these algorithms can hardly be used in actual applications because of the large computational cost. As mentioned above, although the accuracy of iterative algorithms is usually better than that of the direct method, they also have some shortcomings: (1) Iterative algorithm usually needs an appropriate initialization to find the correct solution. Otherwise, it may fall into a local extremum. (2) Most existing pose determination methods are not all suitable for coplanar and non-coplanar cases of reference points. (3) The computational cost is relatively high, particularly some robust algorithms,^23,24 which need to traverse the search optimal solution.

Based on the above description and discussion, we propose a robust and accurate camera pose determination method based on geometric optimization search using the IoT. This method uses a non-metric camera to obtain image information and the IoT technology to transfer and process image data. In data processing, using object space collinearity error, the iterative method proposed directly computes the rotation matrix and is suitable for planar as well as non-planar reference point configurations. The central idea contains three main steps: (1) obtain image information through IoT technology; (2) estimate the first pose by minimizing an object space collinearity error as the error function; (3) use geometric relationship and constraint condition to obtain the appropriate attitude angles as a new initial value for the next iteration calculation. Finally, check the final pose through residual error comparison and physical meaning.

The outline of the rest of this article is structured as follows: section “Methodology” introduces the methodology that includes the image acquisition and transmission and the optimal value for pose. In section “Pose ambiguity,” we briefly explain the characteristic geometric relationship between the correct solution and the local extremum of error function. Section “Robust pose determination algorithm” proposes a robust and accurate camera pose determination method. The experimental results on synthetic and real data are presented in section “Experiments.” Section “Conclusion” presents the conclusion.

Methodology

Image acquisition and processing based on IoT

With the development of computer vision and visual measurement, there is an increasing need for real-time camera pose determination. Higher requirements are put forward for speed of image transmission and processing. Therefore, the image acquisition and processing methods are based on the IoT.

The site personnel use non-metric camera to collect image information in a distributed way and then transmit the image information to the mobile phone via Bluetooth and then to the cloud through the 4G network. Data processing personnel obtain image information from the cloud through the PC for data processing, and finally the data that have been processed will be returned to the field staff’s mobile phone. In this way, the pose estimation of the camera can be completed quickly and our method needs to meet the engineering needs that are fast data processing and fast data transfer. The IoT flowchart for image acquisition and processing is shown in Figure 1.

Figure 1.

IoT flowchart for image acquisition and processing.

The camera model

The goal of pose determination is to find the six exterior parameters including three positions $p_{s} = (X_{S}, Y_{S}, Z_{S})^{T}$ and three attitude angles $R = f (α, β, γ)$ of the camera relative to a reference frame through a sequence of space reference points and their corresponding image points.

As shown in Figure 2, assume the coordinate of the image points $t = (t_{x}, t_{y}, 1)^{T}$ in the normalized image plane and a sequence of non-collinear three-dimensional (3D) coordinates of reference points $p_{i} = (X_{i}, Y_{i}, Z_{i})^{T}$ , i = 1, 2, …, n > 3, presented in an object reference frame. The collinearity equation in the photogrammetry can be expressed as follows

t = \frac{q}{q_{(3)}}

(1)

q = R (p - p_{s})

(2)

where q is the projection point, q₍₃₎ is the third component of vector q, and R is the rotation matrix of 3 × 3.

Figure 2.

Object space collinearity error.

Another way to represent collinearity is that the orthogonal projection of q in the direction of t should be equal to itself. Specific expressions are presented using the following equations

q = Vq

(3)

R (p - p_{s}) = VR (p - p_{s})

(4)

where $V = t t^{T} / t^{T} t$ is the projection operator.²⁵

The error function

Because of the collinearity error in equation (4), which is caused by the measurement error, we derive the error function through minimizing the summation of squares of these errors

E = \sum_{i = 1}^{n} {‖ R (p_{i} - p_{s}) - V_{i} q_{i} ‖}^{2} = \sum_{i = 1}^{n} {‖ (I - V_{i}) R (p_{i} - p_{s}) ‖}^{2}

(5)

where i is the number of reference points.

The optimal value for position

Note that the error function is quadratic in position. Given a fixed value of attitude, the optimal position value for the camera can be calculated by the derivation of function (5), which is expressed as follows

\begin{matrix} p_{s} (R) = {[\sum_{i = 1}^{n} R^{T} {(I - V_{i})}^{T} (I - V_{i}) R]}^{- 1} \\ [\sum_{i = 1}^{n} R^{T} {(I - V_{i})}^{T} (I - V_{i}) R p_{i}] \end{matrix}

(6)

which leads to an error function only determined by the rotation R.

The optimal value for attitude

We need to find the best position and attitude through the corresponding reference point 3D coordinates $p_{i}$ and projection points q.

This is a constrained least-squares problem²⁶ that can be solved by singular value decomposition (SVD). First, the data need to be centralized. The following representations are used to process the data

\begin{matrix} {\bar{p}}_{i} = p_{i} - \frac{1}{n} \sum_{i = 1}^{n} p_{i} \\ {\bar{q}}_{i} = q_{i} - \frac{1}{n} \sum_{i = 1}^{n} q_{i} \end{matrix}

(7)

where ${\bar{p}}_{i}$ and $\bar{q}$ are the centralized data.

In this case, the sample cross-covariance matrix for $p_{i}$ and q can be calculated as follows

M = \sum_{i = 1}^{n} {\bar{q}}_{i} {\bar{p}}_{i}^{T}

(8)

According to Horn²⁶ method, if the error function is the minimum, then the rotation matrix should satisfy

R^{*} = \arg \max_{R} tr (R^{T} M)

(9)

The SVD calculation of M is $U^{T} MV = \sum$ . Then the optimal value for the rotation matrix can be expressed as

R^{*} = U V^{T}

(10)

Then the rotation matrix is transformed into Euler angle with geometric meaning. Euler angles used to represent the attitude of an object relative to a reference frame are three angles, namely, $α, β, γ$ , that are rotated sequentially through the Y-, X-, and Z-axis to form a rotation matrix which can be expressed as follows

\begin{matrix} α = \arctan (\frac{- R_{13}}{R_{33}}) \\ β = \arcsin (- R_{23}) \\ γ = \arctan (\frac{R_{21}}{R_{22}}) \end{matrix}

(11)

Pose ambiguity

Geometrical explanation

We use a geometric relation to illustrate that the local minimum of the error function leads to the singular value of pose estimation. Figure 3 shows the camera at three different locations and the corresponding image planes and coplanar reference point model. It is assumed that the coplanar model center is C_M which corresponds to the origin of the object reference coordinate system. We can obtain the pose C₁ of the camera by minimizing the error function when the camera rotates clockwise from point C around the X-axis to point C₁. However, it can be seen from Figure 3 that C₂ rotating counterclockwise around the X-axis may also satisfy the minimum error function, which results in the singularity of pose estimation. Therefore, inadequate error function constraints and non-linearity lead to multiple solutions of pose estimation.

Figure 3.

Multiple solutions of pose estimation using geometric relation.

Then, change only one angle and assign values to the other parameters. The reference points are set to $p_{1} = [1, 1, 0]^{T}$ , $p_{2} = [- 1, 1, 0]^{T}$ , $p_{3} = [- 1, - 1, 0]^{T}$ , $p_{4} = [1, - 1, 0]^{T}$ , and the camera’s position is $p_{s} = (0, 0, 5)^{T}$ . Figure 4 displays the change of error function under different angles rotating around the Y-axis and zero noise. The local minima of the error function at −34.38°, –42.56°, –51.49°, and −60.82° are found, when the rotation angles are α = 40°, 50°, 60°, and 70°, respectively. However, when the rotation angles are less than 38.6°, there is no local minimum for the error function.

Figure 4.

Error function values of different rotation angles.

The error function values in the X- and Y-axis directions are shown in Figure 5, when the attitude angle rotates around the Z-axis fixed at 20° and the attitude angles $α$ and $β$ are small. Figure 6 shows the distribution of error function values when the angles are large. We could see that when the attitude angle is small (α = 20°, β = 10°), (α = 30°, β = 25°), the error function has only one minimum and no local minimum. However, when the attitude angle increases (α = 60°, β = 40°) and (α = 70°, β = 50°), the error function has a local minimum at (α = –56.62°, β = –38.58°), (α = –68.76°, β = –46.02°), which may lead to the failure of pose estimation.

Figure 5.

Distribution of error function under small attitude angles: (a) α = 20°, β = 10°; (b) α = 30°, β = 25°.

Figure 6.

Distribution of error function under large attitude angles: (a) α = 60°, β = 40°; (b) α = 70°, β = 50°.

Numerical calculation

There are at most four solutions to the problem of camera pose determination.^27,28 In previous studies, the distance between the camera center and the reference point is constructed by polynomials, the solutions of which are used to judge the solution of pose determination.

However, due to the complex judgment criteria, this method in fact is very difficult to engineering practice. A new method is proposed to find numerically stable algorithms that rapidly converge to the correct solution through attitude angles with definite geometric characteristics in this article. The stationary point of the error function can be found by the interval splitting approach.²⁹ First, the camera’s three attitude angles for intervals in the range of 0°–90°. Then the stationary points of error functions are estimated by Jacobian and Bernstein expansions. Because the rotation matrix represented by Euler angle has a triangular function, in order to simplify the operation, Euler angle is transformed into a matrix in the form of quaternion. The expression is as follows²⁶

R = [\begin{matrix} {r_{0}}^{2} + {r_{1}}^{2} - {r_{2}}^{2} - {r_{3}}^{2} & 2 (r_{1} r_{2} - r_{0} r_{3}) & 2 (r_{1} r_{3} + r_{0} r_{2}) \\ 2 (r_{1} r_{2} + r_{0} r_{3}) & {r_{0}}^{2} - {r_{1}}^{2} + {r_{2}}^{2} - {r_{3}}^{2} & 2 (r_{2} r_{3} - r_{0} r_{1}) \\ 2 (r_{3} r_{1} - r_{0} r_{2}) & 2 (r_{2} r_{3} + r_{0} r_{1}) & {r_{0}}^{2} - {r_{1}}^{2} - {r_{2}}^{2} + {r_{3}}^{2} \end{matrix}]

(12)

where $r_{0}, r_{1}, r_{2}, r_{3}$ is the quaternion used to rotate the matrix.

If the expression (6) is introduced into the error function, then the error function contains only four variables $r_{0}, r_{1}, r_{2}, r_{3}$ . The stationary points of the error function that may be saddle point, maximum, or minimum are computed by the Hessian matrix which is expressed as follows

H_{q 0, q 1, q 2, q 3} = [\begin{matrix} \begin{matrix} E_{q 0, q 0} & E_{q 0, q 1} \\ E_{q 1, q 0} & E_{q 1, q 1} \end{matrix} & \begin{matrix} E_{q 0, q 2} & E_{q 0, q 3} \\ E_{q 1, q 2} & E_{q 1, q 3} \end{matrix} \\ \begin{matrix} E_{q 2, q 0} & E_{q 2, q 1} \\ E_{q 3, q 0} & E_{q 3, q 1} \end{matrix} & \begin{matrix} E_{q 2, q 2} & E_{q 2, q 3} \\ E_{q 3, q 2} & E_{q 3, q 3} \end{matrix} \end{matrix}]

(13)

When all four eigenvalues of the Hessian matrix are positive, the corresponding stationary point is the minimum.

These eigenvalues can be calculated by the roots of the characteristic polynomial $\det (H_{q 0, q 1, q 2, q 3} - λ I) = 0$ . All four eigenvalues of the Hessian matrix are positive, which can be able to determine if the stationary point is a minimum. Then camera attitudes with quaternion are transformed into Euler angle with geometric meaning. We find the following numerical rules:

A correct attitude angle has a lower error function value than the local extremum.

The error function has no local minimum when the attitude angles are small. However, when the attitude angle increases, the error function has a local minimum which may cause the singularity of pose determination.

If the local minimum of the error function exists, the absolute value of the camera attitude angle corresponding to the local minimum is approximately equal to the absolute value of the correct solution.

Robust pose determination algorithm

From the analysis of the previous section, when the attitude angle increases, the solution may tend to trap in an unsuitable local minimum, which may lead to the failure of pose determination and affect subsequent applications. In order to avoid this situation, the pose determination method should be able to find the global minimum and avoid local minimum.

Based on the above description and discussion, a robust and accurate pose determination method using geometric optimization search is proposed. First, the first solution of the camera poses is obtained by minimizing the error function. At this point, it may converge to a local minimum. The correctness of the first solution is verified by introducing an error function. In theory, the value of the error function should be zero. However, the value cannot be zero because of the measurement error. Thus, we can set an appropriate threshold to determine whether the solution is a global minimum or a local minimum. If the error function value of the corresponding camera pose is less than the threshold value and meets the physical meaning, the solution at this time is considered to be the correct solution. Otherwise, the solution is the local minimum value.

According to the numerical rules we found in the last section, if the local minimum of the error function exists, the absolute value of the angle corresponding to the local minimum is approximately equal to the absolute value of the correct value. Besides, the attitude angle revolving around the Z-axis is only revolving around the optical axis, which does not affect the relationship between the image plane and the spatial reference point and only affects the coordinates of the image points, so it can be ignored.

Now, we explain how the algorithm avoids the local minimum and obtains the stable solution:

First, we impose an additional constraint expressed as $\sum_{i}^{n} V_{i} R (p - p_{s}) \geq 0$ which ensures that all the reference points are in front of the camera lens.

Then three column vectors of the matrix $G = [\begin{matrix} - α_{1} & - α_{1} & α_{1} \\ - β_{1} & β_{1} & - β_{1} \\ γ & γ & γ \end{matrix}]$ are used as the attitude angles to construct three rotation matrices R and bring in the error function. One of three column vectors of matrix G that satisfies the minimum of the error function and the constraint condition will be found as the initial value for the next iteration calculation.

Second, we check the final pose through residual error comparison and physical meaning. Figure 7 presents the flow of the proposed algorithm for pose determination.

Figure 7.

The flowchart of the proposed method.

Experiments

Experiments using simulated data

In order to verify the accuracy and robustness of our proposed algorithm, we compare it with POSIT,¹⁷ RPnP (robust perspective-n-point),²² and SDR¹³ in the case of coplanar reference points and EPnP,¹⁶ PPnP (Procrustes perspective-n-point),²⁰ and OI algorithms²¹ in the case of non-coplanar reference points based on the computer simulation experiment.

We used the following settings before the experiment:

For ease of calculation, it is assumed that the intrinsic parameter matrix of the camera is a unit matrix.

The non-coplanar and coplanar reference points are distributed uniformly in the range $[- 1, 1] \times [- 1, 1] \times [0, 2]$ , $[- 1, 1] \times [- 1, 1] \times [0, 0]$ in the object space.

When the coordinates of spatial reference points are known, the camera’s position $p_{s, true}$ and attitude $R_{true}$ are generated randomly, and then the coordinates $t = (t_{x}, t_{y}, 1)^{T}$ of image points are calculated by collinearity equation.

Multiple levels of Gaussian noise would be added to the image points and then 1000 test data sets were generated for each noise level.

At this time, the camera’s position $p_{s}$ and attitude R are calculated using a variety of pose estimation algorithms.

In order to evaluate the accuracy of various algorithms, the error evaluation of position is $e_{position} (%) = ‖ p_{s, true} - p_{s} ‖ / ‖ p_{s} ‖ \times 100$ and the attitude is $e_{a t t i t u d e} (d e g r e e s) = \max_{k = 1}^{3} a \cos (d o t (r_{t u r e}^{k}, r^{r})) \times 180 / π$ where $dot (\cdot, \cdot)$ and $a \cos (\cdot)$ indicate the dot product and arccosine operation and $r_{true}^{k}$ and $r^{r}$ are the kth column of $R_{true}$ and R, respectively.

The graphs in Figures 8 and 9 show the mean error of pose as a function of the number of reference points in the non-coplanar and coplanar cases. The Gaussian noise is fixed at 2. As shown in Figure 8, the accuracy of our algorithm is significantly better than that of the other algorithms^16,20,21 under non-coplanar conditions, especially when the number of reference points is small. As shown in Figure 9, our algorithm performs slightly better than RPNP,²² but obviously better than the others.^13,17 The reason is that the local minimum results in multiple solutions which lead to the reduction of attitude estimation accuracy. Especially when the number of references is small, it could cause insufficient constraints that cause the other algorithms to fall into local minimum. Besides, RPNP²² is only suitable for the coplanar cases, although the accuracy is relatively high.

Figure 8.

Results of point tests (non-coplanar): (a) position error and (b) attitude error.

Figure 9.

Results of point tests (coplanar): (a) position error and (b) attitude error.

Figures 10 and 11 analyze the mean error as a function of image pixel errors in the non-coplanar and coplanar cases, respectively. The reference point number is maintained at 5. As illustrated in Figure 10, the accuracy and stability of the proposed method are better than those of the others, especially when the noise level exceeds one pixel. Furthermore, our method is insensitive to noise. As the noise level increases, the errors rise quite slowly. As shown in Figure 11, the results of the proposed method are slightly better than those of RPNP, but obviously better than those of the others. Generally speaking, the performance of the iterative algorithm is better than that of the direct method for noise test. This phenomenon can be explained as follows: due to insufficient constraints, the numerical value of the direct method cannot be updated like that of the iterative algorithm, so the direct methods are sensitive to data noise. Our method is an iterative algorithm with a strong anti-noise ability. It could obtain the first solution of the camera pose by minimizing the error function. Then the most appropriate attitude angle is found by geometric relationship and physical constraints to update the initial value, so as to ensure the accuracy and stability of the calculation.

Figure 10.

Results of noise tests (non-coplanar): (a) position error and (b) attitude error.

Figure 11.

Results of noise tests (coplanar): (a) position error and (b) attitude error.

In the simulation experiments, our algorithm maintains high accuracy and stability under various conditions that are planar as well as non-planar point configurations, different noise levels, and different number of reference points.

In order to test the correctness rates of various algorithms for pose determination at different attitude angles, our experimental setup is to change only the attitude angle $α$ which ranges from 0° to 80° in the reference point coplanar case. When the attitude angle $α$ is at each angle, the camera position and the other two attitude angles change randomly to generate 1000 models. Then the accuracy of various algorithms^12,17,22 under these model conditions is compared. Figure 12 shows the rates of correct pose of the four algorithms as the angle $α$ increases. The noise level was fixed at 2 pixels in the test process. As we mentioned in the previous section, there is no local minimum when the attitude angle is small. Therefore, the correctness rates of the four algorithms for pose determination within 0°–10° are above 92%.

Figure 12.

Correctness rate at different attitude angles.

However, as the attitude angle $α$ increases, the Ansar and POSIT algorithms become much more sensitive to the angle. Especially, the rate of correct pose of the Ansar algorithm tumbled to 50%. The rate of our algorithm remained up to above 90% in calculating the solution correctly for almost all angles $α$ except from 35° to 43°. Our algorithm and the RPNP algorithm have good stability for large angles. Thus, the proposed method has high robustness at almost all attitude angles and slightly better than the RPNP algorithm, but significantly outperforms all the other algorithms. This is very important for subsequent applications.

Experiments using real image

To further verify the effectiveness of the proposed algorithm, several sets of real image experiments were completed. The test camera we used to take the four images shown in Figure 13 is Canon 5D Mark III type. The camera parameters used in the test are shown in Table 1.

Figure 13.

Real images: (a) camera pose 1, (b) camera pose 2, (c) camera pose 3, and (d) camera pose 4.

Table 1.

The camera parameters.

Resolution	5616 × 3744
Pixel dimension	6.25 μm

The camera calibration has been accomplished through MATLAB Camera Calibrator toolbox provided by Heikkilä.³⁰

As shown in Figure 13, the calibration block is a cube. Each image can see three planes of the box. Each plane has 49 black-and-white intersections, a total of 147 points. In order to verify the applicability of the algorithm in the case of fewer reference points, the number of reference points is 4 and four reference points are randomly selected from the 147 points. A total of 100 reference points were randomly selected for each image. In addition, adding the Gaussian noise with a standard deviation of [0, 1] and a step size of 0.5 to the coordinates of image points, 1000 independent random experiments are carried out for each noise level parameter. The intersection point of the vertical surface of the calibration block is the original center of the world coordinate system. The three vertical sides are the X-, Y-, and Z-axis of the world coordinate system. According to the structural relationship of the calibration block, the coordinates of the reference points in the world coordinate system can be obtained. The real image is corrected using the distortion factor and then using a variety of algorithms to calculate the pose of the camera.

Since the real pose of the camera cannot be obtained, the accuracy of the algorithm is indirectly measured using the square root of the reprojection error. Figure 14 shows the results of various algorithms.

Figure 14.

The results of experiments on real data.

As illustrated in Figure 14, compared with the other algorithms, the proposed algorithm has higher accuracy and stability with few reference points and different poses. The experimental results verify our method’s accuracy and robustness and it possesses stronger applicability and generality.

Conclusion

In our study, we described a robust and accurate pose determination algorithm based on geometric characteristic search using the IoT. This method uses a non-metric camera to obtain image information and uses the IoT technology to transfer and deal with uncertain data. In data processing, using object space collinearity error, we derive an iterative algorithm by absolute orientation and orthogonal projection, which directly compute the rotation matrix. In the process of solving, we also consider the phenomenon of multiple solutions to this problem. Through geometric interpretation and numerical calculation, we elaborate the characteristic geometric relationship between the correct solution and the local extremum of the error function. Then we proposed a fast and convenient way to solve the problem of multiple solutions, which can provide accurate and stable data for subsequent applications.

Our algorithm’s main features are follows:

Our method can quickly complete data processing and transmission because of using the IoT technology.

It can handle the arbitrary attitude of the camera, including large angles.

This algorithm can be used to planar as well as non-planar point configurations and handle all the feature points (n ≥ 4) and the computational complexity of the proposed method is O(n).

The proposed algorithm is compared with the state-of-the-art methods on simulated data as well as real image data. Results of the experiments show that the accuracy and stability have improved and no severe pose jumps occurred. The proposed algorithm can be applied in a lot of relevant applications in AR^7,31 and UAV.^6,32 Thus, further research will be valuable.

Footnotes

Acknowledgements

The authors express their thanks to the editor and anonymous reviewers for their help in revising the paper. Z.Z. built the mathematical model and developed the general methodological framework. Q.W. performed the research, analyzed the data, discussed the results, and wrote the article. All authors have read and approved the final manuscript.

Handling Editor: Behrouz Jedari

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors express their thanks to the National Research and Development Project of the People’s Republic of China (grant nos 2016YFF0203105-3 and 2017YFC0805703) for their support.

ORCID iD

Qi Wang

References

Hartley

Zisserman

. Multiple view geometry in computer vision. 2nd ed. Cambridge: Cambridge University Press, 2003.

Forsyth

Ponce

. Computer vision: a modern approach. Washington, DC: Electronics Industry Press, 2012.

Urban

Wursthorn

Leitloff

et al . MultiCol bundle adjustment: a generic method for pose estimation, simultaneous self-calibration and reconstruction for arbitrary multi-camera systems. Int J Comput Vision 2017; 121(2): 234–252.

McGlone

. Manual of photogrammetry. 6th ed. Bethesda, MD: ASPRS Press, 2013.

Kim

. Dynamic ultrasonic hybrid localization system for indoor mobile robots. IEEE Trans Indus Electron 2013; 60(10): 4562–4573.

Michal

Paulina

. Fast orientation of video images of buildings acquired from a UAV without stabilization. Sensors 2016; 16(7): E951.

Schall

Wagner

Reitmayr

et al . Global pose estimation using multi-sensor fusion for outdoor augmented reality. In: International symposium on mixed and augmented reality, Orlando, FL, 19–22 October 2009, pp.153–162. New York: IEEE.

Zhang

Jiang

Elgammal

. Vision-based pose estimation for cooperative space objects. Acta Astronaut 2013; 91(10): 115–122.

Abdel-Aziz

Karara

Hauck

. Direct linear transformation from comparator coordinates into object space coordinates in close-range photogrammetry. Photogram Eng Remote Sens 2015; 81(2): 103–107.

10.

. A note on the number of solutions of the noncoplanar P4P problem. IEEE Trans Pattern Anal Mach Intell 2002; 24(4): 550–555.

11.

Quan

Lan

. Linear n-point camera pose determination. IEEE Trans Pattern Anal Mach Intell 1999; 21(8): 774–780.

12.

Ansar

Daniilidis

. Linear pose estimation from points or lines. IEEE Trans Pattern Anal Mach Intell 2003; 25(5): 578–589.

13.

Hmam

Kim

. Optimal non-iterative pose estimation via convex relaxation. Image Vision Comput 2010; 28(11): 1515–1523.

14.

Tang

Chen

Wang

. A novel linear algorithm for P5P problem. Appl Math Comput 2008; 205(2): 628–634.

15.

Zheng

Kuang

Sugimoto

et al . Revisiting the PnP problem: a fast, general and optimal solution. In: International conference on computer vision, Sydney, NSW, Australia, 1–8 December 2013, pp.2344–2351. New York: IEEE.

16.

Lepetit

Moreno-Noguer

Fua

. EPnP: an accurate O(n) solution to the PnP problem. Int J Comput Vision 2009; 81(2): 155–166.

17.

Oberkampf

Dementhon

Davis

. Iterative pose estimation using coplanar points. In: Proceedings of IEEE conference on computer vision and pattern recognition, New York, 15–17 June 1993, pp.626–627. New York: IEEE.

18.

Dementhon

Davis

. Model-based object pose in 25 lines of code. Int J Comput Vision 1995; 15(1–2): 123–141.

19.

Gramegna

Venturino

Cicirelli

et al . Optimization of the posit algorithm for indoor autonomous navigation. Robot Autonom Syst 2004; 48(2): 145–162.

20.

Garro

Crosilla

Fusiello

. Solving the PnP problem with anisotropic orthogonal Procrustes analysis. In: Proceedings of the international conference on 3D imaging, modeling, processing, visualization & transmission, Zurich, 13–15 October 2012, pp.262–269. New York: IEEE.

21.

Hager

Mjolsness

. Fast and globally convergent pose estimation from video images. IEEE Trans Pattern Anal Mach Intell 2000; 22(6): 610–622.

22.

Schweighofer

Pinz

. Robust pose estimation from a planar target. IEEE Trans Pattern Anal Mach Intell 2006; 28(12): 2024–2030.

23.

Olsson

Kahl

Oskarsson

. Branch-and-bound methods for Euclidean registration problems. IEEE Trans Pattern Anal Mach Intell 2009; 31(5): 783–794.

24.

Hartley

Kahl

. Global optimization through rotation space search. Int J Comput Vision 2009; 82(1): 64–79.

25.

Banerjee

Roy

. Linear algebra and matrix analysis for statistics. Boca Raton, FL: CRC Press, 2014.

26.

Horn

BKP

. Closed-form solution of absolute orientation using unit quaternions. J Opt Soc Am A 1987; 5(7): 1127–1135.

27.

Gao

Hou

Tang

et al . Complete solution classification for the perspective-three-point problem. IEEE Trans Pattern Anal Mach Intell 2003; 25(8): 930–943.

28.

Vynnycky

Kanev

. Mathematical analysis of the multisolution phenomenon in the P3P problem. J Math Imag Vision 2015; 51(2): 326–337.

29.

Zettler

Garloff

. Robustness analysis of polynomials with polynomial parameter dependency using Bernstein expansion. IEEE Trans Automat Control 1998; 43(3): 425–431.

30.

Heikkilä

. Geometric camera calibration using circular control points. IEEE Trans Pattern Anal Mach Intell 2000; 22(10): 1066–1077.

31.

Fan

Gausemeier

et al . Virtual reality & augmented reality in industry. Shanghai, China: Shanghai Jiao Tong University Press, 2011.

32.

Mozhdeh

Gunho

JérMe

et al . Development and evaluation of a UAV-photogrammetry system for precise 3D environmental modeling. Sensors 2015; 15(11): 27493–27524.