Abstract
We propose a robust and accurate camera pose determination method based on geometric optimization search using the Internet of Things (IoT). The central idea is to (1) obtain image information through Internet of Things technology, (2) obtain the first pose by minimizing the error function, and (3) use the geometric relationship and constraint condition to obtain the appropriate attitude angles as a new initial value for the next iteration calculation. The features of this method are as follows. First, this method can deal with a large amount of uncertain data, such as in the case of any shooting angle, in the case of any reference point, and in the case of a small number of feature points. Finally, because of using Internet of Things technology, our method can quickly complete data processing and transmission. Compared to state-of-the-art methods, the experimental results show that our approach performs well on both synthetic and real data and can be used to provide accurate and stable data for subsequent applications.
Keywords
Introduction
With the wide application of image sensors in various fields, a large amount of image information is transmitted through the Internet of Things (IoT). Meanwhile, with the development of image data acquisition and processing techniques, there is an increasing demand for processing uncertain data. Therefore, real-time positioning of the camera has become more and more important. Camera pose determination which is also called pose estimation plays an important role in the field of computer vision1–3 and photogrammetry. 4 Meanwhile, the use of visual sensors to estimate robot pose control, 5 unmanned aerial vehicle (UAV), 6 augmented reality, 7 aerospace, 8 and other aspects has great applications. In practice, all these applications require that the pose determination algorithm has robust solutions in the following cases: at arbitrary camera attitude angles, especially for large angles, a few feature points (n≤5) and planar and non-planar point configurations. In addition, data acquisition needs a wide range of rapid completion, such as distributed acquisition, and data processing needs cloud servers to coordinate processing which can speed up the processing and improve efficiency. Camera pose determination is to restore the pose of a photograph based on a series of spatial reference points and the corresponding image points.
The existing approaches to this problem are generally classified as direct method and iteration method. The direct method is fast because of the linear algorithm. However, it may be unstable due to ignoring some constraints, especially when there is noise in the data. The classical direct method is the direct linear transformation (DLT) proposed by Abdel-Aziz et al.
9
which is an algorithm that solves a set of variables from a set of equations. Since the orthogonal constraints are neglected, this method requires a large number of known reference points as control points. Hu and Wu
10
obtained an unstable linear solution under the condition of four points. Through exploiting the advantage of data redundancy, Quan and Lan
11
proposed a family of linear solutions. Ansar and Daniilidis
12
developed linear solutions for not only n points but also n lines. Hmam and Kim
13
proposed a global optimization solution using a semi-definite positive relaxation (SDR) program. The main drawback of the algorithm is that the lack of tightness consistently leads to the inappropriateness for real-time applications because of dependence on an off-the-shelf SDR solver. Tang et al.
14
applied five points to obtain a linear solution. Zheng et al.
15
proposed to solve the functional minimization problem by Gröbner basis technique to find the global optimal solution. Lepetit et al.
16
proposed an efficient and accurate method named EPnP (efficient perspective-n-point) with a time complexity of
When the redundant reference points are not available, higher accuracy can be achieved through an iterative algorithm. Oberkampf et al., 17 Dementhon and Davis, 18 and Gramegna et al. 19 proposed a POSIT method to solve the camera pose determination by constantly updating the scale factor of orthogonal projection. Most iterative algorithms transform camera pose determination into a non-linear least-squares problem solved by non-linear optimization. Garro et al. 20 proposed establishing a minimum residual error function in image space. Besides, an orthogonal iterative (OI) algorithm establishing the error function in the reference point space was proposed by Lu et al. 21 This method has the advantages of high accuracy and fast convergence speed. But, if the initialization is not appropriate, the result may be easily trapped in a local extremum. It needs a good initial value to converge to the correct solution. Then Schweighofer and Pinz 22 proposed an improved method that is only suitable for planar reference point when considering multiple solutions. Olsson et al. 23 presented a branch-and-bound method to search for the global optimum solution. Hartley and Kahl 24 used the infinite norm to solve the global optimization. Unfortunately, these algorithms can hardly be used in actual applications because of the large computational cost. As mentioned above, although the accuracy of iterative algorithms is usually better than that of the direct method, they also have some shortcomings: (1) Iterative algorithm usually needs an appropriate initialization to find the correct solution. Otherwise, it may fall into a local extremum. (2) Most existing pose determination methods are not all suitable for coplanar and non-coplanar cases of reference points. (3) The computational cost is relatively high, particularly some robust algorithms,23,24 which need to traverse the search optimal solution.
Based on the above description and discussion, we propose a robust and accurate camera pose determination method based on geometric optimization search using the IoT. This method uses a non-metric camera to obtain image information and the IoT technology to transfer and process image data. In data processing, using object space collinearity error, the iterative method proposed directly computes the rotation matrix and is suitable for planar as well as non-planar reference point configurations. The central idea contains three main steps: (1) obtain image information through IoT technology; (2) estimate the first pose by minimizing an object space collinearity error as the error function; (3) use geometric relationship and constraint condition to obtain the appropriate attitude angles as a new initial value for the next iteration calculation. Finally, check the final pose through residual error comparison and physical meaning.
The outline of the rest of this article is structured as follows: section “Methodology” introduces the methodology that includes the image acquisition and transmission and the optimal value for pose. In section “Pose ambiguity,” we briefly explain the characteristic geometric relationship between the correct solution and the local extremum of error function. Section “Robust pose determination algorithm” proposes a robust and accurate camera pose determination method. The experimental results on synthetic and real data are presented in section “Experiments.” Section “Conclusion” presents the conclusion.
Methodology
Image acquisition and processing based on IoT
With the development of computer vision and visual measurement, there is an increasing need for real-time camera pose determination. Higher requirements are put forward for speed of image transmission and processing. Therefore, the image acquisition and processing methods are based on the IoT.
The site personnel use non-metric camera to collect image information in a distributed way and then transmit the image information to the mobile phone via Bluetooth and then to the cloud through the 4G network. Data processing personnel obtain image information from the cloud through the PC for data processing, and finally the data that have been processed will be returned to the field staff’s mobile phone. In this way, the pose estimation of the camera can be completed quickly and our method needs to meet the engineering needs that are fast data processing and fast data transfer. The IoT flowchart for image acquisition and processing is shown in Figure 1.

IoT flowchart for image acquisition and processing.
The camera model
The goal of pose determination is to find the six exterior parameters including three positions
As shown in Figure 2, assume the coordinate of the image points
where q is the projection point, q(3) is the third component of vector q, and R is the rotation matrix of 3 × 3.

Object space collinearity error.
Another way to represent collinearity is that the orthogonal projection of q in the direction of t should be equal to itself. Specific expressions are presented using the following equations
where
The error function
Because of the collinearity error in equation (4), which is caused by the measurement error, we derive the error function through minimizing the summation of squares of these errors
where i is the number of reference points.
The optimal value for position
Note that the error function is quadratic in position. Given a fixed value of attitude, the optimal position value for the camera can be calculated by the derivation of function (5), which is expressed as follows
which leads to an error function only determined by the rotation R.
The optimal value for attitude
We need to find the best position and attitude through the corresponding reference point 3D coordinates
This is a constrained least-squares problem 26 that can be solved by singular value decomposition (SVD). First, the data need to be centralized. The following representations are used to process the data
where
In this case, the sample cross-covariance matrix for
According to Horn 26 method, if the error function is the minimum, then the rotation matrix should satisfy
The SVD calculation of M is
Then the rotation matrix is transformed into Euler angle with geometric meaning. Euler angles used to represent the attitude of an object relative to a reference frame are three angles, namely,
Pose ambiguity
Geometrical explanation
We use a geometric relation to illustrate that the local minimum of the error function leads to the singular value of pose estimation. Figure 3 shows the camera at three different locations and the corresponding image planes and coplanar reference point model. It is assumed that the coplanar model center is CM which corresponds to the origin of the object reference coordinate system. We can obtain the pose C1 of the camera by minimizing the error function when the camera rotates clockwise from point C around the X-axis to point C1. However, it can be seen from Figure 3 that C2 rotating counterclockwise around the X-axis may also satisfy the minimum error function, which results in the singularity of pose estimation. Therefore, inadequate error function constraints and non-linearity lead to multiple solutions of pose estimation.

Multiple solutions of pose estimation using geometric relation.
Then, change only one angle and assign values to the other parameters. The reference points are set to

Error function values of different rotation angles.
The error function values in the X- and Y-axis directions are shown in Figure 5, when the attitude angle rotates around the Z-axis fixed at 20° and the attitude angles

Distribution of error function under small attitude angles: (a) α = 20°, β = 10°; (b) α = 30°, β = 25°.

Distribution of error function under large attitude angles: (a) α = 60°, β = 40°; (b) α = 70°, β = 50°.
Numerical calculation
There are at most four solutions to the problem of camera pose determination.27,28 In previous studies, the distance between the camera center and the reference point is constructed by polynomials, the solutions of which are used to judge the solution of pose determination.
However, due to the complex judgment criteria, this method in fact is very difficult to engineering practice. A new method is proposed to find numerically stable algorithms that rapidly converge to the correct solution through attitude angles with definite geometric characteristics in this article. The stationary point of the error function can be found by the interval splitting approach. 29 First, the camera’s three attitude angles for intervals in the range of 0°–90°. Then the stationary points of error functions are estimated by Jacobian and Bernstein expansions. Because the rotation matrix represented by Euler angle has a triangular function, in order to simplify the operation, Euler angle is transformed into a matrix in the form of quaternion. The expression is as follows 26
where
If the expression (6) is introduced into the error function, then the error function contains only four variables
When all four eigenvalues of the Hessian matrix are positive, the corresponding stationary point is the minimum.
These eigenvalues can be calculated by the roots of the characteristic polynomial
A correct attitude angle has a lower error function value than the local extremum.
The error function has no local minimum when the attitude angles are small. However, when the attitude angle increases, the error function has a local minimum which may cause the singularity of pose determination.
If the local minimum of the error function exists, the absolute value of the camera attitude angle corresponding to the local minimum is approximately equal to the absolute value of the correct solution.
Robust pose determination algorithm
From the analysis of the previous section, when the attitude angle increases, the solution may tend to trap in an unsuitable local minimum, which may lead to the failure of pose determination and affect subsequent applications. In order to avoid this situation, the pose determination method should be able to find the global minimum and avoid local minimum.
Based on the above description and discussion, a robust and accurate pose determination method using geometric optimization search is proposed. First, the first solution of the camera poses is obtained by minimizing the error function. At this point, it may converge to a local minimum. The correctness of the first solution is verified by introducing an error function. In theory, the value of the error function should be zero. However, the value cannot be zero because of the measurement error. Thus, we can set an appropriate threshold to determine whether the solution is a global minimum or a local minimum. If the error function value of the corresponding camera pose is less than the threshold value and meets the physical meaning, the solution at this time is considered to be the correct solution. Otherwise, the solution is the local minimum value.
According to the numerical rules we found in the last section, if the local minimum of the error function exists, the absolute value of the angle corresponding to the local minimum is approximately equal to the absolute value of the correct value. Besides, the attitude angle revolving around the Z-axis is only revolving around the optical axis, which does not affect the relationship between the image plane and the spatial reference point and only affects the coordinates of the image points, so it can be ignored.
Now, we explain how the algorithm avoids the local minimum and obtains the stable solution:
First, we impose an additional constraint expressed as
Then three column vectors of the matrix
Second, we check the final pose through residual error comparison and physical meaning. Figure 7 presents the flow of the proposed algorithm for pose determination.

The flowchart of the proposed method.
Experiments
Experiments using simulated data
In order to verify the accuracy and robustness of our proposed algorithm, we compare it with POSIT, 17 RPnP (robust perspective-n-point), 22 and SDR 13 in the case of coplanar reference points and EPnP, 16 PPnP (Procrustes perspective-n-point), 20 and OI algorithms 21 in the case of non-coplanar reference points based on the computer simulation experiment.
We used the following settings before the experiment:
For ease of calculation, it is assumed that the intrinsic parameter matrix of the camera is a unit matrix.
The non-coplanar and coplanar reference points are distributed uniformly in the range
When the coordinates of spatial reference points are known, the camera’s position
Multiple levels of Gaussian noise would be added to the image points and then 1000 test data sets were generated for each noise level.
At this time, the camera’s position
In order to evaluate the accuracy of various algorithms, the error evaluation of position is
The graphs in Figures 8 and 9 show the mean error of pose as a function of the number of reference points in the non-coplanar and coplanar cases. The Gaussian noise is fixed at 2. As shown in Figure 8, the accuracy of our algorithm is significantly better than that of the other algorithms16,20,21 under non-coplanar conditions, especially when the number of reference points is small. As shown in Figure 9, our algorithm performs slightly better than RPNP, 22 but obviously better than the others.13,17 The reason is that the local minimum results in multiple solutions which lead to the reduction of attitude estimation accuracy. Especially when the number of references is small, it could cause insufficient constraints that cause the other algorithms to fall into local minimum. Besides, RPNP 22 is only suitable for the coplanar cases, although the accuracy is relatively high.

Results of point tests (non-coplanar): (a) position error and (b) attitude error.

Results of point tests (coplanar): (a) position error and (b) attitude error.
Figures 10 and 11 analyze the mean error as a function of image pixel errors in the non-coplanar and coplanar cases, respectively. The reference point number is maintained at 5. As illustrated in Figure 10, the accuracy and stability of the proposed method are better than those of the others, especially when the noise level exceeds one pixel. Furthermore, our method is insensitive to noise. As the noise level increases, the errors rise quite slowly. As shown in Figure 11, the results of the proposed method are slightly better than those of RPNP, but obviously better than those of the others. Generally speaking, the performance of the iterative algorithm is better than that of the direct method for noise test. This phenomenon can be explained as follows: due to insufficient constraints, the numerical value of the direct method cannot be updated like that of the iterative algorithm, so the direct methods are sensitive to data noise. Our method is an iterative algorithm with a strong anti-noise ability. It could obtain the first solution of the camera pose by minimizing the error function. Then the most appropriate attitude angle is found by geometric relationship and physical constraints to update the initial value, so as to ensure the accuracy and stability of the calculation.

Results of noise tests (non-coplanar): (a) position error and (b) attitude error.

Results of noise tests (coplanar): (a) position error and (b) attitude error.
In the simulation experiments, our algorithm maintains high accuracy and stability under various conditions that are planar as well as non-planar point configurations, different noise levels, and different number of reference points.
In order to test the correctness rates of various algorithms for pose determination at different attitude angles, our experimental setup is to change only the attitude angle

Correctness rate at different attitude angles.
However, as the attitude angle
Experiments using real image
To further verify the effectiveness of the proposed algorithm, several sets of real image experiments were completed. The test camera we used to take the four images shown in Figure 13 is Canon 5D Mark III type. The camera parameters used in the test are shown in Table 1.

Real images: (a) camera pose 1, (b) camera pose 2, (c) camera pose 3, and (d) camera pose 4.
The camera parameters.
The camera calibration has been accomplished through MATLAB Camera Calibrator toolbox provided by Heikkilä. 30
As shown in Figure 13, the calibration block is a cube. Each image can see three planes of the box. Each plane has 49 black-and-white intersections, a total of 147 points. In order to verify the applicability of the algorithm in the case of fewer reference points, the number of reference points is 4 and four reference points are randomly selected from the 147 points. A total of 100 reference points were randomly selected for each image. In addition, adding the Gaussian noise with a standard deviation of [0, 1] and a step size of 0.5 to the coordinates of image points, 1000 independent random experiments are carried out for each noise level parameter. The intersection point of the vertical surface of the calibration block is the original center of the world coordinate system. The three vertical sides are the X-, Y-, and Z-axis of the world coordinate system. According to the structural relationship of the calibration block, the coordinates of the reference points in the world coordinate system can be obtained. The real image is corrected using the distortion factor and then using a variety of algorithms to calculate the pose of the camera.
Since the real pose of the camera cannot be obtained, the accuracy of the algorithm is indirectly measured using the square root of the reprojection error. Figure 14 shows the results of various algorithms.

The results of experiments on real data.
As illustrated in Figure 14, compared with the other algorithms, the proposed algorithm has higher accuracy and stability with few reference points and different poses. The experimental results verify our method’s accuracy and robustness and it possesses stronger applicability and generality.
Conclusion
In our study, we described a robust and accurate pose determination algorithm based on geometric characteristic search using the IoT. This method uses a non-metric camera to obtain image information and uses the IoT technology to transfer and deal with uncertain data. In data processing, using object space collinearity error, we derive an iterative algorithm by absolute orientation and orthogonal projection, which directly compute the rotation matrix. In the process of solving, we also consider the phenomenon of multiple solutions to this problem. Through geometric interpretation and numerical calculation, we elaborate the characteristic geometric relationship between the correct solution and the local extremum of the error function. Then we proposed a fast and convenient way to solve the problem of multiple solutions, which can provide accurate and stable data for subsequent applications.
Our algorithm’s main features are follows:
Our method can quickly complete data processing and transmission because of using the IoT technology.
It can handle the arbitrary attitude of the camera, including large angles.
This algorithm can be used to planar as well as non-planar point configurations and handle all the feature points (n ≥ 4) and the computational complexity of the proposed method is O(n).
The proposed algorithm is compared with the state-of-the-art methods on simulated data as well as real image data. Results of the experiments show that the accuracy and stability have improved and no severe pose jumps occurred. The proposed algorithm can be applied in a lot of relevant applications in AR7,31 and UAV.6,32 Thus, further research will be valuable.
Footnotes
Acknowledgements
The authors express their thanks to the editor and anonymous reviewers for their help in revising the paper. Z.Z. built the mathematical model and developed the general methodological framework. Q.W. performed the research, analyzed the data, discussed the results, and wrote the article. All authors have read and approved the final manuscript.
Handling Editor: Behrouz Jedari
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors express their thanks to the National Research and Development Project of the People’s Republic of China (grant nos 2016YFF0203105-3 and 2017YFC0805703) for their support.
