Hybrid visual natural landmark–based localization for indoor mobile robots

Abstract

Localization is a crucial part of autonomous moving for the indoor mobile robot. The natural features of the ceiling and surrounding environment can serve for position estimation. Based on these natural features, a hybrid visual natural landmarks–based localization method is proposed. We combine the landmarks-based positioning with ceiling-based visual odometry. During the visual odometry, the orientation is computed from the parallel features between the adjacent frames. The position is calculated from the corresponding point features in the two consecutive images using the perspective-n-point method. During the natural landmarks–based localization, the orientation filter is utilized to obtain the global orientation. Then, the feature points are determined by the Compute Unified Device Architecture–based scale-invariant feature transform algorithm. Finally, the position is estimated based on the computed orientation and point features. Various experiments have been conducted to evaluate the effectiveness of the proposed method. The experimental results show that the proposed localization method outperforms other methods in accuracy and efficiency.

Keywords

Mobile robot natural landmark indoor localization visual odometry

Introduction

Self-localization is an important part of the autonomous mobile robot, especially the indoor service robot. Accurate position is necessary for navigation and trajectory planning, which play important roles in the mobile robot working in unknown indoor environment. Various localization methods have been proposed, such as Lidar, ultrasonic, WIFI, and Ultra Wideband. Recently, the vision-based localization has been intensively researched. Visual localization generally detects the existing features in the environment ranging from the artificial markers to geometric features such as corners and wall.^1,2 The position and orientation are determined by the corresponding features in the former and current frames.³

It is common to place a known object such as barcodes in the environment to work as marks. Then, the position is estimated through the relative position of the marks and the robot.⁴ The accurate calibration and recognition of the markers are needed in these methods. Moreover, the setting of the markers is not feasible in some applications. Another localization strategy is to employ natural landmarks. Thus, the vision-based simultaneous localization and mapping (VSLAM) is proposed, which uses the corner point and line features for 3-D reconstruction and Kalman filter for motion estimation.^5

–8 VSLAM can be generally classified into two categories: forward-looking approach⁹ and side-looking approach.¹⁰ In the forward-looking method, as the scale of the features in the view varies from frame to frame, the scale-invariant feature transform (SIFT) algorithm is adopted for features detection.^11,12 The side-looking method uses the corner-like features since small variation of the scale is caused when moving along the wall. However, complex computation is needed in VSLAM because of the search and matching of the features.

Related works

Earlier methods used the global camera mounted on the ceiling and the marker on the top of the robot to localize the robot. In Michel et al.’s study,¹³ the overhead camera was adopted to estimate the pose of the robot, which is indicated by the marker. Andersen et al.¹⁴ employed the lamp as the marker and computed the position of the automatic guided vehicle based on the global camera. The global vision-based localization method is also very popular in the soccer robot.¹⁵ In the work of Brezak et al.,¹⁵ the position of the robot was calculated through the Bayer format image acquired by the global vision system, which achieved high precision.

To localize the robot without the artificial marks attached to the environment, the natural landmarks are proposed. Some researchers used corner, lamp, and door features to determine the position of the robot, which improved the ability to maintain the stableness of the algorithm.¹⁰ In the study by Wu et al.,¹⁶ the line-based dynamic vision was adopted to detect the line-based visual features. This approach obtained the same result compared with other natural landmarks–based method, but also can be extended to the case with dependent lines.

Another typical natural landmarks–based localization method is the VSLAM. Heng et al.⁹ proposed a VSLAM system with a multi-camera that used the stereo vision to estimate the pose with scale. In Kim et al.’s work,⁵ the Bayesian filtering–based VSLAM was proposed, which reduced the error accumulation by selecting the key frames. However, these kind of methods have significant weakness: first, if the frame rate of the camera is low, the accuracy decreases obviously; second, the localization fails when there is sudden velocity change; at last, the real-time performance is hard to achieve. To solve this problem, Liu et al.¹⁷ proposed the Conditional SLAM, which outperforms other methods in accuracy and computational efficiency.

In the previous stereo SLAM methods, the dense 3-D map is produced. Engel et al.¹⁸ presented the semi-dense SLAM method by implementing loop closure detection and image alignment. Stereo and multi-view stereo were used to estimate the semi-dense map based on pose graph optimization. RGBD-SLAM methods are an important field in the research of visual SLAM. But the field of view (FOV) and depth range limit the application of the RGBD-SLAM method in the large-scale environment. Yousif et al.¹⁹ combined the RGBD camera with a monocular camera to extract the 2-D and 3-D feature point at the same time, which could solve the unstable performance in the limited FOV and textureless scenes.

Other SLAM methods that focus on the direct approach, like the large-scale direct (LSD) SLAM,²⁰ have been proposed. In the LSD-SLAM method, the global map is presented as a pose graph, and the depth maps are estimated using filtering and direct image alignment. In Engel et al.’s work,²¹ LSD-SLAM was extended to stereo vision. Recently, the stereo LSD-SLAM has been integrated with inertial measurement unit (IMU) through solving energy minimization problems, which would be challenged with the photometric error. Leutenegger et al.²² fuses the visual and inertial cues which integrate the visual reprojection error with IMU error into the cost function, and then solve the joint function to infer about the motion.

Motivations

In the artificial marker–based methods, the mismatching of the markers will result in large errors of localization. Thus, the markers must be designed in specified shape and color, and then set in positions selected in advance. Triangulation of the landmarks is generally needed in the measurement of the position of the robot, which decreases the accuracy and efficiency. The main concerns of the VSLAM method is the visual odometry and close-loop property. However, the visual odometry is extremely sensitive to the environmental change and cannot solve the problem of error accumulation. Image processing and data fusion have to be implemented to achieve close-loop property.

Recently, the ceiling-based localization method was proposed in some articles as the features of the ceiling are rarely occluded and stable in the scale. Moreover, the localization is not affected by the dynamic change such as moving people in the environment. In the study by Xu et al.,³ the perspective-n-point (PnP)-based localization method was presented, in which the position and orientation are computed through the point features and line features on the ceiling. Similarly, in the research of Jeong and Lee,²³ the VSLAM used line and corner features as landmarks. The hybrid landmark outperforms the single one.

Based on the survey and analysis of the existing localization algorithms, the hybrid visual natural landmark (HVNL)–based method is proposed. First, a landmark library is built for absolute positioning including the features extracted from the ceiling image and the panoramic image. Then, the robot must determine whether there exists natural landmark. If the natural landmark exists, the panoramic image is used to compute the orientation and the ceiling image is used to estimate the position of the robot; otherwise, the ceiling-based visual odometry method is adopted, which localizes the robot using the different locations of the matching features in two adjacent frames. During the selection and matching of the features, the Graphics Processing Unit (GPU) acceleration is implemented to improve the efficiency and accuracy, respectively. Comprehensive experiments were conducted, and the results demonstrated that the proposed method outperforms the visual odometry in accuracy and VSLAM in efficiency.

Organization of the article

The rest of the article is organized as follows. The second section introduces the pipeline of the proposed method. The establishment and recognition of the natural landmarks are presented in the third section. The ceiling-based visual odometry is described in the fourth section. In the fifth section, the natural landmark–based localization is introduced. The experimental results are presented and analyzed in the sixth section. The seventh section gives the conclusion.

The proposed localization system

In the indoor environment, some of the ceiling consisted of chessboard blocks. An RGB camera and omnidirectional camera^24,25 are mounted on the top of the robot, which is shown in Figure 1. The world frame is assigned on the ceiling. The X and Y axis of the world frame are parallel to the edge lines of the blocks. The camera coordinate is set to the same as the image frame. Thus, the Z axis is normal to the plane of the ceiling.

Figure 1.

Diagram of the HVNL-based localization system. HVNL: hybrid visual natural landmark.

The proposed localization system mainly consists of four parts: (1) landmark library establishment; (2) recognition of the natural landmarks; (3) ceiling-based visual odometry; and (4) hybrid natural landmark–based localization. The flow diagram of the localization algorithm is shown in Figure 2. From Figure 2, it can be known that the visual odometry is actually a relative localization and the natural landmark–based positioning is equivalent to absolute localization. Thus, the hybrid localization strategy can achieve better performance than other methods.

Figure 2.

Block diagram of the proposed localization method.

Natural landmark establishment and recognition

The accumulation error is unavoidable during the longtime moving of the robot that localizes based on visual odometry. Especially if the robot moves along a closed trajectory, the error will not be removed even the position is nearby the starting point. In the former research,⁴ the artificial markers were adopted to eliminate the error. In this study, the hybrid natural landmarks are used. First, the robot has to navigate the environment as a mapping run to establish the landmark library. The most important part of building and recognizing the natural landmark lies in the feature detection and matching. Various methods have been proposed to detect and match the image features, which can mainly be classified into two categories: intensity-based methods²⁶ and feature-based methods.^11,27,28

The intensity-based methods are usually limited by the illumination differences, textureless regions, and computational efficiency.²⁹ Thus, the feature-based methods, including point, line, contour,^30,31 and so on, are widely used. Some representative feature-based methods have been presented in recent research studies, such as SIFT,^12,32 SURF,³³ GLOH,³⁴ and BRISK.³⁵ The SIFT-based method outperforms other algorithms because of its invariant image rotation, scaling, and illumination variations. In this article, a discrete Gaussian–Hermite moment (DGHM)^36,37–based feature detection method is proposed to build the landmark library. Then, the Compute Unified Device Architecture (CUDA)–based SIFT algorithm is adopted to recognize the natural landmark, which largely speeds up the process of localization.

Establishment of the natural landmark library

Yang and Dai³⁶ proposed the Gaussian–Hermite moments based on weighted Hermite polynomials, which were invariant to rotation and translation.

In this article, the DGHM-based feature detection algorithm is proposed. More explanation of the GH moments is shown in Appendix 1. The DGHM at (i, j) can be expressed as

{\overset{⌢}{η}}_{p, q} (i, j, m_{M}, m_{n}) = \frac{4}{(M - 1) (N - 1)} \sum_{u = 0}^{K_{M} - 1} \sum_{v = 0}^{K_{N} - 1} I (i + (m_{M} u - \frac{M}{2} + 1) j + (m_{N v} - \frac{N}{2} + 1)) {\overset{⌢}{H}}_{p} (x, σ) {\overset{⌢}{H}}_{q} (y, σ)

Assume the image mask is t(u, v), the size is $M \times N [0 \leq u \leq M - 1, 0 \leq v \leq N - 1]$ . We define the approximate matrix of DGHM as

M_{aproxi} (X, σ) = [\begin{matrix} {\overset{⌢}{η}}_{p, 0} (X, σ) & {\overset{⌢}{η}}_{p, q} (X, σ) \\ {\overset{⌢}{η}}_{p, q} (X, σ) & {\overset{⌢}{η}}_{0, q} (X, σ) \end{matrix}]

The feature points detected by the DGHM-based method are more stable and contain more geometric information.³⁰ The approximate value of the determinant of $M_{aproxi} (X, σ)$ can be given by

det (M_{aproxi} (X, σ)) = {\overset{⌢}{η}}_{p, 0} (X, σ) {\overset{⌢}{η}}_{q, 0} (X, σ) - {(ω \cdot {\overset{⌢}{η}}_{p, 0} (X, σ))}^{2}

where ω is the weighted coefficient. Then, the feature point can be determined according to whether the determinant is positive or negative. If it is positive, then the current point is the feature point.

In this study, the hybrid natural landmark is proposed, which contains the ceiling image features and the panoramic image features, shown in Figure 3. The selection of the natural landmark largely affects the accuracy and efficiency of localization. Too many landmarks reduce the efficiency and too few landmarks cause low accuracy. We set the starting point as the first landmark. Then, the panoramic image is adopted for feature detection and matching with respect to the features in the landmark library. If the number of the matching feature point is less than T, it can be known that the current point is far away from the existing natural landmarks. Then, the current point is defined as the new landmark and assigned with ID number.

Figure 3.

Hybrid natural landmark library. (a) Panoramic images and (b) ceiling images.

Recognition of the natural landmark

In order to localize the robot, the accurate recognition of the natural landmark is necessary. After the natural landmark library finished, each natural landmark must be matched separately. Features detection and matching are very useful for natural landmark recognition because they are widely distributed in the environment. In previous research,^21,26 the SURF and SIFT methods are usually adopted. However, the SIFT method is time-consuming and the SURF method is less accurate compared with other methods. In this article, the CUDA-based DGHM-SIFT algorithm is proposed to recognize the natural landmark. First, the DGHM-based feature detection method is used to find the feature point, and then the CUDA-based SIFT descriptor and matching is implemented to determine the referenced natural landmark for localization. As the DGHM has good edge detecting property, the acquired feature points have significant geometry feature information and are more stable. Thus, the proposed method is more robust.

After the key points are detected, the gradient table of the image in the Gaussian pyramid is prepared. Each image in the Gaussian pyramid is defined as one octaves. Then, the descriptors of the key point which belongs to the same octave are generated simultaneously. The GPU implementation of the proposed method is shown in Figure 4. The recognition results are shown in Figure 5.

Figure 4.

GPU implementation on the recognition of the natural landmark. GPU: graphics processing unit.

Figure 5.

Natural landmark recognition results. (a and b) Successful recognition and (c) failed recognition.

Ceiling-based visual odometry

The feature of the ceiling can be reliably detected as it is less possible to be occluded. The ceiling commonly consists of many blocks, which have parallel lines and corner points. The point and line features are very useful for the localization of the robot. The position of the robot can be determined with the orientation information. However, any point cannot be guaranteed to be observed in the camera image at all the time. Thus, it is impossible to localize the robot only with fixed feature points. They should be selected dynamically with the motions of the robot. The block diagram of the ceiling-based visual odometry is shown in Figure 6 and described as follows.

The raw image is calibrated with the intrinsic parameters.

The image is set as reference. The following images are used for estimating the position based on the reference.

The feature lines are extracted, then the intersection point of the feature lines, which is the closest to the image center, is selected as the initial feature point.

With the motions of the robot, the feature lines and points are tracked and updated around that of the former image.

The position is estimated based on the feature lines and points.

If the feature point is far away from the image center, the algorithm returns to (2); otherwise, returns to (4).

Figure 6.

Block diagram of the ceiling-based visual odometry.

Image processing

Due to the illumination difference, there usually exist some blurs nearby the lines and corners of the blocks in the ceiling. This results in

the wrong feature extraction and recognition. Thus, filtering is needed to process the image and acquire more accurate features. In this article, the weighted guided filter (WGF)³⁸ is adopted, which has good edge-preserving property. As the computational complexity of the WGF is irrelevant to the size of the support window, it can achieve real-time performance. The kernel function can be written as

\hat{Z} (q) = a_{p} G (q) + b_{p}, \forall q \in Ω_{ζ 1} (p)

where

\begin{array}{l} a_{p} = \frac{μ_{G \cdot I, ζ 1} (p) - μ_{G, ζ 1} (p) μ_{I, ζ 1} (p) + \frac{λ}{{\hat{Γ}}_{G} (p)} η_{p}}{σ_{_{G, ζ 1}}^{2} (p) + \frac{λ}{{\hat{Γ}}_{G} (p)}} \\ b_{p} = μ_{I, ζ 1} (p) - a_{p} μ_{G, ζ 1} (p) \end{array}

In equation (4), G is the guidance image and I is the input image. Here, we define the input image as the guidance image. $\hat{Z}$ can be considered as the linear transform of G in the window $Ω_{ζ 1} (p)$ . $μ_{G, ζ 1} (p)$ , $μ_{I, ζ 1} (p)$ , and $μ_{G \cdot I, ζ 1} (p)$ represent the average value of the G, I, and $G • I$ , respectively. In equation (5), ${\hat{Γ}}_{G} (p)$ is defined as

{\hat{Γ}}_{G} (p) = \frac{1}{N} \sum_{q = 1}^{N} \frac{σ_{G, ζ 1}^{2} (p) + ε}{σ_{G, ζ 1}^{2} (q) + ε}

where q is the neighboring point of the window centered at p. Z(p) is the output value of the filter at p. ε is a constant.

After processing the image with WGF, the Hough transform³⁹ is used to detect the feature lines. The Hough transform finds the specified shape from the given objects through voting, which makes it less efficient. Thus, the hybrid feature detection method is adopted to reduce the computational complexity. The global feature detection is only implemented for the reference image. And for other images, the local feature detection method is used nearby the feature point in the last frame.

For the reference image, all lines are extracted from the binary image. The line representing the X axis of the world coordinate, which is the closest to the center of the image, is defined as the main line. Another line that is the closest to the center of the image and vertical to the main line is defined as secondary line. The intersection of these two lines is defined as the main feature point. For the other image, the region-of-interest (ROI) method is utilized to improve the computational efficiency. A rectangular window center at the main feature point of the former image is determined. Then the local feature detection method is implemented within the window. The main and secondary lines and feature point are tracked and updated. If the feature point is not in the current image, the former image is set as the reference image and the new feature lines and point are determined.

Positioning based on visual odometry

The transformation matrix from the world coordinate to the camera coordinate can be expressed as

[\begin{matrix} x_{c} \\ y_{c} \\ z_{c} \\ 1 \end{matrix}] = [\begin{matrix} n_{x} & o_{x} & a_{x} & p_{x} \\ n_{y} & o_{y} & a_{y} & p_{y} \\ n_{z} & o_{z} & a_{z} & p_{z} \\ 0 & 0 & 0 & 1 \end{matrix}] [\begin{matrix} x_{w} \\ y_{w} \\ z_{w} \\ 1 \end{matrix}] = {}^{c}T_{w} [\begin{matrix} x_{w} \\ y_{w} \\ z_{w} \\ 1 \end{matrix}]

where $(x_{w}, y_{w}, z_{w})$ are the world coordinates. $\vec{n} = (n_{x}, n_{y}, n_{z})$ , $\vec{o} = (o_{x}, o_{y}, o_{z})$ , and $\vec{a} = (a_{x}, a_{y}, a_{z})$ denote the direction vectors of the X_w , Y_w , and Z_w axis, respectively. $\vec{p} = (p_{x}, p_{y}, p_{z})$ represents the position vector from the origin of the camera frame to the world frame. The relationships between these three frames are shown in Figure 7.

Figure 7.

Frames of reference.

As the mobile robot only moves along the X_w , Y_w axis and rotates along the Z_w axis, the transformation matrix from the camera coordinate to the world coordinate can be expressed as

\begin{array}{l} {}^{c i}T_{w} = {}^{c}T_{w} {}^{w}T_{i r}^{- 1} = [\begin{matrix} n_{x} cos φ_{i} - o_{x} sin φ_{i} & n_{x} sin φ_{i} + o_{x} cos φ_{i} & a_{x} & {}^{c i}p_{w x} \\ n_{y} cos φ_{i} - o_{y} sin φ_{i} & n_{y} sin φ_{i} + o_{y} cos φ_{i} & a_{y} & {}^{c i}p_{w y} \\ n_{z} cos φ_{i} - o_{z} sin φ_{i} & n_{z} sin φ_{i} + o_{z} cos φ_{i} & a_{y} & {}^{c i}p_{w z} \\ 0 & 0 & 0 & 1 \end{matrix}] = [\begin{matrix} {}^{i}n_{x} & {}^{i}o_{x} & {}^{i}a_{x} & {}^{i}p_{x} \\ {}^{i}n_{y} & {}^{i}o_{y} & {}^{i}a_{y} & {}^{i}p_{y} \\ {}^{i}n_{z} & {}^{i}o_{z} & {}^{i}a_{z} & {}^{i}p_{z} \\ 0 & 0 & 0 & 1 \end{matrix}] \end{array}

{\begin{matrix} {}^{c i}p_{w x} = n_{x} (p_{x i} cos φ_{i} + p_{y i} sin φ_{i}) + o_{x} (- p_{x i} sin φ_{i} + p_{y i} cos φ_{i}) + p_{x} \\ {}^{c i}p_{w y} = n_{y} (p_{x i} cos φ_{i} + p_{y i} sin φ_{i}) + o_{y} (- p_{x i} sin φ_{i} + p_{y i} cos φ_{i}) + p_{y} \\ {}^{c i}p_{w z} = n_{z} (p_{x i} cos φ_{i} + p_{y i} sin φ_{i}) + o_{z} (- p_{x i} sin φ_{i} + p_{y i} cos φ_{i}) + p_{z} \end{matrix}

where $(p_{x i}, p_{y i})$ is the translation vector of the camera and $φ_{i}$ is the rotation angle of the robot along the Z_w axis.

The orientation of the robot is crucial for the pose estimation. Due to the existence of the parallels on the ceiling, for any two points, which are parallel to the $X_{w}$ or $Y_{w}$ axis, the image coordinates satisfy equation (10)

{\begin{matrix} {}^{i}n_{x} (y_{l x 1} - y_{l x 2}) + {}^{i}n_{y} (x_{l x 1} - x_{l x 2}) + {}^{i}n_{z} (x_{l x 1} y_{l x 2} - x_{l x 2} y_{l x 1}) = 0 \\ {}^{i}o_{x} (y_{l y 1} - y_{l y 2}) + {}^{i}o_{y} (x_{l y 1} - x_{l y 2}) + {}^{i}o_{z} (x_{l y 1} y_{l y 2} - x_{l y 2} y_{l y 1}) = 0 \end{matrix}

where $(x_{l x 1}, y_{l x 1})$ and $(x_{l x 2}, y_{l x 2})$ , $(x_{l y 1}, y_{l y 1})$ and $(x_{l y 2}, y_{l y 2})$ are the normalized coordinates of the points in the line that is parallel to $X_{w}$ and $Y_{w}$ axis, respectively, which are given in

{\begin{matrix} x_{l x 1} = (u_{l x 1} - u_{0}) / k_{x} \\ y_{l x 1} = (v_{l x 1} - v_{0}) / k_{y} \end{matrix}

Combining equation (9) and (10) we have

{\begin{matrix} \begin{array}{l} [n_{x} (y_{l x 1} - y_{l x 2}) + n_{y} (x_{l x 1} - x_{l x 2}) + n_{z} (x_{l x 1} y_{l x 2} - x_{l x 2} y_{l x 1})] cos φ_{i} \\ = [o_{x} (y_{l x 1} - y_{l x 2}) + o_{y} (x_{l x 1} - x_{l x 2}) + o_{z} (x_{l x 1} y_{l x 2} - x_{l x 2} y_{l x 1})] sin φ_{i} \end{array} \\ \begin{array}{l} [o_{x} (y_{l y 1} - y_{l y 2}) + o_{y} (x_{l y 1} - x_{l y 2}) + o_{z} (x_{l y 1} y_{l y 2} - x_{l y 2} y_{l y 1})] cos φ_{i} \\ = - [n_{x} (y_{l y 1} - y_{l y 2}) + n_{y} (x_{l y 1} - x_{l y 2}) + n_{z} (x_{l y 1} y_{l y 2} - x_{l y 2} y_{l y 1})] sin φ_{i} \end{array} \end{matrix}

From equation (12), the rotation angle of the robot can be obtained as

φ_{i} = atan (k_{1} + k_{2}, k_{3} + k_{4})

where

{\begin{matrix} k_{1} = n_{x} (y_{l x 1} - y_{l x 2}) + n_{y} (x_{l x 1} - x_{l x 2}) + n_{z} (x_{l x 1} y_{l x 2} - x_{l x 2} y_{l x 1}) \\ k_{2} = o_{x} (y_{l x 1} - y_{l x 2}) + o_{y} (x_{l x 1} - x_{l x 2}) + o_{z} (x_{l x 1} y_{l x 2} - x_{l x 2} y_{l x 1}) \\ k_{3} = o_{x} (y_{l y 1} - y_{l y 2}) + o_{y} (x_{l y 1} - x_{l y 2}) + o_{z} (x_{l y 1} y_{l y 2} - x_{l y 2} y_{l y 1}) \\ k_{4} = - [n_{x} (y_{l y 1} - y_{l y 2}) + n_{y} (x_{l y 1} - x_{l y 2}) + n_{z} (x_{l y 1} y_{l y 2} - x_{l y 2} y_{l y 1})] \end{matrix}

For the (i+1)th frame, we have

\begin{array}{l} [\begin{matrix} k (u_{i + 1} - u_{0}) \\ k (v_{i + 1} - v_{0}) \\ 1 \end{matrix}] = [\begin{matrix} cos φ_{i + 1} & - sin φ_{i + 1} & p_{x, i + 1} \\ sin φ_{i + 1} & cos φ_{i + 1} & p_{y, i + 1} \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} x_{w i} \\ y_{w i} \\ 1 \end{matrix}] = [\begin{matrix} cos φ_{i} & - sin φ_{i} & p_{x i} \\ sin φ_{i} & cos φ_{i} & p_{y i} \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} cos Δ φ_{i} & - sin Δ φ_{i} & Δ p_{x i} \\ sin Δ φ_{i} & cos Δ φ_{i} & Δ p_{y i} \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} x_{w i} \\ y_{w i} \\ 1 \end{matrix}] \end{array}

where $(u_{0}, v_{0})$ are the coordinates of the principal point. $(u_{i}, v_{i})$ are the coordinates of the feature point of the ith frame. $(p_{x i}, p_{y i})$ are the coordinates of the origin of the camera frame. $(x_{w i}, y_{w i})$ are the coordinates of feature point in the world frame. $k = h / k_{x} \cdot Δ φ_{i}$ denotes the orientation difference between the ith frame and (i+1)th frame. $Δ p_{x i}, Δ p_{y i}$ are the translation difference between the ith frame and (i+1)th frame.

Then, we have

{\begin{matrix} p_{x, i + 1} = k (u_{d, i + 1} - u_{d i} cos Δ φ_{i} + v_{d i} sin Δ φ_{i}) + p_{x i} cos Δ φ_{i} - p_{y i} sin Δ φ_{i} \\ p_{y, i + 1} = k (v_{d, i + 1} - u_{d i} sin Δ φ_{i} - v_{d i} cos Δ φ_{i}) + p_{x i} sin Δ φ_{i} + p_{y i} cos Δ φ_{i} \end{matrix}

In the previous research,³ the position is estimated from the former and current frames, and then accumulated. In this article, the position is determined only by the reference frame and current frame, which can reduce the accumulation error effectively. Therefore, the pose of the robot can be written as

{\begin{matrix} {}^{W}p_{x, i + 1} = - p_{x, i + 1} cos Δ φ_{i + 1} - p_{y, i + 1} sin Δ φ_{i + 1} \\ {}^{W}p_{y, i + 1} = p_{x, i + 1} sin Δ φ_{i + 1} - p_{y, i + 1} cos Δ φ_{i + 1} \end{matrix}

where

{\begin{matrix} p_{x, i + 1} = k (u_{d, i + 1} - u_{d 0} cos Δ φ_{i} + v_{d 0} sin Δ φ_{i}) + p_{x 0} cos Δ φ_{i} - p_{y 0} sin Δ φ_{i} \\ p_{y, i + 1} = k (v_{d, i + 1} - u_{d 0} sin Δ φ_{i} - v_{d 0} cos Δ φ_{i}) + p_{x 0} sin Δ φ_{i} + p_{y 0} cos Δ φ_{i} \end{matrix}

where $u_{d 0} = u_{f} - u_{0}, v_{d 0} = v_{f} - v_{0}$ . $u_{f}, v_{f}$ are the image coordinates of the feature point of the reference image. $p_{x 0}, p_{y 0}$ are the image coordinates in the world frame.

Natural landmark–based localization

In the case of artificial landmark–based localization, the process of landmark recognition would be effected by the human activities. Sometimes, the landmarks are even occluded by some objects, which will lead to failure in localization. The visual odometry method is efficient and can achieve real-time performance, but the accumulation error cannot be removed with the motion of the robot. The initial position of the robot cannot be determined at the beginning. To solve these problems, the hybrid visual landmark–based localization method is proposed, which has the advantages of robustness and absolute localization.

Orientation estimation

For a general purpose, the orientation filter is adopted to estimate the orientation angles. In our study, the indirect Kalman filter⁴⁰ is utilized, which defines the state space with the orientation error. The state vector usually has to be unit or positive, but it is not necessary in the proposed filter. Moreover, the state propagation and measurement model is simpler than other filters. The data fusion process is described by the error quaternion, which is an approximately linear space.

Assume the relative orientation from the current image I to the reference image G of the natural landmark is denoted using the unit quaternion ${}_{I}^{G}q$ . The estimate of ${}_{I}^{G}q$ is represented by ${}_{I}^{G}{\overset{⌢}{q}}$ and the estimated error is represented by $δ q$ , then it can be expressed as

{}_{I}^{G}q = {}_{I}^{G}{\overset{⌢}{q}} \otimes δ q

where ⊗ represents the multiplication operation.

We use Δ_q to denote the rotation between two adjacent frames. Then, we obtain

{}_{I}^{G}{\overset{⌢}{q}}_{k} = {}_{I}^{G}{\overset{⌢}{q}}_{k - 1} \otimes Δ q_{k - 1}

Based on the assumption that the error estimate δ_q is normal distribution, we have

{\begin{matrix} _{I}^{N} q_{k} =_{I}^{N} q_{k - 1} \otimes Δ q_{k - 1} =_{I}^{N} {\overset{⌢}{q}}_{k - 1} \otimes δ q_{k - 1} \otimes Δ q_{k - 1} \\ δ q_{k} =_{I}^{N} q_{k}^{T} \otimes_{I}^{N} q_{k} \\ = {(_{I}^{N} {\overset{⌢}{q}}_{k - 1} \otimes Δ q_{k - 1})}^{T} \otimes (_{I}^{N} {\overset{⌢}{q}}_{k - 1} \otimes δ q_{k - 1} \otimes Δ q_{k - 1}) \\ = Δ q_{k - 1}^{T} \otimes δ q_{k - 1} \otimes Δ q_{k - 1} \end{matrix}

where $δ q_{k - 1}$ can be written as

δ q_{k - 1} = {[δ q_{0} δ q_{1} δ q_{2} δ q_{3}]}_{k - 1}^{T} = {[δ q_{0} 0 0 0]}_{k - 1}^{T} + {[0 δ q_{1} δ q_{2} δ q_{3}]}_{k - 1}^{T}

Combining equations (21) and (22), we have

\begin{array}{l} δ q_{k - 1} = Δ q_{k - 1}^{T} \otimes {[δ q_{0} 0 0 0]}_{k - 1}^{T} \otimes Δ q_{k - 1} + Δ q_{k - 1}^{T} \otimes {[0 δ q_{1} δ q_{2} δ q_{3}]}_{k - 1}^{T} \otimes Δ q_{k - 1} \\ = {[δ q_{0} 0 0 0]}_{k - 1}^{T} + [\begin{matrix} 0 & 0 \\ 0 & Δ R_{k - 1}^{T} \end{matrix}] {[0 δ q_{1} δ q_{2} δ q_{3}]}_{k - 1}^{T} = [\begin{matrix} {(δ q_{0})}_{k - 1} \\ Δ R_{k - 1}^{T} {[\begin{matrix} δ q_{1} \\ δ q_{2} \\ δ q_{3} \end{matrix}]}_{k - 1} \end{matrix}] \end{array}

where Δ_R is the rotation matrix. It can be found that $δ q_{0}$ remains constant between different states. Thus, equation (23) can be simplified as

δ e_{k} = Δ R_{k - 1}^{T} δ e_{k - 1}

where

δ e = [δ q_{1} / (1 + δ q_{0}) δ q_{2} / (1 + δ q_{0}) δ q_{3} / (1 + δ q_{0})]^{T}

The state model can be expressed as

{\begin{cases} X_{k} = Δ R_{k - 1}^{T} X_{k - 1} + w_{k - 1} |_{X = δ e} \\ Y_{k} = X_{k} + v_{k} \end{cases}

where w is the process noise and v is the observation noise. As mentioned above, X_k, Y_k are closer to linear, thus, they can be thought as decoupled with each other. Then, the covariance matrix of w and v, which is denoted by Q and W, can be set to diagonal. In this article, the IMU is adopted to obtain more accurate orientation estimation. The covariance matrix Q is estimated by the visual odometry and W is estimated by the IMU. Finally, the output of the filter can be written as

{\overset{⌢}{X}}_{k} = {\overset{⌢}{X}}_{k}^{-} + K_{k} (Y_{k} - {\overset{⌢}{X}}_{k}^{-}) = δ {\overset{⌢}{e}}_{k}^{-} + K_{k} (δ {\overset{⌢}{e}}_{{IMU}_{k}}^{-} δ {\overset{⌢}{e}}_{{VO}_{k}}^{-}) \approx δ {\overset{⌢}{e}}_{VO}^{-} + K_{k} δ e_{{IMU}_{k}}^{VO}

where $δ e_{{IMU}_{k}}^{VO}$ is the modified Rodrigues parameters of $δ q_{{IMU}_{k}}^{VO}$ , which is given by

δ q_{{IMU}_{k}}^{VO} = Δ q_{VO}^{T} \otimes Δ q_{IMU}

where $Δ q_{IMU}$ and $Δ q_{VO}$ represent rotation computed by the IMU and visual odometry (VO), respectively. As the robot only rotates along the Z_w axis, the rotation along other axis can be ignored.

Position

To compute the position of the robot, the feature points must be determined in the current image and reference image of the natural landmark. In this study, the lamp features are used to determine the feature point as they can be easily detected. In some previous research,¹⁰ the lamps are usually detected when they are turned on, which is not possible for all cases. Thus, the DGHM-SIFT algorithm described in the “Recognition of the natural landmark” section is adopted to determine the feature point.

First, the lamp features are detected based on the intensity, which is obviously different from other blocks. Then, the ROI is selected with a window centered at the corner of the lamp. The corner that is nearest to the center of the image is selected as the main feature point. Finally, the DGHM-SIFT algorithm is utilized to find the corresponding lamp in the ceiling image of the natural landmark. The lamp corner is set to be the feature point. The feature point detection results are shown in Figure 8. The white circle means the feature point.

Figure 8.

The feature point detection results (the lines connect the corresponding matching points).

After the orientation and feature point is determined, the localization can be implemented according to the “Positioning based on visual odometry” section, as given in

\begin{array}{l} p_{x, i + 1} = k (u_{d, i + 1} - u_{d, ref} cos Δ φ_{i, ref} + v_{d, ref} sin Δ φ_{i, ref}) + p_{x, ref} cos Δ φ_{i, ref} - p_{y, ref} sin Δ φ_{i, ref} \\ p_{y, i + 1} = k (v_{d, i + 1} - u_{d, ref} sin Δ φ_{i, ref} - v_{d, ref} cos Δ φ_{i, ref}) + p_{x, ref} sin Δ φ_{i, ref} + p_{y, ref} cos Δ φ_{i, ref} \end{array}

where $p_{x, ref}, p_{y, ref}$ are the coordinates of the feature point in the world frame. $u_{d, ref}, v_{d, ref}$ represent the image coordinates of the feature point. $Δ φ_{i, ref}$ is the relative rotation angle with respect to the natural landmark.

Experimental results

The experimental system contains a mobile robot (TurtleBot, Yujin Robot, YMR-K01-W1), an RGB camera, a fish-eye camera, and IMU. The two cameras are pointed to the ceiling, as shown in Figure 9.

Figure 9.

Experimental system.

Visual odometry experiment

An indoor experiment was implemented to verify the effectiveness of the visual odometry. Before moving, the initial position of the robot must be determined. The PnP-based⁴¹ method was adopted to compute initial pose using the vertices of the blocks. The transformation matrix could be determined

{\begin{matrix} x_{w v} n_{x} + y_{w v} o_{x} - x_{1 c} x_{w v} n_{z} - x_{1 c} y_{w v} o_{z} + p_{x} - x_{1 c} p_{z} = 0 \\ x_{w v} n_{y} + y_{w v} o_{y} - y_{1 c} x_{w v} n_{z} - y_{1 c} y_{w v} o_{z} + p_{y} - y_{1 c} p_{z} = 0 \end{matrix}

where $(x_{w v}, y_{w v}, z_{w v})$ are the world coordinates of the vertex and $(x_{1 c}, y_{1 c})$ are the normalized coordinates in the image frame. Then, the initial pose was given in

{}^{c}T_{w} = [\begin{matrix} - 0.342 & - 0.94 & 0 & 143.6 \\ 0.94 & - 0.342 & 0 & 16.3 \\ 0 & 0 & 1 & - 2810.5 \\ 0 & 0 & 0 & 1 \end{matrix}]

After the initial pose of the robot was obtained, the robot started moving. The experimental results are shown in Figure 10. For better evaluation, the position results are shown in Figure 10(a). From Figure 10(a), it can be found that with the increasing moving distance, the computed trajectory deviates from the actual one. The position errors, which are the distance from the measure point to the actual point, are presented in Figure 10(b). The maximum error in distance was 49.71 mm. The average position error along X_w and Y_w axis was 18.30 mm and 18.20 mm, respectively. Thus, the error accumulation was very small as the localization is implemented based on the reference image instead of the former image.

Figure 10.

Position results of the visual odometry. (a) Positions and (b) position errors in distance.

The orientation results and its error are shown in Figure 11. The orientation was denoted by the direction angle in the world frame, without affected by the error accumulation. The average error of the orientation angle was 0.78° in our experiment. Moreover, the maximum error is 1.12° because of the noise and illumination difference in the images. The experimental results demonstrate that the visual odometry has small error in orientation, but the error accumulation is unavoidable.

Figure 11.

Orientation results of the visual odometry. (a) Direction angles and (b) orientation errors in angles.

Initial pose estimation

The biggest difference between the visual odometry and the landmarks-based methods is that the landmarks-based methods could estimate its initial pose in any positions instead of computing it based on former images. To evaluate the effectiveness of the proposed method, some experiments on initial pose estimation were conducted. In this article, four random points were selected as the initial positions, the localization results of these four points are shown in Table 1. From Table 1, it can be found that the maximum error of the initial pose estimation in distance is 29.2 mm, which can be used as the initial position for the following images.

Table 1.

Localization results in initial points (mm).

Position	Computed x	Actual x	Absolute error in x	Computed y	Actual y	Absolute error in y
1	4056.1	−4050	6.1	−27.7	0	27.7
2	4968.2	4950	18.2	−16.8	0	16.8
3	5125.7	5100	25.7	3768.5	3780	11.5
4	4223.9	4200	23.9	3793.6	3780	13.6

Experiments on VNL-based method

After the initial pose in the world frame was determined, the localization of the robot is implemented based on the VNLs. The proposed localization method described in “The proposed localization system” section was repeated for position estimation. The measured and actual pose of the robot are shown in Table 2. From Table 2, it can be known that the average errors in the x and y axis are 12.9 mm and 13.0 mm, respectively. The maximum error in distance is 29.5 mm. The average error of the direction angle is 0.7°. Compared with the visual odometry, the position error was not accumulated in the proposed method. In other words, it can achieve high accuracy and outperform the visual odometry methods in the long distance movement.

Table 2.

Localization results of the VNL-based method.

No	Position (mm)						Angle (°)
No	Measured (x, y)		Error in x	Actual (x, y)		Error in y	Measured	Actual	Error
1	4021.9	−61.5	8.6	4030.5	−51.8	9.7	1.2	0.9	0.3
2	4208.7	−22.9	9.6	4218.3	−10.8	12.1	1.6	0.9	0.7
3	4485.1	−44.4	8.4	4493.5	−37.5	6.9	0.9	0.6	0.3
4	4785.5	−49.9	11.8	4797.2	−41.3	8.6	−0.4	−0.9	0.5
5	5114.7	−61.4	13.8	5128.5	−72.4	11.0	−0.5	−1.1	0.6
6	5393.3	−45.3	16.5	5376.8	−37.4	7.9	9.9	10.7	0.8
7	5711.8	−62.8	17.6	5729.4	−52.7	10.1	24.8	24.3	0.5
8	6172.8	−33.9	19.0	6191.8	−31.9	2.0	36.8	37.5	0.7
9	6245.3	6.7	12.3	6257.6	−5.8	12.5	52.7	52.2	0.5
10	6239.6	7.1	10.8	6250.4	17.9	10.8	70.5	69.9	0.6
11	6234.4	305.9	13.9	6248.3	292.4	13.5	91.4	90.7	0.7
12	6269.9	613.1	17.4	6252.5	601.4	11.7	91.8	90.8	1.0
13	6282.4	901.7	18.7	6263.7	913.5	11.8	90.2	90.6	0.4
14	6257.6	1203.8	15.8	6241.8	1217.7	13.9	90.1	90.9	0.8
15	6294.2	1837.2	15.6	6278.6	1822.9	14.3	88.3	89.0	0.7
16	6276.9	1979.2	14.1	6262.8	1998.6	19.4	89.9	90.8	0.9
17	6255.7	2273.4	12.2	6243.5	2258.7	14.7	90.2	90.7	0.5
18	6267.9	2555.6	7.2	6260.7	2543.1	12.5	88.5	89.1	0.6
19	6270.4	2838.5	11.0	6259.4	2826.9	11.6	87.6	88.8	1.2
20	6256.3	3146.3	10.6	6245.7	3135.9	10.4	88.1	87.1	1.0
21	6262.5	3458.2	11.9	6250.6	3440.5	17.7	89.4	90.5	1.1
22	6251.7	3652.3	7.4	6244.3	3641.6	10.7	128.2	128.7	0.5
23	6183.0	3746.9	15.8	6167.2	3760.1	13.2	141.7	140.8	0.9
24	5879.3	3756.5	7.1	5886.4	3739.5	16.9	159.7	160.1	0.4
25	5590.6	3449.2	11.3	5579.3	3468.7	19.5	189.2	189.5	0.3
26	5292.1	3353.1	16.3	5308.4	3367.9	14.8	195.8	196.4	0.6
27	4979.3	3243.6	16.4	4995.7	3257.4	13.8	210.1	210.6	0.5
28	4564.0	2929.4	7.8	4556.2	2945.7	16.3	227.9	227.3	0.6
29	4224.0	2613.3	14.1	4238.1	2639.2	25.9	237.8	236.8	1.0
30	3827.1	2366.9	18.7	3808.4	2355.1	11.8	248.7	248.0	0.7
31	3642.4	2067.4	9.1	3651.5	2052.8	14.6	255.1	255.7	0.6
32	3335.8	1729.0	12.9	3348.7	1743.8	14.8	268.2	268.8	0.6

VNL: visual natural landmark.

Comparison experiments

To obtain a more comprehensive evaluation of the performance of the proposed method, some comparison experiments had been implemented. Three methods were compared in this experiment, including visual odometry, ORB-SLAM,⁴² and the proposed method. The ORB-SLAM method is an accurate and versatile solution for monocular, RGB-D, and stereo visions, which can achieve real-time performance in the position estimation and 3-D reconstruction. In ORB-SLAM, the keyframe-based strategy was taken to boost the reusability of the maps. The experimental results are shown in Figures 12 and 13. The results demonstrate that the proposed method achieves higher accuracy than the visual odometry and ORB-SLAM. The computation time for each method is listed in Table 3. From Table 3, it can be found that using GPU obviously accelerates the localization, which makes the proposed method achieve real-time performance with high accuracy.

Figure 12.

Positioning results with the odometry, ORB-SLAM, and proposed method. (a) Positions and (b) position errors in distance. SLAM: simultaneous localization and mapping.

Figure 13.

Positioning results. (a) Positions, (b) position errors in distance, (c) orientation angle, and (d) angle error.

Table 3.

Average runtime for each method.

Method	Computation time (ms)
Visual odometry	20
ORB-SLAM	35
HVLN (CPU-based, natural landmark only)	1026
HVLN (GPU-based, natural landmark only)	48
Proposed method	30

HVLN: hybrid visual natural landmark; SLAM: simultaneous localization and mapping; GPU: graphics processing unit; CPU: central processing unit.

To verify the robustness of the proposed algorithm against the illumination variation, we conducted experiments under different light conditions: natural light, weak light (evening), and strong light (sunny/noon). The experimental results are shown in Table 4. It can be seen from the table that the traditional ORB feature points are less robust when the illumination conditions change. Different feature extractions at the same location in different time periods may cause mismatching of maps. The accuracy of the positioning algorithm we proposed does not change greatly even under different conditions.

Table 4.

Localization error under different conditions (mm).

Localization method	Light condition
Localization method	Natural light	Weak light	Strong light
Visual odometry	55	82	94
ORB-SLAM	35	60	72
HVNL	18	25	28

HVLN: hybrid visual natural landmark; SLAM: simultaneous localization and mapping.

Conclusion

In this article, the ceiling features and environment features are detected and used as natural landmarks. Before localization, the natural landmark library is established. During the localization, the robot must determine the existence of the natural landmarks. The CUDA-based SIFT method and orientation filter are adopted for the natural landmarks–based localization. The lines and points feature on the ceiling are utilized for the visual odometry–based pose estimation. The proposed HVLN method was validated by various experiments. The following conclusions are drawn,

The ceiling-based visual odometry detects the line and point features according to the blocks and parallels on the ceiling. Then, the orientation angle is determined by the corresponding lines in the two adjacent frames. The position is estimated based on the orientation and feature point, which is efficient and accurate in the short distance.

In the natural landmarks–based localization, the orientation filter is used to calculate the orientation angle. The CUDA-based SIFT method is implemented to find the corresponding feature point in the current frame and library frame. This method is a little more time-consuming than the visual odometry, but it can compute the initial position without error accumulation.

The proposed HVLN algorithm combines the relative localization and absolute localization. Therefore, it can work more efficiently and accurately than other positioning methods.

The future work is to analyze the error and find the relationship between the orientation error and position error, which will be beneficial to the position accuracy.

Footnotes

Acknowledgements

The authors would like to thank the National Natural Science Foundation of China, the Hangzhou Civic Significant Technological Innovation Project of China, and the Hangzhou Civic Significant Technological Innovation Project of China for supporting this work.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (grant no. 51521064), the Hangzhou Civic Significant Technological Innovation Project of China (no. 20131110A04), and Hangzhou Civic Significant Technological Innovation Project of China (no. 20142013A56).

ORCID iD

Xuequn Zhang

Supplemental material

Supplemental material for this article is available online.

Appendix 1

The Hermite polynomial can be written as

The $H_{p} (x)$ is orthogonal to the weighted function $exp (- x^{2})$ . Thus, it can be also described as

where $δ_{p q}$ denotes the Kronecker delta. The recursive equation can be also obtained for improving the computational efficiency of the Hermite polynomial

where the initial conditions are $H_{0} (x) = 1, H_{1} (x) = 2 x$ . An orthonormal version of the Hermite polynomial is written as

With a scale parameter σ, the Gaussian–Hermite polynomial can be defined as

The Gaussian–Hermite moments are formed by a set of Gaussian–Hermite polynomials. Given an image I, and the intensity function is $f (x, y)$ , the Gaussian–Hermite moments are given by

Given the intensity function $f (x, y)$ , which is defined over a domain of $0 \leq x, y \leq K - 1$ , the image coordinates can be written as

where $- 1 \leq i, j \leq 1$ . Then the discrete implementation of the Gaussian–Hermite moments can be defined as

The gravity center over [−1, 1] can be obtained by the geometric moments

Then, the central Gaussian–Hermite moments can be expressed as

References

Thrun

. Bayesian landmark learning for mobile robot localization. Machine Learning 1998; 33: 41–76.

Babinec

Jurišica

Hubinský

. Visual localization of mobile robot using artificial markers. Proc Eng 2014; 96: 1–9.

Han

Tan

. Ceiling-based visual positioning for an indoor mobile robot with monocular vision. IEEE Trans Ind Electron 2009; 56: 1617–1628.

Luo

. Multilevel multisensor-based intelligent recharging system for mobile robot. IEEE Trans Ind Electron 2008; 55: 270–279.

Kim

Yoon

Kweon

. Bayesian filtering for keyframe-based visual SLAM. Int J Robot Res 2015; 34: 517–531.

Lee

Hwang

Okopal

. Ground-moving-platform-based human tracking using visual SLAM and constrained multiple kernels. IEEE Trans Intell Trans Syst 2016; 17: 3602–3612.

Lee

. Embedded visual SLAM: applications for low-cost consumer robots. IEEE Robot Autom Mag 2013; 20: 83–95.

Zhou

Zou

Pei

. StructSLAM: visual SLAM with building structure lines. IEEE Trans Veh Technol 2015; 64: 1364–1375.

Heng

Lee

Pollefeys

. Self-calibration and visual SLAM with a multi-camera system on a micro aerial vehicle. Auton Robots 2014; 39(3): 259–277.

10.

Hwang

Song

. Monocular vision-based slam in indoor environment using corner, lamp, and door features from upward-looking camera. IEEE Trans Ind Electron 2011; 58: 4804–4812.

11.

Gong

Zhao

Jiao

. A novel coarse-to-fine scheme for automatic image registration based on SIFT and mutual information. IEEE Trans Geosci Remote Sens 2014; 52: 4328–4338.

12.

Kupfer

Netanyahu

Shimshoni

. An efficient SIFT-based mode-seeking algorithm for sub-pixel registration of remotely sensed images. IEEE Trans Geosci Remote Sens 2015; 12: 379–383.

13.

Michel

Chestnutt

Kuffner

. Vision-guided humanoid footstep planning for dynamic environments. In: 5th IEEE-RAS international conference on humanoid robots, Tsukuba, Japan, 5 December 2005, pp. 13–18. IEEE.

14.

Andersen

Henriksen

Ravn

. Visual positioning and docking of non-holonomic vehicles. In: Experimental robotics IV: The 4th international symposium, Stanford, California, 30 June – 2 July 1995, pp. 140–149.

15.

Brezak

Petrovi

Ivanjko

. Robust and accurate global vision system for real time tracking of multiple mobile robots. Robot Auton Syst 2008; 56: 213–230.

16.

. Technical communique: observability analysis of rotation estimation by fusing inertial and line-based visual information: a revisit. Automatica 2006; 42: 1809–1812.

17.

Liu

Cheng

. Conditional simultaneous localization and mapping: a robust visual SLAM system. Neurocomputing 2014; 145: 269–284.

18.

Forster

Zhang

Gassner

. SVO: semidirect visual odometry for monocular and multicamera systems. IEEE Trans Robot 2017; 33: 249–265.

19.

Yousif

Taguchi

Ramalingam

. MonoRGBD-SLAM: simultaneous localization and mapping using both monocular and RGBD cameras. In: IEEE international conference on robotics and automation, Singapore, 29 May–3 June 2017, pp. 4495–4502. IEEE.

20.

Engel

Schöps

Cremers

. LSD-SLAM: large-scale direct monocular SLAM. In: Computer vision-ECCV (eds Fleet

Pajdla

Schiele

Tuytelaars

), Lecture notes in computer science, Vol. 8690, pp. 834–849. Cham, Switzerland: Springer.

21.

Engel

Stückler

Cremers

. Large-scale direct SLAM with stereo cameras. In: IEEE/RSJ international conference on intelligent robots and systems, Hamburg, Germany, 28 September–2 October 2015, pp. 1935–1942. IEEE.

22.

Leutenegger

Lynen

Bosse

. Keyframe-based visual-inertial odometry using nonlinear optimization. Int J Robot Res 2015; 34: 314–334.

23.

Jeong

Lee

. Visual SLAM with line and corner features, 2006. In: IEEE/RSJ international conference on intelligent robots and systems, Beijing, China, 9–15 October 2006, pp. 2570–2575. IEEE.

24.

Gaspar

Winters

Santos Victor

. Vision-based navigation and environmental representation with omni-directional camera. IEEE Trans Rob Autom 2001; 16: 890–898.

25.

Song

Wang

. Self-localization and control of an omni-directional mobile robot based on an omni-directional camera. In: 7th Asian control conference, Hong Kong, China, 27–29 August 2009, pp. 899–904. IEEE.

26.

Wong

Clausi

. ARRSI: automatic registration of remote-sensing images. IEEE Trans Geosci Remote Sens 2007; 45: 1483–1493.

27.

Wen

. Remote sensing image registration with modified SIFT and enhanced feature matching. IEEE Geosci Remote Sens Lett 2017; 14: 3–7.

28.

Zhang

Zhu

. Unsupervized image clustering with SIFT-based soft-matching affinity propagation. IEEE Signal Process Lett 2017; 24: 461–464.

29.

Wang

Liu

. Robust scale-invariant feature matching for remote sensing image registration. IEEE Geosci Remote Sens Lett 2009; 6: 287–291.

30.

Talbi

Batouche

. Particle swam optimization for image registration. In: International conference on information and communication technologies: from theory to applications, 2004. Proceedings, 2004, pp. 397–398.

31.

Senthilnath

Kalro

Benediktsson

. Accurate point matching based on multi-objective genetic algorithm for multi-sensor satellite imagery. Appl Math Comput 2014; 236: 546–564.

32.

Lowe

. Distinctive image features from scale-invariant keypoints. Int J Comput Vision 2004; 60: 91–110.

33.

Siddiqui

Mammeri Boukerche

. Real-time vehicle make and model recognition based on a bag of SURF features. IEEE Trans Intell Trans Syst 2016; 17: 3205–3219.

34.

Mikolajczyk

Schmid

. A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 2005; 27: 1615–1630.

35.

Dán

Khan

Fodor

. Characterization of SURF and BRISK interest point distribution for distributed feature extraction in visual sensor networks. IEEE Trans Multimedia 2015; 17: 591–602.

36.

Yang

Dai

. Image analysis by Gaussian-Hermite moments. Signal Process 2001; 91: 2290–2303.

37.

Hosny

. Fast computation of accurate Gaussian-Hermite moments for image processing applications. Digit Signal Process 2012; 22: 476–485.

38.

Zheng

Zhu

. Weighted guided image filtering. IEEE Trans Image Process 2015; 24: 120–129.

39.

Dahyot

. Statistical hough transform. IEEE Trans Pattern Anal Mach Intell 2009; 31: 1502–1509.

40.

Suh

. Orientation estimation using a quaternion-based indirect Kalman filter with adaptive estimation of external acceleration. IEEE Trans Instrum Meas 2010; 59: 3296–3305.

41.

Tan

. A general recursive linear method and unique solution pattern design for the perspective-n-point problem. Image Vision Comput 2008; 26: 740–750.

42.

Mur Artal

Montiel

JMM

, and Tardós

. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 2015; 31(5): 1147–1163.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB