Sage Journals: Discover world-class research

Abstract

Loop closure detection is a key technique for robots to minimize the accumulated localization and mapping errors after long-time explorations of simultaneous localization and mapping. However, the requirement for efficiency and accuracy performance for mobile robot applications is not well satisfied. In this article, we propose a fast and accurate loop closure detection method by exploiting both pose-based and appearance-based information in a probabilistic manner, inspired by the complementarity between the pose-based and the appearance-based information. Our approach formulates a probability framework combing the pose-based loop closure detection probability and the appearance-based loop closure detection probability. In the proposed framework, the pose-based loop closure detection model is firstly derived from the nonlinear optimization model of odometry. Then the appearance-based loop similarity and the pose-based loop similarity are combined into a joint framework to improve the loop closure detection performance. We implemented our approach using C++ and ROS and thoroughly tested it on the publicly available datasets. The experiments presented in this article suggest that the proposed method can achieve high efficiency and accuracy performance on loop closure detection.

Keywords

Loop closure detection SLAM pose covariance appearance similarity

Introduction

Visual loop closure detection is “the revisiting problem” for a robot to recognize a previously visited area.¹ So the loop closure detection (LCD) issue is mostly regarded as a scene matching or image retrieval problem.² An image acquired by a camera is matched to a map, where the map contains a set of previously acquired images. Once the match is successful, a loop closure event is confirmed. However, in contrast to general scene matching issues, the LCD method must deal with a growing dataset as the robot explores more scenes, leading to a linear increase in the computational cost, as shown in Figure 1(a). Therefore, the LCD algorithm must be sufficiently efficient in consideration that there are always newly acquired images waiting to be processed with the ongoing application of real-time simultaneous localization and mapping (SLAM).

Figure 1.

Time consumption of loop closure detections. (a) The computational cost of traditional loop closure detection will linearly increase with ongoing SLAM. While the proposed method maintains stable and low. (b) Different procedures of loop closure detection consume different length of time when processing a loop candidate. The proposed pose-based probability module (PP) runs faster than the other procedures (SC is the sequential consistency check, and GC is the geometrical consistency check). SLAM: simultaneous localization and mapping.

To perform a faster and more accurate LCD, we presented a novel approach that exploits both pose-based and appearance-based information, inspired by the complementarity between these two kinds of information. The image similarity works well in most cases but may confuse different places with similar scenes. Think about how human beings recognize previously visited places. People can recognize a previously visited place on the basis of the same location, even if the scene has changed greatly. On the contrary, people will not confuse two different places with similar scenes during the journey. In other words, both kinds of information contribute to the LCD task, and they are complementary to each other.

Inspired by the complementary of pose- and appearance-based information, we propose a novel pose-appearance-based probabilistic LCD method. This study is the first trial combining these two kinds of information into a joint framework to improve the LCD performance.

Consequently, this work is an actual “loop” closure detection method, rather than a general re-localization method or scene matching method. In other words, the kidnapping issue is not taken into account.

The joint framework of the proposed method is illustrated in Figure 2. The pose-based method calculates the probability of loop closure using the proposed probability model. And in this process, a fast pose-based filter is also designed to efficiently acquire the loop closure candidates. The appearance-based method is a replaceable component based on the appearance-based LCD method, such as bag-of-words (BoW) and convolutional neural networks (CNN). Then the product of the pose-based probability and the appearance-based similarity is then used as the final loop probability. A sequential consistency (SC) check and a geometrical consistency (GC) check are also introduced into the proposed framework as optional modules which are commonly used for LCD applications. The SC check is very efficient while the GC check has a remarkable precision performance with quite high time consumption.

Figure 2.

The joint LCD framework combining the pose-based method and the appearance-based method. The first block indicates the proposed pose-based loop closure probability calculation. The appearance-based method is based on other appearance-based LCD algorithms, such as BoW and CNN. The dashed blocks are optional for different applications. LCD: loop closure detection; BoW: bag-of-words.

The main contribution of this article is the probability combination of pose and appearance in LCD. This allows the proposed method to achieve higher efficiency and accuracy performance on LCD. In sum, we make two key claims:

The introduction of pose-based information into the framework will accelerate the LCD process.

The proposed LCD pipeline is compatible with many other methods and easy to be used in real-time SLAM applications. The rest of this article is mainly focused on the probability derivations of pose similarity, so that the appearance similarity and pose similarity can be combined and form a novel LCD method.

The remainder of this article is structured as follows. The related work is introduced in section “Related works,” followed by the relevant mathematical concepts and notations in section “Preliminaries.” As the main mathematical derivation of this article, in section “Pose-based loop closure probability,” the pose similarity is derived probabilistically, where the accumulative pose covariance matrix is incrementally calculated and used for this pose similarity. Then, in section “Pose-appearance-based LCD,” the pose similarity is introduced in the new LCD pipeline, named as the pose-appearance-based LCD. Finally, section “Experimental evaluation” presents the experimental settings and results to validate the contributions of the proposed method, and this article is concluded in section “Conclusions.”

Related works

After years of research, LCD methods have formed four main types: traditional feature based, learning based, sequence based, and pose based.

Methods based on traditional feature

Visual features can well represent the appearance information of an image frame and are widely used for image matching. So it is a natural idea to detect loop closures using visual features.³ Visual features can also be coded as words, named BoW, to accelerate the feature matching process.⁴ It represents image features as words of a dictionary, which is generated offline using a large number of visual features with the k-means algorithm.⁵ The key idea is then to accelerate the image matching procedure by comparing words instead of raw feature descriptors.

Based on the BoW model, appearance-based LCD methods proposed by Angeli et al.⁶, Cummins and Newman⁷, Schöps and Cremers⁸, Gálvez-López and Tardós⁹, and Labbé and Michaud¹⁰ are widely used. incremental appearance-based mapping (IAB-MAP)⁶ extends the BoW method by adding color histogram features together with keypoint features in the dictionary. The epipolar constraint between the similar images is checked and such a procedure is referred to as the GC check. fast appearance-based mapping (FAB-MAP)^7,8 also adopts the BoW idea and uses the Chow–Liu Tree¹¹ to model the distributions of words, which in turn provides a way of calculating the probability of a loop closure event. DLoopDetector⁹ employs both sequential and GC checks after the BoW image match. The sequential check keeps a serial of images matching the same scene in the map as a candidate loop closure, while the GC check employs random sample consensus (RANSAC) to find a fundamental matrix between the image pair. The loop closure is confirmed only if both consistency checks are verified. real-time appearance-based mapping (RTAB-MAP)¹⁰ focuses on the memory management aspect for large-scale and long-term online SLAM application, it also proposes a Bayes estimation approach for LCD problems. ORB-SLAM¹² improves the DLoopDetector by replacing the binary robust independent elementary features (BRIEF) feature with the rotation invariant oriented fast and rotated brief (ORB) features, which can better deal with perspective changes when performing LCD. Bampis et al.¹³ use the BoW to produce a general vector representing the whole scene and improve the ability to deal with the LCD for image sequences.

The BoW model can be extended to be incremental, to avoid the vocabulary training stage.¹⁴ Then the robot may online and incrementally detect loop closures for image sequences.^15
–17

Methods based on learning

In addition to the methods based on traditional features, with the fast development of machine learning, LCD based on deep learning is also gaining in popularity.^18
–20 For example, Jégou et al. proposed the vector of locally aggregated descriptors (VLAD) algorithm²¹ based on the Fisher kernel, which computes the gradients of the input features in a trained model and generates a joint fixed-size vector as the representation of the input image.^22,23 NetVLAD²⁴ applies soft assignment of VLAD descriptors to the offline learned clusters as an end-to-end visual place recognition method. Merrill and Huang proposed the CALC²⁵ method by training a network comprising a semantic segmentation and auto-encoder to extract descriptors. Based on the traditional histogram-of-oriented-gradients (HOG) LCD mehtod,²⁶ Zaffar et al. proposed the CoHOG²⁷ by integrating the convolutional scanning and regions-based feature extraction of CNN. Zhang et al., on the other hand, proposed a CNN-based LCD method named CNN-LCD.²⁸ An et al. combined the CNN global features and CNN local features to improve the LCD performance.²⁹

Methods based on sequential information

Since the image frames are captured sequentially, the loop closure images are also sequentially consistent. Then the SC check or temporal consistency check can be employed to filter the LCD results to improve the accuracy.

SeqSLAM³⁰ is a typical representative method using sequential information. Instead of matching to a single image, SeqSLAM calculates matching location within every local navigation sequence to enhance the performance in challenging environments.

DOseqSLAM^31,32 uses a dynamical instead of fixed size of sequence to check the SC, achieving a better adaptability for different applications. Similar to the usage of the sequential information, the incremental LCD methods are also proposed.^29,33,34

Methods based on pose information

Although the appearance-based LCD methods are powerful and successful, however, they suffer from high computational demands, especially when the computational cost linearly increases with the map size. In contrast to that, Pose-driven methods, on the other side of the spectrum, can provide an efficient solution with less computational demand.^35,36 For example, Neira and Tardos³⁵ recognized the importance of the pose-based information and used the pose-based information for the data association in the mapping process. Li et al.³⁶ model the visual odometry accumulative error based on the Kalman Filter and the RGB-D camera model. The error in the pose is used as a constraint in LCD. The disadvantage of these two approaches is the limiting Kalman filter model, which is not applicable to nonlinear-optimization-based SLAM, such as the sparse methods^37

–40 and dense methods.^41
–43

In this article, we introduce a novel pose-appearance-based method by exploiting both pose-based and appearance-based information in a probabilistic manner, which performs fast and accurate LCD.

Preliminaries

This work combines the pose-based loop probability and the appearance-based loop probability. The calculation of the pose-based loop probability needs the estimation of the covariance matrix of poses, where the pose is represented as Lie algebra, and its covariance matrix is calculated using nonlinear optimization. So in this section, some basic formulas are introduced as preliminaries.

Lie algebra

Consider the 3D rotation matrix of R , that is, $R \in SO 3$ , and $ϕ^{\land}$ as its corresponding Lie algebra, that is, $ϕ^{\land} \in so 3$ . Then, the relationship between R and $ϕ$ is

R = exp ϕ^{\land}, ϕ = (ln R)^{\lor}

where $^{\land}$ is the antisymmetric matrix operator, and $^{\lor}$ is its inverse operator, as

ϕ^{\land} = {[\begin{matrix} ϕ_{1} \\ ϕ_{2} \\ ϕ_{3} \end{matrix}]}^{\land} ≐ [\begin{matrix} 0 & - ϕ_{3} & ϕ_{2} \\ ϕ_{3} & 0 & - ϕ_{1} \\ - ϕ_{2} & ϕ_{1} & 0 \end{matrix}]

The operator $^{\land}$ has the property as

a^{\land} b = - b^{\land} a, (a^{\land} and b^{\land} \in so 3)

If a small increment is added to $ϕ$ , the corresponding Lie group has the linear approximation as

exp {(ϕ + δ ϕ)}^{\land} = exp {(J_{l} δ ϕ)}^{\land} exp ϕ^{\land}

On the contrary, the Lie algebra has the linear approximation in equation (5) with the small increment of R

{(ln (Δ R \cdot R))}^{\lor} = ϕ + J_{l}^{- 1} δ ϕ

where J _l is the left Jacobian matrix of the Lie algebra. If $ϕ$ is represented in norm and direction form as $ϕ = ϕ a$ , the left Jacobian matrix together with its inverse can be calculated as

{\begin{array}{l} J_{l} = \frac{sin ϕ}{ϕ} I + \frac{1 - cos ϕ}{ϕ} a^{\land} + (1 - \frac{sin ϕ}{ϕ}) a a^{T} \\ J_{l}^{- 1} = \frac{ϕ}{2} cot \frac{ϕ}{2} I - \frac{ϕ}{2} a^{\land} + (1 - \frac{ϕ}{2} cot \frac{ϕ}{2}) a a^{T} \end{array}

Nonlinear optimization for pose estimation

Consider a six-dof camera pose T written as vector form $ξ$

T = exp ξ^{\land}, ξ^{\land} = [\begin{matrix} ϕ^{\land} & ρ \\ 0^{T} & 0 \end{matrix}] \in se 3, ϕ^{\land} \in so 3

The landmarks observed by the camera are 3D points in the map, $x_{w j}$ , and the corresponding observations in the image are $x_{o j}$ . Then, the camera pose can be estimated based on the nonlinear optimization model as

\begin{array}{l} \underset{ξ}{arg max} p ((x_{o 1}, x_{w 1}), (x_{o 2}, x_{w 2}), \dots, (x_{o N}, x_{w N}) | ξ) \\ = \underset{ξ}{arg max} ln {\prod_{j = 1}^{N} \frac{1}{σ_{o j} \sqrt{2 π}} exp (- \frac{e {(x_{o j}, π (ξ, x_{w j}))}^{2}}{2 σ_{o j}^{2}})} \\ = \underset{ξ}{arg min} \sum_{j = 1}^{N} \frac{e {(x_{o j}, π (ξ, x_{w j}))}^{2}}{σ_{o j}^{2}} \end{array}

The $π ()$ is the observation function, and $e (x_{o j}, π (ξ, x_{w j}))$ is the observation error, which is simply written as e_j . $σ_{o j}$ is the covariance of e_j , which can be calculated online based on the observation errors. N is the total number of observed landmarks.

Then, equation (8) can be rewritten in the matrix form as follows

\underset{ξ}{arg min} f = \underset{ξ}{arg min} e^{T} We

where $e ≐ [e_{1}, e_{2}, \dots, e_{N}]$ , and W is a diagonal matrix of $diag {σ_{o 1}^{- 2}, σ_{o 2}^{- 2}, \dots, σ_{o N}^{- 2}}$ .

Based on the Gauss–Newton nonlinear optimization algorithm, equation (9) can be iteratively solved, and the t-th iterative increment of $ξ$ is calculated as in equation (10) ⁴⁴

δ ξ_{t} = - {(J_{t}^{T} W_{t} J_{t})}^{- 1} J_{t}^{T} W_{t} e_{t}

where J _t is the Jacobian matrix $J_{t} = \partial e_{t} / \partial ξ_{t}^{T}$ .

After several iterations, the optimal pose of $\hat{ξ}$ can be obtained when $δ ξ_{t} \to 0$ .

Covariance of pose

The $J_{t}^{T} W_{t} J_{t}$ in equation (10) can be written as H _t , which is an approximate Hessian matrix for the optimization. Then the covariance matrix of the optimal estimation $Σ_{ξ}$ can be obtained from the final H

Σ_{ξ} = H^{- 1} = (J^{T} W J)^{- 1}

The $Σ_{ξ}$ is the relative pose covariance against the previous frames. So the accumulative covariance $Σ_{i}$ for the i-th frame should be calculated based on the propagation law of uncertainty

Σ_{i} = \frac{\partial ξ_{i}}{\partial ξ_{inc}^{T}} Σ_{inc} {(\frac{\partial ξ_{i}}{\partial ξ_{inc}^{T}})}^{T} + \frac{\partial ξ_{i}}{\partial ξ_{i - 1}^{T}} Σ_{i - 1} {(\frac{\partial ξ_{i}}{\partial ξ_{i - 1}^{T}})}^{T}

where $ξ_{inc}$ is the incremental pose between $ξ_{i}$ and $ξ_{i - 1}$ , and $Σ_{inc}$ is its covariance matrix from the incremental pose estimation, that is, the $Σ_{ξ}$ in equation (11).

Equation (11) is a fast but low accurate first-order estimation of the incremental covariance. There are other accurate methods⁴⁵ as alternatives. For the joint optimization of sequential frames, there are also some methods calculating the pose covariance,^46,47 which can be used for global trajectory optimization.

In this work, the covariance of pose is employed for LCD, but it is not a major concern about how to calculate the covariance in this work. Users may adopt any method mentioned above for a specific SLAM system. But the computational efficiency should be taken into consideration when utilizing covariance calculation methods, since the LCD process is executed in real time along with the progress of SLAM.

Pose-based loop closure probability

In this section, we proposed the pose-based method to calculate the loop closure probability of any two poses, shown as the first block in Figure 2.

Two images with exactly the same pose definitively indicate a loop closure. However, in most cases, these two poses are not exactly the same, but they share similar views. In this case, they are called images of covisibility, and they can generate a loop.

Consider the current pose as ${\hat{ξ}}_{i}$ , and the candidate loop pose as ${\hat{ξ}}_{j}$ , then the loop closure probability of the two poses will be

p_{j} (Loop) = \frac{| A^{- 1} |^{1 / 2}}{| Σ_{i} |^{1 / 2}} exp {\frac{1}{2} b^{T} A b - \frac{1}{2} {\hat{ξ}}_{j}^{T} Σ_{co}^{- 1} {\hat{ξ}}_{j} - \frac{1}{2} {\hat{ξ}}_{i}^{T} Σ_{i}^{- 1} {\hat{ξ}}_{i}}

All the variables in equation (13) are related to poses and covariance matrix. The derivation is detailed as follows.

Probability distribution of pose

To calculate the pose-based loop closure probability, the probabilistic distribution model of the pose needs to be derived first. For any pose of $ξ_{i}$ with the optimal estimation of ${\hat{ξ}}_{i}$ and the covariance matrix of $Σ_{i}$ , basing on the assumption of zero mean Gaussian noise, the probability distribution of the pose $ξ_{i}$ is fully parametrized as

ρ_{_{{\hat{ξ}}_{i}, Σ_{i}}} (ξ_{i}) = \frac{1}{{(2 π)}^{dim / 2} | Σ_{i} |^{1 / 2}} exp (- \frac{1}{2} {(ξ_{i} - {\hat{ξ}}_{i})}^{T} Σ_{i}^{- 1} (ξ_{i} - {\hat{ξ}}_{i}))

Covisibility of two poses

The covisibility of two poses is the basis for visual odometry to calculate their incremental transformation. So the covisibility matrix can be estimated during the odometry process, and then be employed to detect pose-based loop closure.

The covisibility matrix $Σ_{co}$ is calculated using a group of covisible poses during the odometry, as

Σ_{co} = \frac{1}{n} \sum_{i = 1}^{n} ξ_{co, i} ξ_{co, i}^{T}

The $ξ_{co, i}$ indicates the relative pose of any two images with covisibility, which are used for odometry. And $Σ_{co}$ is incrementally updated online when new covisible image pairs are acquired with the progress of the odometry.

Pose-based loop closure probability

Then we can calculate the loop closure probability of any two poses based on the covariance matrix of pose and the covisibility matrix. In other words, the two poses are considered as loop closure only when they are covisible under the uncertainty of $Σ_{i}$ and $Σ_{co}$ .

Based on the marginal probability model, the probability of loop closure, that is, the probability of covisibility, can be calculated as

p_{j} (Loop) = p_{j} (Covis) = \int_{- \infty}^{\infty} p_{_{{\hat{ξ}}_{j}, Σ_{co}}} (Covis | ξ) ρ_{_{{\hat{ξ}}_{i}, Σ_{i}}} (ξ) d ξ

Note the difference between $p_{*}$ and $ρ_{*}$ , where $p_{*}$ indicates the probability and $ρ_{*}$ indicates the probability density. $p_{_{{\hat{ξ}}_{j}, Σ_{co}}} (Covis | ξ)$ is the probability of the covisibility of ${\hat{ξ}}_{j}$ and $ξ$ . $ρ_{_{{\hat{ξ}}_{i}, Σ_{i}}} (ξ)$ is the probability density function of $ξ$ calculated in equation (14).

Based on Bayes’ theorem of $P (A | B) = P (B | A) P (A) / P (B)$ , we can get $p_{j} (Covis | ξ) = c_{1} ρ_{j} (ξ | Covis)$ , where c ₁ is a constant factor and $ρ_{j} (ξ | Covis)$ is the probability density function of $ξ$ under the constraint of covisibility. With the Gaussian distribution, we can obtain

p_{j} (Covis | ξ) = \frac{c_{1}}{{(2 π)}^{3} | Σ_{co} |^{1 / 2}} exp (- \frac{1}{2} {(ξ - {\hat{ξ}}_{j})}^{T} Σ_{i}^{- 1} (ξ - {\hat{ξ}}_{j}))

Considering that the probability of covisibility should be normalized to be 1 when the two poses are exactly the same, that is, $p_{j} (Covis | ξ = {\hat{ξ}}_{j}) = 1$ , we can obtain

p_{j} (Covis | ξ) = exp (- \frac{1}{2} {(ξ - {\hat{ξ}}_{j})}^{T} Σ_{i}^{- 1} (ξ - {\hat{ξ}}_{j}))

Finally, the probability of loop closure can be calculated as

\begin{array}{l} p_{j} (Loop) \\ = \int_{- \infty}^{\infty} p_{j} (Covis | ξ) ρ_{_{{\hat{ξ}}_{i}, Σ_{i}}} (ξ) d ξ \\ = \frac{1}{8 π^{3} | Σ_{i} |^{1 / 2}} \int_{- \infty}^{\infty} exp {- \frac{1}{2} {(ξ - {\hat{ξ}}_{j})}^{T} Σ_{co}^{- 1} (ξ - {\hat{ξ}}_{j}) \\ - \frac{1}{2} {(ξ - {\hat{ξ}}_{i})}^{T} {Σ i}_{- 1} (ξ - {\hat{ξ}}_{i})} d ξ \end{array}

Let $A = Σ_{co}^{- 1} + Σ_{i}^{- 1}$ and $b = A^{- 1} (Σ_{co}^{- 1} {\hat{ξ}}_{j} + Σ_{i}^{- 1} {\hat{ξ}}_{i})$ . According to the symmetry of $Σ_{i j}$ and $Σ_{c o}$ , equation (19) can be rewritten as

\begin{array}{l} p_{j} (Loop) \\ = \frac{1}{8 π^{3} | Σ_{i} |^{1 / 2}} \int_{- \infty}^{\infty} exp {- \frac{1}{2} {(ξ - b)}^{T} A (ξ - b) + \frac{1}{2} b^{T} A b \\ - \frac{1}{2} {\hat{ξ}}_{j}^{T} Σ_{co}^{- 1} {\hat{ξ}}_{j} - \frac{1}{2} {\hat{ξ}}_{i}^{T} Σ_{i}^{- 1} {\hat{ξ}}_{i}} d ξ \\ = \frac{| A^{- 1} |^{1 / 2}}{| Σ_{i} |^{1 / 2}} exp {\frac{1}{2} b^{T} A b - \frac{1}{2} {\hat{ξ}}_{j}^{T} Σ_{co}^{- 1} {\hat{ξ}}_{j} - \frac{1}{2} {\hat{ξ}}_{i}^{T} Σ_{i}^{- 1} {\hat{ξ}}_{i}} \end{array}

Fast loop closure candidate filtering

To improve the efficiency of the LCD procedure, the loop closure candidates are selected using a simple threshold operation based on the pose and the covariance matrix, before calculating the probability of equation (20).

We set a probabilistic threshold of 95%. According to the Gaussian integration function, the 95% confidence interval of $ξ_{i}$ is $[{\hat{ξ}}_{i} - 1.96 σ_{i}, {\hat{ξ}}_{i} + 1.96 σ_{i}]$ , and the 95% confidence interval of covisibility with ${\hat{ξ}}_{j}$ is $[{\hat{ξ}}_{j} - 1.96 σ_{co}, {\hat{ξ}}_{j} + 1.96 σ_{co}]$ . The covariance matrices of $σ_{i}$ and $σ_{co}$ are the 6D vectors that are the square roots of the diagonal elements of $Σ_{i}$ and $Σ_{co}$ , respectively.

Thus, the current pose of ${\hat{ξ}}_{i}$ may be a loop closed with ${\hat{ξ}}_{j}$ in the map only if it satisfies the relationship ${\hat{ξ}}_{j} \in [{\hat{ξ}}_{i} - 1.96 σ_{i} - 1.96 σ_{co}, {\hat{ξ}}_{i} + 1.96 σ_{i} + 1.96 σ_{co}]$ .

Then, we can obtain the final 95% confidence interval of loop closure according to $[{\hat{ξ}}_{j} - 1.96 σ_{i} - 1.96 σ_{co}, {\hat{ξ}}_{j} + 1.96 σ_{i} + 1.96 σ_{co}]$ . Only the poses within this interval are considered loop closure candidates, and then be used for the pose-appearance-based LCD.

Pose-appearance-based LCD

Based on the pose-based loop closure probability, the pose-appearance-based LCD framework can be built, as illustrated in Figure 2. The appearance similarity method, such as BoW-based or CNN-based appearance similarity, can be employed as the appearance-based model in the joint framework.

Pose-appearance-based loop closure probability

To detect the loop closures using both the appearance and the pose information, the appearance similarity of s is multiplied by the pose-based loop closure probability of p_p , after which we can obtain the pose-appearance-based loop closure probability: $p_{pa} = p_{p} \cdot s$ . The p_p can be calculated using equation (20), and the s can be calculated using any other appearance-based LCD method, such as the BoW-LCD or the CNN-LCD.

For the BoW-based appearance similarity, let v and w be the two BoW vectors that are pre-normalized. Then, the fast $ℓ 1$ -norm of the two BoW vectors⁴ is calculated as

| | v - w {| |}_{1} = 2 + \sum_{v_{i} \neq 0 w_{i} \neq 0} (| v_{i} - w_{i} | - | v_{i} | - | w_{i} |)

Since $| | v - w {| |}_{1} \sim [0, 2]$ , the similarity of the two images can be calculated based on the $ℓ 1$ -norm of the two BoW vectors as

s = 1 - 0.5 | | v - w {| |}_{1}, s \sim [0, 1]

For CNN-based appearance similarity, the output of CNN, that is, the output of the activation function, can be also normalized to be the substitution of appearance similarity s.

Finally, the pose-appearance-based loop closure probability of $p_{pa}$ is used to detect loop closures. A threshold can be set for this probability to refine the loop closure candidates, and then use the consistency check to determine the loop closure.

SC check and the GC check

Similar to the appearance-based LCD method, the SC check and the GC check are introduced in the proposed method. These two checks are optional and commonly used for LCD methods.⁴⁸

The SC check is designed to work very efficiently but with limited performance. A loop candidate is kept only if two or more sequential images are detected to be the same loop closure candidates. The SC check is an optional module for most LCD methods, as well as the proposed method. It can only work for continuous SLAM without kidnapping. Considering the proposed method relies on the continuous pose estimation, and do not deal with the kidnapping case, the SC check is recommended.

The SC check is mostly considered as the prepossess of the global optimization after LCD, instead of part of the loop detection. During the GC check, the fundamental matrix of the image pair is checked based on their keypoint matches. The loop closure candidate is confirmed only if there are enough inliers for the fundamental matrix. Then the keypoint matches are employed as the constraints of the global optimization. In this reason, no matter whether the GC check is part of the LCD methods, it is always necessary for LCD tasks. The GC check has a remarkable performance at filtering out false loops, albeit with a high computational cost.

Experimental evaluation

The main focus of this work is to improve the efficiency and accuracy of LCD process using the complementarity of pose and appearance. Our experiments are designed to show the capabilities of our method and to support our key claims, which are: (i) The introduction of pose-based information into the framework will accelerate the LCD process. (ii) The proposed LCD pipeline is compatible with many other methods and easy to be used in real-time SLAM applications.

Methodology

Datasets

Two publicly available datasets are used in the experiments, namely, the KITTI dataset⁴⁹ and the TUM dataset.⁵⁰ The KITTI dataset is a dataset of outdoor driving, and the TUM dataset is captured in an indoor environment with a handheld RGB-D camera. Note that only some image sequences of the two datasets consist of loop closures: KITTI00, KITTI02, KITTI05, TUMfreiburg2desk, and TUMfreiburg3longoffice. These image sequences are employed for the LCD experiments. All loop image pairs of the ground truth are manually labeled; some image pairs are illustrated in Figure 3.

Figure 3.

Some image pairs of the loop closure ground truth, where the ground truth is manually labeled. (a) KITTI dataset and (b) TUM dataset.

Metrics

The performance of the proposed method is evaluated with the precision and recall metrics. The precision is defined as the ratio of true positive LCDs to the total number of detections. The recall is defined as the ratio of true positive LCDs to the number of ground truth loop closures. An LCD result is considered to be truly positive only if the loop image pair is listed in the ground truth; otherwise, this LCD result is considered to be false. The threshold of the pose-appearance-based probability is continuously tuned during the experiments, and the precision–recall curves are drawn and analyzed.

For the precision–recall curve, the area under the curve (AUC) represents the comprehensive performance of the method. So the AUC is calculated for every comparing LCD method. The average time consumption per frame is also recorded to evaluate the real-time performance.

Program and settings

We implement the proposed LCD method in the framework of ORB-SLAM2,³⁹ where the LCD process is replaced by the proposed method. The proposed method and all methods used for comparison run together in the program, and the trajectory optimization of the loop closure is disabled in the program to maintain consistency within the experimental conditions among all methods. The program runs on a computer with a 2.4 GHz quad-core processor. The ORB feature is used in all experiments, and the number of features per image is set to be 1000 in consideration of the computational capability of the computer. The program runs at 30 FPS with these settings. The code is open source and can be downloaded from https://github.com/nubot-nudt/PALoop

Efficiency

The first experiment is designed to show the efficiency of our approach and to support the claim that the introduction of pose-based information will accelerate the LCD process. The time consumption of each of the four components is recorded during the LCD, and the average time consumption per loop image pair is illustrated in Figure 1(b).

As shown in Figure 1(b), the time consumption of the GC check is excessively high. Thus, the total time consumption will linearly increase with the amount of the loop candidates when detecting loop closures, as shown in Figure 1(a). Note that most LCD methods do not take the GC check module in themselves. However, the GC check is necessary for LCD of real-time SLAM, because it provides the observation constraints for the global optimization after the loop closure. So there are as many GC checks as the loop candidates. Thats why the time consumption of GC checks is taken into consideration in this real-time LCD experiment.

With the introduction of pose-based information into the framework, the loop candidates are efficiently refined, and the time consumption of the proposed method remains much lower.

The pose-based loop candidates together with the appearance-based loop candidates for one current frame are illustrated in Figure 4. Both the pose-based and the appearance-based modules detect many loop candidates. But the two kinds of candidates are distributed differently with complementarity: The pose-based candidates congregate around the current pose, while the appearance-based candidates disperse over the whole map. The two kinds of candidates overlap each other around the ground truth. Then with the help of the pose-based fast loop closure candidate filtering, only part of the map is necessary for the loop closure consistency check. And thus, the proposed method combining the pose-based information and the appearance-based information has a good efficiency performance.

Figure 4.

Loop closure candidates of one demonstrative query frame using the pose-based probability module (PP) and the appearance-based appearance similarity module (AP). The pose-based candidates and the appearance-based candidates are distributed differently with complementarity. (a) Query frame, (b) correct frame of GT, (c) wrong frame of BoW, and (d) wrong frame of PP. GT: ground truth; BoW: bag-of-words.

The pose covariance matrix is calculated along with the odometry. We visualize the calculation results of the pose variance to validate the algorithm. The 95% translation confidence interval is drawn in Figure 5.

Figure 5.

The 95% confidence intervals of the pose illustrated as cubes along the trajectory.

The pose variance increases with the progression of the odometry, as shown in Figure 5. This means the more loop candidates will be kept by the pose-based module which result in the increasing of time-cost. But the time-cost is still low comparing to the appearance-based module, as shown in Figure 1. Note that if the robot estimates the pose variance with a low precision in some challenging environments, the 95% threshold can be tuned higher to get a larger confidence interval. Caution that the higher thresh will result in an increase of the time-cost, and an extreme case is the infinite confidence interval which would result in the traditional appearance-based global LCD. This case mostly occurs with the failure of localization which results in the infinite localization error. On the other hand, a smaller threshold may cause the confidence interval too small to detect any loop closures. All the experiments in this work use 95% as the threshold.

Accuracy

The next set of experiments is designed to show the accuracy of our approach and to support the claim that the proposed LCD pipeline is compatible with many other methods and easy to be used in real-time SLAM applications. We, therefore, tested our approach on two public datasets and is running in real time together with a SLAM method of state-of-the-art, the ORB-SLAM.³⁹

Comparing LCD methods are employed as the appearance-based module in the proposed framework, which are the FAB,⁵¹ DBOW,⁹ CNN-LCD,²⁸ NetVLAD,²⁴ AlexNet,²⁵ CALC,²⁵ HOG,²⁶ and CoHOG.²⁷ And the LCD experiments are conducted for the three pose-appearance-based examples compared with the original methods. The experimental results are illustrated in Table 1.

Table 1.

The area under curve (AUC) and the average time consumption performance of different LCD methods with/without the proposed LCD pipeline.

Method	KITTI00		KITTI02		KITTI05		TUMfreiburg2desk		TUMfreiburg3longoffice		Mean Average
Method	AUC	TpF (ms)	AUC	TpF (ms)	AUC	TpF (ms)	AUC	TpF (ms)	AUC	TpF (ms)	AUC	TpF (ms)
DBOW	0.60	112.67	0.39	405.66	0.53	231.20	0.17	3.05	0.20	4.86	0.38	151.49
DBOW+SC	0.66	103.98	0.43	375.18	0.61	213.44	0.18	2.52	0.24	4.03	0.43	139.83
DBOW+PP	0.71	4.05	0.63	10.77	0.63	6.24	0.43	0.20	0.60	0.71	0.60	4.40
DBOW+PP+SC	0.71	3.58	0.57	9.48	0.65	5.68	0.39	0.18	0.61	0.58	0.59	3.90
FAB-MAP	0.26	30.76	0.06	31.31	0.20	25.72	0.05	13.28	0.04	13.63	0.12	22.94
FAB-MAP+SC	0.27	29.87	0.04	30.38	0.21	24.50	0.04	10.09	0.51	10.03	0.22	20.97
FAB-MAP+PP	0.56	1.11	0.21	0.77	0.51	0.64	0.50	0.41	0.49	0.87	0.46	0.76
FAB-MAP+SC+PP	0.59	0.99	0.45	0.71	0.54	0.57	0.45	0.34	0.68	0.58	0.54	0.64
CNN	0.00	16.29	0.01	16.11	0.02	18.05	0.09	57.93	0.09	63.33	0.04	34.34
CNN+SC	0.00	15.72	0.01	15.52	0.08	17.29	0.40	57.06	0.06	58.76	0.11	32.87
CNN+PP	0.01	0.53	0.01	0.40	0.03	0.37	0.52	2.22	0.45	5.82	0.20	1.87
CNN+SC+PP	0.01	0.49	0.01	0.37	0.10	0.35	0.53	2.02	0.53	5.26	0.24	1.70
VLAD	0.78	1007.79	0.72	1010.93	0.68	1006.51	0.54	1012.97	0.42	996.90	0.63	1007.02
VLAD+SC	0.75	983.27	0.67	987.24	0.68	983.53	0.67	982.66	0.41	956.08	0.64	978.56
VLAD+PP	0.78	38.96	0.72	28.66	0.74	30.93	0.62	57.30	0.56	146.82	0.68	60.53
VLAD+SC+PP	0.75	36.54	0.67	26.60	0.76	29.70	0.73	52.84	0.53	137.16	0.69	56.57
AlexNet	0.71	693.70	0.72	686.02	0.66	682.31	0.15	679.13	0.27	710.75	0.50	690.38
AlexNet+SC	0.68	688.01	0.66	679.93	0.64	676.91	0.19	670.95	0.24	698.51	0.48	682.86
AlexNet+PP	0.72	23.66	0.72	17.27	0.67	16.08	0.48	26.31	0.59	71.19	0.64	30.90
AlexNet+SC+PP	0.70	22.56	0.67	16.24	0.66	15.67	0.52	24.57	0.58	67.55	0.62	29.32
CALC	0.69	50.24	0.64	51.74	0.62	49.34	0.18	46.42	0.23	45.62	0.47	48.67
CALC+SC	0.65	49.95	0.58	51.40	0.60	49.05	0.13	46.14	0.28	45.34	0.45	48.38
CALC+PP	0.71	1.67	0.64	1.30	0.65	1.11	0.53	1.78	0.57	4.37	0.62	2.05
CALC+SC+PP	0.67	1.60	0.59	1.23	0.63	1.08	0.50	1.68	0.61	4.20	0.60	1.96
CoHOG	0.38	3680.42	0.52	4284.12	0.50	2349.37	0.59	326.02	0.51	288.77	0.50	2185.74
CoHOG+SC	0.33	3678.99	0.50	4282.21	0.49	2348.25	0.58	325.32	0.49	287.73	0.48	2184.50
CoHOG+PP	0.45	119.87	0.53	103.88	0.67	49.81	0.70	12.44	0.75	27.37	0.62	62.67
CoHOG+SC+PP	0.41	115.13	0.50	98.63	0.67	48.86	0.69	11.81	0.75	26.39	0.61	60.16
HOG	0.17	93.25	0.41	107.49	0.39	59.37	0.10	13.56	0.04	12.20	0.22	57.17
HOG+SC	0.10	93.03	0.41	107.30	0.35	59.24	0.06	13.53	0.24	12.16	0.23	57.05
HOG+PP	0.40	3.03	0.45	2.62	0.44	1.27	0.40	0.51	0.32	1.14	0.40	1.72
HOG+SC+PP	0.31	2.91	0.43	2.48	0.40	1.25	0.41	0.48	0.52	1.10	0.41	1.64

TpF: time consumption per frame; LCD: loop closure detection; HOG: histogram-of-oriented-gradients. The bold values indicate the outstanding results.

Some precision and recall curves are illustrated in Figures 6 and 7. The precision curves and the recall curves are also provided together with the precision–recall curves to analyze the details of the precision and recall performance, respectively.

Figure 6.

Precision, recall, and precision–recall curves of different methods on the KITTI00 image sequence (PP indicates the pose-based probability. SC indicates the sequential consistency check).

Figure 7.

Precision, recall, and precision–recall curves of the CNN-based method on the TUMfreiburg2desk (the top row) and the TUMfreiburg3longoffice (the bottom row) image sequences.

Note that the SC check and GC check module are optional and are widely used in many other methods. These two modules are not the contributions of this work so that they are not discussed in this experiment. Moreover, the GC module is a strict but time-consuming module. The introduction of GC will perform a 100% precision for most methods with a heavy computational burden, which is already illustrated in Figure 3.

As shown in Table 1, the introduction of the pose-based probability can greatly improve the LCD performance. The AUC (i.e., precision and recall performance) is greatly increased for all the experimental methods. The time consumption per frame (TpF) is decreased by orders of magnitude, showing the extraordinary real-time performance. Note that this TpF is the average time during the whole SLAM, and the time consumption is increasing with the increase of map, as shown in Figure 1. On the other hand, the introduction of the SC check can also improve the LCD performance, but the improvement is limited comparing to the pose-based probability.

The precision curves in Figures 6 and 7 show that the pose-based probability and the SC check can improve the precision performance of the appearance-based method. However, the recall performance, as shown in the recall curves in Figures 6 and 7, will not be better than the appearance-based results because the introduction of any other component after the appearance-based component can filter out some candidates and cannot add candidates. Finally, the large precision increase and small recall decrease results in an improvement of the precision–recall performance.

In summary, our evaluation suggests that our method provides competitive efficiency and accuracy in LCD. The introduction of pose-based information into the framework will greatly accelerate the LCD process. And meanwhile, the proposed method achieves good precision–recall performance in real-time SLAM applications.

Conclusions

In this article, we presented a novel approach to detect loop closures for SLAM. Our approach exploited both pose-based and appearance-based information in a probabilistic manner, inspired by the complementarity between these two kinds of information. This allows the proposed method to achieve higher efficiency and accuracy performance in LCDs. We implemented and evaluated our approach on different datasets and provided comparisons to other existing techniques. The experimental evaluation suggests that our method provides competitive efficiency and accuracy in LCDs.

However, there are certain aspects of the proposed method that need to be improved. First, the introduction of the pose information limits the application of the method, which makes it unable to deal with the kidnapping of the robot. Our plan is to introduce more powerful but less fast appearance method, such as deep learning based method, to re-localize in the global map. Second, the trend of online incrementally update of the appearance vocabularies or other appearance models can be adopted to enhance the adaptability of long-time SLAM.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Fund for key Laboratory of Space Flight Dynamics Technology (2022-JYAPAF-F1028), Major Project of Natural Science Foundation of Hunan Province (No. 2021JC0004), and the National Science Foundation of China (U1913202, U22A2059, and 62203460).

ORCID iDs

Qinghua Yu

Huimin Lu

References

Tsintotas

Bampis

Gasteratos

The revisiting problem in simultaneous localization and mapping. Cham: Springer International Publishing, 2022, pp. 1–33. DOI: 10.1007/978-3-031-09396-8_1.

Lowry

Sünderhauf

Newman

, et al. Visual place recognition: a survey. IEEE Trans Robot 2016; 32(1): 1–19.

Tsintotas

Giannis

Bampis

, et al. Appearance-based loop closure detection with scale-restrictive visual features. In: Tzovaras

Giakoumis

Vincze

, et al. (eds) Computer vision systems. Cham: Springer International Publishing, pp. 75–87. ISBN 978-3-030-34995-0.

Nister

Stewenius

. Scalable recognition with a vocabulary tree. In: IEEE Computer Society conference on computer vision and pattern recognition, New York, NY, USA, 17–22 June 2006, pp. 2161–2168. Los Alamitos, CA, USA: IEEE Computer Society. DOI: 10.1109/CVPR.2006.264.

Hartigan

Wong

. Algorithm AS 136: a K-means clustering algorithm. J R Stat Soc C: Appl Stat 1979; 28(1): 100–108.

Angeli

Filliat

Doncieux

, et al. Fast and incremental method for loop-closure detection using bags of visual words. IEEE Trans Robot 2008; 24(5): 1027–1037.

Cummins

Newman

. Appearance-only SLAM at large scale with FAB-MAP 2.0. Int J Robot Res 2010; 30(9): 1100–1123.

Schöps

JET

Cremers

. LSD-SLAM: large-scale direct monocular SLAM. In: European conference on computer vision, 2014, pp. 834–849. Cham: Springer International Publishing. DOI: 10.1007/978-3-319-10605-2_54.

Gálvez-López

Tardós

. Real-time loop detection with bags of binary words. In: 2011 IEEE/RSJ international conference on intelligent robots and systems, San Francisco, CA, USA, 25–30 September 2011, pp. 51–58. IEEE. DOI: 10.1109/IROS.2011.6094885.

10.

Labbé

Michaud

. Appearance-based loop closure detection for online large-scale and long-term operation. IEEE Trans Robot 2013; 29(3): 734–745.

11.

Chow

Liu

. Approximating discrete probability distributions with dependence trees. IEEE Trans Inform Theory 1968; 14(3): 462–467.

12.

Mur-Artal

Tardós

. Fast relocalisation and loop closing in keyframe-based SLAM. In: 2014 IEEE international conference on robotics and automation, Hong Kong, China, 31 May 2014–7 June 2014, pp. 846–853. Springer International Publishing. DOI: 10.1109/ICRA.2014.6906953.

13.

Bampis

Amanatiadis

Gasteratos

. Fast loop-closure detection using visual-word-vectors from image sequences. Int J Robot Res 2018; 37(1): 62–82.

14.

Garcia-Fidalgo

Ortiz

. iBoW-LCD: an appearance-based loop-closure detection approach using incremental bags of binary words. IEEE Robot Autom Lett 2018; 3(4): 3051–3057.

15.

Tsintotas

Bampis

Gasteratos

. Modest-vocabulary loop-closure detection with incremental bag of tracked words. Robot Auton Syst 2021; 141: 103782.

16.

Khan

Wollherr

. Ibuild: incremental bag of binary words for appearance based loop closure detection. In: 2015 IEEE international conference on robotics and automation (ICRA), Seattle, WA, USA, 26–30 May 2015, pp. 5441–5447. IEEE. DOI: 10.1109/ICRA.2015.7139959.

17.

Tsintotas

Bampis

Gasteratos

. Probabilistic appearance-based place recognition through bag of tracked words. IEEE Robot Autom Lett 2019; 4(2): 1737–1744.

18.

Babenko

Slesarev

Chigorin

, et al. Neural codes for image retrieval. In: Fleet

Pajdla

Schiele

Tuytelaars

(eds) European conference on computer vision. Zurich Switzerland, 6–12 September 2014. Springer International Publishing. ISBN 978-3-319-10590-1, 2014. pp. 584–599. DOI: 10.1007/978-3-319-10590-1_38.

19.

Wan

Wang

Hoi

SCH

, et al. Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the 22nd ACM international conference on multimedia, Orlando, FL, USA, 3–7 November 2014, pp. 157–166. New York, NY, USA: ACM. DOI: 10.1145/2647868.2654948.

20.

Hou

Zhang

Zhou

. Convolutional neural network-based image representation for visual loop closure detection IEEE international conference on information and automation, Vol. 15, Lijiang, China, 8–10 August 2015, pp. 2238–2245. IEEE. DOI: 10.1109/ICInfA.2015.7279659.

21.

Jégou

Perronnin

Douze

, et al. Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 2011; 34(9): 1704–1716.

22.

Jaakkola

Haussler

. Exploiting generative models in discriminative classifiers. In: Advances in neural information processing systems, Vol. 11, 1998, pp. 487–493. Cambridge, MA, USA: MIT Press. DOI: 10.5555/340534.340715.

23.

Perronnin

Dance

. Fisher kernels on visual vocabularies for image categorization. In: IEEE conference on computer vision and pattern recognition, Minneapolis, MN, USA, 17–22 June 2007, pp. 1–8. IEEE. DOI: 10.1109/CVPR.2007.383266.

24.

Cieslewski

Choudhary

Scaramuzza

. Data-efficient decentralized visual slam. In: 2018 IEEE international conference on robotics and automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018, pp. 2466–2473. IEEE.

25.

Merrill

Huang

. Lightweight unsupervised deep loop closure. In: Proceedings of robotics: science and systems (RSS). Pittsburgh, PA, 26–30 June 2018.

26.

Dalal

Triggs

. Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society conference on computer vision and pattern recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005, pp. 886–893. IEEE.

27.

Zaffar

Ehsan

Milford

, et al. CoHOG: a light-weight, compute-efficient, and training-free visual place recognition technique for changing environments. IEEE Robot Autom Lett 2020; 5(2): 1835–1842.

28.

Zhang

Zhu

. Loop closure detection for visual SLAM systems using convolutional neural network. In: 2017 23rd international conference on automation and computing, Huddersfield, UK, 7–8 September 2017, pp. 1–6. IEEE. DOI: 10.23919/IConAC.2017.8082072.

29.

Zhu

Wei

, et al. Fast and incremental loop closure detection with deep features and proximity graphs. J Field Robot 2022; 39(4): 473–493.

30.

Milford

Wyeth

. SeqSLAM: visual route-based navigation for sunny summer days and stormy winter nights. In: 2012 IEEE international conference on robotics and automation, Saint Paul, MN, USA, 14–18 May 2012, pp. 1643–1649. IEEE. DOI: 10.1109/ICRA.2012.6224623.

31.

Tsintotas

Bampis

Gasteratos

. DOSeqSLAM: dynamic on-line sequence based loop closure detection algorithm for SLAM. In: 2018 IEEE international conference on imaging systems and techniques (IST), Krakow, Poland, 16–18 October 2018, pp. 1–6. IEEE. DOI: 10.1109/IST.2018.8577113.

32.

Tsintotas

Bampis

Gasteratos

. Tracking-DOSeqSLAM: a dynamic sequence-based visual place recognition paradigm. IET Comput Vis 2021; 15(4): 258–273.

33.

Tsintotas

Bampis

Gasteratos

. Assigning visual words to places for loop closure detection. In: 2018 IEEE international conference on robotics and automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018, pp. 5979–5985. IEEE. DOI: 10.1109/ICRA.2018.8461146.

34.

Tsintotas

Bampis

Rallis

, et al. SeqSLAM with bag of visual words for appearance based loop closure detection. In: Aspragathos

Koustoumpardis

Moulianitis

(eds) Advances in service and industrial robotics. Cham: Springer International Publishing, pp. 580–587. ISBN 978-3-030-00232-9.

35.

Neira

Tardos

. Data association in stochastic mapping using the joint compatibility test. IEEE Trans Robot Autom 2001; 17(6): 890–897.

36.

Zhang

, et al. Improved loop closure detection algorithm for VSLAM with spatial coordinate index. In: International Conference on Smart Materials and Nanotechnology in Engineering (SMNE), Sanya, China, 1–3 March 2016, pp. 225–230. SMNE. DOI: 10.12783/dtmse/smne2016/10591.

37.

Forster

Zhang

Gassner

, et al. SVO: semidirect visual odometry for monocular and multicamera systems. IEEE Trans Robot 2017; 33(2): 249–265.

38.

Mur-Artal

Montiel

JMM

Tardós

. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 2015; 31(5): 1147–1163.

39.

Mur-Artal

Tardós

. ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans Robot 2017; 33(5): 1255–1262.

40.

Engel

Koltun

Cremers

. Direct sparse odometry. IEEE Trans Pattern Anal Mach Intell 2018; 40(3): 611–625.

41.

Engel

Schöps

Cremers

. LSD-SLAM: large-scale direct monocular SLAM. In: Fleet

Pajdla

Schiele

, et al. (eds) Computer vision—ECCV 2014, Zurich, Switzerland, 6–12 September 2014, pp. 834–849. Cham: Springer International Publishing. ISBN 978-3-319-10605-2. DOI: 10.1007/978-3-319-10605-2_54.

42.

Engel

Sturm

Cremers

. Semi-dense visual odometry for a monocular camera. In: 2013 IEEE international conference on computer vision, Sydney, NSW, Australia, 1–8 December 2013, pp. 1449–1456. IEEE. DOI: 10.1109/ICCV.2013.183.

43.

Xiao

, et al. Hybrid-residual-based RGBD visual odometry. IEEE Access 2018; 6: 28540–28551.

44.

Triggs

McLauchlan

Hartley

, et al. Bundle adjustment—a modern synthesis. In: Triggs

Zisserman

Szeliski

(eds) International workshop on vision algorithms. Berlin, Heidelberg: Springer, pp. 298–372. DOI: 10.1007/3-540-44480-7_21.

45.

Barfoot

Furgale

. Associating uncertainty with three-dimensional poses for use in estimation problems. IEEE Trans Robot 2014; 30(3): 679–693.

46.

Ila

Polok

Solony

, et al. Fast covariance recovery in incremental nonlinear least square solvers. In: 2015 IEEE international conference on robotics and automation, Seattle, WA, USA, 26–30 May 2015, pp. 4636–4643. IEEE. DOI: 10.1109/ICRA.2015.7139841.

47.

Ila

Polok

Solony

, et al. Fast incremental bundle adjustment with covariance recovery. In: 2017 international conference on 3D vision, Qingdao, China, 10–12 October 2017, pp. 175–184. IEEE. DOI: 10.1109/3DV.2017.00029.

48.

Tsintotas

Bampis

Gasteratos

Visual place recognition for simultaneous localization and mapping. In: Romil

Rawat

Purvee

Bhardwaj

Upinder

Kaur

Shrikant

Telang

Mukesh

Chouhan

K. Sakthidasan

Sankaran

(eds) Autonomous Vehicles. Volume 2: Smart Vehicles for communication. Wiley, 2022, pp. 47–79. ISBN 978-1-394-15261-2. DOI: 10.1002/9781394152636.ch4.

49.

Geiger

Lenz

Urtasun

. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition, Providence, RI, USA, 16–21 June 2012, pp. 3354–3361. IEEE. DOI: 10.1109/CVPR.2012.6248074.

50.

Sturm

Engelhard

Endres

, et al. A benchmark for the evaluation of RGB-D SLAM systems. In: 2012 international conference on intelligent robot systems, Vilamoura-Algarve, Portugal, 7–12 October 2012, pp. 573–580. IEEE. DOI: 10.1109/IROS.2012.6385773.

51.

Glover

Maddern

Warren

, et al. OpenFABMAP: an open source toolbox for appearance-based loop closure detection. In: 2012 IEEE international conference on robotics and automation, Saint Paul, MN, USA, 14–18 May 2012, pp. 4730–4735. IEEE. DOI: 10.1109/ICRA.2012.6224843.

Fast loop closure detection using probabilistic integration of pose and appearance similarity

Abstract

Keywords

Introduction

Related works

Methods based on traditional feature

Methods based on learning

Methods based on sequential information

Methods based on pose information

Preliminaries

Lie algebra

Nonlinear optimization for pose estimation

Covariance of pose

Pose-based loop closure probability

Probability distribution of pose

Covisibility of two poses

Pose-based loop closure probability

Fast loop closure candidate filtering

Pose-appearance-based LCD

Pose-appearance-based loop closure probability

SC check and the GC check

Experimental evaluation

Methodology

Datasets

Metrics

Program and settings

Efficiency

Accuracy

Conclusions

Footnotes

Declaration of conflicting interests

Funding

ORCID iDs

References