A lifelong mapping framework with multi-view projection in a changing environment

Abstract

Simultaneous Localization and Mapping (SLAM) systems typically rely on a prior map constructed during an initial deployment. In real-world environments, however, structural and semantic changes gradually invalidate this map, leading to degraded localization accuracy and, in severe cases, localization failure. This limitation poses a major obstacle to the long-term deployment of mobile robots in dynamic environments. This paper proposes a lifelong mapping framework with multi-view projection fusion (LLMF) that enables efficient map maintenance while preserving a consistent global coordinate system. The framework introduces two key design components. First, a multi-view point cloud projection alignment strategy based on Bird’s-Eye View (BEV) and frontal view (FV) projections is employed to align point cloud maps acquired at different times without re-labeling previously defined operational points. Second, an image-based change detection and map update mechanism is developed, transforming computationally expensive 3D point cloud comparisons into efficient 2D image processing operations. The proposed framework is evaluated through qualitative experiments on the open-source MulRan dataset and quantitative long-term experiments conducted over more than nine months in a real farm environment. Experimental results demonstrate that LLMF maintains localization accuracy while significantly reducing the computational cost of change detection, lowering processing time from several hours to a few minutes. These results indicate that the proposed framework provides a practical and scalable engineering solution for long-term robot operation in changing environments.

Keywords

changing environment lifelong mapping multiple views projection fusion point cloud change detection

1. Introduction

Simultaneous Localization and Mapping (SLAM) is a core enabling technology for autonomous mobile robots, enabling them to localize and navigate in unknown environments while incrementally constructing a spatial representation of the surroundings. SLAM techniques have been widely adopted in robotic autonomy and navigation systems, including multi-robot coordination and inspection tasks.^1–3 In recent years, LiDAR–inertial odometry systems such as Fast-LIO⁴ and Faster-LIO⁵ have significantly improved real-time pose estimation accuracy and robustness, while publicly available datasets such as FusionPortable facilitate the development and evaluation of generalized SLAM frameworks.⁶

With the continuous development of optimization and estimation methods, including metaheuristic algorithms such as the spiral dynamics algorithm and the harmony search family,^7–9 modern SLAM systems have achieved increasingly accurate pose estimation and 3D mapping capabilities. These advances support a wide range of engineering applications such as inspection, monitoring, and autonomous operation.

In practical deployments, a common industrial paradigm is to treat the 3D map constructed during an initial operation as a fixed reference for subsequent tasks. Recent research has explored robust map alignment and lifelong mapping strategies to maintain localization accuracy over extended deployments, including open-set object map alignment approaches,¹⁰ lifelong LiDAR–IMU SLAM frameworks with autonomous map updating,¹¹ and multi-session LiDAR mapping systems designed for long-term operation.¹²

This paradigm is particularly prevalent in Light Detection and Ranging (LiDAR) robotic systems, where three-dimensional sensing enables accurate perception and mapping of the surrounding environment. Public datasets such as SemanticKITTI have facilitated the development and evaluation of LiDAR-based perception and mapping methods.¹³ In such systems, localization is commonly performed using scan-to-map matching, where real-time LiDAR scans are aligned with a previously constructed global map to estimate the robot pose in three-dimensional space. LiDAR-based perception and mapping techniques have been widely applied in infrastructure inspection, object detection, and scene understanding tasks,^14–16 while SLAM and odometry frameworks such as tracked robot odometry, active SLAM strategies, and gravity-constrained SLAM systems provide robust localization capabilities for mobile robots.^17–19 While effective in static or slowly changing environments, this approach exhibits fundamental limitations when deployed over long time horizons in real-world settings.

Real environments are inherently dynamic. Object relocation, structural modifications, and seasonal or operational changes progressively invalidate the prior map, leading to increasing discrepancies between the reference map and the actual environment.^20–22 As these discrepancies accumulate, localization accuracy degrades and may ultimately result in localization failure. A straightforward response is to reconstruct the map; however, this solution introduces a critical engineering drawback: the newly reconstructed map is generally inconsistent with the original coordinate system. As a consequence, previously defined semantic annotations, inspection points, and operational locations must be re-labeled, resulting in substantial time overhead and additional economic cost.^23,24 These limitations significantly constrain the practical value of mobile robots in long-term and large-scale deployments.^25–27 Addressing this challenge has given rise to the research problem commonly referred to as lifelong mapping–defined as the technical paradigm of continuously maintaining and adaptively updating maps over extended operational periods in dynamic environments.

In this work, we propose a lifelong mapping framework with multi-view projection fusion, termed LLMF, aimed at enabling efficient map maintenance while preserving a consistent global coordinate system. Rather than relying on direct and computationally expensive 3D point cloud comparisons, the proposed framework adopts a global-to-local strategy in which point clouds acquired at different times are projected into complementary two-dimensional views. Specifically, Bird’s-Eye View (BEV) and frontal view (FV) projections are jointly exploited to align point cloud maps across different operational periods without altering the original map coordinates. Environmental changes are then identified through image-based change detection, allowing map updates to be performed efficiently while avoiding the re-labeling of existing operational points.

The main contributions of this work are summarized as follows:

A lifelong mapping framework is developed by extending traditional SLAM with a dedicated map maintenance mechanism that preserves global coordinate consistency across long-term deployments.

A multi-view point cloud projection and alignment strategy is introduced, combining BEV and FV projections with image-based change detection to transform computationally intensive 3D operations into efficient 2D image processing.

The proposed framework is validated through qualitative experiments on the open-source MulRan dataset²⁸ and long-term quantitative evaluations conducted over nine months in a real farm environment, comprising 32 operational runs, demonstrating both localization robustness and computational efficiency.

2. Related work

Long-term localization and mapping in changing environments has attracted significant attention from both industry and academia. Existing approaches can be broadly categorized according to their map representation, update strategy, and change detection mechanism.

Early industrial solutions have primarily focused on sub-map management and replacement strategies. For example, Gaussian Robotics proposed a long-term localization and mapping method for 2D environments in which the global map is decomposed into multiple sub-maps; when environmental changes are detected, outdated sub-maps are replaced while attempting to preserve associated information.²⁹ While effective in structured 2D settings, such approaches are difficult to extend to large-scale 3D environments.

In LiDAR-based 3D mapping, Kim et al. introduced Long-term Mapper (LT-Mapper),³⁰ this method first eliminates highly dynamic objects in the environment,³¹ then employs Scan Context descriptors³² to achieve trajectory alignment across multiple sessions, and performs point-wise change detection through kd-tree-based set difference operations.³³

Baidu Apollo proposed a joint vehicle localization framework³⁴ that combines LiDAR–inertial odometry with global matching and detects environmental changes based on grid-level statistics such as point density and height variance. Walcott et al. developed Dynamic Pose Graph SLAM (DPG-SLAM),²⁶ incorporating environmental dynamics directly into the pose graph formulation by selectively removing inactive scans and adding new observations. Other studies have explored point cloud registration and change detection by exploiting density variations in urban scenes³⁵ or by leveraging prior Building Information Modeling (BIM) data to align and update multi-session maps.²³ Lifelong Localization (LiLoc)³⁶ retains factor graph information from the prior map and distributes historical constraints during subsequent operations to support lifelong localization.

Compared with existing 3D lifelong localization and mapping methods, the proposed LLMF in this paper features essential innovations in core design, with its key differences and technical advantages specified as follows: (1) LT-Mapper adopts a combined strategy of “LiDAR keyframe matching + kd-tree 3D point cloud differencing.” Its change detection performance is fully dependent on high-precision localization and matching results, and the accumulation of matching errors is prone to causing map ghosting (i.e., the ghosting effect shown in Figure 1). Moreover, its local update mechanism undermines the consistency of the global coordinate system, leading to the invalidation of original semantic annotations; (2) BIM-SLAM assists multi-session map alignment through pre-defined Building Information Modeling (BIM). It cannot only adapt to dynamic scenarios without BIM support such as farmland and temporary work areas but also highly relies on offline models for computation, which severely limits the real-time applicability in engineering scenarios; (3) LiLoc solely focuses on localization consistency and achieves short-term high-precision localization by retaining historical constraints through factor graphs. However, it lacks an active map update mechanism, making it unable to cope with the long-term evolution of environmental structures and difficult to support the continuous operation of robots in dynamic scenarios. LLMF innovatively proposes a BEV+FV multi-view projection fusion strategy, transforming computationally intensive 3D point cloud processing into efficient 2D image processing, thus breaking through the efficiency bottleneck of traditional direct 3D point cloud processing. Meanwhile, relying on the fixed global coordinate constraint, it fundamentally solves the coordinate drift and consistency issues caused by local updates. Throughout the process, there is no need for re-annotating semantic information or relying on pre-defined models, making it more in line with the core requirements of practical engineering deployment for efficiency, stability, and versatility. The comparison of key characteristics of related methods is shown in Table 1.

Figure 1.

In the farm environment, the LT-Mapper algorithm exhibited a map ghosting issue (The three red boxes in the figure indicate areas where map ghosting occurred after the map was updated).

Table 1.

Comparison of lifelong localization and mapping methods.

Method	Global coordinate consistency guarantee	Change detection approach
LT-Mapper	Relies on local matching,prone to accumulated drift (ghosting effect)	3D point cloud differencing via kd-tree
BIM-SLAM	Weak adaptability to environments, dependent on pre-defined BIM	Point cloud registration constrained by BIM
LiLoc	Only guarantees localization consistency, no active map update mechanism	No explicit change detection, dependent on relocalization
LLMF	Fixed global coordinates, no alteration to original framework during updates	Deep learning (MTKD) combined with 2D image differencing

Despite these advances, two fundamental challenges remain in lifelong mapping for 3D robotic systems. First, accurately aligning the coordinate systems of point cloud maps acquired at different times remains non-trivial, particularly when large-scale environmental changes occur. Second, efficient and reliable change detection over large-scale 3D point clouds is computationally demanding. Most existing methods adopt a local-to-global update strategy, incrementally updating the global map through frame-level alignment or local region comparison. This strategy is sensitive to local matching errors and may lead to accumulated drift or inconsistencies in the global coordinate system. In practice, such errors can cause shifts in previously labeled operational points, as illustrated by the ghosting phenomenon observed in LT-Mapper updates (Figure 1).

An alternative strategy is to perform global point cloud comparison to avoid local alignment errors; however, direct 3D matching methods, such as Iterative Closest Point (ICP),³⁷ incur prohibitively high computational costs when applied to large-scale maps. Although grid-based partitioning and multi-threading techniques have been proposed to accelerate change detection,^38,39 their computational demands remain incompatible with long-term, resource-constrained robotic operation.

To alleviate these limitations, several studies have explored the use of image-based alignment and change detection techniques. Classical feature-based methods, such as Scale-Invariant Feature Transform (SIFT),⁴⁰ Speeded Up Robust Features (SURF),^41,42 and Oriented FAST and Rotated BRIEF (ORB),⁴³ have demonstrated robust performance in image matching and are widely adopted in visual SLAM systems.^44,45 More recently, deep learning-based change detection models have achieved strong performance in remote sensing applications by identifying differences between temporally separated images, including methods based on multi-teacher knowledge distillation,⁴⁶ bi-temporal adapter networks,⁴⁷ and general feature interaction architectures.⁴⁸

Motivated by these developments, this work leverages image-based alignment and change detection techniques to address the computational and robustness limitations of 3D point cloud processing, extending their application to lifelong map maintenance in large-scale robotic environments.

3. Proposed lifelong mapping framework

This section presents the proposed lifelong mapping framework with multi-view projection fusion (LLMF). The overall architecture of the framework is illustrated in Figure 2. LLMF is designed as a modular system composed of four tightly integrated components: (i) a global localization and mapping module, (ii) a multi-view point cloud projection alignment module, (iii) an image-based point cloud change detection module, and (iv) a map update module. In the proposed framework, the global map constructed during the robot’s initial operation is treated as a persistent reference map for subsequent long-term deployment. During each new operation, the robot performs global localization and mapping using this prior map as a reference. Upon completion of the task, both the historical map and the newly constructed map are jointly processed by the multi-view projection module, where 3D point clouds are projected into complementary two-dimensional representations. These projected images are then passed to the change detection module, which identifies regions corresponding to environmental changes through image comparison. Based on the resulting change maps, the map update module selectively removes obsolete point cloud data and integrates newly observed structures into the prior map. Crucially, this update process preserves the original global coordinate system, ensuring that existing semantic annotations, inspection points, and operational locations remain valid. Through this pipeline, LLMF enables efficient and consistent long-term map maintenance in dynamic environments.

Figure 2.

The system framework of LLMF.

3.1. Global localization and mapping module

Accurate estimation of the robot’s global pose during motion is a prerequisite for reliable localization and task execution in large-scale environments.^49,50 Conventional global localization approaches typically rely on direct frame-to-map matching against a prior global map. In dynamic environments, however, this strategy becomes increasingly fragile, as discrepancies between the current scene and the reference map can lead to degraded accuracy or localization failure. In this framework, global localization and mapping are implemented using a robust hybrid strategy consistent with established LiDAR-based SLAM systems,^19,51 which combines multiple complementary estimation stages. which combines multiple complementary estimation stages. First, scan-to-scan matching between consecutive LiDAR frames is performed to estimate the local motion of the robot. Second, scan-to-map matching is used to anchor the local estimates to the global reference map and recover the global pose. Finally, factor graph optimization integrates these constraints to produce a globally consistent and accurate pose estimate. Based on the optimized trajectory, a point cloud map of the current environment is incrementally constructed. This newly generated map serves as the input for subsequent multi-view projection, change detection, and map update processes described in the following subsections.

3.2. Point cloud multi-view projection alignment module

Let the global 3D point cloud map be denoted as the set

P = {p_{i} ∣ p_{i} = (x_{i}, y_{i}, z_{i})^{⊤}, i = 1, 2, \dots, N},

where

N

is the total number of points and

x_{i}, y_{i}, z_{i} \in R

are scalar coordinates. Let

x_{min}, x_{max}, y_{min}, y_{max}, z_{min}, z_{max}

denote the extrema of the point cloud along each axis.

To establish a mapping between the 3D point cloud and a 2D image representation, a 3D bounding box is first defined using these extrema. The width $w$ and height $h$ of the bounding box on the horizontal plane are given by

\begin{aligned} w & = x_{max} - x_{min}, \\ h & = y_{max} - y_{min} . \end{aligned}

(1)

Given a grid resolution $r$ (in metres per pixel, default 0.05 m), serving as the core link between 3D point clouds and 2D projection images, not only adapts to the point cloud density of 16-line LiDAR and retains 0.1 m-level details, but also avoids computational overload caused by excessively large image sizes, projection onto a 2D grid yields an image of width $W = w / r$ and height $H = h / r$ . Each 3D point $p_{i}$ is mapped to grid coordinates $(μ_{i}, ν_{i})$ according to

(μ_{i}, ν_{i}) = (\frac{x_{i} - x_{min}}{r}, \frac{y_{i} - y_{min}}{r}) .

(2)

To improve robustness against variations in scene scale, height-related features are normalized to the interval $[0, 1]$ . Let $Z_{avg} (μ, ν)$ denote the mean height of points falling into grid cell $(μ, ν)$ . The normalized average height is defined as

Z_{avg}^{*} (μ, ν) = \frac{Z_{avg} (μ, ν) - z_{min}}{z_{max} - z_{min}} .

(3)

Similarly, let $Z_{range} (μ, ν)$ represent the height range within the grid cell, computed as the difference between the maximum and minimum $z$ -coordinates of the contained points. The normalized height range is given by

Z_{range}^{*} (μ, ν) = \frac{Z_{range} (μ, ν)}{z_{max} - z_{min}} .

(4)

These normalized features are encoded into a color image using the HSV color space. Specifically, the hue channel is determined by $Z_{avg}^{*}$ (mapped to the interval $[0, 120]$ , corresponding to a red-to-green transition), the saturation is fixed at 0.8 to ensure color vividness, and the value channel is determined by $Z_{range}^{*}$ (mapped to $[0.5, 1]$ , where larger height variations correspond to higher intensity).

Using this formulation, Bird’s-Eye View (BEV) projections are generated by projecting onto the $x$ - $y$ plane, capturing horizontal structural features for planar alignment. Complementarily, Front View (FV) projections are generated by projecting onto the $x$ - $z$ plane, emphasizing vertical structure and height information. Both views share the same grid-mapping principle, differing only in projection plane and feature emphasis.

Let the reference projection image and the current projection image be denoted by $I_{r}$ and $I_{c}$ , respectively. Their alignment is achieved via a homography matrix $H_{m} \in R^{3 \times 3}$ , such that the aligned reference image shares a common coordinate system with the current image. In homogeneous coordinates, corresponding pixel locations satisfy

[\begin{matrix} μ_{r} \\ ν_{r} \\ 1 \end{matrix}] \sim H_{m} [\begin{matrix} μ_{c} \\ ν_{c} \\ 1 \end{matrix}],

(5)

where “

\sim

” denotes equality up to scale. The homography matrix is expressed as

H_{m} = [\begin{matrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{matrix}] .

(6)

Eliminating the scale factor yields the non-homogeneous form

\begin{aligned} \begin{aligned} μ_{r} & = \frac{h_{11} μ_{c} + h_{12} ν_{c} + h_{13}}{h_{31} μ_{c} + h_{32} ν_{c} + h_{33}}, \\ ν_{r} & = \frac{h_{21} μ_{c} + h_{22} ν_{c} + h_{23}}{h_{31} μ_{c} + h_{32} ν_{c} + h_{33}} . \end{aligned} \end{aligned}

(7)

ORB features are extracted from $I_{r}$ and $I_{c}$ , and corresponding point pairs ${(μ_{r, k}, ν_{r, k}), (μ_{c, k}, ν_{c, k})}$ are identified using Hamming distance and the nearest-neighbor distance ratio (NNDR)⁵² criterion with a threshold of 0.8(the NNDR threshold of 0.8 is used to filter valid ORB feature matching pairs, achieving an optimal balance between match quantity and purity and adapting to feature variations in dynamic scenarios). The homography inlier threshold of 1–3 pixels can not only tolerate minor errors but also filter out outliers to avoid matrix distortion; the probability threshold of 0.5 corresponds to a pixel value of 128 in single-channel images, accurately adapting to the pixel distribution of difference images and effectively distinguishing between real changes and noise interference. A Random Sample Consensus (RANSAC)⁵³ procedure is then employed: six non-collinear correspondences are randomly sampled to estimate an initial homography $H_{init}$ via least squares; inliers are determined using a reprojection error threshold $ϵ$ (1–3 pixels), and the model with the largest inlier set is retained. Finally, the optimal homography $H_{m}$ is obtained by refining $H_{best}$ using weighted least squares over all inliers. To clearly present the theoretical derivation logic of this section, its pseudocode implementation is shown in Algorithm 1.

To ensure no regional bias in sampling and the statistical consistency of RANSAC iterations, Algorithm 1 adopts an unbiased random sampling without replacement strategy from the set of valid correspondences filtered through ORB feature matching and NNDR thresholding. Considering that homography estimation (a $3 \times 3$ matrix with 8 degrees of freedom) requires at least 4 non-collinear correspondences, this algorithm samples 6 points to increase redundancy and significantly reduce the probability of collinearity. The non-collinearity check is performed on the reference projection image: 3 arbitrary points are selected from the 6 candidate correspondences to fit a line, and the Euclidean distances from the remaining 3 points to this line are calculated. If all distances are less than 1 pixel (consistent with the reprojection error threshold), the set of points is deemed collinear and resampled, ensuring the robustness of homography estimation.

By applying this procedure independently to BEV and FV projections, two homography matrices, $H_{m}^{h}$ and $H_{m}^{v}$ , are obtained. These matrices jointly encode the spatial relationship between the current and prior point cloud maps and provide essential geometric constraints for the image-to-point cloud mapping process described in Section 3.4.

3.3. Point cloud change detection based on image comparison

In this section, we directly adopt the Multi-Teacher Knowledge Distillation (MTKD) framework for change detection proposed by Liu et al.,⁴⁶ and conduct model training using our self-collected dataset instead of the original datasets in the reference paper. Specifically, our dataset comprises 47,901 image pairs, which are derived from cropping the Bird’s-Eye View (BEV) images, with a unified input resolution of 512 $\times$ 512 pixels. As a general optimization framework, MTKD is compatible with diverse backbone network architectures, and we employ the same backbone types as supported by the original framework, including CNN-based architectures (e.g., ResNet-18 with 18 layers, VGG-16 with 16 layers, and lightweight models like TinyCD with simplified layer structures), Transformer-based architectures (e.g., MiT-b0 with a basic Transformer encoder and MiT-b1 with deeper layers and 13.941 parameters), and Foundation Model (FM)-based architectures (e.g., ViT-L with 261.12M parameters and SAM with 361.472M parameters). For the training scheme, we follow the original MTKD settings: data augmentation is implemented using RandomRotate, RandomFlip, and PhotoMetricDistortion to improve model generalization; the AdamW optimizer is adopted with $β_{1} = 0.9$ and $β_{2} = 0.99$ , and the batch size is set to 8 (16 for ChangeStar-FarSeg, consistent with the original configuration); the learning rate schedule includes a warm-up phase of 1,000 iterations (5 epochs for TTP), where the learning rate linearly rises from $10^{- 6}$ to the initial value (e.g., $10^{- 4}$ for FC-EF, $3.57 \times 10^{- 3}$ for TinyCD, $4 \times 10^{- 4}$ for TTP), followed by linear decay to 0 using LinearLR for most models (CosineAnnealingLR for TTP); the teacher models( $M_{O}$ )are trained for 200,000 iterations (300 epochs for TTP) on the original and partitioned datasets, while the student model ( $M_{S}$ ) undergoes an additional 100,000 iterations (100 epochs for TTP) of training on the original dataset to distill complementary knowledge from the teacher models. Here, BEV refers to the 2D projection images of the prior and current maps that are coordinate-aligned, denoted by $I_{r} \in R^{H \times W \times 3}$ and $I_{c} \in R^{H \times W \times 3}$ , respectively.

The proposed framework employs a lightweight convolutional neural network $N (\cdot; θ)$ , which takes the normalized differential features as input and produces a pixel-wise change probability map $\hat{Y} \in [0, 1]^{H \times W}$ :

\hat{Y} = N (F_{norm}; θ),

(8)

where

θ

denotes the network parameters. The network follows an encoder–decoder architecture, in which the encoder extracts hierarchical features via successive downsampling, while the decoder restores spatial resolution using upsampling and skip connections. A Sigmoid activation function is applied at the output layer to obtain per-pixel change probabilities.

The network is trained using a weighted binary cross-entropy loss to address class imbalance:

\begin{aligned} L_{CE} & = - \frac{1}{H W} \sum_{u = 1}^{H} \sum_{v = 1}^{W} [α Y_{GT} (u, v) \log \hat{Y} (u, v) \\ + (1 - α) (1 - Y_{GT} (u, v)) \log (1 - \hat{Y} (u, v))], \end{aligned}

(9)

where the weighting factor

α

is defined as

α = \frac{\sum_{u = 1}^{H} \sum_{v = 1}^{W} (1 - Y_{GT} (u, v))}{\sum_{u = 1}^{H} \sum_{v = 1}^{W} Y_{GT} (u, v) + \sum_{u = 1}^{H} \sum_{v = 1}^{W} (1 - Y_{GT} (u, v))} .

During inference, the continuous probability map $\hat{Y}$ is converted into a binary change map $Y$ using a threshold $τ$ :

Y_{GT} (u, v) = {\begin{cases} 1, & \hat{Y} (u, v) > τ, \\ 0, & otherwise . \end{cases}

(10)

The threshold

τ

is selected on a validation set according to application requirements and is set to 0.5 by default.

To quantify the degree of scene change and guide network training, we adopt a change detection model based on Multi-Teacher Knowledge Distillation (MTKD) and define the Change Area Ratio (CAR) as

{CAR}_{GT} = \frac{1}{H W} \sum_{u = 1}^{H} \sum_{v = 1}^{W} Y_{GT} (u, v),

(11)

where

Y_{GT} \in {0, 1}^{H \times W}

is the ground-truth binary change map.

Based on ${CAR}_{GT}$ , the training samples are categorized into three subsets corresponding to different change intensities:

\begin{aligned} D_{S} & = {(I_{1}, I_{2}, Y_{GT}) ∣ {CAR}_{GT} \leq 0.3}, \\ D_{M} & = {(I_{1}, I_{2}, Y_{GT}) ∣ 0.3 < {CAR}_{GT} \leq 0.7}, \\ D_{L} & = {(I_{1}, I_{2}, Y_{GT}) ∣ 0.7 < {CAR}_{GT} \leq 1} . \end{aligned}

(12)

3.4. Mapping update module

Given the binary change map $Y \in {0, 1}^{H \times W}$ produced by the change detection module, the mapping update module performs selective modification of the prior point cloud map while strictly preserving the global coordinate system.

For each grid cell $(μ, ν)$ in the projected image, the corresponding 3D spatial region is defined as

\begin{aligned} x & \in [x_{min} + μ r, x_{min} + (μ + 1) r], \\ y & \in [y_{min} + ν r, y_{min} + (ν + 1) r], \\ z & \in [z_{min}, z_{max}] . \end{aligned}

(13)

Conversely, a 3D point $p = (x, y, z)^{⊤}$ is mapped to its corresponding grid indices by

μ (p) = ⌊ \frac{x - x_{min}}{r} ⌋, ν (p) = ⌊ \frac{y - y_{min}}{r} ⌋ .

(14)

Using this mapping, the binary change map $Y$ is translated into point-level update operations. Points from the prior map that correspond to changed regions and are therefore removed are defined as

P_{remove} = {p \in P_{r} | Y (μ (p), ν (p)) = 1} .

(15)

Similarly, points from the current map that correspond to changed regions and should be added to the global map are defined as

P_{add} = {p \in P_{c} | Y (μ (p), ν (p)) = 1} .

(16)

The updated point cloud map is then obtained via set operations:

P_{new} = (P_{r} ∖ P_{remove}) \cup P_{add} .

(17)

For bookkeeping and downstream processing, each point involved in the update is assigned a label:

label (p) = {\begin{cases} removal, & p \in P_{remove}, \\ added, & p \in P_{add} . \end{cases}

(18)

A key constraint of the proposed update mechanism is that the global coordinate system remains unchanged throughout the process:

CoordSys (P_{new}) \equiv CoordSys (P_{r}) .

(19)

This constraint guarantees that all semantic annotations, navigation targets, and operational paths defined in the original map remain valid, thereby eliminating the need for costly re-annotation.

The computational complexity of the mapping update procedure is $O (N_{r} + N_{c})$ where $N_{r}$ and $N_{c}$ denote the number of points in the prior and current point clouds, respectively. This linear complexity ensures scalability to large-scale environments and long-term deployments. The pseudocode corresponding to the point cloud map update module is summarized in Algorithm 2.

4. Experimental results and analysis

To evaluate the effectiveness of the proposed lifelong mapping framework with multi-view projection fusion (LLMF), a comprehensive experimental study was conducted from three complementary perspectives. All experiments were performed on a workstation equipped with an Intel Core i7-6600U CPU and 16 GB RAM.

Two datasets were used for evaluation: the open-source MulRan dataset and a self-collected long-term farm dataset. The farm dataset was acquired using the inspection robot shown in Figure 3. The robot is equipped with a 16-line Robosense LiDAR and a 6-axis inertial measurement unit (IMU). Data collection was carried out over a period of nine months, resulting in 32 operational runs, with an average frequency of approximately one run per week.

Figure 3.

The inspection robot used in the experiments, equipped with a 16-line Robosense LiDAR and a 6-axis IMU.

The experimental evaluation focuses on the following three aspects:

qualitative assessment of multi-view projection-based coordinate system alignment to verify the effectiveness of global map unification across different operational periods;

quantitative analysis of computational performance, including change detection runtime, memory consumption, and CPU usage;

long-term evaluation of localization performance in real-world deployment scenarios.

4.1. Qualitative verification of multi-view projection alignment

To qualitatively verify the effectiveness of the proposed projection-based map alignment strategy, two representative inspection locations were selected from the point cloud maps obtained during different runs of both the MulRan and the farm datasets. For each dataset, the spatial coordinates of two inspection point vectors, denoted as $p_{A}$ and $p_{B}$ , were examined before and after alignment.

Specifically, for the MulRan dataset, $p_{A}^{r}$ and $p_{B}^{r}$ denote the inspection point vectors in the prior map $M_{r}^{m}$ , while $p_{A}^{c}$ and $p_{B}^{c}$ denote the corresponding point vectors in the current map $M_{c}^{m}$ . After applying the proposed multi-view projection alignment, the transformed inspection points are denoted as ${\hat{p}}_{A}^{c}$ and ${\hat{p}}_{B}^{c}$ in the aligned map ${\hat{M}}_{c}^{m}$ . The same notation is adopted for the farm dataset and is not repeated here for brevity.

The qualitative alignment results are illustrated in Figure 4. Figure 4(A) to (C) correspond to the MulRan dataset, showing the point cloud maps before and after alignment and the comparison of inspection point vectors. Figure 4(D) to (F) present the corresponding results for the farm dataset. Quantitative results, including coordinate deviations and projection alignment time, are reported in Table 2. Together, these results demonstrate that the proposed alignment strategy achieves accurate global coordinate unification while maintaining low computational overhead, which is essential for long-term map maintenance.

Figure 4.

Qualitative results of coordinate system alignment. (A,B) and (D,E) show point cloud maps of the MulRan and farm datasets before and after alignment, respectively. (C) and (F) illustrate the spatial correspondence of inspection points after alignment.

Table 2.

MAP alignment based on multiple views projection.

Mulran dataset	$M_{r}^{m}$ (reference)		$M_{c}^{m}$ (before alignment)		t(s)	${\hat{M}}_{c}^{m}$ (after alignment)
	$P_{A}^{r}$	$P_{B}^{r}$	$P_{A}^{c}$	$P_{B}^{c}$	127	${\hat{P}}_{A}^{c}$	${\hat{P}}_{B}^{c}$
	$(- 1.13, - 45.79, 16.93)$	$(21.13, 18.07, 11.76)$	$(- 0.30, - 46.55, 16.06)$	$(20.96, 17.06, 10.36)$		$(- 1.20, - 45.69, 16.39)$	$(20.97, 18.08, 10.89)$
Farm Dataset	$M_{r}^{f}$ (Reference)		$M_{c}^{f}$ (Before alignment)		t(s)	${\hat{M}}_{c}^{f}$ (After alignment)
	$P_{A}^{r}$	$P_{B}^{r}$	$P_{A}^{c}$	$P_{B}^{c}$	85	${\hat{P}}_{A}^{c}$	${\hat{P}}_{B}^{c}$
	$(- 74.86, 133.52, 0.27)$	$(10.40, 97.93, 3.55)$	$(- 73.87, 133.60, - 1.76)$	$(11.69, 97.90, 2.01)$		$(- 74.79, 133.50, 0.31)$	$(10.51, 98.12, 3.35)$

4.2. Change detection

This subsection presents a quantitative evaluation of the proposed image-based change detection module, with a particular focus on computational efficiency. Experiments were conducted on both the long-term farm dataset and the open-source MulRan dataset.

For the farm dataset, the prior point cloud map $P_{p}^{F}$ occupies 103.9 MB of memory and contains 6,672,069 point vectors. The corresponding current map $P_{c}^{F}$ occupies 106.8 MB and consists of 6,493,678 points. For the MulRan dataset, the prior map $P_{p}^{M}$ occupies 160.6 MB and includes 4,604,272 points, while the current map $P_{c}^{M}$ occupies 205.6 MB of memory and contains 5,818,631 points.

Table 3 reports a comparative evaluation of three change detection approaches: a point cloud matching-based method,³⁷ a grid division-based method,³⁸ and the proposed method. The comparison considers three key performance indicators that are critical for long-term robotic deployment: computation time, system memory usage, and CPU utilization. All methods were evaluated on both the MulRan dataset and the farm dataset under identical hardware conditions. As shown in Table 3, the proposed change detection approach consistently achieves substantially lower computation time compared to the point cloud matching-based and grid division-based methods, while also reducing memory consumption and CPU usage. These results demonstrate that transforming 3D point cloud change detection into 2D image-based processing leads to significant computational efficiency gains, making the proposed approach well suited for long-term map maintenance in large-scale and dynamic environments.

Table 3.
Comparison table of computation time, memory usage, and CPU usage for change detection algorithms.

Farm dataset MulRan dataset

Algorithm Computation time (s) Memory usage (%) CPU usage (%) Computation time (s) Memory usage (%) CPU usage (%)

Point Cloud Matching-based³⁷ $> 3600$ 96.9 12.1 $> 3600$ 99.2 9.9

Grid division-based³⁸ 706 96.0 10.9 638 98.7 9.3

LLMF 73 95.7 8.6 59 99.5 8.8

	Farm dataset	MulRan dataset
Point Cloud Matching-based³⁷	$> 3600$	96.9	12.1	$> 3600$	99.2	9.9
Grid division-based³⁸	706	96.0	10.9	638	98.7	9.3
LLMF	73	95.7	8.6	59	99.5	8.8

4.3. Global localization performance testing

To evaluate the practical value of the proposed framework in real robotic engineering scenarios, a systematic long-term localization evaluation was conducted using the self-collected farm dataset.

First, the complete map update and maintenance process of the proposed framework is illustrated qualitatively in Figures 5 and 6, providing an intuitive validation of its feasibility and effectiveness in dynamic environments. Figure 5 presents the front-view (FV) projection and alignment results, demonstrating the consistency of vertical structural alignment across different operational periods. Figure 6 visualizes the full Bird’s-Eye View (BEV)-based pipeline, including projection alignment, change detection, and map update.

Figure 5.

Front-view (FV) projection and alignment results for point cloud maps acquired at different operational periods.

Figure 6.

Bird’s-Eye View (BEV)-based map maintenance pipeline. The figure illustrates projection alignment, change detection, image-to-point cloud mapping, and the resulting updated map, verifying the feasibility of the proposed scheme in dynamic environments.

Specifically, in Figure 6, panels (A) and (B) show the BEV projection images of the point cloud maps acquired at different times. Panel (C) illustrates ORB feature extraction and homography estimation between the two projections. Panel (D) presents the change detection results obtained using the proposed image-based change detection network, where red regions indicate structures that have disappeared in the current environment and green regions correspond to newly added structures. Based on these results, image-to-point cloud mapping is performed in panel (E), where obsolete points are removed and newly observed points are added to the prior map. The resulting updated point cloud map is shown in panel (F), with a magnified view of representative changed regions provided in panel (G). Importantly, all operations are performed within the original global coordinate system, ensuring that the coordinates of previously labeled task points remain unchanged throughout the update process.

Second, the localization performance of the proposed framework is quantitatively evaluated by comparison with widely used state-of-the-art LiDAR-based SLAM systems. Specifically, Tightly-coupled LiDAR Inertial Odometry via Smoothing and Mapping (LIO-SAM)⁵⁴ and Fast-LIO2,⁵⁵ which do not incorporate map update mechanisms, and LT-Mapper,³⁰ which supports long-term map maintenance, are selected as benchmark methods. Localization accuracy is assessed using the Absolute Trajectory Error (ATE) metric.¹⁹

The evaluation focuses on the robot’s seventh operational run, which was conducted approximately two months after the initial mapping and during which substantial environmental changes had occurred in the farm scenario. The corresponding localization trajectories and quantitative ATE results are reported in Figure 7 and Table 4, respectively. The results demonstrate that the proposed framework maintains accurate and stable localization performance over long-term deployment, even under significant environmental changes, thereby validating its effectiveness for practical robotic applications.

Figure 7.

Comparison of results from the Robot’s 7th operation in the farm dataset.

Table 4.

Comparison of ATE in the 7th operation.

Method	Max	Median	Min	RMSE	Std
LIO_SAM⁵⁴	1.46	0.47	0.04	0.58	0.27
Fast_LIO2⁵⁵	2.14	0.46	0.05	0.56	0.26
LT_Mapper³⁰	0.82	0.27	0.01	0.34	0.14
LLMF	0.47	0.14	0.01	0.16	0.11

Finally, to further evaluate the long-term localization stability of the proposed framework in dynamic environments, the root mean square error (RMSE) of the Absolute Trajectory Error (ATE) is statistically analyzed over 32 operational runs conducted in the real farm environment across a nine-month period. RMSE is a widely adopted performance metric in the SLAM community and provides an objective measure of cumulative localization accuracy over time.

Figure 8 presents both a heatmap and a line chart summarizing the RMSE results for all evaluated algorithms. In the heatmap, darker colors indicate larger localization errors, while the value “ $-$ 1” (shown in grey) denotes localization failure. The accompanying line chart illustrates the temporal evolution of trajectory error across repeated runs. Although LIO-SAM, Fast-LIO2, and the proposed method without map maintenance exhibit strong localization performance in the early stages, these methods lack the capability to update and maintain a prior map. As a result, their localization error increases progressively as environmental changes accumulate, eventually leading to localization failure.

Figure 8.

Heatmap and line chart of RMSE values for the Absolute Trajectory Error (ATE) over 32 operational runs spanning nine months in a farm environment. Darker colors indicate larger errors, while “ $-$ 1” denotes localization failure.

The LT-Mapper algorithm, which incorporates a map maintenance mechanism, is also evaluated for comparison. While LT-Mapper is able to adapt to environmental changes to some extent, its performance is consistently weaker than that of the proposed framework, primarily due to the local update limitations discussed in Section 3 and illustrated in Figure 1. This observation is further supported by the boxplots shown in Figure 9, where larger interquartile ranges indicate poorer localization stability over long-term operation.

Figure 9.

Boxplot comparison of RMSE values for different localization algorithms over 32 runs in a farm environment. Larger values and wider distributions indicate lower localization accuracy and stability, while “ $-$ 1” denotes localization failure.

To further assess the contribution of the map update module within the proposed lifelong mapping framework, an ablation study was conducted. Figure 10 compares localization performance across 32 repeated runs under two configurations: with map update enabled and with map update disabled. When the map update module is disabled, localization error accumulates steadily over time, and localization failure occurs in later operational stages. In contrast, enabling the map update module allows the system to continuously correct deviations induced by dynamic environmental changes, thereby maintaining stable and accurate localization over long-term deployment.

Figure 10.

Comparison of localization error over 32 long-term runs in the farm environment with the map update module enabled and disabled. Localization accuracy is evaluated using the RMSE of the Absolute Trajectory Error (ATE).

To more comprehensively quantify the localization advantages of LLMF in long-term dynamic scenarios, Table 5 presents the statistical analysis results of the Root Mean Square Error (ATE-RMSE) of the Absolute Trajectory Error for 32 farm runs. In terms of sample coverage, LLMF completed all 32 runs without any localization failure cases. In contrast, LIO-SAM and Fast-LIO2 only had 25 and 17 valid samples respectively due to localization failure in the later stages, which confirms the stability of LLMF in long-term deployments. Quantitative indicators show that the mean RMSE of LLMF is only 0.2567 m, which is much lower than 1.2701 m of LIO-SAM, 0.7652 m of Fast-LIO2, and 0.4021 m of LT-Mapper. Moreover, its standard deviation (0.0584 m) and 95% confidence interval ([0.2357 m, 0.2778 m]) are the smallest, indicating that the consistency of its localization accuracy is significantly superior to that of the comparative algorithms. Through significance testing, it can be seen that the differences between LLMF and all comparative methods reach a statistically significant level ( $p < 0.01$ or $p < 0.001$ ). Among them, the accuracy improvement relative to LIO-SAM is 79.79%, 36.15% relative to LT-Mapper, and even 47.35% compared with the LLMF variant without map update enabled. These statistical results further verify the effectiveness of the map update module and multi-view projection fusion strategy in the proposed framework, ensuring that the robot can maintain stable and high-precision localization performance amid dynamic environmental changes over a nine-month period.

Table 5.

Statistical analysis of ATE-RMSE for 32 farm runs.

Algorithm	Sample size	RMS Emean (m)	Std (m)	95%CI (m)	p-valus vs ours	Significance	RMSE Diff (m)	Accuracygain (%)
LIO_SAM	25	1.2701	1.0050	[0.8553, 1.6850]	0.000037	( $p < 0.001$ )	1.0134	79.79
LLMF_No_Updated	26	0.4876	0.3564	[0.3437, 0.6316]	0.003	( $p < 0.01$ )	0.2309	47.35
LLMF	32	0.2567	0.0584	[0.2357,0.27781]	—	—	—	—
Fast_LIO2	17	0.7652	0.6224	[0.4452,1.0852]	0.003932	( $p < 0.01$ )	0.5085	66.45
LT_Mapper	32	0.4021	0.0703	[0.3767,0.4274]	0	( $p < 0.001$ )	0.1454	36.15

These results demonstrate that the map update module is a critical component of the proposed framework. Removing this module significantly degrades the system’s ability to adapt to environmental dynamics, leading to reduced localization accuracy and stability, particularly in long-duration and large-scale operational scenarios.

5. Discussion

The experimental results demonstrate that the proposed lifelong mapping framework achieves robust localization performance while substantially improving the computational efficiency of change detection and map maintenance. Compared with traditional point cloud matching-based methods and grid division-based approaches, the proposed image-based change detection strategy reduces computation time from several hours to several minutes under comparable experimental conditions. This improvement can be primarily attributed to two factors: first, the transformation of computationally expensive 3D point cloud operations into efficient 2D image processing tasks; and second, the strong capability of deep learning models to capture structural changes in projected representations. From an engineering perspective, these characteristics make the proposed framework particularly suitable for long-term robotic deployment in large-scale and dynamic environments. The nine-month experimental validation conducted in a real farm scenario confirms that the framework can reliably cope with gradual and substantial environmental changes over extended operational periods, without requiring frequent manual intervention.

A key strength of the proposed approach lies in its adoption of a global-to-local map maintenance strategy. By performing change detection and updates at the global map level while preserving the original coordinate system, the framework enables incremental map evolution without introducing coordinate drift or inconsistency. As a result, existing semantic annotations, navigation targets, and task-specific reference points defined in the initial map remain valid throughout long-term operation, eliminating the need for costly re-annotation. This design choice distinguishes the proposed framework from many existing long-term mapping approaches that rely on local or incremental updates, which may accumulate alignment errors over time. By maintaining coordinate system consistency, the framework provides a practical pathway toward sustainable, long-term autonomy, supporting continuous robotic operation in environments subject to ongoing structural changes.

Although the proposed approach effectively ensures accurate long-term robot localization in dynamic environments, it still has certain limitations: The core design idea of this paper is to project two PCD point cloud maps acquired at different times into Bird’s-Eye View (BEV) and Frontal View (FV) 2D representations respectively. Through feature registration and difference comparison, environmental change regions are accurately identified, thereby realizing the selective update of the prior map. This update mechanism adopts a post-task processing mode, which needs to be completed offline after the robot finishes a single task and cannot achieve synchronous update during task execution. The applicability of the above design idea is significantly limited in two types of scenarios: First, extremely dynamic environments, such as scenes with frequent cargo handling and rapid movement of temporary obstacles in warehouses. The offline update mode is difficult to respond to the high-frequency changes of the environment in real time; second, scenes where extreme weather causes large-scale sudden changes in spatial geometric characteristics, such as sudden heavy snow covering the original terrain and heavy rainfall leading to changes in surface morphology. Such drastic changes will lead to a significant reduction in feature matching points for projection registration, thereby affecting the reliability of change detection and map update. It should be supplemented that the problem of map maintenance and localization under such extreme change scenarios is still an incompletely solved technical challenge in the field of LiDAR-localized robots. In future engineering applications, efficient map maintenance and updates can be achieved by adding QR codes, magnetic induction, or adopting RTK-aided positioning in outdoor scenarios free from magnetic field interference.

6. Conclusion

This paper addresses the degradation of robotic localization performance caused by the failure of prior maps in long-term dynamic environments. To this end, a modular lifelong mapping framework (LLMF) is proposed, which performs map maintenance through change detection based on point cloud projection into the image domain. The framework is designed to support long-term robotic operation without altering the global coordinate system of the original map.

The main contributions of this work can be summarized as follows:

A complete lifelong mapping framework is developed by integrating global localization, multi-view point cloud projection alignment, image-based change detection, and point cloud map update into a unified system. This modular design enables the systematic perception of environmental changes and the automated maintenance of global maps during long-term operation.

An efficient integrated strategy for point cloud registration and change detection is proposed, consisting of three collaboratively designed algorithms: Algorithm 1 projects 3D point clouds into 2D representations and accurately estimates the relative spatial relationships between maps from different periods through feature-based registration; Change detection algorithm leverages a deep learning-driven change detection model to efficiently identify structurally altered regions in the environment; Algorithm 2 fuzes the outputs of the previous two, back-projects image-level differences into the point cloud space, and performs precise addition and removal of changed point clouds while preserving the original global coordinate system, thereby achieving adaptive map maintenance. This strategy completely avoids the high computational cost associated with direct 3D point cloud processing and significantly improves the computational efficiency and practicality of map maintenance in long-term dynamic environments.

Extensive experimental evaluation is conducted using both an open-source dataset and a long-term real-world farm dataset spanning nine months. The results demonstrate that the proposed framework preserves localization accuracy before and after map updates while reducing the computation time required for change detection from several hours to several minutes, highlighting its suitability for practical engineering deployment.

Regarding the two fundamental challenges in lifelong mapping, the solution effectiveness is clearly specified as follows: Fully Addressed Challenge: Accurate Alignment of Point Cloud Map Coordinate Systems Across Different Periods and Maintenance of Coordinate Consistency LLMF innovatively adopts a BEV+FV multi-view projection fusion strategy, combined with ORB feature matching and homography matrix estimation, enabling high-precision registration without the need for re-annotating semantic information. The “global-to-local” update mechanism fundamentally eliminates coordinate drift and map ghosting. In 32 farm runs, the registration error is only at the centimeter level, with the narrowest 95% confidence interval ( $[0.2357 m, 0.2778 m]$ ) and the optimal long-term stability. Engineering-Effectively Addressed Challenge: Efficient Change Detection for Large-Scale 3D Point Clouds By transforming 3D point cloud change detection into 2D image processing and integrating the MTKD model, LLMF reduces the detection time from several hours to a few minutes, with significantly reduced resource consumption. It should be noted that the scheme adopts an offline update mode, leaving room for optimization in high-frequency dynamic scenarios or scenarios with sudden extreme weather changes—this remains a common unsolved challenge in the field.

Future work will focus on extending the robustness and applicability of the proposed framework. In particular, we plan to investigate more resilient projection and alignment strategies for handling extreme environmental changes, to explore multi-modal change detection methods that incorporate additional sensory cues, and to extend the framework to multi-robot scenarios to support distributed lifelong map maintenance.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Sichuan Science and Technology Program (2025YFHZ0103, 2025HJRC0022, 2023NSFSC1985), and Jiangsu Distinguished Professor Programme.

Conflicts of interest

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Footnotes

ORCID iDs

Kaiyi Xian

Duo Liu

Gexiang Zhang

Ferrante Neri

Song Chen

References

González

Villar

Tan

, et al. An efficient multi-robot path planning solution using A* and coevolutionary algorithms. Integr Comput Aided Eng 2022; 30: 41–52.

Jeon

Moon

Jeong

, et al. Autonomous flight strategy of an unmanned aerial vehicle with multimodal information for autonomous inspection of overhead transmission facilities. Comput-Aided Civ Infrastruct Eng 2024; 39: 2159–2186.

Kamjoo

Rostami

Fakhrmoosavi

, et al. A simulation-based approach for optimizing the placement of dedicated lanes for autonomous vehicles in large-scale networks. Comput Aided Civ Infrastruct Eng 2024; 39: 3011–3029.

Zhang

. Fast-lio: a fast, robust LiDAR-inertial odometry package by tightly-coupled iterated Kalman filter. IEEE Robot Autom Lett 2021; 6: 3317–3324.

Bai

Xiao

Chen

, et al. Faster-lio: lightweight tightly coupled LiDAR-inertial odometry using parallel sparse incremental voxels. IEEE Robot Autom Lett 2022; 7: 4861–4868.

Wei

Jiao

, et al. FusionPortableV2: a unified multi-sensor dataset for generalized slam across diverse platforms and scalable environments. Int J Robot Res 2025; 44: 1093–1116.

Siddique

Adeli

. Spiral dynamics algorithm. Int J Artif Intell Tools 2014; 23: 1430001.

Siddique

Adeli

. Harmony search algorithm and its variants. Int J Pattern Recognit Artif Intell 2015; 29: 1539001:1.

Siddique

Adeli

. Hybrid harmony search algorithms. Int J Artif Intell Tools 2015; 24: 1530001:1.

10.

Peterson

Jia

Tian

, et al. Roman: open-set object map alignment for robust view-invariant global localization. arXiv preprint arXiv:241008262 (2024).

11.

Chen

Zhong

Xie

, et al. SLAM-RAMU: 3D LiDAR-IMU lifelong slam with relocalization and autonomous map updating for accurate and reliable navigation. Ind Robot 2024; 51: 219–235.

12.

Yang

Prakhya

Zhu

, et al. Lifelong 3D mapping framework for hand-held & robot-mounted LiDAR mapping systems. IEEE Robot Autom Lett 2024; 9: 9446–9453.

13.

Behley

Garbade

Milioto

, et al. SemanticKITTI: a dataset for semantic scene understanding of LiDAR sequences. In: IEEE/CVF international conference on computer vision (ICCV), Seoul, Korea (South), 2019, pp.9296–9306. IEEE. DOI: 10.1109/ICCV.2019.00939.

14.

Yamaguchi

Mizutani

. Quantitative road crack evaluation by a u-net architecture using smartphone images and LiDAR data. Comput-Aided Civ Infrastruct Eng 2023; 39: 963–982.

15.

Zhang

Wang

Han

, et al. Deep learning framework with local sparse transformer for construction worker detection in 3D with LiDAR. Comput Aided Civ Infrastruct Eng 2024; 39: 2990–3007.

16.

Esmorís

Vilariño

Fernández-Arango

, et al. Characterizing zebra crossing zones using LiDAR data. Comput-Aided Civ Infrastruct Eng 2023; 38: 1767–1788.

17.

Kubelka

Reinstein

Svoboda

. Tracked robot odometry for obstacle traversal in sensory deprived environment. IEEE ASME Trans Mechatron 2019; 24: 2745–2755.

18.

Chen

Huang

Fitch

. Active SLAM for mobile robots with area coverage and obstacle avoidance. IEEE ASME Trans Mechatron 2020; 25: 1182–1192.

19.

Xian

Liu

Zhang

, et al. Gravity-constrained simultaneous localization and mapping for suppressing map warping in complex large-scale environments. Integr Comput Aided Eng 2025; .

20.

Krajník

Vintr

Molina

, et al. Warped hypertime representations for long-term autonomy of mobile robots. IEEE Robot Autom Lett 2019; 4: 3310–3317.

21.

Santos

Krajník

Duckett

. Spatio-temporal exploration strategies for long-term autonomy of mobile robots. Rob Auton Syst 2017; 88: 116–126.

22.

Tang

Wang

Ding

, et al. Topological local-metric framework for mobile robots navigation: a long term perspective. Auton Robots 2019; 43: 197–211.

23.

Torres

MAV

Braun

Borrmann

. BIM-SLAM: integrating BIM models in multi-session slam for lifelong mapping using 3D LiDAR. arXiv preprint arXiv:240815870 (2024).

24.

Yuan

. MM-LINS: a multi-map LiDAR-inertial system for over-degenerate environments. IEEE Trans Intell Veh 2025; 10: 472–482.

25.

Kunze

Hawes

Duckett

, et al. Artificial intelligence for long-term robot autonomy: a survey. IEEE Robot Autom Lett 2018; 3: 4023–4030.

26.

Walcott-Bryant

Kaess

Johannsson

, et al. Dynamic pose graph SLAM: long-term mapping in low dynamic environments. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp.1871–1878.

27.

Wang

Guo

Chen

, et al. An in-depth examination of slam methods: challenges, advancements, and applications in complex scenes for autonomous driving. IEEE Trans Intell Transp Syst 2025; 26: 11066–11087. DOI: https://doi.org/10.1109/TITS.2025.3545479.

28.

Kim

Park

Cho

, et al. Mulran: multimodal range dataset for urban place recognition. In: 2020 IEEE international conference on robotics and automation (ICRA), pp.6246–6253. IEEE. DOI: 10.1109/ICRA40945.2020.9197298.

29.

Zhao

Guo

Song

, et al. A general framework for lifelong localization and mapping in changing environment. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp.3305–3312.

30.

Kim

. LT-mapper: a modular framework for LiDAR-based lifelong mapping. In: 2022 international conference on robotics and automation (ICRA). IEEE, pp.7995–8002.

31.

Kim

. Remove, then revert: static point cloud map construction using multiresolution range images. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp.10758–10765. IEEE. DOI: 10.1109/IROS45743.2020.9340856.

32.

Kim

. Scan context: egocentric spatial descriptor for place recognition within 3D point cloud map. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp.4802–4809.

33.

Wang

Zhang

, et al. Lidar-camera fusion based on KD-tree algorithm. In: 2022 2nd international conference on electronic information engineering and computer technology (EIECT), pp.410–414. IEEE. DOI: 10.1109/EIECT58010.2022.00086.

34.

Ding

Hou

Gao

, et al. Lidar inertial odometry aided robust LiDAR localization system in changing city scenes. In: 2020 IEEE international conference on robotics and automation (ICRA). IEEE, pp.4322–4328.

35.

Zováthi

Nagy

Benedek

. Point cloud registration and change detection in urban environment using an onboard LiDAR sensor and MLS reference data. Int J Appl Earth Obs Geoinf 2022; 110: 102767.

36.

Fang

Qian

, et al. LiLoc: lifelong localization using adaptive submap joining and egocentric factor graph. In: 2025 IEEE international conference on robotics and automation (ICRA), pp.8041–8047. IEEE. DOI: 10.1109/ICRA55743.2025.11127560.

37.

Besl

McKay

. A method for registration of 3-D shapes. IEEE Trans Pattern Anal Mach Intell 1992; 14: 239–256.

38.

Han

Zhang

. Grid graph-based large-scale point clouds registration. Int J Digit Earth 2023; 16: 2448–2466.

39.

Hornung

Wurm

Bennewitz

, et al. OctoMap: an efficient probabilistic 3D mapping framework based on octrees. Auton Robots 2013; 34: 189–206

40.

Al-ma’adeed

Bouridane

Crookes

, et al. Partial shoeprint retrieval using multiple point-of-interest detectors and SIFT descriptors. Integr Comput Aided Eng 2015; 22: 41–58.

41.

Łażewski

Cyganek

. Highly compressed image representation for classification and content retrieval. Integr Comput Aided Eng 2023; 31: 1–18.

42.

Bay

Tuytelaars

Van Gool

. SURF: speeded up robust features. In: European conference on computer vision. Springer, pp.404–417.

43.

Rublee

Rabaud

Konolige

, et al. ORB: an efficient alternative to SIFT or SURF. In: 2011 international conference on computer vision, pp.2564–2571. IEEE. DOI: 10.1109/ICCV.2011.6126544.

44.

Bai

Yang

Liu

, et al. Research on LiDAR vision data fusion algorithm based on improved ORB-SLAM2. In: 2025 4th international conference on artificial intelligence, internet of things and cloud computing technology (AIoTC), pp.174–179. IEEE. DOI: 10.1109/AIoTC66747.2025.11198798.

45.

Anebarassane

PDK,

, et al. Enhancing ORB-SLAM3 with YOLO-based semantic segmentation in robotic navigation. In: 2023 IEEE world conference on applied intelligence and computing (AIC), pp.874–879. IEEE. DOI: 10.1109/AIC57670.2023.10263892.

46.

Liu

Zhu

Gao

, et al. JL1-CD: a new benchmark for remote sensing change detection and a robust multi-teacher knowledge distillation framework. arXiv preprint arXiv:250213407 (2025).

47.

Cao

Meng

. A new learning paradigm for foundation model-based remote-sensing change detection. IEEE Trans Geosci Remote Sens 2024; 62: 1–12.

48.

Fang

. Changer: feature interaction is what you need for change detection. IEEE Trans Geosci Remote Sens 2023; 61: 1–11.

49.

Chen

, et al. Tightly-coupled LiDAR-based lifelong mapping using wheels and a MEMS gyroscope for mobile robots in dynamic environment. In Proceedings of the 2024 3rd international symposium on robotics, artificial intelligence and information engineering. pp. 6–14. ACM.

50.

Yuan

Cai

, et al. LL-localizer: a lifelong localization system based on dynamic i-octree. IEEE Trans Instrum Meas 2025; 74: 1–11.

51.

Shan

Englot

. LeGO-LOAM: lightweight and ground-optimized LiDAR odometry and mapping on variable terrain. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp.4758–4765. IEEE. DOI: 10.1109/IROS.2018.8594299.

52.

Yan

Ren

, et al. Improved nearest neighbor distance ratio for matching local image descriptors. In: International CCF conference on artificial intelligence. Springer, pp.185–197.

53.

Xie

Chen

, et al. An improved adaptive threshold RANSAC method for medium tillage crop rows detection. In: 2021 6th international conference on intelligent computing and signal processing (ICSP), pp.1282–1286. IEEE. DOI: 10.1109/ICSP51882.2021.9408744.

54.

Shan

Englot

Meyers

, et al. LIO-SAM: Tightly-coupled lidar inertial odometry via smoothing and mapping. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), 2020, pp.5135–5142.

55.

Cai

, et al. FAST-LIO2: fast direct LiDAR-inertial odometry. IEEE Trans Robot 2022; 38: 2053–2073.

	Farm dataset			MulRan dataset
Algorithm	Computation time (s)	Memory usage (%)	CPU usage (%)	Computation time (s)	Memory usage (%)	CPU usage (%)
Point Cloud Matching-based³⁷	$> 3600$	96.9	12.1	$> 3600$	99.2	9.9
Grid division-based³⁸	706	96.0	10.9	638	98.7	9.3
LLMF	73	95.7	8.6	59	99.5	8.8