Sage Journals: Discover world-class research

Abstract

The accuracy of agricultural unmanned ground vehicles’ localization directly affects the accuracy of their navigation. However, due to the changeable environment and fewer features in the agricultural scene, it is challenging for these unmanned ground vehicles to localize precisely in global positioning system-denied areas with a single sensor. In this article, we present an efficient and adaptive sensor-fusion odometry framework based on simultaneous localization and mapping to handle the localization problems of agricultural unmanned ground vehicles without the assistance of a global positioning system. The framework leverages three kinds of sub-odometry (lidar odometry, visual odometry and inertial odometry) and automatically combines them depending on the environment to provide accurate pose estimation in real time. The combination of sub-odometry is implemented by trading off the robustness and the accuracy of pose estimation. The efficiency and adaptability are mainly reflected in the novel surfel-based iterative closest point method for lidar odometry we propose, which utilizes the changeable surfel radius range and the adaptive iterative closest point initialization to improve the accuracy of pose estimation in different environments. We test our system in various agricultural unmanned ground vehicles’ working zones and some other open data sets, and the results prove that the proposed method shows better performance mainly in accuracy, efficiency and robustness, compared with the state-of-art methods.

Keywords

Sensor-fusion SLAM localization agricultural UGV

Introduction

Nowadays, artificial intelligence (AI) makes agriculture more and more simple and efficient, and many unmanned ground vehicles (UGVs) used in agricultural production help reduce farmers’ labor work and promote the farming procedure.¹ Localization is one of the most crucial modules for the navigation of agricultural UGVs in many precision agriculture scenarios, such as cultivating, sowing, fertilizing and watering. A global positioning system (GPS) is widely used in UGVs’ localization to help them obtain accurately (up to centimeters) longitude, latitude, and altitude. However, in some open areas where a signal is blocked by buildings and some indoor scenes that cannot be covered by GPS satellites, it is hard for UGVs to get their poses by simply relying on GPS.² Although some indoor localization methods, such as ultra-wideband (UWB)³ and quick response code positioning,⁴ can be used to replace GPS in industrial scenarios, which need special equipment to be settled down at a known position in advance, they are not suitable for agriculture due to the lack of flexibility. To this end, an efficient, adaptive and flexible localization method makes a lot of sense for these UGVs in GPS-denied areas. Lidar, camera and inertial measurement unit (IMU) are common and widely used sensors equipped in UGVs to assist localization, and simultaneous localization and mapping (SLAM) is a software algorithm that processes and integrates the data coming from these sensors to get a sensible pose.

For quite a long time, SLAM has been considered to be a reliable approach to help mobile robots localize and percept in unknown environments by researchers.⁵ Visual-based SLAM methods and Lidar-based SLAM methods are two main categories, and they are often implemented under the assistance of an IMU to increase the robustness and accuracy. Lidar-based SLAM can estimate the pose of a mobile robot in a large range for the physical property of a laser beam, but it may fail to give an accurate pose in large planner scenes as the differences between two consecutive lidar point clouds are too small. Visual-based SLAM can utilize the abundant visual feature of the environment and publish the position together with IMU corrected; however, it is hard to maintain robustness in a light-changing and feature-less environment. To get robust, accurate and real-time information about agricultural UGVs’ 6-degree of freedom (DOF) poses, sensor-fusion odometry utilizing the advantage of both SLAM methods is needed to maintain high performance of localization in various agricultural environments.

We present the efficient and adaptive lidar–visual–inertial odometry (EALVIO) system, which is aimed at assisting UGVs to localize in various agricultural environments without GPS and can be used as an extension of GPS to cover most agricultural scenes. Due to the changeable terrain and the lack of feature texture of agricultural scenes, three kinds of sensors are integrated into the odometry system and three kinds of sub-odometry are automatically selected according to the environmental change. We set principles to check the situation of the current environment and each odometry’s current timestamp, ensuring the odometry can adapt to various scenes efficiently and meet the need for real-time data processing at the same time. To enhance the efficiency and accuracy of pose estimation, we propose a novel iterative closest point (ICP) algorithm based on surfel, which enables surfel radius range changing to get better estimations of UGVs’ motion in various scenes, the ICP algorithm initializes with different pose increments coming from different modules of EALVIO adaptively to construct different sub-odometry. In well-structured areas, the lidar–visual–inertial odometry (LVIO) is implemented, and the lidar odometry (LO) initializes with the poses provided by visual odometry (VO) under the assistance of IMU prediction when the surrounding scene contains adequate information about the environment so that none of these sensors will degrade. In feature-less areas where VO may fail to estimate, the LO starts to estimate poses utilizing the initialization pre-calculated by inertial information and then the lidar–inertial odometry (LIO) takes the place of LVIO. The pure LO always runs at the bottom of the system to provide a 6-DOF pose estimation even in tough environments, which visual and inertial information do not support. VO in EALVIO is an indirect method based on detecting and matching image features, and visual features can obtain depth information from lidar point clouds to keep a unified metric scale. High-frequency IMU information is used to compensate for the time interval the system cannot cover and serves as an initial guess for LO when the system cold starts without any other available information. Finally, EALVIO can run in real time and output UGVs’ 6-DOF poses at IMU’s frequency. In addition, the system can run at high speed to guarantee real-time exhibition.

In short, three contributions are done in EALVIO.

An efficient and adaptive sensor-fusion odometry framework used in agricultural scenes is proposed for agricultural UGVs’ pose estimation, and a series of modules in EALVIO are designed to enhance the robustness in different agricultural environments.

A surfel-based ICP method used for LO utilizing the changeable surfel radius range and the adaptive initialization from three sub-odometry is proposed to maintain efficiency and accuracy.

A series of experiments are done to test whether the system can achieve the expected performance of efficiency, accuracy and robustness when compared with state-of-art methods, ablation studies are implemented to analyse the performance improvements brought by the corresponding modules in EALVIO and various agricultural environments are considered in the process of pose estimation.

The rest of this article is organized as follows. The second section mainly introduces the related work about SLAM and sensor-fusion localization systems. The third section shows the entire structure of EALVIO and explains in detail about the modules and the algorithms. The fourth section shows the results and analysis of the experiments. The last section gives the conclusion and future work.

Related work

Recently, sensor-fusion seems to attract more and more researchers’ attention. There are several successful sensor-fusion localization systems based on SLAM, which have been proved to be quite practical in open data sets and real-world testing.⁶ Among these advanced sensor-fusion systems, some are mainly based on lidar. Lightweight and ground-optimized lidar odometry and mapping (Lego-LOAM)⁷ is a lightweight SLAM algorithm, which can output moving robots’ pose by tracking lidar point clouds’ geometry feature, and an IMU is optional to be fused into the system as it can eliminate the error caused by lidar’s spinning motion, which means lidar and IMU are loosely coupled. Lidar–inertial odometry via smoothing and mapping (LIO-SAM)⁸ is a tightly coupled LIO, which can integrate the data from the two sensors efficiently, IMU pre-calculation serves as an initial guess for LO’s pose optimization, the LO is also based on feature-matching of point clouds, and its output can be used to correct IMU’s bias. Visual information could not be neglected and plays a key role in SLAM and sensor-fusion research field, and there are lots of reliable visual–inertial systems during the past decade.⁹ Based on extended Kalman filter (EKF), Bloesch et al. presented a monocular visual–inertial odometry¹⁰ directly using pixel intensity errors of image patches, with the help of EKF, the camera and IMU can yield a closely coupled visual–inertial framework, IMU measurements are used to propagate the state of the filter and then visual measurements are utilized in the update stage. Monocular visual-inertial system (VINS-Mono)¹¹ is another typical example of a visual–inertial system using a pose graph, it is a robust and versatile visual–inertial state estimation method to obtain accurate poses of moving robots, a low-cost mono-camera and an IMU are tightly coupled to form a minimum sensor suite and the IMU is used to provide metric scale information¹² to the VO, which is based on visual-structure matching and pose-graph optimization.

However, neither LIO nor visual–inertial odometry could provide complete and accurate 6-DOF poses in changing agricultural environment, lidar will degrade in large planner areas and the camera may suffer from changing weather and feature-less terrain, which are common in agricultural scenes. There is no doubt that the minimum sensor suite for agricultural UGVs needs to be extended. Zhang and Singh proposed a sensor-fusion SLAM framework¹³ integrating three-dimensional (3-D)-lidar, camera and IMU to estimate ego-motion of mobile devices from coarse to fine using a mutilayer processing pipeline, the final pose estimation is given through IMU prediction, VO and lidar scan matching sequentially and a bypass module is implemented to handle sensor degradation. Lidar–visual–inertial odometry via smoothing and mapping (LVI-SAM)¹⁴ is another multi-sensor-fusion localization and mapping system designed for UGVs and hand-held devices, two sub-systems(lidar–inertial and visual–inertial) compose the whole system and it can stay robust when either of the sub-systems is broken down. SuperOdometry¹⁵ is a lidar–visual–inertial estimator used in perceptually degraded environments, and the IMU-centric sensor fusion architecture is proposed to give accurate pose estimation utilizing the independence of the measurement of IMU. The pose-graph-based methods are commonly used in these systems mentioned above, and there are some methods^16,17 utilizing pose-graph to integrate even more types of sensors, including GPS, IMU, Odometer, lidar and camera. In addition, an multi-state constraint Kalman filter (MSCKF)¹⁸ framework can also be used to fuse multi-sensors to localize, and the lidar–inertial–camera-Fusion¹⁹ and multi-sensor aided inertial navigation system (MINS)²⁰ are the typical representations utilizing MSCKF.

These systems leverage at least three kinds of sensors to estimate UGVs’ poses to cover most localization problems in various scenes, but they do not pay much attention to the adaptability to environments. The lack of adaptability to environments is exposed in the methods of sensor fusion, and the existing methods are mainly based on the Kalman filter or nonlinear optimization, both two methods rely on the uncertainty of the sensors’ measurement. The Kalman filter-based methods need the covariance measuring the uncertainty to propagate noise introduced by different sensors during the process of pose estimation in a continuous time period, the nonlinear optimization-based methods like pose-graph optimization also utilize the covariance to determine the weight of constraints of the corresponding sensors’ measurements. Since uncertainty is the only factor used to fuse sensors’ measurements, the accuracy of pose estimation varies a lot when they are implemented in different scenes without the perception of environments in the process of sensor fusion.

Due to the explanation above, the localization system needs to be efficient, accurate and robust enough to handle problems challenged by agricultural environments. In EALVIO, we have done some meaningful trade-offs among these factors. Inspired by surfel-based mapping (SuMa)²¹, which is astonishingly efficient LO proposed by Jens Behley and Cyrill Stachniss, an ICP method based on surfel²² is used to drive the LO to meet the efficiency requirement, and we innovatively add a surfel radius range changeable mechanism by perceiving the surroundings to increase accuracy in changing environment. In addition, the ICP initialization is taken from three individual sources adaptively to fuse pose estimation from different sensors. Sensor degradation can be recognized through the current environment’s complexity analysed by visual information and the uncertainty of sensors’ measurement. Different from SuperOdometry,¹⁵ which is an IMU-centric system, several odometry modules are working based on the output of the IMU odometry module, and we choose LO as the fundamental module for the reason that there are many bumpy road conditions in the agricultural environment where the measurement of IMU may drift a lot while the measurement of lidar is relatively stable.

Approach

System overview

The complete framework of EALVIO is shown in Figure 1. We use four frames in this article, such as W represents the world frame whose origin is at the start pose of the mobile device and L, C and I denote the lidar frame, the camera frame and the IMU frame, respectively. A right-down index like k in L_k represents the order of the lidar scans that have been received. As a real-time system, we use t_k to show the time periods experienced during the lidar scan L_k since the odometry starts. The transformation from frame A to B is denoted as $T_{B A} \in ℝ^{4 \times 4}$ , and it is composed of the rotation matrix $R_{B A} \in S O (3)$ and the transformation matrix $t_{B A} \in ℝ^{3}$ . A lidar state vector X_k can be used to describe the rotation, position and velocity of a lidar equipped in a UGV moving in the world frame

X_{k} = [R_{k} p_{k} v_{k}]

Figure 1.

Overview of EALVIO’s framework. EALVIO: efficient and adaptive lidar–visual–inertial odometry.

As mentioned previously, EALVIO can leverage three kinds of odometry’s advantages along with an environmental change in agricultural scenes. There is an abstract structure described in Figure 2 showing the sub-odometry’s construction in EALVIO. LO, VO and IMU modules can be combined arbitrarily to construct LO, LIO and LVIO. The lower module could provide initial pose guessing for the adjacent upper module, and this combining process can be found in the supplementary video.

Figure 2.

Structure of sub-odometry in EALVIO. EALVIO: efficient and adaptive lidar–visual–inertial odometry.

The following contents will introduce briefly the processing procedure of the raw data from the different sensors and the function of each module in the framework of EALVIO:

The visual-feature detection module receives raw images from the camera and detects the features in these images, then visual features are sent to the VO and the number of them is used to analyse the complexity of the current environment by the environment judgement module.

The surfel generation module receives raw lidar points from the lidar and generates surfels of the current lidar scan, and the radius range of these surfels generated will be affected and modified according to the output of the environment judgement module and the lidar state from the LO. The lidar points are also projected into the camera frame to provide feature-depth information for VO.

The IMU pre-calculation module receives an inertial message from the IMU and provides a coarse initial pose transformation for LO, the inertial message is also leveraged in the IMU integration module to be fused with the output of LO, and the fused pose estimation serves as the initialization of VO.

The VO leverages visual features with depth information and the initialization provided by IMU integration to compute an optimized pose transformation, and the output will be finally sent to LO as an initialization after timestamp checking and correction.

The LO module is the core of EALVIO as it combines the output of other modules and gives the final odometry, it maintains and updates a local map constructed by surfels and continually performs ICP based on surfels, which have changeable radius range in different environments, the initialization of ICP is chosen from different modules in different scenes according to a series of principles and the final output serves as the feedback for the surfel generation module.

Surfel generation with a changeable radius range

A 3-D lidar receives about millions of reflecting points per second, and it is tricky to handle these 3-D points and get a relatively accurate pose estimation in real time. Therefore, efficiency is the first thing to consider to make LO practical. We leverage surfels to reduce the time complexity of ICP-based point cloud registration; in addition, the surfel radius range is changeable to adapt to the current environment through the judgement of lidar state and visual-feature detection module. With the receiving points’ 3-D position in the lidar frame, the corresponding surfels are generated according to the following procedure.

Firstly, a projection function $\prod$ is used to project the points $p_{i}^{L} \in ℝ^{3}$ in the current lidar frame into a spherical coordinate to get two-dimensional descriptions $V_{i} \in ℝ^{2}$ of these points. We also store a vertex map V to search for the corresponding $p_{i}^{L}$ of each $V_{i}$ . w and h are the data width and height of the lidar scan, respectively

\begin{array}{l} V_{i} = \prod (x_{i}, y_{i}, z_{i}) = (u_{i}, v_{i}) = \\ (\frac{w}{2} \cdot (1 - arctan (y_{i}, x_{i})), h \cdot (1 - arcsin (z_{i} \cdot {‖ p_{i}^{L} ‖}^{- 1})) \end{array}

V = {V_{i}, p_{i}^{L}}

Then, a surfel s_i can be generated for each $V_{i}$ , s_i is consisted of a norm $n_{s_{i}} \in ℝ^{3}$ and a radius $r_{s_{i}} \in ℝ$ and, $n_{s_{i}}$ is relevant to the local position gradient around $p_{i}^{L}$ , and we store another normal map N to search for the corresponding $n_{s_{i}}$ of each $V_{i}$

\begin{array}{l} n_{s_{i}} = (V ((u_{i} + 1, v_{i})) - V ((u_{i}, v_{i}))) \times \\ (V ((u_{i}, v_{i} + 1)) - V ((u_{i}, v_{i}))) \end{array}

r_{s_{i}} = \frac{k \cdot ‖ V (s_{i}) ‖}{{(V (s_{i}) \cdot {(‖ V (s_{i}) ‖)}^{- 1})}^{T} \cdot n_{s_{i}}}

where k is a constant making the surfels cover the current lidar scanning range. $r_{s_{i}}$ has a range $r_{s_{min}} \leq r_{s_{i}} \leq r_{s_{max}}$ , which can be adjusted along with the environmental change, and the radius is the indicator of the lidar measurements’ accuracy as it represents the difference between laser beam angle and the local surface norm. In addition, surfel radius range is affected by lidar state, and the LO based on ICP²³ may take a long time to converge, in some cases, that lidar suffers a sudden rotation. We need to finely increase the limits of surfel radius range to maintain LO’s robustness since the measurements are not such reliable in these unstable cases

\begin{array}{l} r_{s_{max}} = \\ {\begin{matrix} r_{{init}_{max}}, if
 
 
roll,pitch,yaw<threshold \\ r_{{init}_{max}} + p_{1} \cdot roll + p_{2} \cdot pitch + p_{3} \cdot yaw
 
,
 
otherwise \end{matrix} \end{array}

where p ₁, p ₂, and p ₃ are the constant coefficients corresponding to $roll$ , $pitch$ , and $yaw$ of lidar pose X_t coming from LO. p ₃ has a larger weight than $p 1$ and $p 2$ as the estimation of $yaw$ has a more important influence than $roll$ and $pitch$ during agricultural UGVs moving on an open field, it indicates the heading direction of UGVs.

Meanwhile, the surfel radius range will be modified further along with the environmental change to generate suitable surfels for LO in different scenes, and this mechanism is implemented in the visual-feature detection module. The visual-feature detection module mainly has two uses in the EALVIO system. Firstly, it continuously performs corner detection²⁴ in the current keyframe to pick up stable and uniform distributed visual features for the VO, and keyframes are selected depending on the number of stable visual features and the parallax between the current keyframe and the last keyframe. Another use is the interaction with the surfel generation module, visual information is heuristic to assist surfel radius range adjustment in various scenes as it contains more details about the surroundings than the lidar point clouds. We have discussed that surfel radius represents the stability of the lidar measurement, the surfel radius range is increased in the feature-less environment for the reason that ICP is hard to converge in these scenes, and surfels need to be frequently updated according to (8) in runtime to handle LO’s degradation. Otherwise, we do the reverse operation to decrease the surfel radius range in well-structured scenes. Figure 3 can be used to explain the phenomenon. To maintain real-time performance, N is used as the maximum to restrict the number of visual feature, N_t is the number of valid visual features in the frame received at the timestamp t and we define valid visual features as features that could obtain depth from the lidar points projection module. The environment state judgement module is activated once in a fixed time period $Δ t$ to compute the surfel radius range at the current timestamp, supposing that the camera takes pictures p times per second, and its strategy is described below

r_{s_{max}} = \frac{Δ t \cdot p}{\sum_{t_{0}}^{t_{0} + Δ t} \frac{N_{t}}{N}} \cdot r_{{init}_{max}}

Figure 3.

Surfels generated by EALVIO in different scenes. The upper pictures show the surfels generated in the corresponding lower scenes. The color represents the stability of surfels, yellow means stable, whereas green means unstable. The left column is a well-structured scene, and the right column is a feature-less scene. We can see that there exist more stable surfels in the right scene so we need to increase the surfel radius range for a better update of stable surfels. EALVIO: efficient and adaptive lidar–visual–inertial odometry.

A surfel map is composed of the surfels of recent lidar scans, and a map updating strategy is considered to update the old surfels and add new surfels to maintain a stable local map. The corresponding surfels of a lidar scan and the local map are matched by utilizing the vertex map V and normal map N; when a couple of matching surfels are found, we compare the radius of them; and if the new surfel radius $r_{s_{n e w}}$ is bigger than the current surfel radius $r_{s_{c u r}}$ , which means the new measurement of the surfel is more reliable, the corresponding $V_{s}$ in vertex map and n_s in the normal map are replaced with the new ones

V_{s} = V_{s_{new}}, n_{s} = n_{s_{new}}, r_{s} = r_{s_{new}}, i f r_{s_{new}} > r_{s_{cur}}

Table 1 presents the translation and rotation error of poses estimated by the SuMa system, which uses a fixed surfel radius range in the KITTI odometry data set²⁵; as the surfel radius range varies, the final accuracy will change a lot, which indicates that an unsuitable surfel radius range may weaken the performance of LO using surfels. To this end, as an adaptive sensor-fusion odometry, the surfel radius range changing mechanism is necessary for EALVIO since there is no need to tune the range in different environments manually.

Table 1.

Average translation and rotation errors of poses evaluated by SuMa with different maximal surfel radius range in KITTI odometry data set.

KITTI 00
Maximal surfel radius range (m)	0.1	0.2	0.3	0.4	0.5
Average sequence translation RMSE (%)	1.332	0.880	0.772	0.736	0.662
Average sequence rotation error (deg/m)	0.006	0.004	0.003	0.003	0.002
KITTI 01
Maximal surfel radius range (m)	0.1	0.2	0.3	0.4	0.5
Average sequence translation RMSE (%)	72.231	59.552	14.833	16.627	3.48
Average sequence rotation error (deg/m)	0.027	0.016	0.007	0.009	0.005

RMSE: root mean square error.

Visual odometry

We leverage the tightly coupled framework of VINS¹¹ to handle visual information and inertial information. The details of the camera pose estimation can be seen in their article. Here, we mainly introduce how the visual–inertial odometry cooperates with other modules of EALVIO.

It is known that there exists the problem of scale-drift¹² in pure VO due to the lack of feature depth information. IMU could help solve the problem as it can provide 6-DOF pose transformation in a short time period, but IMU bias and white noise are also introduced into the system at the same time. Although VINS has tried to eliminate the effect of IMU’s uncertainty, it is still a challenge work to offer a good initialization for the visual–inertial odometry with a large IMU bias. In EALVIO, the VO uses LO’s output integrated with raw IMU information as an initialization because LO always has a stable pose estimation in different scenes. We use a factor graph²⁶ to integrate LO and the IMU measurement in the IMU integration module, and the merged pose information contains the transformation between two consecutive images recently and the bias of IMU measurement, which are required in the VINS system. VINS maintains a sliding window composed of a series of state vectors to perform visual–inertial alignment and bundle adjustment. The full-state vector $χ$ can be written as follows

χ = [x_{0}, x_{1}, \dots x_{n}, T_{C I}, λ_{0}, λ_{1}, \dots λ_{m}]

x_{k} = [p_{k}^{W}, v_{k}^{W}, q_{k}^{W}, b_{a_{k}}, b_{ω_{k}}]

$x_{k} (k \in n)$ is the IMU state vector, including IMU position, velocity, orientation in the world frame and bias in the IMU frame, where n is the number of keyframes. The initial x_k can be obtained from LO integrated with inertial information. $T_{C I}$ is the transformation matrix from the camera frame to the IMU frame. $λ_{i} (i \in m)$ is the inverse depth²⁷ of visual feature, where m is the number of visual features in the sliding window. The depth information can be initialized with the assistance of the lidar points projection module.

When a visual picture comes at a time instant t, the position of the visual feature in the camera frame can be obtained. Since the visual feature only has a two-dimensional position, we reuse the vertex map V_M provided by the surfel generation module to perform feature-depth association. We set the z value of the visual feature to a unit and then project the visual features from the camera frame to the lidar frame through the transformation matrix $T_{L_{t} C_{t}}$

T_{L_{t} C_{t}} = T_{L I} \cdot T_{C_{t} I}^{- 1}

$T_{C_{t} I}$ is the online-calibrated transformation matrix derived from the state vector $χ$ in (9), and $T_{L I}$ is a const matrix calibrated in advance, representing the pose transformation from IMU to lidar. As explained in Kelly and Sukhatme,²⁸ the precise calibration of transformation between multi-sensors is crucial to the robustness of pose estimation of sensor-fusion systems, amd offline calibration methods are usually inconvenient to implement and may bring additional errors. To minimize the error caused by offline calibration, in EALVIO, we only calibrate the pose transformation between lidar and IMU manually and utilize the online-calibrated $T_{C_{t} I}$ .

After calculating by (11), we get the position $f_{i} (x_{i}, y_{i}, z_{i})$ of each visual feature in the lidar frame. To match the vertex map V_M to acquire depth information, f_i needs to be transformed into the sphere coordinate using the projection function $\prod$ mentioned in (2). Then, we search V_M to find the corresponding lidar points of feature f_i and obtain the inverse depth $λ_{i}$ . Figure 4 describes the association process in detail.

Figure 4.

Feature depth association. (a) Visual feature in the camera frame, we set the depth to the unit. (b) The visual feature transformed from the camera frame to the lidar frame. (c) The visual features are projected into sphere coordinate to find the corresponding depth in V_M , and V_M is transformed from W to L_t in advance using $T_{W L_{k}}$ from LO.

A non-linear bundle adjustment based on probability is performed to minimize the residuals of visual and IMU measurements, and the VO can finally provide a maximum posterior pose estimation together with a timestamp for the LO module as an initialization. Since LO and VO are asynchronous in output frequency, we need to leverage the timestamp checking and correction module to modify the VO output finely. If the VO provides an initialization $T_{L_{k} L_{k - 1}}$ at the timestamp $t_{k}^{'}$ , but the latest lidar scan L_k is actually received at the timestamp t_k , we do the timestamp checking and correction to correct $T_{L_{k} L_{k - 1}}$ according to the following principle

T_{L_{k} L_{k - 1}} = (1 + \frac{t_{k} - t_{k}^{'}}{t_{k} - t_{k - 1}}) \cdot T_{L_{k} L_{k - 1}}

Surfel-based LO with adaptive ICP initialization

We leverage a surfel map to perform frame-to-model ICP to estimate 6-DOF lidar poses during the time period between two consecutive lidar scans. The ICP cost function E is based on point-to-plane errors

\begin{array}{l} E = \sum_{u \in V_{t}} (N_{M} (\prod (T_{L_{k - 1} L_{k}}^{n} u)))^{T} \cdot \\ (T_{L_{k - 1} L_{k}}^{n} \cdot u - (V_{M} (\prod (T_{L_{k - 1} L_{k}}^{n} u))))^{2} \end{array}

A surfel map contains a vertex map V_M and a normal map N_M of recent lidar scans. The right-upper index n represents the ICP iteration times. V_t represents the vertex map of the current scan. We iteratively optimize the increment $T_{L_{k - 1} L_{k}}^{k}$ using the Gauss–Newton method to minimize the point-to-plane error. Since a good ICP initialization affects the ICP iteration times a lot, the initial increment is chosen among three individual sources including VO, the IMU pre-calculation module, and the last pose increment, which is automatically depending on the environment.

In well-structured environments, a reasonable pose increment given by the VO in (12) is the best choice to initialize ICP, whereas in feature-less areas, where VO may degrade, the IMU pre-calculation module provides a coarse pose increment for the ICP initialization using the high-frequency IMU raw data. Firstly, we estimate the lidar velocity by utilizing $v_{k - 1}$ in lidar state $X_{k - 1}$ denoted in (1) of the last lidar scan $L_{k - 1}$ from the LO’s last output

v_{k - 1} = \frac{T_{L_{k - 2} L_{k - 1}}}{t_{k - 1} - t_{k - 2}}

Then, the increment between the current lidar scan L_k and the last lidar scan $L_{k - 1}$ is calculated using the initial velocity $v_{k - 1}$ and IMU measurement, which contains the gyroscope ${\hat{ω}}_{t}$ and accelerometer ${\hat{a}}_{t}$ at the timestamp t

{\hat{ω}}_{t} = ω_{t} + b_{ω_{t}} + n_{ω}

{\hat{a}}_{t} = a_{t} + b_{a_{t}} + n_{a}

where $n_{ω}$ and n_a are the additional noise modeled as Gaussian white noise. $b_{ω_{t}}$ and $b_{a_{t}}$ are acceleration bias and gyroscope bias modeled as a random walk, respectively. The bias can be obtained from the IMU sate vector (10) optimized by VO.

The increment is defined as $T_{L_{k - 1} L_{k}} \in ℝ^{4 \times 4}$ , and it is composed of the rotation part $R_{L_{k - 1} L_{k}} \in ℝ^{3 \times 3}$ and the translation part $t_{L_{k - 1} L_{k}} \in ℝ^{3}$ . If there are plenty of IMU messages between the time period of two consecutive lidar scans, the IMU output frequency is relatively much higher than LO

R_{L_{k - 1} L_{k}} = EulerToRotationMat (\int_{t_{k - 1}}^{t_{k}} ω_{t} d t)

t_{L_{k - 1} L_{k}} = \int_{t_{k - 1}}^{t_{k}} v_{k - 1} d t + {\int\int}_{t_{k - 1}} a_{t} d t^{2}

$T_{L_{k - 1} L_{k}}$ from IMU pre-calculation can serve as an initial guess for the LO in stable cases; however, some kinds of low-cost IMU are sensitive to shock, and they will lose accuracy in bumpy agricultural scenes where UGVs may suffer sharply and frequently shock; we need to repropagate the IMU measurements when the bias changes too much and that will cost additional time.²⁹ Hence, we discard the results calculated by IMU data in these unstable cases and use the last stable increment $T_{L_{k - 2} L_{k - 1}}$ as an initialization.

The following criteria are used to decide which pose increment can serve for the initialization of ICP, N_t is the number of the stable visual feature mentioned in the visual-feature detection module used to measure the quality of VO’s output, $t h_{N_{t}}$ is the corresponding threshold, and $t h_{ω}$ and $t h_{a}$ are the constant thresholds of the maximum variation of bias during a unit time period

① $N_{t} > t h_{N_{t}}$

② $b_{ω}^{k} - b_{ω}^{k - 1} < \frac{t h_{ω}}{t_{k} - t_{k - 1}} \lor b_{a}^{k} - b_{a}^{k - 1} < \frac{t h_{a}}{t_{k} - t_{k - 1}}$

\begin{array}{l} T_{L_{k - 1} L_{k}} = \\ {\begin{matrix} T_{L_{k - 1} L_{k}} from
 
visual
 
odometry, if ①② & \\ T_{L_{k - 1} L_{k}} from
 
IMU - precalculation, if \neg ①② \\ T_{L_{k - 2} L_{k - 1}} from
 
lidar
 
odometry
,
 
otherwise \end{matrix} \end{array}

Finally, $T_{L_{k - 1} L_{k}}$ will be optimized further by (13) to minimize the point-to-plane error, and the poses of lidar scan L_k in the world frame W can be computed as the LO’s output through the following function. As the ICP method initializes with the increment from different sources, LVIO, LIO and LO are composed adaptively to handle different situations

T_{W L_{k}} = T_{W L_{k - 1}} \cdot T_{L_{k - 1} L_{k}}

Experiments

We implement the EALVIO utilizing c++ and ros-kinetic on a low-cost laptop, which has an Intel i5-9300 H CPU @2.4 GHz and an 8 GB RAM. The data sets are taken from two sources, one is from the data set gathered by LVI-SAM’s¹⁴ author using a Velydone-16 lidar, a fish-eye camera and a MicroStrain IMU, and the other one is made by self-gathered using a RoboSense-16 lidar, a mono camera and an LMPS IMU in several agricultural environments. We run the EALVIO in these data sets to evaluate the accuracy of estimated poses and the efficiency and robustness of the system in real time. In addition, Lego-LOAM, LIO-SAM and LVI-SAM are mainly taken as the comparative sensor-fusion odometry systems.

Ablation study

To validate the methods and the strategies we presented in EALVIO, the following experiments are done to analyse the corresponding function and performance.

Ablation study of changeable surfel radius range

We disable the surfel radius range changing mechanism in EALVIO and use a fixed range to do a contrast test in the garden data set, the garden data set is provided by LVI-SAM’s author carrying handheld devices in a large garden where GPS signal is poor. The bird-eye view of the corresponding trajectories evaluated by different methods is shown in Figure 5. Since the data set starts and stops at the same position and GPS ground truth is not available, the end-to-end translation and rotation errors are computed to evaluate the accuracy in Table 2. We found that the surfel radius range changing mechanism enhances EALVIO’s adaptability to environmental change and reduces 82% of the translation errors in the 2 km garden data set, which is long term. The surfel radius range’s changing process in the garden data set during run time is shown in Figure 6; with the change in the surrounding environment, the surfel generation module of EALVIO can always choose the suitable surfel radius range under the support of the visual-feature detection module and the feedback of lidar state. Utilizing these adaptive surfels, the ICP methods of LO can have better estimations of agricultural UGVs’ poses.

Figure 5.

Trajectories of garden data set evaluated by different methods.

Table 2.

End-to-end translation and rotation errors of garden data set.

Error type	EALVIO (complete)	EALVIO without surfel radius range changing	EALVIO without visual odomerry	EALVIO without IMU	LVI-SAM without loop closing	LIO-SAM without loop closing	Lego-LOAM without loop closing
Estimated length (m)	2083.21	2079.04	2150.85	2203.03	2085.53	2137.52	2315.77
Translation error (m)	8.34	45.87	100.85	246.88	32.18	79.15	83.61
Rotation error (degree)	6.32	7.91	10.73	14.52	7.66	7.33	8.53

EALVIO: efficient and adaptive lidar–visual–inertial odometry; IMU: inertial measurement unit; LVI-SAM: lidar–visual–inertial odometry via smoothing and mapping; LIO-SAM: lidar–inertial odometry via smoothing and mapping; Lego-LOAM: lightweight and ground-optimized lidar odometry and mapping.

The bold emphasis represent the optimal value of the corresponding index.

Figure 6.

The surfel radius range changing process of EALVIO as time flows of the garden data set. The pictures are the visual features at the corresponding time; the green points represent the stable features, whereas the red ones are unstable. From the pictures, we can see that the more unstable visual features could result in a bigger surfel radius range. EALVIO: efficient and adaptive lidar–visual–inertial odometry.

Ablation study of VO

The VO is disabled to analyse the function of visual information on EALVIO, and the system is actually turned into LIO. The trajectories estimated by EALVIO without VO can be found in Figure 5, and the corresponding end-to-end errors are presented in Table 2. The EALVIO without VO leads to a large translation error when compared with the complete framework because VO can offer better pose estimations for the initialization of LO in well-structured environments containing adequate stable visual features, especially in these agricultural scenes where the ground is bumpy with few geometry features; VO plays a key role for localization.

Ablation study of IMU

Both IMU and VO are disabled in the ablation study of IMU because the VO in EALVIO cannot work individually without IMU integration; the system is simplified to pure LO. As the LO cannot get reliable initialization for surfel-based ICP in the scenes where LO may degrade a lot, the accumulated error of translation and rotation of the LO may become unbearable. However, on the other hand, the pure LO will not break down during the whole estimation procedure in the garden data set; hence, it is a good choice to run at the bottom of EALVIO to maintain robustness.

Comparison with state-of-art methods

Lego-LOAM, LIO-SAM and LVI-SAM’s results without loop-close are also invoked as contrast algorithms, and the loop-closure in these methods is disabled because loops may not be detected in many agricultural environments. Lego-LOAM is the representation of a loosely coupled lidar–inertial fusion system, and LIO-SAM is the representation of tightly coupled odometry; in the garden data set, they perform similar accuracy lower than LVI-SAM and EALVIO due to the lack of visual information support, and they may also fail to estimate the sudden and frequent rotation of UGVs. The complete EALVIO can reduce 74.08% of the translation error and 17.49% of the rotation error in the garden data set when compared with LVI-SAM without loop-closure, which is the newest state-of-art algorithm.

Many comparative experiments are also done in the handheld and jackal data sets of LVI-SAM, the handheld data set, which is feature-less, is gathered by handheld devices and the jackal data set, which is feature-rich, is gathered by a UGV called jackal, both data sets start and end at the same position. The trajectories evaluated by different methods compared with GPS ground truth are shown in Figure 7, and in Table 3, the average sequence translation root means square error (RMSE) with respect to GPS is computed to analyse the performance to restrain drift, and the end-to-end error is computed to analyse the accuracy. The results show that EALVIO can perform better accuracy and robustness in handheld data set when compared with the other three methods as it can handle the frequent environmental change in feature-less areas, whereas in well-structured scenes of jackal data set, LIO-SAM achieves the lowest translation RMSE as it makes better use of geometric features, but EALVIO can still achieve similar performance with state-of-art methods.

Figure 7.

Trajectories of handheld and jackal data sets evaluated by different methods compared with ground truth. The upper pictures are trajectories evaluated in the handheld data set, the lower pictures are trajectories evaluated in the jackal data set, the GPS ground truth in the jackal data set may drift in some regions, which indicates the GPS signal is poor, and we discard the results of these regions in the following analysis. GPS: global positioning system.

Table 3.

Analysis of trajectories of handheld and jackal data set evaluated by different methods.

Data set	Error type	Lego-LOAM without loop closing	LIO-SAM without loop closing	LVI-SAM without loop closing	EALVIO
Handheld	Estimated length (m)	Fail	Fail	2293.46	2288.92
	Average sequence translation RMSE (%)	Fail	Fail	7.87	7.14
	End-to-end translation error (m)	Fail	Fail	7.57	6.53
	End-to-end rotation error (degree)	Fail	Fail	24.82	12.55
Jackal	Estimated length (m)	4677.23	4670.73	4670.36	4671.51
	Average sequence translation RMSE (%)	8.96	3.54	4.05	4.88
	End-to-end translation error (m)	10.44	5.48	4.69	4.12
	End-to-end rotation error(degree)	5.17	2.18	2.28	2.49

EALVIO: efficient and adaptive lidar–visual–inertial odometry; RMSE: root mean square error; LIO-SAM: lidar–inertial odometry via smoothing and mapping; LVI-SAM: lidar–visual–inertial odometry via smoothing and mapping; Lego-LOAM: lightweight and ground-optimized lidar odometry and mapping.

The bold emphasis represent the optimal value of the corresponding index.

Real-time performance analysis

The average processing time per lidar scan is measured to evaluate the real-time performance of EALVIO when compared with the other state-of-art sensor-fusion systems. As the odometry receives lidar point clouds at a constant rate in real time, we record each time period between the scan inputs and the corresponding odometry outputs. The results in Figure 8 show that the average processing time of EALVIO is 0.12s shorter than LVI-SAM in the garden data set tests, which means the real-time performance is around three times stronger than the LVI-SAM when performing a similar accuracy of pose estimation. We can also see from the figure that Lego-LOAM costs the least time when compared with other methods, but the odometry of Lego-LOAM relies on the mapping process costing additional time running in the backend of the system to modify pose estimation finely with a low frequency. In addition, since we do not use much nonlinear optimization and turn to the different initialization of ICP to fuse poses from different sensors, it is rare that the optimal solution of fused poses costs a long time to be obtained, and the processing time of each lidar scan is closer to each other in EALVIO than the other methods. We also accelerate the data set playing rate twice of its original rate and find EALVIO works well in the additional supplementary video.

Figure 8.

Processing time of different methods in the garden data set.

Robustness in various agricultural environments

The robustness experiments are implemented by applying EALVIO to the real agricultural scene to examine the adaptability along with environmental change. Several typical agricultural scenes (greenhouse, fields and factory buildings) are considered in our experiments. The greenhouse and field scenes are challenging for sensor fusion systems as they contain bumpy roads, flat fields and bare farmlands, which may cause sensors’ degradation, and the factory buildings scene is utilized to examine the performance of EALVIO in well-structured areas as it contains distinct geometric features and good road conditions. The robustness of EALVIO is evaluated by computing the trajectories translation error and rotation error with ground truth utilizing the methods proposed by Geiger et al.²⁵ Figure 9 shows that EALVIO can provide UGVs’ trajectories estimated in different scenes, and Table 4 presents the corresponding errors. In addition, LVI-SAM’s performance in our data sets is also included in Table 4 as a contrast algorithm. In greenhouses, where the GPS signal is totally blocked, the ground truth is provided by the UWB module, whereas in fields and factory buildings, real-time kinematic global positioning system can be regarded as ground truth. We find that the average translation and rotation errors increase as the trajectories length grows, but they stay at an acceptable small range in different environments, which means EALVIO could maintain robust in most agricultural scenes, and the accuracy of pose estimation will not change dramatically in different scenes profiting by the surfel radius range changing and the adaptive initialization of ICP. In greenhouses and fields, which are typical agricultural scenes, EALVIO outputs better pose estimation than LVI-SAM according to the indicators in Table 4. In addition, in the well-structured factory buildings, EALVIO can also achieve a similar performance of accuracy and robustness of state-of-art methods.

Figure 9.

Trajectories evaluated by EALVIO in various agricultural scenes compared with ground truth (greenhouse, fields and factory building from left to right). EALVIO: efficient and adaptive lidar–visual–inertial odometry.

Table 4.

Average translation and rotation errors of trajectories evaluated by EALVIO and LVI-SAM compared with ground truth in various agricultural scenes.

Scene and ground truth type	Error type	EALVIO	LVI-SAM without loop closing
Green house	Trajectory length(m)	89.57	88.25
With UWB	Average sequence translation RMSE (%)	0.39	0.43
Ground truth	Average sequence rotation error (deg/m)	0.03	0.03
Field	Trajectory length(m)	237.82	237.26
With GPS-RTK	Average sequence translation RMSE (%)	1.57	1.88
Ground truth	Average sequence rotation error (deg/m)	0.05	0.05
Factory building	Trajectory length(m)	369.14	368.74
With GPS-RTK	Average sequence translation RMSE (%)	1.85	1.76
Ground truth	Average sequence rotation error (deg/m)	0.08	0.07

The bold emphasis represent the optimal value of the corresponding index.

EALVIO: efficient and adaptive lidar–visual–inertial odometry; RMSE: root mean square error; LVI-SAM: lidar–visual–inertial odometry via smoothing and mapping; UWB: ultra-wideband; RTK-GPS: real-time kinematic global positioning system.

The bold emphasis represent the optimal value of the corresponding index.

Conclusion and future work

In this article, we proposed a sensor-fusion odometry system framework, which can output agricultural UGVs’ poses in various agricultural environments by combining three kinds of sub-odometry. To maintain real-time performance, we utilized efficient LO based on surfels to boost the pose estimation process, we also delivered a surfel radius range changing mechanism to increase the accuracy of EALVIO and the adaptive ICP initialization is used to fuse the odometry from multi-sensors and maintain the robustness of pose estimation. From the experiments, we can see that EALVIO performs better efficiency and similar accuracy in open data sets compared with LVI-SAM, which is one of the state-of-art sensor-fusion SLAM systems, and it also shows the advantage of adaptability as EALVIO can stay accurate and stable in various agricultural environments for a long term. However, EALVIO may be slowed as it relies on VO, which is based on visual-feature detection; in future work, we tend to use direct VO like direct sparse odometry³⁰ without detecting visual geometry features, and the real-time performance will be improved further.

Footnotes

Authors’ contributions

Conceptualization: Zixu Zhao; Methodology: Zixu Zhao; Software Programming: Zixu Zhao and Zaiwang Lu; Data Curation: Zixu Zhao, Zaiwang Lu, Long Long and Yucheng Zhang; Validation: Zixu Zhao, Zaiwang Lu and Long Long; Writing– Original Draft: Zixu Zhao; Writing– Review and Editing: Long Long, Yucheng Zhang and Jinglin Shi; and Supervision: Yucheng Zhang and Jinglin Shi.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Zixu Zhao

Supplemental material

Supplemental material for this article is available online.

References

Pretto

Aravecchia

Burgard

, et al. Building an aerial–ground robotics system for precision farming: an adaptable solution. IEEE Robot Automat Magazine 2020; 28(3): 29–49.

Mohamed

Haghbayan

Westerlund

, et al. A survey on odometry for autonomous navigation systems. IEEE Access 2019; 7: 97466–97486.

Zhou

Law

Guan

, et al. Indoor elliptical localization based on asynchronous UWB range measurement. IEEE Trans Instrument Measure 2010; 60(1): 248–257.

Nazemzadeh

Fontanelli

Macii

, et al. Indoor localization of mobile robots through QR code detection and dead reckoning data fusion. IEEE/ASME Trans Mechatron 2017; 22(6): 2588–2599.

Cadena

Carlone

Carrillo

, et al. Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans Robot 2016; 32(6): 1309–1332.

Debeunne

Vivet

. A review of visual-lidar fusion based simultaneous localization and mapping. Sensors 2020; 20(7): 2068.

Shan

Englot

. Lego-LOAM: lightweight and ground-optimized lidar odometry and mapping on variable terrain. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), Madrid, Spain, 1–5 October 2018, pp. 4758–4765. IEEE.

Shan

Englot

Meyers

, et al. LIO-SAM: tightly-coupled lidar inertial odometry via smoothing and mapping. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), Las Vegas, NV, USA, 24 October-24 January 2021, pp. 5135–5142. IEEE.

Servières

Renaudin

Dupuis

, et al. Visual and visual-inertial SLAM: state of the art, classification, and experimental benchmarking. J Sensors 2021; 2021: Article ID 2054828.

10.

Bloesch

Omari

Hutter

, et al. Robust visual inertial odometry using a direct EKF-based approach. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), Hamburg, Germany, 28 September–2 October 2015, pp. 298–304. IEEE.

11.

Qin

Shen

. VINS-Mono: a robust and versatile monocular visual-inertial state estimator. IEEE Trans Robot 2018; 34(4): 1004–1020.

12.

Strasdat

Montiel

Davison

. Scale drift-aware large scale monocular SLAM. Robot: Sci Syst VI 2010; 2(3): 7.

13.

Zhang

Singh

. Laser–visual–inertial odometry and mapping with high robustness and low drift. J Field Robot 2018; 35(8): 1242–1264.

14.

Shan

Englot

Ratti

, et al. LVI-SAM: tightly-coupled lidar-visual-inertial odometry via smoothing and mapping. In: 2021 IEEE international conference on robotics and automation (ICRA), Xi’an, China, 30 May-5 June 2021, pp. 5692–5698. IEEE.

15.

Zhao

Zhang

Wang

, et al. Super odometry: IMU-centric lidar-visual-inertial estimator for challenging environments. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), Prague, Czech Republic, 27 September–1 October 2021, pp. 8729–8736. IEEE.

16.

Chang

Niu

Liu

. GNSS/IMU/ODO/LiDAR-SLAM integrated navigation system using imu/odo pre-integration. Sensors 2020; 20(17): 4702.

17.

Qin

Cao

Pan

, et al. A general optimization-based framework for global pose estimation with multiple sensors. arXiv preprint arXiv:190103642 2019.

18.

Mourikis

Roumeliotis

. A multi-state constraint Kalman filter for vision-aided inertial navigation. In: Proceedings 2007 IEEE international conference on robotics and automation, Rome, Italy, 10–14 April 2007, pp. 3565–3572. IEEE.

19.

Zuo

Geneva

Lee

, et al. LIC-Fusion: lidar-inertial-camera odometry. In: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), Venetian Macao, Macau, China. 4–8 November 2019, pp. 5848–5854. IEEE.

20.

Lee

Yang

Huang

. Efficient multi-sensor aided inertial navigation with online calibration. In: 2021 IEEE international conference on robotics and automation (ICRA), Xi’an, China, 30 May–5 June 2021, pp. 5706–5712. IEEE.

21.

Behley

Stachniss

. Efficient surfel-based SLAM using 3D laser range data in urban environments. Robot: Sci Syst 2018; 2018: 59.

22.

Stückler

Behnke

. Multi-resolution surfel maps for efficient dense 3D modeling and tracking. J Visual Commun Image Represent 2014; 25(1): 137–147.

23.

Paul

Neil

. A method for registration of 3-D shapes. IEEE Trans Pattern Anal Machine Intellig 1992; 14(2): 239–256.

24.

Rublee

Rabaud

Konolige

, et al. ORB: an efficient alternative to sift or surf. In: 2011 international conference on computer vision. Barcelona, Spain, 6–13 November 2011, pp. 2564–2571. IEEE.

25.

Geiger

Lenz

Urtasun

. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition, Providence, RI, USA, 16–21 June 2012, pp. 3354–3361. IEEE.

26.

Carlone

Kira

Beall

, et al. Eliminating conditionally independent sets in factor graphs: a unifying perspective based on smart factors. In: 2014 IEEE international conference on robotics and automation (ICRA), Hong Kong, China, 31 May–7 June 2014, pp. 4290–4297. IEEE.

27.

Civera

Davison

Montiel

. Inverse depth parametrization for monocular SLAM. IEEE Trans Robot 2008; 24(5): 932–945.

28.

Kelly

Sukhatme

GS.

Visual-inertial sensor fusion: localization, mapping and sensor-to-sensor self-calibration. Int J Robot Res 2011; 30(1): 56–79.

29.

Forster

Carlone

Dellaert

, et al. On-manifold preintegration for real-time visual–inertial odometry. IEEE Trans Robot 2016; 33(1): 1–21.

30.

Engel

Koltun

Cremers

. Direct sparse odometry. IEEE Trans Pattern Anal Mach Intellig 2017; 40(3): 611–625.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

Efficient and adaptive lidar–visual–inertial odometry for agricultural unmanned ground vehicle

Abstract

Keywords

Introduction

Related work

Approach

System overview

Surfel generation with a changeable radius range

Visual odometry

Surfel-based LO with adaptive ICP initialization

Experiments

Ablation study

Ablation study of changeable surfel radius range

Ablation study of VO

Ablation study of IMU

Comparison with state-of-art methods

Real-time performance analysis

Robustness in various agricultural environments

Conclusion and future work

Footnotes

Authors’ contributions

Declaration of conflicting interests

Funding

ORCID iD

Supplemental material

References

Supplementary Material