Sage Journals: Discover world-class research

Abstract

The York University Teledyne Optech (YUTO) Mobile Mapping System (MMS) Dataset, encompassing four sequences totaling 20.1 km, was thoroughly assembled through two data collection expeditions on August 12, 2020, and June 21, 2019. Acquisitions were performed using a uniquely equipped vehicle, fortified with a panoramic camera, a tilted LiDAR, a Global Positioning System (GPS), and an Inertial Measurement Unit (IMU), journeying through two strategic locations: the York University Keele Campus in Toronto and the Teledyne Optech headquarters in City of Vaughan, Canada. This paper not only delineates the comprehensive overview of the YUTO MMS dataset, delving into aspects such as the collection procedure, sensor configuration, synchronization, data structure and format but also presents a robust benchmark of prevailing Simultaneous Localization and Mapping (SLAM) systems. By subjecting them to analysis utilizing the introduced dataset, this research lays a foundational baseline for ensuing studies, thereby contributing to advancements and enhancements in the SLAM-integrated mobile mapping system. The dataset can be downloaded from: https://ausmlab.github.io/yutomms/.

Keywords

Dataset mobile mapping system panoramic camera LiDAR simultaneous localization and mapping odometry computer vision robotics

1. Introduction

SLAM is a crucial technology in the field of self-driving vehicles and robotics. It enables autonomous systems to navigate, avoid obstacles, and interact with the environment. The development of SLAM has led to the creation of various benchmark datasets designed to improve robotic capabilities. Recent SLAM datasets by Corte´s et al. (2018); Helmberger et al. (2022); Zhang et al. (2022); Gao et al. (2022) have tackled challenges in visual and LiDAR-based SLAM. These include rapid camera movements, dynamic object presence, navigation in textureless environments, and adaptation to varying lighting conditions. These datasets provide standardized data and evaluation metrics essential for impartially assessing SLAM algorithms. Such benchmarks help identify algorithmic strengths and weaknesses and steer innovation towards greater efficiency, precision, and robustness.

However, SLAM datasets do not fully cater to the specific needs of MMS aimed at creating detailed maps for autonomous vehicles and smart city initiatives. MMS traditionally requires expensive sensors and significant processing power to generate high-quality, survey-grade maps. Recent advancements have shifted towards adopting more economical, lower-grade sensor designs in MMS to balance quality with affordability and computational efficiency (Elhashash et al., 2022). Integrating SLAM technology with MMS, called SLAM-Integrated MMS, utilizes SLAM’s accuracy in environments with unreliable GPS signals to produce precise maps (Li et al., 2020). This integrated approach addresses several traditional MMS challenges, such as unreliable localization in complex environments due to GPS and INS limitations, mapping inconsistencies leading to point cloud ghosting, and the heavy computational demands of traditional mapping techniques (Ahmadi et al., 2023; Kang et al., 2021; Zhang et al., 2023). Despite the critical role of SLAM-Integrated MMS, there is a noticeable gap in datasets specifically tailored for this application, underscoring the significance of developing a dedicated dataset.

The YUTO MMS dataset addresses this gap by providing a meticulously collected dataset explicitly designed for SLAM-Integrated MMS applications while offering utility for general SLAM and odometry research. This dataset is distinct from others primarily aimed at autonomous driving and robotic navigation, as it is designed to fulfill the intricate requirements of accurately mapping urban and natural landscapes on a large scale. Captured using a state-of-the-art (SOTA) sensor suite mounted on a vehicle, the dataset encompasses four sequences from two distinct collection dates: August 12, 2020, and June 21, 2019.

Covering a distance of over 20.1 km across varied environments, this dataset has been meticulously assembled to provide a comprehensive resource for developing, testing, and benchmarking advanced systems.

The collection of the YUTO MMS dataset involved extensive fieldwork at strategic locations, including York University’s Keele Campus in Toronto and the Teledyne Optech headquarters in Vaughan, Ontario, Canada. The data was gathered under typical daily conditions, capturing the dynamism of urban life with busy roads filled with cars, buses, pedestrians, and cyclists. The dataset intricately includes several loops, multiple turning points, traffic lights, and stop signs, introducing complexities such as acceleration. This rich detail makes the dataset invaluable for assessing the effectiveness and resilience of visual SLAM algorithms in dynamic urban settings.

In addition to its comprehensive environmental coverage, the YUTO MMS dataset is noted for its unprecedented level of detail in documentation. It meticulously covers aspects such as the data collection procedure, sensor configurations and synchronization, as well as the data structure and format. This granularity ensures researchers have the necessary information to utilize the dataset fully for their specific research needs.

A significant feature of the dataset includes a tilted LiDAR sensor, which introduces both unique challenges and opportunities for SLAM applications. The tilt of LiDAR enables it to detect and capture features at greater heights around the road, such as traffic signs, lights, light poles, highway billboards, and tall buildings. These mostly static tall features can provide valuable information for accurate localization and mapping. However, the performance of SLAM may be compromised at lower heights where dynamic objects are prevalent. Despite these challenges, the ability of the tilted LiDAR to utilize static tall features can meaningfully contribute to the overall effectiveness of SLAM systems, enhancing their robustness and accuracy in urban environments (Xu et al., 2019).

The tilted configuration of the LiDAR sensor presents specific challenges for both LiDAR-centric and coupled visual-LiDAR SLAM methodologies. In LiDAR-centric SLAM, tracking objects becomes difficult due to the limited coverage width, as objects remain within the LiDAR-covered regions for only brief periods. Additionally, the tilted setup complicates point matching and accurate geometry modeling, since it does not provide full coverage of the road ahead and offers sparse coverage, especially at lower heights. Coupled visual-LiDAR SLAM systems also face difficulties due to the non-rectangular overlapping region between the camera and the tilted LiDAR sensor, which results in a restricted and challenging area for data integration. Furthermore, the topmost LiDAR beams often fail to intersect with objects due to the absence of tall structures, leading to sparse data in the upper parts of the LiDAR scans. To address these unique challenges, the dataset utilizes the Maverick MMS developed by Teledyne Optech, a system specifically designed for mapping applications in environments with tall buildings. The Maverick MMS combines the Ladybug 5 panoramic camera with the Velodyne HDL-32E LiDAR, which features a spinning axis positioned at a 45-degree angle to the camera’s optical axis.

This configuration enables the system to effectively capture a comprehensive view of the environment, both visually and through LiDAR data. The Maverick system’s use of real-time kinetic (RTK) positioning, based on GPS receiver signals, ensures the establishment of accurate ground truth for the dataset.

This dataset primarily focuses on SLAM-Integrated MMS applications. However, its diverse and rich data can be used for general SLAM and odometry research and beyond. The dataset offers potential for various applications, including depth estimation, object detection and tracking, place recognition, and visual feature descriptor learning. The objects in the dataset are diverse, encompassing both dynamic and static entities, which makes it suitable for various computer vision tasks.

The YUTO MMS dataset is a valuable asset for advancing SLAM-Integrated MMS, SLAM and odometry research, setting a new benchmark in this field. It enables a deeper understanding of the complexities involved in mobile mapping. It showcases the potential of integrating advanced SLAM technologies to overcome traditional data collection and mapping accuracy limitations. In summary, the work presents significant contributions to the domains of SLAM and MMS, offering:

• Innovative dataset for SLAM-Integrated MMS that directly addresses the specialized needs of SLAM-integrated MMS, filling an essential gap in research resources.

• Comprehensive environmental data from over 20.1 km of varied urban and natural landscapes, enhancing SLAM and odometry research with a focus on dynamic urban complexities.

• SOTA Sensor suite with tilted LiDAR introduces novel data acquisition challenges and opportunities, significantly enriching mapping applications with detailed urban and natural landscape data.

• Granular dataset presentation provides unprecedented detail on collection procedures, sensor configurations, and data structure, supporting the development, testing, and benchmarking of advanced systems.

• Facilitates a wide range of applications beyond traditional SLAM, like depth estimation, contributing to advancements in autonomous navigation and environmental mapping.

2. Related works

The availability of diverse and comprehensive datasets plays a crucial role in advancing research and development in the SLAM community. In this section, we present a literature review focusing on related datasets utilized for outdoor SLAM. Our goal is to analyze and compare the datasets outlined in Table 1 to identify their strengths, limitations, and unique characteristics. The ultimate aim is to highlight the significance of our newly introduced YUTO MMS dataset. Also, the review presents a comprehensive analysis that offers valuable insights for researchers and practitioners, facilitating informed decisions on the selection and utilization of datasets for various applications.

Table 1.

Comparison with the related datasets on the dataset name, publication time, sensors, carrier, data acquisition environment, total distance and ground truth estimation method.

Name	Year	Omnidirectional vision	3D LiDAR Configuration	Carrier	Environment	Total distance	Ground truth
CADC	2021	No	Upright	Vehicle	Outdoor	12.94 km	RTK-GPS & IMU
MADMAX Mars	2021	Yes	N/A	Robot	Desert	8.2 km	RTK-GPS
KAIST	2018	No	Upright	Vehicle	Outdoor	/	RTK-GPS
CUL	2018	No	Tilted	Vehicle	Outdoor	44.81 km	SLAM
NCLT	2016	Yes	Upright	Robot	Indoor/Outdoor	147 km	RTK-GPS & SLAM
Sparse MPO	2016	Yes	Upright	Vehicle	Outdoor	Short trajectories	GPS
TUM-Omni	2015	Yes	N/A	Hand	Indoor	Short trajectories	Motion capture (Partially)
KITTI	2012	No	Upright	Vehicle	Outdoor	22.18 km	GPS & IMU
Ford campus	2011	Yes	Upright	Vehicle	Outdoor	4 km	GPS & IMU
San Francisco	2011	Yes	Upright	Vehicle	Outdoor	/	GPS & IMU
RAWSEEDS	2009	Yes	N/A	Robot	Outdoor	4 km	RTK-GPS
YUTO MMS (ours)	2023	Yes	Tilted	Vehicle	Outdoor	20.1 km	RTK-GPS & IMU

The Canadian Adverse Driving Conditions (CADC) dataset (Pitropov et al., 2021) is the first autonomous vehicle dataset that focuses on adverse driving conditions specifically. This dataset was collected during winter within the Region of Waterloo, Canada. It contains 7000 frames of annotated data from 8 Ximea MQ013CG-E2 cameras, VLP-32C LiDAR and a Novatel OEM638 GNSS + INS system. The sensors are time synchronized and calibrated. Scale AI provides Lidar frame annotations that represent ground truth for 3D object detection and tracking.

MADMAX (Meyer et al., 2021) focuses on the development of a data set specifically designed for planetary rovers. The data was collected from various Mars analog sites in the Moroccan desert, featuring diverse environ-mental conditions. The data set comprises 36 navigation experiments and incorporates multiple types of sensors, including monochrome stereo cameras, a color camera, stereo omnidirectional cameras, and an IMU. Notably, the data set does not include a 3D LiDAR sensor. The authors evaluate the performance of ORB-SLAM2 and VINS-mono algorithms on the dataset, providing a valuable baseline for future research. To establish ground truth pose information, they utilize a RTK-GPS with two antennas, enabling precise measurements of both position and orientation.

KAIST (Choi et al., 2018) introduces a multi-spectral dataset that covers regions from urban to residential for autonomous systems. KAIST dataset provides the different perspectives of the world captured in day and night, in addition to sunrise, morning, afternoon, sunset, night, and dawn time slots. A multi-sensor platform is developed, which includes a co-aligned RGB/Thermal camera, RGB stereo, 3D LiDAR, and GPS/IMU sensor with a related calibration technique. The dataset can be applied in a wide range of visual perception tasks, including the object detection, drivable region detection, localization, image enhancement, depth estimation, and colorization using a single/multi-spectral approach.

CUL (Jeong et al., 2018, 2019) was collected using a mobile mapping vehicle in South Korean cities: Gangnam, Pangyo, and Daejeon. The data collection utilized a range of sensors including four LiDARs, with two 3D LiDARs for maximum data acquisition range and two 2D LiDARs for effective information gathering on buildings and roads, GPS, Virtual Reference Station (VRS)-GPS, 3-axis Fibre Optic Gyroscope (FOG), IMU, and an altimeter sensor.

Notably, the dataset includes two 3D LiDAR sensors tilted at 45°, positioned on the right and left sides of the vehicle, enabling comprehensive coverage of high-rise buildings. The dataset comprises six sequences with a total distance of approximately 44.81 km. Although ground truth data is unavailable, the authors utilized the most accurate sensors in conjunction with pose graph SLAM, semi-automatic ICP (Iterative Closest Point) registration, and manual loop closure selection to generate a reliable baseline for benchmarking and evaluation purposes. The dataset serves as a valuable resource for researchers in the fields of urban LiDAR data analysis and SLAM algorithm development.

NCLT (Carlevaris-Bianco et al., 2016) is a useful dataset for the advancement in fields such as autonomous driving, robotics, and 3D scene reconstruction. This dataset comprises a large collection of indoor and outdoor data, including LiDAR scans and high-resolution panoramic images captured by a variety of diverse sensors. The sensor suite includes an upright Velodyne HDL-32E 3D LiDAR, a panoramic camera, and two 2D Hokuyo planar LiDARs, as well as IMU, GPS, and FOG. The dataset provides diverse scenes and environmental conditions, with over 15 months of data collected during 27 collection sessions, each varying in length, time, sky, foliage, and snow. The repeated exploration of the same environment and using a Segway robot to collect the data ensures a comprehensive and diverse dataset. For their ground truth, they have used GPS signals and SLAM wherever the GPS signal is unavailable.

MPO (Jung et al., 2016) provided two specialized datasets, Dense MPO and Sparse MPO, aimed at advancing place categorization. The Dense MPO is a collection of static images that are not suitable for SLAM purposes. However, the Sparse MPO dataset comprises real-time panoramic scans of sparse 3D reflectance point clouds recorded while driving a car, making it suitable for SLAM. The Sparse MPO was gathered by driving a vehicle at speeds between 30 and 50 kph through different areas of Fukuoka city. The sensors used to acquire the Sprase MPO dataset include an upright 3D LiDAR, an omnidirectional camera, IMU and a GPS. The data was collected under different weather conditions and at various times of the day, providing a diverse range of scenes and environmental conditions. Nonetheless, the sequences in the dataset are too short for SLAM, as noted in their following journal paper by Mart´ınez et al. (Mart´ınez Mozos et al., 2019).

In TUM-Omni (Caruso et al., 2015), in addition to their proposed methods, the authors have curated a dataset for evaluation purposes. The dataset has been captured using a handheld global-shutter USB3 camera equipped with a 185^◦ field of view fisheye lens, in indoor environments. The authors publicly provide five indoor sequences on their website for research purposes. Ground truth data was acquired using a motion capture system that covers an area of approximately 7 × 12 m. However, errors were only computed on the part of the trajectories where ground truth data was available, meaning that some sequences fell out of the motion capture-covered region. The authors also generated two additional sequences using the Gazebo simulator, which were not released publicly.

KITTI dataset (Geiger et al., 2012, 2013) introduces a novel real-world dataset captured for use in mobile robotics and autonomous driving research. Various sensor modalities are employed for data collection, including high-resolution color and grayscale stereo cameras, a Velodyne 3D laser scanner, and a high-precision GPS/IMU inertial navigation system. The dataset comprises 6 h of diverse traffic scenarios featuring numerous static and dynamic objects recorded at 10 to 100 Hz. The KITTI data is well-calibrated, synchronized, and timestamped, and it also provides object labels in the form of 3D tracklets. Additionally, KITTI offers online benchmarks for stereo, optical flow, object detection, and other tasks.

In Ford Campus dataset (Pandey et al., 2011), the data collection was conducted in an urban environment encompassing both the Ford research campus area and the downtown area in Dearborn, Michigan, U.S.A. The dataset was acquired using a modified Ford F-250 vehicle equipped with a variety of sensors, including an upright 3D LiDAR, a panoramic camera, two 2D Riegl LMS-Q120 LiDARs, GPS, and IMU. The vehicle traversed the test environment multiple times, resulting in two distinct trajectories that provided extensive coverage of the surroundings. The authors have not provided the total sequence length, but we estimated the sequences using Google Maps¹ distance calculations, and it is around 4 KM in total. Also, in their dataset directory, the images folder within the dataset comprises five subfolders, each dedicated to an undistorted camera view captured by the panoramic camera without any additional processing.

In the San Francisco dataset (Chen et al., 2011), the authors employed a range of sensors, including an upright 3D LiDAR, Ladybug3 panoramic camera, IMU, GPS, high-definition cameras (Prosilica), and a Distance Measurement Instrument (DMI), mounted on a mobile mapping vehicle. The dataset was collected in San Francisco and comprises approximately 150,000 panoramic images captured at 4-meter intervals, which were subsequently converted into approximately 1.7 million perspective images. It is important to note that the dataset’s sampling rate, though beneficial for landmark identification, may not be suitable for SLAM due to the relatively sparse capture intervals. The authors aligned the LiDAR data and the panoramas with the building outlines, facilitating more accurate landmark identification and localization. This dataset provides a valuable resource for research in mobile mapping, landmark recognition, and related areas.

Bovisa (Bonarini et al., 2006) presents a RAWSEEDS dataset collected using a variety of sensors mounted on a robot. The sensors used include a two-camera setup for binocular vision, a single camera for monocular vision, an omnidirectional camera, two pairs of 2D Laser Range Finders (LRFs), an IMU, and GPS. The data was collected in three sequences from the same location at the Bovisa campus of Politecnico di Milano, focusing on outdoor environments. The ground truth for the dataset was obtained from Real-Time Kinematic (RTK)-GPS. While the authors have not provided the exact sequence length, an estimated length of approximately 4 km was determined using Google Maps.

In conclusion, the literature review has provided an in-depth analysis of the most relevant papers in the field of outdoor SLAM datasets. While each reviewed work offers valuable insights and contributes to the advancement of SLAM research, it often lacks certain aspects of particular interest in our study. RAWSEEDS, TUM-Omni, NCLT, MADMAX Mars datasets utilize non-vehicle carriers, such as handheld devices or robots, which may not fully capture the complexities of vehicular motion, which is crucial for realistic SLAM analysis. Additionally, RAWSEEDS, San Francisco, Ford Campus, TUM-Omni, Sparse MPO, KAIST, MADMAX Mars datasets have limited coverage in terms of total distance, which restricts the evaluation of SLAM performance in expansive environments. Furthermore, TUM-Omni dataset focuses on indoor scenarios rather than the challenging conditions encountered in outdoor environments. Regarding ground truth accuracy, CUL, Sparse MPO, TUM-Omni datasets do not provide precise calculations, hindering the potential for accurate benchmarking. Moreover, the absence of a tilted LiDAR configuration limits the ability to effectively capture and model high-rise buildings and other tall structures. Lastly, the absence of a panoramic or an omnidirectional camera restricts comprehensive visual perception. In contrast, our newly introduced dataset, the YUTO MMS dataset, stands out for its unique combination of a vehicle-based carrier, considerable total distance coverage, outdoor scenarios, precise ground truth, tilted LiDAR configuration, and inclusion of a panoramic camera. These distinctive characteristics make the YUTO MMS dataset a valuable resource for advancing research and development in SLAM-Integrated MMS and outdoor SLAM applications.

3. Maverick mobile mapping system

3.1. System overview

The development of Maverick MMS by Teledyne Optech was driven by the demand for a versatile and high-performance mobile mapping unit that can be easily transported. The Maverick MMS has proven to be a valuable tool in a wide range of industries and applications, owing to its robust and versatile data collection capabilities. Its datasets have been utilized in projects related to transportation safety, construction, asset management, rail, utilities, and 3D modeling. Figure 1 illustrates the comprehensive configuration of Maverick MMS along with planned route for data collection.

Figure 1.

(a) The sensor configuration of the Maverick MMS; (b) An example of the vehicle platform utilized for data collection; (d) The planned route within the Teledyne Optech headquarter, overlaid on Google Maps; (c) The planned route within the York University campus, overlaid on Google Maps.

3.2. System characteristics

As shown in Table 2, the Maverick MMS boasts great portability, weighing in at less than 9 kg (20 lb). Its compact dimensions make it highly maneuverable and suitable for use in different settings. This lightweight and compact design allows for easy mounting on various platforms, including backpacks, vehicles, trains, Utility Vehicles (UTVs) and All-Terrain Vehicle (ATVs), enabling the system to be utilized in diverse environments and scenarios. Maverick has an Ingress Protection Rating (IP Rating) of IP65, meaning that the system provides a high level of protection against dust and water ingress, enabling its use in challenging outdoor conditions. Additionally, with a reasonable operating temperature range, Maverick demonstrates its capability to withstand diverse climatic conditions.

Table 2.

Maverick MMS specifications.

Operating temperature	Dimensions	Mount	Power supply	Weight (kg)	IP Rating
0^o to 43^o (32^o to 110^oF)	34.4 cm × 21.6 cm × 36.3 cm (13.60^” × 8.50^” × 14.28^”)	Mount bolts to existing roof racks	12 V - 36 V DC	8.85	IP65

3.2.1. Configuration

The Maverick MMS is mounted at a 45-degree angle between the optical axis of the camera and the spinning axis of the LiDAR. When mounted on a car, this results in the camera’s optical axis being parallel to the ground, while the LiDAR’s spinning axis is tilted 45° with respect to the ground. In the experiments, images were captured at an average of 7.5 frame per second (FPS), while LiDAR points were acquired at a spinning rate of 15 revolution per second (RPS).

3.3. Sensors

The Maverick MMS comprises a Ladybug 5 panoramic camera as the imaging component, a Velodyne HDL-32E LiDAR as the LiDAR component, and a high-precision GPS/IMU. It also includes a post-processing software called LMS (LiDAR Mapping Suite) that generates trajectories with centimeter-level precision by performing offline bundle adjustment with the aid of high-precision GPS/IMU data and LiDAR points.

3.3.1. LiDAR

The LiDAR component used in the Maverick MMS is the Velodyne HDL-32E, which has 32 detector pairs and a vertical field of view ranging from +10 to −30°. It provides a 360-degree horizontal field of view and can scan up to 100 m. The absolute accuracy of this LiDAR sensor is typically within +-3 cm, making it a reliable and high-precision tool for capturing 3D spatial data from a moving vehicle.

3.3.2. Camera

The imaging component of the Maverick MMS is the Ladybug 5 camera, which uses six Sony ICX655 CCD sensors, each with a 2/3” format. The Ladybug 5 provides a 90% field of view of the full sphere, making it capable of capturing panoramic images from all directions. With a focal distance of approximately 200 cm, it can capture high-quality images with a resolution of up to 30 megapixels (5 MP * 6 sensors). The camera’s optics consist of six 4.4-mm focal length lenses, and it has been calibrated for spherical distance from 2 m to infinity. These specifications make the Ladybug 5 camera a powerful tool for capturing high-resolution panoramic imagery in a mobile mapping context.

3.3.3. GPS/IMU

Maverick has a NovAtel SPAN-IGM-S1, which serves as a GNSS unit. It represents a powerful solution for navigation and positioning applications, seamlessly merging raw inertial measurements with comprehensive GPS information through the SPAN system. It guarantees robust performance, equipped with three accelerometers, three gyroscopes (gyros), and a NovAtel OEM615 receiver. The dual-frequency GPS antenna ensures reliable satellite signal reception, while integrating the STIM300 IMU further enhances accuracy. In the dataset collection process, the average values of IMU accelerometer bias are [0.014, −0.006, 0.022] m/s² and the gyro drifts are [−15, 13, 31] deg./hr in X-Y-Z directions.

4. Data acquisition and processing

To achieve this, a photogrammetric bundle block adjustment is performed using GNSS and INS position and orientation information, with automated tie-point measurement used to estimate corrections for interior orientation, boresight parameters, and trajectory position.

4.1. Operation

The data collection operations for our dataset involved capturing sequences from different locations and environment. Sequence A was collected at Teledyne Optech headquarters in the City of Vaughan on June 21, 2019, between 16:43:29 p.m. and 16:48:19 p.m. under sunny weather conditions. During this sequence, the vehicle traversed the building and explored the surrounding parking lot area with an approximate drive speed of 3 m/s. Sequences B, C, D were collected at York University Keele Campus in Toronto on August 12, 2020, from 16:32:16 p.m. to 18:49:14 p.m., also under sunny weather conditions. In Sequence B, the vehicle navigated around the main campus of York University, encompassing a primary loop and several smaller loops. Sequence C focused primarily on the residential area in York Village, located adjacent to York University, and featured multiple loops within this neighborhood. In Sequence D, the vehicle predominantly traversed through open spaces, and it followed a distinct path that included a significant loop within a residential area, along with several smaller loops. To ensure a higher FPS for the panoramic camera, the vehicle was also driven at a slow speed in Sequences B, C, and D. The driving speed for the mentioned Sequences was approximately 6 m/s. The details are available in Table 3. The workflow for post-processing the collected dataset, involving a range of calibration, ground truth generation and time synchronization procedures, is illustrated in Figure 2.

Table 3.

Characteristics of the four sequences encompass environmental conditions for each sequence, data acquisition rates, total time and distance covered, and the average driving speed.

	Sequence A	Sequence B	Sequence C	Sequence D
Region	Parking lot	Campus area	Campus area	Residential area
Camera FPS	7.632	7.353	7.152	7.156
LiDAR RPS	15.266	15.259	15.259	15.261
Distance traveled	324 m	7035 m	9137 m	3634 m
Running time	94 seconds	19 minutes	25 minutes	10 minute
Average driving speed	3.447 m/s	6.171 m/s	6.063 m/s	6.057 m/s
Loop	One small loop	One large loop + a few small loops	Many medium-size loops	A few loops
Dynamic objects	No	Yes	Yes	Yes

Figure 2.

The workflow for data acquisition and processing involves the collection of raw data, including camera images, LiDAR scans, GNSS and IMU data, by the Maverick MMS. Base stations are used to correct the GPS/IMU data through RTK positioning, improving the accuracy of the IMU simultaneously. The optimized GPS/IMU trajectory and initial LiDAR boresight value are then used to perform LiDAR self-calibration with advanced least-squares algorithms. This enables reliable and repeatable calibration parameters, which are used for camera self-calibration. A photogrammetric bundle block adjustment using GNSS and INS position and orientation information is applied for camera self-calibration, and automated tie-point measurement is used to estimate corrections for interior orientation, boresight parameters and trajectory position. This process yields accurate camera interior and exterior orientation parameters, which are then used for anomalies checking by comparing with ground control before and after calibration. Finally, the corrected LiDAR and camera calibration parameters, optimized ground truth trajectory and synchronized indices are processed to generate output data.

4.2. Calibration

Calibration is a crucial step in the utilization of the Maverick MMS system, ensuring accurate sensor integration. The calibration of the Maverick MMS is conducted using professional LMS software by applying base stations and control points. The world coordinates system, denoted as w is based on the geodetic reference frame WGS84, providing a standardized global reference. Additionally, the vehicle coordinate system, known as the body frame b, is utilized for local positioning and orientation. Within the Maverick MMS, the frames denoted as i, l, and c correspond to the IMU, LiDAR, and camera. It is worth noting that all sensors undergo meticulous calibration using external parameters. These calibrations ensure that the X-Y-Z axes align correctly, with the X-axis pointing forward, the Y-axis pointing right, and the Z-axis pointing downward. This alignment enables accurate sensor fusion and precise mapping capabilities.

To represent the sensor’s pose, a 4×4 matrix, denoted as p = [R, t], is employed. This matrix comprises a 3×3 rotation matrix R and a translation vector t. At a specific time point k, the transformation of the local coordinate system is denoted as p_k, representing its position and orientation relative to the origin. The motion between consecutive time points, denoted as m_k, describes the relative pose change from time k_–1 to k, providing essential information for tracking and mapping purposes. The pose representation Tab is utilized for a comprehensive representation of a 6-DOF pose. This representation describes the pose of frame b with respect to frame a, allowing for precise characterization of the relative transformation between the two frames. In Table 4, the extrinsic parameters of all sensors are detailed. These parameters include T_bi, which represents the fixed relative transformation between the body and IMU frames. Similarly, T_bl denotes the relative transformation between the body frame and the LiDAR frame, while T_bc signifies the relative transformation between the body frame and the camera frame. These parameters play a crucial role in accurately integrating sensor data and ensuring optimal alignment and synchronization. Figure 3 illustrates the relative positions of the LiDAR, IMU and camera concerning the local frame when processing YUTO MMS dataset within the SLAM system.

Table 4.

Calibration parameters. The relative transformation between the body and imu frame remains constant for all sequences. Prior to each data collection, the extrinsics of the lidar and camera are re-calibrated, and as such, the extrinsics for all sequences are listed below. note that sequences b, c, and d were collected on the same date, and therefore share the same extrinsics.

Sequence	Transform	Translation (x, y, z) in meters	Rotation (ω, φ, κ) in degrees
A,B,C,D	T _bi	[-0.004, 0.068, −0.226]	[0.50, −1.56, −179.0]
A	^T bl	[0.112, 0.010, −0.181]	[178.869, −44.844, 1.934]
A	^T bc	[-0.031, 0.000, −0.112]	[0.150, −0.050, 179.650]
B,C,D	^Tbl	[0.111, 0.010, −0.181]	[179.579, −44.646, 0.601]
B,C,D	^T bc	[-0.031, 0.000, −0.112]	[0.162, −0.045, 179.607]

Figure 3.

Schematic diagram of the relative positions of LiDAR, IMU and camera with respect to the local frame.

4.2.1. IMU calibration

In the tightly coupled NovAtel SPAN on the OEM6 system, the IMU undergoes an initial calibration process. To determine the lever arm offset between the IMU and the GPS antenna, as well as the body to IMU rotation, the base station coordinates of the trajectory are utilized. During trajectory processing, the output is set to the NAD83 (North American Datum of 1983) processing datum in the selected processing frame. This ensures precise alignment and compatibility between the IMU and GPS data. The calibration process in the GPS/IMU system begins with the utilization of initial value of the offset between IMU and GPS antenna. Next, the self-calibration engine is employed to control the calibration of the IMU-to-antenna lever arm. The calibration process continues for a duration of 600 seconds or until the standard deviation of the estimated lever arm values falls below 0.05 m. This convergence criterion ensures that the lever arm calibration reaches an acceptable level of accuracy and stability.

4.2.2. LiDAR calibration

To accurately determine the LiDAR’s position relative to the body frame, a selection of control points on the wall and ground is performed, as depicted in Figure 4. For adding control sites, the reference frame is NAD83 and UTM zone 17N. To ensure precise LiDAR boresight calibration, optimal GPS conditions and sufficient planar surface geometry, both horizontal and vertical, are essential. Additionally, capturing opposing passes enables scanning of objects from different angles and relative positions. Firstly, the LiDAR boresight adjustment is conducted during the self-calibration process with control points. This adjustment involves treating the offsets in x-y-z directions in position corrections as free unknowns and the drifts and accelerations in X-Y-Z directions as constraint unknowns. The adjustment encompasses time corrections, channel corrections, boresight corrections, position corrections, and orientation corrections. To refine the boresight results, the adjustment process is repeated until no estimated corrections to the boresight parameters exceed 0.009° for the angles and 0.009 m for the lever-arms in two consecutive runs. Subsequently, to solve for the sensor calibration parameters while keeping the boresight angles fixed, a LiDAR sensor calibration is performed. The quality analysis of the self-calibration results are shown in Table 5. Upon completion of the LiDAR sensor calibration, the final calibration.lcp file is generated. A noise level check procedure is then carried out, and planar fit statistics with respect to the standard deviation values are reported to assess calibration accuracy. Figure 5 displays the planar fit statistics before and after self-calibration.

Figure 4.

The mission data utilized for calibration comprises short and long loops executed in opposing directions. As illustrated in the right figure, the calibration process involves the use of a checkerboard target to detect the control points. The control points are denoted by red triangles, while the blue lines represent the LiDAR surveyed lines and the green lines display the camera lines.

Figure 5.

The planar fit statistics results for sequence A, which highlights the absolute accuracy of data collected by the LiDAR and camera systems. The separation between individual control points and the LiDAR data is calculated before and after self-calibration, providing a clear visualization of the improvement in accuracy.

4.3. Panoramic camera calibration

The six cameras undergo accurate calibration using Zhang’s method (Zhang 2000), which involves capturing checkerboard patterns from different orientations using all cameras, while the lens distortions of the images captured by Ladybug camera are removed. Subsequently, LynxView software is employed to boresight the cameras with respect to the body frame. Prior to camera calibration, the LiDAR must have already been calibrated with the .lcp file, which contains the optical model and boresight values. Moreover, the raw LiDAR, camera, and trajectory data are utilized in the camera calibration process. Next, the LiDAR surveyed lines are selected, and the camera lines are synchronized to the defined LiDAR lines, as illustrated in Figure 4. Following that, the images to be used for camera boresight are selected. Images with numerous identifiable objects in both the LiDAR and imagery, such as hard corners of buildings, windows, and road features, are preferred. Similar features are then selected in both the image and point cloud, with a minimum of 3 points required for the boresight process and between 5 and 10 points recommended for optimal results. The calibration procedure involves obtaining a closed-form solution, followed by formulating the constraints as a nonlinear optimization problem based on the maximum likelihood criterion. The accuracy results of camera self-calibration are also presented in Table 5.

Table 5.

Quality analysis of the self-calibration results encompasses key metrics, including the number of iterations, channel correction offset and standard deviation, precision for LiDAR and camera self-calibrations, statistics on planar fit, and the optimized GPS/IMU trajectory.

Adjust information	Number of performed iteration 20	Standard deviation of unit weight 0.91803
Channel correction	Range offset value (m) 0.002	Standard deviation (m) 0.0001
LiDAR self-calibration	Boresight eX, eY and eZ (deg)	Eccentricity dX, dY and dZ (m)
Correction	[0.001, 0.002, 0.009]	[0.000, 0.004, 0.000]
Precision (standard deviation)	[0.000068, 0.000138, 0.000290]	[0.0002, 0.0001, 0.000]
Camera self-calibration	Boresight eX, eY and eZ (deg)	Eccentricity dX, dY and dZ (m)
Correction	[0.045, 0.162, 0.293]	[0.000, 0.000, 0.000]
Precision (standard deviation)	[0.023, 0.082, 0.147]	[0.000, 0.000, 0.000]
Planar fit statistics	Total number of points 7876280	RMS and standard deviation of distance (m) 0.010, 0.014
GPS/IMU trajectory accuracy	Roll, pitch and heading (deg)	East, north and height (m)
Precision (standard deviation)	[0.004, 0.003, 0.012]	[0.005, 0.005, 0.010]

4.4. Synchronization

4.4.1. Trajectory refinement

The ground truth trajectory is obtained by performing post-processing techniques to refine the GNSS trajectory. Prior to trajectory refinement, the trajectory file and its corresponding precision file are readily available. Furthermore, the LiDAR and camera systems are calibrated to ensure accurate data acquisition. During the refinement process, painted control points, which can be precisely identified and measured in the LiDAR intensity data, are carefully selected. The LiDAR surveyed lines are then chosen, while the camera lines are automatically synchronized with the LiDAR lines. Next, the images captured by the camera system are processed, with an image mask applied to exclude certain areas, such as those depicting the survey vehicle, from influencing the colorization of the point cloud. The centers of the control points in the LiDAR point cloud are measured, ensuring that they are unobstructed and contain sufficient laser points for accurate measurements. Once all the control points have been measured, the desired survey lines are selected for further processing. Given that the instrument used for the sample data is already calibrated, the channel corrections and boresight corrections remain fixed in the correction settings. The position offsets are set as “free unknown,” while the drift and accuracy corrections are set as “constrain unknown.” The resulting optimized trajectory serves as the ground truth, representing the refined and accurate trajectory of the system. For a visual depiction of this process, please refer to Figure 2.

4.4.2. Time synchronization

In the Maverick MMS, the GPS/IMU system operates at a high frequency of 125 Hz, providing precise timing information. To ensure synchronization, GPS timestamps are used as a reference throughout the trajectory refinement process. By refining the trajectory, we generate ground truth data for all image frames, effectively synchronizing the GPS/IMU system with the panoramic camera. Given that the LiDAR and the panoramic camera operate on different time scales, with images captured at an average rate of 7.5 frames per second and LiDAR scans acquired at a spinning rate of 15 revolutions per second, careful synchronization is required. We achieve this synchronization by aligning the LiDAR data with the GPS/IMU system as part of the trajectory refinement process. The timestamps are recorded using GPS time for accurate temporal alignment. To precisely synchronize the laser scans and panoramic images, we employ an interpolation method to determine the closest timestamp of each laser scan relative to the corresponding image. This interpolation technique ensures that the LiDAR data and panoramic images are temporally aligned, facilitating accurate sensor fusion and subsequent data analysis.

4.5. Translation

Considering the interdependence between certain SLAM methods discussed in Experimental Results and the projection of LiDAR point clouds onto corresponding panoramic images, this section elucidates the essential steps involved in this process. In the next section, we elaborate on the mappings between the pixel coordinate space.

4.5.1. Mappings Between 3D Points to Pixels in the Panoramic Image

In order to find correspondences from a pixel in the panoramic image to the camera 3D coordinate system, we use the unit sphere projection and the polar coordinate system as intermediates between them.

4.5.2. 3D - Unit Sphere

Going from the 3D camera coordinate system to the unit sphere for a 3D point, we calculate range r. To obtain the unit vector of the 3D point by projecting it onto the unit sphere:

(X_{u}, Y_{u}, Z_{u}) = ({}_{r}^{\underline{1}}{X_{c}}, {}_{r}^{\underline{1}}{Y_{c}}, {}_{r}^{\underline{1}}{Z_{c}})

(1)

where (X_c, Y_c, Z_c) points to the mentioned 3D point in the 3D camera coordinate system and (X_u, Y_u, Z_u) is the projection of the point on the unit sphere.

Reversely, from the unit sphere to the camera 3D coordinate system, getting the range r for the 3D point from other sources is needed. There are two ways to acquire:

• Depth from LiDAR point cloud

• SLAM-based methods (e.g., Triangulation)

4.5.3. Unit sphere - polar

Mapping from the unit sphere to the polar coordinate system is done by the following equation:

\begin{array}{l} α = \arctan 2 (X_{u}, Z_{u}) \\ β = - \arcsin (Y_{u}) \end{array}

(2)where (α, β) is the presentation of the point in the polar coordinate system. α and β are longitude and latitude, respectively. For α and β we have: α ∈ [−180^o, 180^o] and β ∈ [−90^o, 90^o].

The inverse of equation (2) is calculated by

\begin{array}{l} X_{u} = \cos β \cdot \sin α \\ Y_{u} = - \sin β \\ Z_{u} = \cos β \cdot \cos α \end{array}

(3)

4.5.4. Polar - image

Lastly, mapping from the polar coordinate system to a pixel in the image is shown in the following equation:

\begin{array}{l} u = w \cdot (0.5 + α / 2 π) \\ v = h \cdot (0.5 - β / π) \end{array}

(4)

where (u, v) is the presentation of the corresponding pixel in the image. And ρ is the pixel coordinate space where:

ρ = {(u, v) | u, v \in W, 0 \leq u < w, 0 \leq v < h}

(5)where h and w are the height and width of the image. And the mapping from the pixel in the image to polar coordinate system:

\begin{array}{l} α = 2 π \cdot (u / w - 0.5) \\ β = - π \cdot (v / h - 0.5) \end{array}

(6)

We use the extrinsic parameters matrix between the panoramic camera and the LiDAR sensor in the MMS. Then, using Equations (1), (2), and (4), we can project the 3D points on the corresponding panoramic image. Figure 6 presents the visual representation of a captured 3D laser scan, showcasing the collected point cloud data. Additionally, this figure demonstrates the accurate projection of laser points onto the corresponding panoramic image, allowing for a comprehensive fusion of visual and depth information.

Figure 6.

The 3D laser points (up) and the projection of these laser points onto the corresponding panoramic image (down).

5. Dataset

The dataset is available and can be downloaded from https://ausmlab.github.io/yutomms/.

5.1. Benchmark data

5.1.1. Panoramic images

The dataset includes panoramic images captured by the panoramic camera, with detailed information provided in Table 6. Although the raw captured files may differ (Table 3), the resulting high-resolution images are stored in the widely used JPG format, all uniformly with dimensions of 8000 pixels in width and 4000 pixels in height. It is worth noting that Sequence A differs slightly from the other sequences in terms of size, as it was captured at a different time. However, the provided information regarding all other aspects remains consistent across all sequences.

Table 6.

General information of the YUTO MMS dataset, including its sequence name, number of image, LiDAR and GPS/IMU data. Additionally, the total directory size for downloading is provided.

Sequence	Number of image files	Number of LiDAR scans	Number of GPS/IMU data	Total directory volume (GB)
A	700	1432	11,845	3.8
B	8382	17,395	143,637	45.6
C	10,778	22,992	189,875	59.2
D	4500	9615	79,506	25.2

5.1.2. LiDAR

The dataset includes LiDAR point cloud data, with comprehensive details provided in Table 6. The data is provided in the widely adopted .bin file format, which is consistent with the format used in the publicly available KITTI dataset. This ensures compatibility and ease of integration with existing LiDAR processing pipelines and algorithms. Similar to the panoramic images, Sequence A may differ slightly in size compared to the other sequences due to its capture at a different time. However, the essential information pertaining to point cloud data, such as point density and coordinate representations, remains consistent across all sequences.

5.2. Reference data

5.2.1. LiDAR mapping suit

Teledyne Optech’s Lidar Mapping Suite Pro (LMS Pro) serves as a robust data processing solution specifically designed for the Maverick MMS. The software enables comprehensive data processing for the entire project, encompassing trajectory, camera, and LiDAR data. The ground truth trajectories were obtained using it which is capable of producing trajectories with cm-level accuracy. The LMS utilizes high-precision GPS/IMU data and LiDAR points to perform offline bundle adjustment, ensuring precise alignment and synchronization between the ground truth trajectories and the collected data. By leveraging the capabilities of the LMS, we are able to provide ground truth trajectories that serve as a reliable reference for evaluating and validating SLAM and odometry algorithms in our dataset.

5.2.2. Ground Truth

The ground truth data structure for our dataset is saved in .txt format, providing precise and accurate information regarding the vehicle’s pose in a 6-degree-of-freedom (6-DoF) representation. Each line of the file corresponds to a specific image frame captured during the data collection process. The line contains the east, north, height coordinates representing the position of the vehicle, while the omega, phi, and kappa values denote the orientation of the vehicle at that particular point.

5.3. Benchmark data structure

5.3.1. Raw data

Figure 7 shows the details of the directory structure of our dataset. Each sequence has its related data, and also we have included auxiliary files for data transformation, sensor calibration, and performance evaluation of the future SLAM systems running on our dataset.

Figure 7.

The directory structure of the YUTO MMS dataset, with rounded rectangles representing folders and rectangles representing files. The “Sequence X” folder contains the data for a particular sequence.

5.3.2. Synchronization data

The synchronization data of the camera and LiDAR is saved in a .txt format, where each line of the file corresponds to a specific image frame and lists the corresponding LiDAR frame.

5.3.3. Calibration data

The IMU calibration parameters of the Maverick MMS remain fixed due to the tight coupling of the GPS and IMU sensors, as illustrated in Table 4. However, for accurate localization, it is necessary to perform recalibration of the panoramic camera and LiDAR prior to each data collection. The camera calibration parameters are saved in a .ccp format, while the LiDAR calibration parameters are stored in a .lcp format.

5.3.4. Transformation data

In addition, we have included the depth map corresponding to each panoramic image. The depth values in the map are derived through the process explained in Section Data Acquisition and Processing where raw ranges obtained from LiDAR points projected onto the panoramic image space. The depth maps are saved in JPG format and share the same dimensions as the panoramic images.

5.3.5. Auxiliary codes (data transformation, sensor calibration, etc)

We have developed MATLAB scripts that facilitate the loading and visualization of panoramic images and LiDAR scans within the MATLAB environment. These scripts enable the re-projection of LiDAR ranges onto the corresponding panoramic images using the calibration parameters. Furthermore, C++ code that generates rosbag format from our dataset is also provided. In order to evaluate the performance of Odometry or SLAM systems, we employed the rpg trajectory evaluation toolbox (Zhang and Scaramuzza 2018).

6. SOTA SLAMs

6.1. Visual SLAM

ORB-SLAM2 (Mur-Artal and Tardo´s 2017) is a widely recognized visual SLAM approach that exploits visual features for the purpose of simultaneous tracking and mapping. It employs the ORB descriptor to extract distinctive visual features within the scene. The keyframe-based methodology of ORB-SLAM2 involves the initial detection of visual features in keyframes, followed by their tracking across subsequent frames. Loop closing, a crucial component of this SLAM system, benefits from the utilization of bags of binary words. Notably, ORB-SLAM2 exhibits real-time performance, with all three modules, tracking, mapping, and loop closing, operating in parallel. Moreover, this SLAM framework is not limited to monocular cameras alone, as it is compatible with both stereo and RGB-D cameras.

6.1.1. Visual-inertial SLAM

VINS-Mono (Qin et al., 2018) is a versatile and robust monocular visual-inertial state estimator that leverages a tightly coupled, nonlinear optimization-based method to achieve precise visual-inertial odometry. By integrating pre-integrated IMU measurements and feature observations, VINS-Mono achieves high-precision estimation. The system incorporates loop detection and loop closure modules, which facilitate efficient re-localization with minimal computational overhead. Moreover, VINS-Mono introduces an online temporal calibration technique (Huai et al., 2015) to enhance sensor fusion performance. This approach jointly optimizes time offset, camera and IMU states, as well as feature locations in a SLAM system.

ROVIO (RObust Visual Inertial Odometry) (Bloesch et al., 2015) is a monocular visual-inertial odometry algorithm that is suitable for use in dynamic handheld experiments and unmanned aerial vehicles (UAVs). ROVIO employs an extended Kalman filter (EKF) to track multilevel patch features and uses pixel intensity errors as innovation terms during the update step. The landmark representation in ROVIO involves a bearing vector and inverse-distance parametrization, which are refined as part of the iterated EKF states, eliminating the need for an initialization procedure. Maplab (Schneider et al., 2018), an open and research-oriented visual-inertial mapping framework, includes ROVIOLI (ROVIO with localization integration), which serves as a mapping and localization frontend based on the ROVIO algorithm.

6.2. RGB-D SLAM

RPV-SLAM (Range-augmented Panoramic Visual SLAM) (Kang et al., 2021) is an innovative RGB-D SLAM that enhances the capabilities of ORB-SLAM2 (Mur-Artal and Tardo´s 2017) and OpenVSLAM (Sumikura et al., 2019) framework by integrating a 360° panoramic camera and sparse LiDAR ranges. This novel approach leverages bi-interpolation to create a dense depth map for a specific region in the image plane, which is then merged with the RGB image to create an RGB-D representation. By incorporating depth information, some map points are directly generated, while others are triangulated based on the proximity of keyframes. The resulting RGB-D mode is subsequently processed by ORB-SLAM2’s tracking, mapping, and loop closing modules.

6.3. LiDAR SLAM

LOAM (Zhang and Singh 2014) proposes an innovative methodology that integrates scan matching techniques with a robust motion estimation framework, enabling real-time odometry estimation and map generation. Leveraging the inherent geometric properties of LiDAR point clouds, specifically exploiting point-to-plane correspondence, the LOAM approach achieves optimized odometry estimation and accurate mapping outcomes. Comprehensive experimental evaluations conducted in diverse environments, encompassing both indoor and outdoor scenarios with varying motion dynamics, demonstrate the exceptional accuracy and real-time performance of LOAM. With its practical implementation and significant contributions to real-time LiDAR-based SLAM, the paper serves as a guiding resource for the advancement of robotic perception and autonomous systems, driving further research and innovation in the field.

6.4. LIDAR-Inertial SLAM

Google’ Cartographer is a LiDAR-centric SLAM system using backpack mapping platform, which achieves real-time mapping and loop closure at a 5 cm resolution. It uses grid map-based representation with flexible resolution and sensor choices. The Cartographer technique is divided into two processes. The first process is called local SLAM which uses a Ceres scan matcher on a small map, known as a submap, to estimate the pose and orientation of the vehicle. The second process is known as global SLAM, which optimizes the pose by utilizing a larger scan matcher on global maps (a global map generated from assembling all submaps).

6.5. Sensor-fusion-based SLAM

The PVL-Cartographer SLAM (Zhang et al., 2023) is a sensor-fusion-based SLAM system specifically designed for the Maverick MMS. This approach effectively integrates multiple sensors, including a panoramic camera, LiDAR, and IMU, within a pose graph framework, resulting in a robust and accurate SLAM solution. The system comprises two key components: the early fusion range-augmented panoramic visual odometry system, RPV, and the middle fusion approach based on pose graph integration. The performance of the PVL-Cartographer surpasses that of camera-centric, camera-inertial, LiDAR-centric, and LiDAR-inertial SLAM systems, even when only a limited number of visual features are augmented with ranges due to the limited overlap between the image and LiDAR points. These findings underscore the superior capabilities of the sensor-fusion-based SLAM system, particularly in challenging outdoor scenarios involving localization and mapping.

6.6. Dynamic SLAM

Dynamic SLAM has been studied for several decades, which is designed to handle environments with moving objects or dynamic elements (Wang et al., 2003). MID-Fusion (Xu et al., 2019) proposed a new multi-instance dynamic RGB-D SLAM system using an object-level octree-based volumetric representation. MID-Fusion achieved robust camera tracking in dynamic environment, while estimate geometric, semantic, and motion properties for arbitrary objects continuously. Kimera (Rosinol et al., 2021) is the first fully automatic method to build novel 3D Dynamic Scene Graph (DSG) from visual-inertial data, which seamlessly captures metric and semantic aspects of a dynamic environment. SuMa++ (Chen et al., 2019) facilitated semantic segmentation of point clouds using laser-based technology in complex and uncertain circumstances. We believe that our YUTO MMS dataset provides real-word scenarios with numerous dynamic objects in the outdoor road environments, which will enhance the proposed dataset’s utility for dynamic SLAM development.

7. Experimental Results

In this section, we will provide the experimental results of the mentioned methods. But first, we describe our evaluation metrics.

7.1. Evaluation metrics

7.1.1. Absolute Trajectory Error

Absolute Trajectory Error (ATE) is an evaluation metric in SLAM that quantifies the positional discrepancy between the estimated trajectory and ground truth. It measures the accuracy of the estimated trajectory by computing the Euclidean distance or RMSE between corresponding pose points. ATE provides a global assessment of localization performance, enabling researchers to compare SLAM algorithms and drive advancements in accurate trajectory estimation.

7.1.2. Relative Trajectory Error

Relative Trajectory Error (RTE) in SLAM quantifies the positional deviation between estimated and ground truth trajectories at intervals of 100, 200, …, and 800 m. It measures the Euclidean distance or RMSE between corresponding pose points to evaluate the accuracy of relative motion estimation. RTE provides valuable insights into the performance of SLAM algorithms in maintaining accurate inter-frame motion estimation at different distances, aiding in the assessment and improvement of localization and mapping systems.

7.1.3. Relative Rotation Error

Relative Rotation Error (RRE) in SLAM measures the angular deviation between estimated and ground truth relative rotations, often expressed in degrees per meter. It quantifies the rotational accuracy of the system, enabling the evaluation of rotation estimation performance at varying scales. By considering the rotational error relative to the translation distance, RRE provides valuable information on the consistency and precision of rotation estimates in SLAM algorithms.

7.2. Quantitative evaluation

We evaluated the performance of various state-of-the-art SLAM systems, including ORB-SLAM2 (Mur-Artal and Tardo´s 2017), VINS-Mono (Qin et al., 2018), RPV-SLAM (Kang et al., 2021), HDPV-SLAM (Ahmadi et al., 2023), LOAM (Zhang and Singh 2014), Google Cartographer (Hess et al., 2016), and PVL-Cartographer (Zhang et al., 2023), in our study. It is important to note that we specifically tested ORB-SLAM2 using monocular images to highlight the advantages of using panoramic images in visual SLAM. VINS-Mono, a visual-inertial SLAM system, was also tested using monocular images and IMU data from our Maverick MMS. Furthermore, we evaluated RPV-SLAM, a range-augmented panoramic visual SLAM system, and HDPV-SLAM, which incorporates two modules to enhance RPV-SLAM, one of which is a deep learning-based depth estimation module. Additionally, we examined LOAM, a well-known LiDAR-based odometry system, Google Cartographer, a LiDAR-inertial SLAM system that utilizes IMU data for z-direction estimation, and PVL-Cartographer, a panoramic visual-LiDAR-IMU SLAM system based on Google Cartographer. The results in Tables 7 and 8 demonstrate that, among the camera-centric SLAM systems, both RPV-SLAM and HDPV-SLAM outperform the state-of-the-art ORB-SLAM2 and VINS-Mono methods. In contrast, the LOAM algorithm failed to generate accurate trajectories due to the usage of a tilted LiDAR, highlighting the importance of incorporating IMU data to indicate the heading direction. Additionally, Google Cartographer exhibited occasional tracking loss and only produced partial trajectories. However, by integrating camera and LiDAR nodes into the pose graph and jointly optimizing them, the PVL-Cartographer system achieved accurate and robust results across a wide range of scenarios, surpassing existing methods in terms of ATE.

Table 7.

Comparison of positioning accuracy w.r.t. ATE [m].

Sequence	ORB-SLAM2	VINS	RPV-SLAM	HDPV-SLAM	LOAM	Cartographer	PVL-Cartographer
A	5.894	3.997	1.618	1.4	Fail	4.023	0.766
B	100.870	86.897	12.910	9.58	Fail	152.230	2.599
C	155.908	160.765	30.661	11.93	Fail	183.619	3.739
D	10.665	12.875	5.673	4.69	Fail	58.576	2.204

Table 8.

Comparison of positioning and orientation accuracy w.r.t. RTE [%] and RRE [deg/m].

Sequence		ORB-SLAM2	VINS	RPV-SLAM	HDPV-SLAM	LOAM	Cartographer	PVL-Cartographer
_A	RTE	7.769	4.685	3.934	3.59	Fail	6.789	3.027
_A	RRE	0.0677	0.0410	0.0040	0.0041	Fail	0.0507	0.0236
_B	RTE	13.770	10.779	3.096	1.23	Fail	15.047	1.273
_B	RRE	0.0099	0.0109	0.0009	0.0011	Fail	0.0090	0.0019
_C	RTE	4.879	3.987	3.752	2.77	Fail	5.764	0.853
_C	RRE	0.0289	0.0301	0.0057	0.0034	Fail	0.0133	0.0018
_D	RTE	2.878	3.085	1.347	0.9	Fail	4.650	2.555
_D	RRE	0.0148	0.0178	0.0017	0.001	Fail	0.0137	0.0035

7.3. Qualitative evaluation

Figure 8 displays the trajectories generated by SOTA SLAMs. Furthermore, the YUTO MMS dataset has proven to be a valuable resource not only for odometry and SLAM research but also for a wide range of applications in depth estimation, mapping, and 3D reconstruction. The panoramic camera has emerged as a powerful tool with distinct advantages for mapping applications. Its wide field of view enables the capture of a comprehensive and continuous visual representation of the surrounding environment. This panoramic imagery provides crucial contextual information, allowing for a more accurate and detailed mapping process. Figure 9 showcases the remarkable color textured point cloud generated from our YUTO MMS data, which provides a detailed representation of the urban scenarios. This high-quality dataset holds great potential for advancing various fields, such as robotics, computer vision, and geospatial analysis. Researchers can leverage this dataset to explore novel algorithms, validate existing methodologies, and drive innovation in the realm of visual perception and spatial understanding. The availability of such comprehensive and well-curated datasets like YUTO MMS contributes significantly to the advancement of scientific knowledge and the development of robust and reliable intelligent systems.

Figure 8.

Trajectory comparisons are presented for each sequence. It’s important to note that in (b), only a partial trajectory of Cartographer is displayed due to its operation being suspended in the middle of the sequence.

Figure 9.

GPS/IMU trajectories overlaid on satellite images and the corresponding color textured point cloud (LAS data) of YUTO MMS dataset in York University campus and Teledyne Optech headquarters.

8. Discussion

Our findings highlight the strengths of different SLAM systems in specific aspects of estimation. Camera-centric SLAM systems, RPV-SLAM and HDPV-SLAM, demonstrate superior performance in orientation estimation. On the other hand, LiDAR-centric systems exhibit better accuracy in position estimation. Among the collected data, Sequence C presents significant challenges due to its long distance and intricate loop structures. Nonetheless, the sensor-fusion-based SLAM system, PVL-Cartographer, outperforms other methods in terms of ATE, RTE, and RRE. This achievement is attributed to the effective loop closure and pose graph optimization modules, which minimize errors through map optimization when multiple loops are detected. Despite the challenges posed by this scenario, the PVL-Cartographer system showcases robustness and exceptional performance.

The comparative analysis demonstrates the superiority of RPV-SLAM, HDPV-SLAM, and PVL-Cartographer over existing methods, emphasizing the effectiveness of sensor-fusion-based SLAM. In contrast, ORB-SLAM2, VINS-Mono, LOAM, and Cartographer can only produce partial or inaccurate trajectories due to insufficient feature matching. The evaluation of SLAM systems using our YUTO MMS dataset reveals the robustness of sensor-fusion-based approaches compared to camera-only or LiDAR-only methods, especially when dealing with challenging data collected by an MMS equipped with a panoramic camera and a tilted LiDAR. As a result, our YUTO dataset, with its comprehensive range of sensors, holds great promise for advancing research in the field of SLAM-integrated mobile mapping applications. The dataset’s diverse sensor suite, encompassing panoramic cameras, LiDAR and GPS/IMU system, provides an invaluable resource for the development and evaluation of robust SLAM algorithms. The availability of such a dataset will facilitate in-depth investigations into the challenges of autonomous navigation, enabling researchers to explore novel techniques and methodologies for achieving accurate and reliable mapping, localization, and trajectory estimation in complex environments. Ultimately, these advancements will contribute to the realization of fully autonomous self-driving systems, enhancing their safety, efficiency, and overall performance in real-world scenarios.

9. Conclusions

In conclusion, this study introduces a comprehensive dataset captured using the Maverick MMS, tailed for addressing the intricate demands of mapping in challenging urban environments. This dataset integrates synchronized input from the Ladybug 5 panoramic camera, the tilted Velodyne HDL-32E LiDAR sensor, and the GPS/IMU unit, providing a rich source of information for SLAM and odometry research. The unique configuration of the Maverick system, with its tilted LiDAR sensor enhances the capture of elevated structures and a more comprehensive understanding of the environment. This dataset not only offers researchers an opportunity to explore novel approaches for the integration of SLAM and urban mobile mapping but also extends its utility to a spectrum of applications beyond the core focus. The dataset provides accurate ground truth trajectories through RTK positioning, which is useful for rigorous evaluation and benchmarking. It has a diverse range of environments that enable comprehensive testing and validation across various scenarios, making it ideal for research inquiries. For the future work, the dataset contains rich and synchronized sensor data that can be used for depth estimation, object detection and tracking, place recognition, and computer vision endeavors. The aim for future development of the SLAM-Integrated MMS dataset is to acquire long-term data that encompasses a variety of weather and illumination conditions in areas where GPS and IMU signals are compromised, such as tunnels and densely built-up downtown regions. This expansion will not only address the current dataset’s limitations, focusing mainly on outdoor settings under favorable weather conditions but also enhance its real-world applicability and relevance. By incorporating challenging scenarios, including adverse weather, low-light conditions, and texture-deficient areas, as well as data from indoor and hybrid environments, the dataset can become a more valuable resource for the research community. This work underscores the vital contributions made towards SLAM-Integrated MMS, paving the way for future advancements in autonomous navigation and mapping.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Natural Sciences and Engineering Research Council of Canada; (CRDPJ 537080-18).

ORCID iD

Gunho Sohn

Note

References

Ahmadi

Naeini

Sheikholeslami

, et al. (2023) HDPV-SLAM: hybrid depth-augmented panoramic visual slam for mobile mapping system with tilted lidar and panoramic visual camera. In: 2023 IEEE 19th International Conference on Automation Science and Engineering (CASE). Piscataway: IEEE, 1–8.

Bloesch

Omari

Hutter

, et al. (2015) Robust visual inertial odometry using a direct EKF-based approach. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Hamburg, Germany: IEEE, 298–304. DOI: 10.1109/IROS.2015.7353389.

Bonarini

Burgard

Fontana

et al. (2006) Rawseeds: robotics advancement through web-publishing of sensorial and elaborated extensive data sets. In: In Proceedings of IROS. Beijing, China: IEEE, volume 6. p. 93.

Carlevaris-Bianco

Ushani

Eustice

(2016) University of Michigan north campus long-term vision and lidar dataset. The International Journal of Robotics Research 35(9): 1023–1035.

Caruso

Engel

Cremers

(2015) Large-scale direct slam for omnidirectional cameras. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE, pp. 141–148.

Chen

Baatz

Ko¨ser

, et al. (2011) City- scale landmark identification on mobile devices. In: CVPR 2011. Piscataway: IEEE, 737–744.

Chen

Milioto

Palazzolo

, et al. (2019) Suma++: efficient lidar-based semantic slam. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IIEEE, 4530–4537.

Choi

Kim

Hwang

, et al. (2018) Kaist multi-spectral day/night data set for autonomous and assisted driving. IEEE Transactions on Intelligent Transportation Systems 19(3): 934–948. DOI: 10.1109/TITS.

Corte´s

Solin

Rahtu

, et al. (2018) Advio: an authentic dataset for visual-inertial odometry. In: Proceedings of the European Conference on Computer Vision (ECCV). Berlin, Germany: Springer, 419–434.

10.

Elhashash

Albanwan

Qin

(2022) A review of mobile mapping systems: from sensors to applications. Sensors 22: 4262. DOI: 10.3390/s22114262.

11.

Gao

Liang

Yang

, et al. (2022) Vector: a versatile event-centric benchmark for multi-sensor slam. IEEE Robotics and Automation Letters 7(3): 8217–8224.

12.

Geiger

Lenz P Stiller C , et al. (2013) Vision meets robotics: the kitti dataset. The International Journal of Robotics Research 32(11): 1231–1237. DOI:10.1177/0278364913491297.

13.

Geiger

Lenz

Urtasun

(2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, pp. 3354–3361. DOI:10.1109/CVPR.2012.6248074.

14.

Helmberger

Morin

Berner

, et al. (2022) The hilti slam challenge dataset. IEEE Robotics and Automation Letters 7(3): 7518–7525.

15.

Hess

Kohler

Rapp

, et al. (2016) Real-time loop closure in 2d lidar slam. In: 2016 IEEE International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 1271–1278.

16.

Huai

Zhang

Yilmaz

(2015) Real-time large scale 3d reconstruction by fusing kinect and imu data. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences 2: 491–496. doi: 10.5194/isprsannals-II-3-W5-491-2015.

17.

Jeong

Cho

Shin

, et al. (2018) Complex urban lidar data set. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 6344–6351.

18.

Jeong

Cho

Shin

, et al. (2019) Complex urban dataset with multi-level sensors from highly diverse urban environments. The International Journal of Robotics Research 38(6): 642–657. DOI: 10.1177/0278364919843996.

19.

Jung

Oto

Mozos

, et al. (2016) Multi-modal panoramic 3d outdoor datasets for place categorization. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE, 4545–4550.

20.

Kang

Zhang

Liu

, et al. (2021) RPV-SLAM: range-augmented panoramic visual slam for mobile mapping system with panoramic camera and tilted lidar. In: 2021 20th International Conference on Advanced Robotics (ICAR). Piscataway: IEEE, pp. 1066–1072.

21.

Wang

, et al. (2020) Slam integrated mobile mapping system in complex urban environments. ISPRS Journal of Photogrammetry and Remote Sensing 166: 316–332.

22.

Mart´ınez Mozos

Nakashima

Jung

, et al. (2019) Fukuoka datasets for place categorization. The International Journal of Robotics Research 38(5): 507–517.

23.

Meyer

Sm´ısˇek

Fontan Villacampa

, et al. (2021) The madmax data set for visual-inertial rover navigation on mars. Journal of Field Robotics 38(6): 833–853.

24.

Mur-Artal

Tardo´s

(2017) Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE transactions on robotics 33(5): 1255–1262.

25.

Pandey

McBride

Eustice

(2011) Ford campus vision and lidar data set. The International Journal of Robotics Research 30(13): 1543–1552.

26.

Pitropov

Garcia

Rebello

, et al. (2021) Canadian adverse driving conditions dataset. The International Journal of Robotics Research 40(4-5): 681–690. DOI: 10.1177/0278364920979368.

27.

Qin

Shen

(2018) Vins-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics 34(4): 1004–1020.

28.

Rosinol

Violette

Abate

, et al. (2021) Kimera: from slam to spatial perception with 3d dynamic scene graphs. The International Journal of Robotics Research 40(12-14): 1510–1546. DOI: 10.1177/02783649211056674.

29.

Schneider

Dymczyk

Fehr

, et al. (2018) Maplab: an open framework for research in visual-inertial mapping and localization. IEEE Robotics and Automation Letters 3(3): 1418–1425. DOI: 10.1109/LRA.2018.2800113.

30.

Sumikura

Shibuya

Sakurada

(2019) Openvslam: a versatile visual slam framework. In: Proceedings of the 27th ACM International Conference on Multimedia. New York: Association for Computing Machinery, pp. 2292–2295.

31.

Wang

Thorpe

Thrun

(2003) Online simultaneous localization and mapping with detection and tracking of moving objects: theory and results from a ground vehicle in crowded urban areas. In: 2003 IEEE International ConferenCe On Robotics and Automation (Cat. No.03CH37422). Piscataway: IEEE, vol.1. pp. 842–849. DOI:10.1109/ROBOT.2003.1241698.

32.

Tzoumanikas

, et al. (2019) Mid-fusion: octree-based object- level multi-instance dynamic slam. In: 2019 International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 5231–5237. DOI: 10.1109/ICRA.2019.8794371.

33.

Zhang

(2000) A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(11): 1330–1334.

34.

Zhang

Scaramuzza

(2018) A tutorial on quantitative trajectory evaluation for visual(-inertial) odometry. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, Spain: IEEE.

35.

Zhang

Singh

(2014) Loam: lidar odometry and mapping in real-time. In: Robotics: Science and Systems. Delft, Netherlands: Delft University.

36.

Zhang

Helmberger

LFT

, et al. (2022) Hilti-oxford dataset: a millimeter-accurate benchmark for simultaneous localization and mapping. IEEE Robotics and Automation Letters 8(1): 408–415.

37.

Zhang

Kang

Sohn

(2023) PVL-Cartographer: panoramic vision-aided lidar cartographer-based slam for maverick mobile mapping system. Remote Sensing 15(13): 3383. DOI: 10.3390/rs15133383.

YUTO MMS: A comprehensive SLAM dataset for urban mobile mapping with tilted LiDAR and panoramic camera integration

Abstract

Keywords

1. Introduction

2. Related works

3. Maverick mobile mapping system

3.1. System overview

3.2. System characteristics

3.2.1. Configuration

3.3. Sensors

3.3.1. LiDAR

3.3.2. Camera

3.3.3. GPS/IMU

4. Data acquisition and processing

4.1. Operation

4.2. Calibration

4.2.1. IMU calibration

4.2.2. LiDAR calibration

4.3. Panoramic camera calibration

4.4. Synchronization

4.4.1. Trajectory refinement

4.4.2. Time synchronization

4.5. Translation

4.5.1. Mappings Between 3D Points to Pixels in the Panoramic Image

4.5.2. 3D - Unit Sphere

4.5.3. Unit sphere - polar

4.5.4. Polar - image

5. Dataset

5.1. Benchmark data

5.1.1. Panoramic images

5.1.2. LiDAR

5.2. Reference data

5.2.1. LiDAR mapping suit

5.2.2. Ground Truth

5.3. Benchmark data structure

5.3.1. Raw data

5.3.2. Synchronization data

5.3.3. Calibration data

5.3.4. Transformation data

5.3.5. Auxiliary codes (data transformation, sensor calibration, etc)

6. SOTA SLAMs

6.1. Visual SLAM

6.1.1. Visual-inertial SLAM

6.2. RGB-D SLAM

6.3. LiDAR SLAM

6.4. LIDAR-Inertial SLAM

6.5. Sensor-fusion-based SLAM

6.6. Dynamic SLAM

7. Experimental Results

7.1. Evaluation metrics

7.1.1. Absolute Trajectory Error

7.1.2. Relative Trajectory Error

7.1.3. Relative Rotation Error

7.2. Quantitative evaluation

7.3. Qualitative evaluation

8. Discussion

9. Conclusions

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

Note

References