Multi-sensors multi-baseline mapping system for mobile robot using stereovision camera and laser-range device

Abstract

Countless applications today are using mobile robots, including autonomous navigation, security patrolling, housework, search-and-rescue operations, material handling, manufacturing, and automated transportation systems. Regardless of the application, a mobile robot must use a robust autonomous navigation system. Autonomous navigation remains one of the primary challenges in the mobile-robot industry; many control algorithms and techniques have been recently developed that aim to overcome this challenge. Among autonomous navigation methods, vision-based systems have been growing in recent years due to rapid gains in computational power and the reliability of visual sensors. The primary focus of research into vision-based navigation is to allow a mobile robot to navigate in an unstructured environment without collision. In recent years, several researchers have looked at methods for setting up autonomous mobile robots for navigational tasks. Among these methods, stereovision-based navigation is a promising approach for reliable and efficient navigation. In this article, we create and develop a novel mapping system for a robust autonomous navigation system. The main contribution of this article is the fuse of the multi-baseline stereovision (narrow and wide baselines) and laser-range reading data to enhance the accuracy of the point cloud, to reduce the ambiguity of correspondence matching, and to extend the field of view of the proposed mapping system to 180°. Another contribution is the pruning the region of interest of the three-dimensional point clouds to reduce the computational burden involved in the stereo process. Therefore, we called the proposed system multi-sensors multi-baseline mapping system. The experimental results illustrate the robustness and accuracy of the proposed system.

Keywords

Mobile robot stereovision three-dimensional point clouds multi-baseline

Introduction

The main focus of autonomous mobile-robot navigation research in indoor environment is to allow a mobile robot to navigate without collision. Robots are equipped with many sensors, such as cameras, lasers, infrared sensors, global positioning system (GPS), and ultrasonic sensors, which allow them to perceive their environment. In recent years, several researchers have looked at methods for setting up mobile robots for autonomous navigation task. Autonomous navigation includes several interrelated steps including mapping (Figure 1).

Figure 1.

Autonomous navigation steps.

Mapping in robotic is the task of constructing a spatial representation of a robot environment. Until the 1990s, the field of robotics mapping was separated into two categories either metric or topological approaches.^1,2

Metric maps are based on absolute reference frames and numerical estimation of where objects are in the space. In the metric framework, the environment is represented as a set of objects with coordinates in a two-dimensional (2D)/three-dimensional (3D) space. In other words, metric maps capture the geometric properties of the environment. The topological maps describe the connectivity of different places. They represent the map by determining the occupied and free space grid in the map of the environment. Later in 2000s, the probabilistic techniques became dominant in the field of robotics. Mapping in the robotics is referred as the simultaneous localization and mapping (SLAM)^3,4 or concurrent mapping and localization (CML).^5,6 Regardless the types of mapping, robot must be equipped with sensors that enable it to observe its environment. Robot’s sensors that are used in mapping can be divided as follows:

Non-vision sensors (e.g. laser, ultrasonic, infrared, or radar);

Vision sensors (e.g. monocular camera, stereo camera, or Kinect camera);

Multi-sensors (vision sensors with non-vision sensors).

Non-vision sensors, such as ultrasonic and laser, are range distance measurements. Ultrasonic sensors depend on sound propagation, whereas laser-range distance sensors are based on propagation of electromagnetic waves. Ultrasonic and laser-range sensors have been used for mapping operation in the mobile-robot navigation.^7–9 However, ultrasonic, laser rangefinder, or infrared sensors cannot capture color and texture features of the environment and suffer from other propagation sounds, electromagnetic waves, or light.^10,11 In addition, they are affected by soft surfaces and specular reflection and have a high sensitivity to external electric fields.^10,11

Among autonomous navigation methods, vision-based systems have been growing in recent years due to rapid gains in computational power and the reliability of visual sensors. In this research, the major enhancements concern the main aspects of the autonomous navigation system: the mapping operation. This involves building an accurate, real-time mapping system for mobile robots to be used in any unstructured, unknown environment.

This article is organized as follows. First, literature review is presented; second, the proposed multi-sensors multi-baseline mapping system (3MS) system is explained; third, system integration is presented; fourth, the experiments of 3MS are explained; fifth, the 3MS with other systems is compared; and finally, conclusion is given.

Literature review

Stereovision and mapping involve many algorithms and techniques, such as multi-baseline stereo, feature-point detection, obstacle detection, and accurate floor detection. Much research has dealt with the shortcomings of stereovision. Some approaches depend on using multiple images,^12–15 while others use filters for false pixels.^15,16 Still other methods use very small baselines, called micro-baselines.^12,13 Micro-baselines of a few millimeters provide the advantage that they can be captured using a single camera. That is, a “static” camera, either hand-held or mounted on a tripod, will typically vibrate some small amount while capturing video. The aim of stereovision algorithms is to rebuild a 3D image of a scene from multiple images. Multi-baseline is one type of binocular stereo. Standard binocular stereo works using two views of a scene, while multi-baseline stereo uses three or more. More views not only provide a better signal-to-noise ratio, but also eliminate most repetitive structures. Multi-baseline stereo has been used in the past to optimize grid-maps. The first practical use of occupancy grids may be attributed to A Elfes.¹⁷ Another use of occupancy grids was applied by D Murray and C Jennings.¹⁸ However, these approaches suffer from typical shortcomings of stereovision-based mapping, including sensitivity to false positives when observation time of the environment is short (due to a slower convergence rate while updating occupancy probability). In Lategahn et al.,¹⁹ an interesting occupancy grid-mapping technique was presented that uses both dense and sparse point clouds from stereovision and SLAM, respectively (structure and motion). In this case, although the algorithm appears to address false positives relatively well, there is a weakness in encountering outliers, as no multi-view, multi-baseline stereo-matching technique is utilized. Several other techniques in the literature attempt to utilize multi-baseline stereo at the level of local stereo matching or multi-view stereo in order to filter noise, such as Kuhn et al.,²⁰ Milella et al.,²¹ Gallup et al.,²² and Wang et al.,²³ but no technique was found to use higher level, multi-baseline filtering or to use such filtering in combination with multi-view stereo filtering. Regardless of the number of baselines or views, a good feature-point detector is absolutely essential for vision-based navigation.²⁴ The 3D point cloud retrieved via stereovision can be further filtered to detect good feature points. An accurate approach used for mapping applications is to detect 3D feature points within the point cloud provided by the stereovision camera. Well-known techniques for such purposes are point feature histogram (PFH),²⁵ normal aligned radial feature (NARF) points,²⁶ spin-image,²⁷ and Eigen-cascading style sheets (CSS),²⁸ as well as many other techniques.^29,30 These feature points help in the process of “point-cloud registration”²⁵ and thus result in point clouds that are well-aligned (stitched), although these have been gathered at temporally and spatially varying poses of a stereovision camera. If point-cloud registration is performed accurately, the resultant point cloud can be used for effective navigation; otherwise, a common problem, termed the “sliding problem,”²⁵ renders the overall mapping process erroneous. Many methods deal with the accuracy of 3D reconstruction, but ground-plane detection using minimal or noisy 3D data³¹ and detection of dominant planes in the environment is a detailed area of research in itself. Plane fitting is a common method for ground estimation. Among surveys, very few works suggest a method that detects both dominant and subdominant planes along with the height-based classification of obstacles. Among other obstacle-detection methods, some researchers such as Ambrosch and Kubinger³² and Matthies and Shafer³³ completely ignore the significance of detecting the ground plane and variations within it, while others such as Fazl-Ersi and Tsotsos³⁴ use pixel-based region segmentation and classification techniques that may or may not be able to classify floor anomalies. The random sample consensus (RANSAC) plane filter is a commonly used method to find disparities in ground pixels.³⁵ A road-detection algorithm called plane-fitting errors was developed by X Li et al.³⁶ This algorithm utilizes road features. In recent years, the V-disparity image has become common for ground-plane estimation.^37–39 V-disparity and plane fitting both depend on distinct road features. An additional issue in stereovision-based navigation is obstacle detection. The authors in Ghayalod and Hall⁴⁰ gave an introduction to an obstacle-detection approach based on stereovision proposed for mobile-robot navigation. The aim of the approach is to detect a real environment in order to allow a mobile robot to find a safe path even in complex scenarios. The approach is novel in terms of the two-stage perception structure. The detection stage infers the relationship between obstacles and the ground and determines the region of interest (ROI). Based on this, the confirmation stage focuses on characterizing the contours and positions of obstacles and removing the majority of artifacts. A vision-equipped, apelike robot based on the remote-brained approach was presented by Inaba et al.,⁴¹ meant to study a variety of vision-based behaviors. The robot does not bring its own brain with its body, but leaves the brain in the mother environment and communicates with it by radio link. The brain is raised in the mother environment, which is inherited over generations. In this framework, the robot system can have a powerful vision system in the brain environment. They have applied this approach toward the formation of vision-based, dynamic, and intelligent behavior of a multi-limbed mobile robot.

Proposed 3MS system

3MS enhances the mapping operation by building an accurate and real-time 3D map to use in any unstructured, unknown environment, regardless of the kind of application. 3MS fuses multi-baseline stereovision (narrow and wide baselines) and laser-range reading data to enhance the accuracy of the point cloud. Therefore, it combines the advantages of multi-baseline stereovision, a laser-range device, and multi-view (stereovision camera and laser-range device), as follows:

The wide baseline and high resolution provide more precise 3D points at longer ranges.

The narrow baseline improves close-range matching and solves minimum-range limitations.

The laser-range device extends the 2D field of view (FOV) to 180°.

The laser-range device allows the system to work under different light conditions.

Combining wide/narrow baselines and a laser reduces the ambiguity of correspondence matching, enhances its accuracy, and extends the FOV of 3MS to 180°.

Combining stereovision and a laser allows 3MS to prune the horizontal angle of the ROI of the 3D point clouds to 66° and to work within a 180° FOV.

Pruning the ROI of the 3D point clouds reduces the computational burden involved in the stereo process.

The overall vision of the 3MS is a horizontal angle of 180° on the x-axis and a vertical angle of 32° on the front (66°) y-axis, and 7 m on the z-axis.

The system therefore constructs better 3D point clouds to map the environment. 3MS consists of four modules, each executing specific tasks: (1) the initial processing module (IPM), (2) the pre-processing module (PPM), (3) the stereo processing module (SPM), and (4) the post-processing module (PM), as illustrated in Figure 2. In this section, we discuss in detail each component and module in the 3MS system.

Figure 2.

General modules of the proposed system.

ROI of 3MS

The ROI of the proposed 3MS is composed of four intersecting FOVs. Three FOVs belong to the stereovision camera (wide at 24 cm, left narrow at 12 cm, and right narrow at 12 cm), as illustrated in Figure 3(a), and one belongs to the laser device (Figure 3(b)). $R_{1}$ is the ROI of the left narrow baseline, with maximum horizontal angle of 43° on the x-axis, maximum vertical angle of 32° on the y-axis, and a z-axis of 0.2–5.0 m. $R_{2}$ is the ROI of the wide baseline, with maximum horizontal angle of 66° on the x-axis, maximum vertical angle of 32° on the y-axis, and z-axis of 0.9–7.0 m. $R_{3}$ is the ROI of the right narrow baseline, with maximum horizontal angle of 43° on the x-axis, maximum vertical angle of 32° on the y-axis, and z-axis of 0.2–5.0 m. The fourth ROI $R_{4}$ is of the laser-range device, with horizontal angle from 33° to 90° and from −33° to −90° on the x-axis, maximum vertical angle of 1° on the y-axis, and z-axis of 0.0 m to the working range. To the best of our knowledge, our system is the first proposed to use such a fusion.

Figure 3.

ROIs of 3MS: (a) ROI of three stereo baselines (R₁ is red, R₂ is green, and R₃ is blue) and (b) ROI of the laser device (R₄).

Initial processing module

IPM is the startup module for the system. It initiates system execution by starting up and configuring the system components (mobile robot, stereovision camera, and laser-range device), as illustrated in Figure 4.

Figure 4.

Initial processing module.

Pre-processing module

PPM enhances the captured raw images from the stereo camera for the SPM. It comprises four main steps: (1) determining the ROI (R₁–R₄), see Figures 3 and 9; (2) capturing the stereo images; and (3) image rectification and edge detection. As can be seen in Figure 5, the first step is to determine the ROI of the three baselines of the stereovision camera. The second step is to capture images with three distinct baseline configurations: wide (24 cm), left narrow (12 cm), and right narrow (12 cm). The third step is to rectify the images and detect the images’ edges.

Figure 5.

Pre-processing module.

Stereo processing module

SPM receives the stereo images from the PPM, generates the point clouds of each baseline, and then merges these clouds. It comprises four main steps (Figure 6): (1) stereo matching, (2) point-cloud extraction, (3) point-cloud filtering, and (4) point-cloud fusion. The first step performs the matching processing of each pair of images (left narrow, wide, and right narrow). The second step extracts the point cloud of each pair of images (according to the proposed stereo ROI, R₁–R₃, see Figure 3(a)), and then the third step filters each point cloud for unnecessary data. The fourth merges all stereo point clouds (R₁–R₃) into one point cloud, as illustrated in Figure 7. The matching process is performed via the sum of absolute differences (SAD) algorithm. Surface- and texture-validation filters are used to enhance the stereo-matching operation. Many parameters affect matching processing, such as image resolution, disparity range, mask size, surface validation, and texture validation. Two image resolutions are used, 1280 × 960 and 640 ×480 for wide and narrow baselines, respectively. Maximum disparity ranges of 240 are used for both baselines. The mask size is set to 13, the threshold of the texture-validation filter is set to 0.8, and the size of the surface-validation filter is set to 120.

Figure 6.

Stereo processing module.

Figure 7.

Fused point cloud.

Post-processing module

PM fuses the laser reading and point clouds from the stereovision to produce the output map. It comprises two main steps (see Figure 8): (1) location adjustment and (2) fusion. The location adjustment step adjusts the position of the laser-reading points and point clouds with respect to the position of the robot. This enhances the fusion of the laser-reading data with the point clouds from the stereovision of 3MS to deduce reliable localization information. All point clouds, R₁–R₄ are fused into one point cloud, as illustrated in Figure 9(a)–(c). Figure 9(a) illustrates the rear view of the fused point cloud, while Figure 9(b) illustrates the side view of the fused point cloud and Figure 9(c) illustrates the top view of the fused point cloud.

Figure 8.

Post-processing module.

Figure 9.

Fused point cloud of the proposed 3MS: (a) rear view of the proposed ROI; (b) side view of the proposed ROI; (c) top view of the proposed ROI.

System integration

3MS comprises four modules—the IPM, PPM, SPM, and PM—that, as illustrated in Figure 10, each execute specific tasks and interact with other modules. The interaction between all these modules represents the whole system. As in the flowchart shown in Figure 10, 3MS begins with IPM to prepare the system components (mobile robot, stereovision camera, and laser-range device) and then switches control of the 3MS to the PPM. The PPM enhances the captured raw images from the stereo camera for the SPM. The PPM determines the ROI, captures stereo images, rectifies the images, and detects the images’ edges. Thereafter, control switches the flow of 3MS to the SPM. SPM receives the stereo images from the PPM, generates the point clouds of each baseline, and then merges these clouds. The SPM performs stereo-matching, point-cloud extraction, point-cloud filtering, and point-cloud fusion steps.

Figure 10.

3MS integration.

After that, 3MS switches control to the PM. PM merges the laser readings and point clouds from the stereovision to produce the output map. It performs two main steps: (1) location adjustment and (2) fusion. Combining wide/narrow baselines and laser-range data reduces the ambiguity of correspondence matching, enhances accuracy, and extends the FOV of 3MS to 180°. And, the combination allows the 3MS to prune the horizontal angle of the ROI of the 3D point clouds to 66° and to work within a 180° FOV. In addition, pruning the ROI of the 3D point clouds reduces the computational time required by the stereo process.

Experiments

Experimental platform

The proposed system integrates three different types of hardware: a mobile robot, a stereo camera, and a laser-range device.

Mobile-robot platform

We used the PowerBot as mobile-robot platform⁴² (Figure 11). PowerBot is a wheeled mobile robot from Pioneer Mobile Robots, Inc. It is an automated, differential-drive-guided vehicle specially designed and equipped for autonomous, intelligent delivery, and handling of large payloads. It is a member of the Pioneer series of mobile robots, which are research and development platforms. It can carry a load of up to 100 kg on its flat-top surface and navigate with a speed of up to 1.6 m/s. It comes with ultrasonic sensors, an optional vision system, and laser-rangefinder accessories attached to its front and integrates a full-size PC with limited specifications. We improved these specifications to the following: Intel Core i7 Processor, 16 GB RAM, 40 GB Solid-State Hard Drive, and Windows Embedded Standard 7 64-bit Operating System.

Figure 11.

PowerBot mobile robot.

Stereo camera

We used a Bumblebee XB3 camera⁴³ as a stereo camera (Figure 12). The Bumblebee XB3 is mounted on top of the robot to provide information about the environment. The Bumblebee XB3 is a three-sensor, multi-baseline, IEEE-1394b (800Mb/s) camera. It features 1.3 megapixel sensors and has two baselines available for stereo processing. Its wide baseline and high resolution provide more precision at longer ranges, while the narrow baseline improves close-range matching and reduces minimum-range limitations. We used a Bumblebee camera with 1280 × 960@7.5 FPS resolution for the wide baseline and 640 × 480@15 FPS resolution for the narrow baseline.

Figure 12.

Bumblebee XB3 camera.

Laser-range device

3MS uses the laser-range device that is mounted on the top of the front bumper of the PowerBot (SICK LMS-200, see Figure 13(a)).⁴² The laser-range beam is 220 mm above the floor (Figure 13(b)). Data from the scanner are serial, acquired through the COM2 serial port on the onboard computer. SICK LMS-200 is located in the horizontal plane, relative to the center of the robot. This provides an offset from the laser readings’ origin to the robot’s position. X is the forward-back position (positive forward, negative backward), Y is the left-right position (positive right, negative left), and theta is the angle offset (counter-clockwise, usually 0). Left FOV and Right FOV indicate the total FOV or angle of sweep of the laser in degrees (typically, these are equivalent in magnitude since the FOV of the SICK LMS-200 is always centered). The integrated system platform is shown in Figure 14.

Figure 13.

Laser-range device: (a) SICK LMS-200 and (b) laser mounted location.

Figure 14.

Integrated system platform.

Experimental results

3MS was tested successfully many times in different indoor environments and in scenarios with varying degrees of obstacle complexity. The fused ROI depends on the laser reading and the stereovision output, as explained in the following:

R₁ is the ROI of the left narrow baseline, with maximum horizontal angle of 43° on the x-axis, maximum vertical angle of 32° on the y-axis, and z-axis of 0.2–5.0 m.

R₂ is the ROI of the wide baseline, with maximum horizontal angle of 66° on the x-axis, maximum vertical angle of 32° on the y-axis, and z-axis of 0.9–7.0 m.

R₃ is the ROI of the right narrow baseline, with maximum horizontal angle of 43° on the x-axis, maximum vertical angle of 32° on the y-axis, and z-axis of 0.2–5.0 m.

R₄ is the laser-range device, with maximum horizontal angle of 180° on the x-axis, maximum vertical angle of 1° on the y-axis, and z-axis of 0.0 m to the working range.

The first scenario included differently sized and shaped obstacles at different distances. The first obstacle was rectangular. Its center was located at X: 0.7 m, Y: 0.25 m, and Z: 1.5 m. The second obstacle was a chair. Its center was located at X: −0.33 m, Y: 0.42 m, and Z: 2.5 m. Some snapshots of the 3D point clouds from some experiments with the first scenario using the 3MS are displayed in Figure 15. Figure 15(a) shows the snapshot from a front angle, and Figure 15(b) shows the snapshot from a right angle, while Figure 15(c) and (d) shows the front angle snapshots of the chair and rectangular obstacles, respectively.

Figure 15.

Snapshot results of the proposed system (first scenario): (a) first scenario (front angle), (b) first scenario (right angle), (c) chair (front angle), and (d) rectangle (front angle).

The second scenario included a wall. Its center was located at X: 0.1 m, Y: 0.33 m, and Z: 1.5 m. Some snapshots of the 3D point clouds from some experiments with second scenario using the 3MS are displayed in Figure 16. Figure 16(a) shows the snapshot from a front angle, and Figure 16(b) shows the snapshot from a right angle. Table 1 shows the accuracy of the 3MS with respect to ground truth. The absolute errors were X: 0.0 m, Y: 0.03 m, and Z: 0.02 m for the rectangular obstacle; X: 0.02 m, Y: 0.01 m, and Z: 0.01 for the chair; and X: 0.01 m, Y: 0.02 m, and Z: 0.01 for the wall.

Figure 16.

Snapshot results of the proposed system (second scenario): (a) second scenario (front angle) and (b) second scenario (right angle).

Table 1.

3MS accuracy with respect to ground truth.

Scenario	Obstacle	Actual center location (m)	Measured center location (m)	Absolute error (m)	Maximum coverage area
First scenario	Rectangle	X: 0.7 Y: 0.25 Z: 1.5	X: 0.7 Y: 0.26 Z: 1.52	X: 0.0 Y: 0.01 Z: 0.02	H. angle: 180° V. angle: 32°
First scenario	Chair	X: −0.33 Y: 0.42 Z: 2.5	X: −0.34 Y: 0.43 Z: 2.51	X: 0.01 Y: 0.01 Z: 0.01	H. angle: 180° V. angle: 32°
Second scenario	Wall	X: 0.1 Y: 0.33 Z: 1.5	X: 0.11 Y: 0.35 Z: 1.51	X: 0.01 Y: 0.02 Z: 0.01	H. angle: 180° V. angle: 32°

3MS: multi-sensors multi-baseline mapping system.

As can be seen in Figure 15 and in Table 1, the proposed 3MS system extracted very clear, noiseless, accurate, and high-quality 3D point clouds of the environment and obstacles. The fusion between multi-baseline stereovision and laser-range data constructs an accurate map of the environment using 3D point clouds.

Comparison

Testing the whole system is more important than just testing each module individually. To the best of our knowledge, most work in the literature lacks such testing. Hence, in this research, we present the accuracy of our complete integrated system in different environments. To show the accuracy and robustness of the proposed 3MS, we do the following:

Compare the proposed 3MS with several systems;

Compare the accuracy of the proposed 3MS system with the work of H-J Yoon et al.,⁴⁴ and RH Thaher and ZK Hussein.⁴⁵ (what work we were able to find in the literature).

3MS comparison

In this section, we will compare the proposed 3MS with three different systems. The first uses a laser-range device (laser system), the second uses a stereovision camera with wide baselines (wide-baseline system), and the third uses a stereovision camera with narrow baselines (narrow-baseline system). We executed all four systems (the proposed 3MS, laser system, wide-baseline system, and narrow-baseline system) in the same environment using the same computer, scenarios, and obstacle configurations. All systems were tested in the same environment with differently sized and shaped obstacles at different distances, the same setup as described above. As illustrated in Table 2, all systems detected the obstacles (except the narrow-baseline system in the second scenario), but in terms of accuracy, angle of vision (both horizontal and vertical), and noiselessness, the proposed 3MS prevails.

Table 2.

Comparison of results between proposed system, laser system, wide-baseline system, and narrow-baseline system.

Scenario	System	Obstacle	Actual center location (m)	Measured centerlocation (m)	Absolute error (m)	Maximum coverage area
First scenario	3MS	Rectangle	X: 0.7 Y: 0.25 Z: 1.5	X: 0.7 Y: 0.26 Z: 1.52	X: 0.0 Y: 0.01 Z: 0.02	H. angle: 180° V. angle: 32°
	3MS	Chair	X: −0.33 Y: 0.42 Z: 2.5	X: −0.34 Y: 0.43 Z: 2.51	X: 0.01 Y: 0.01 Z: 0.01	H. angle: 180° V. angle: 32°
	Laser system	Rectangle	X: 0.7 Z: 1.5	X: 0.62 Z: 1.57	X: 0.07 Z: 0.07	H. angle: 180° V. angle: 1°
	Laser system	Chair	X: −0.33 Z: 2.5	X: −0.38 Z: 2.82	X: 0.05 Z: 0.32	H. angle: 180° V. angle: 1°
	Wide-baseline system	Rectangle	X: 0.7 Y: 0.25 Z: 1.5	X: 0.63 Y: 0.26 Z: 1.52	X: 0.07 Y: 0.01 Z: 0.02	H. angle: 66° V. angle: 32°
	Wide-baseline system	Chair	X: −0.33 Y: 0.42 Z: 2.5	X: −0.34 Y: 0.49 Z: 2.38	X: 0.01 Y: 0.07 Z: 0.12	H. angle: 66° V. angle: 32°
	Narrow-baseline system	Rectangle	X: 0.7 Y: 0.25 Z: 1.5	X: 0.63 Y: 0.16 Z: 1.52	X: 0.07 Y: 0.09 Z: 0.02	H. angle: 43° V. angle: 32°
	Narrow-baseline system	Chair	X: −0.33 Y: 0.42 Z: 2.5	X: −0.17 Y: 0.39 Z: 2.84	X: 0.16 Y: 0.03 Z: 0.34	H. angle: 43° V. angle: 32°
Second scenario	3MS	Wall	X: 0.1 Y: 0.33 Z: 1.5	X: 0.11 Y: 0.35 Z: 1.51	X: 0.01 Y: 0.02 Z: 0.01	H. angle: 180° V. angle: 32°
	Laser system	Wall	X: 0.1 Z: 1.5	X: 0.08 Z: 1.49	X: 0.02 Z: 0.01	H. angle: 180° V. angle: 1°
	Wide-baseline system	Wall	X: 0.1 Y: 0.33 Z: 1.5	X: 0.07 Y: 0.44 Z: 1.44	X: 0.06 Y: 0.11 Z: 0.06	H. angle: 66° V. angle: 32°
	Narrow-baseline system	Wall	Fail	Fail	Fail	H. angle: 66° V. angle: 32°

For the rectangular obstacle of the first scenario, the absolute errors were X: 0.0 m, Y: 0.01 m, and Z: 0.01 m for 3MS; X: 0.07 m and Z: 0.07 m for the laser system; X: 0.07 m, Y: 0.01 m, and Z: 0.02 m for the wide-baseline system; and X: 0.07 m, Y: 0.09 m, and Z: 0.02 m for the narrow-baseline system. Figures 17 –19 illustrate the x, y, and z values, respectively, regarding the compared systems detection of the rectangular obstacle.

Figure 17.

The location of the x-axis of the rectangular obstacle in the first scenario.

Figure 18.

The location of the y-axis of the rectangular obstacle in the first scenario.

Figure 19.

Location of the z-axis of the rectangular obstacle in the first scenario.

For the chair obstacle in the first scenario, the absolute errors were X: 0.01 m, Y: 0.01 m, and Z: 0.01 for the 3MS; X: 0.05 m and Z: 0.32 for the laser system; X: 0.01 m, Y: 0.07 m, and Z: 0.12 for the wide-baseline system; and X: 0.16 m, Y: 0.03 m, and Z: 0.34 for the narrow-baseline system. Figures 20 –22 illustrate the x, y, and z values, respectively, detected for the chair obstacle by the compared systems.

Figure 20.

Location of the x-axis of the chair obstacle in the first scenario.

Figure 21.

Location of the y-axis of the chair obstacle in the first scenario.

Figure 22.

Location of the z-axis of the chair obstacle in the first scenario.

For the wall obstacle in the second scenario, the absolute errors were X: 0.01 m, Y: 0.02 m, and Z: 0.01 for the 3MS; X: 0.02 m and Z: 0.01 m for the laser system; and X: 0.06 m, Y: 0.11 m, and Z: 0.06 m for the wide-baseline system. The narrow-baseline system failed to detect the obstacle. Figures 23 –25 illustrate the x, y, and z values, respectively, detected for the wall obstacle by the compared systems.

Figure 23.

Location of the x-axis of the wall obstacle in the second scenario.

Figure 24.

Location of the y-axis of the wall obstacle in the second scenario.

Figure 25.

Location of the z-axis of the wall obstacle in the second scenario.

As we can see in Figures 17 –24 and in Table 2 –4, compared to the other systems, the proposed 3MS extracted very clear, noiseless, and accurate 3D point clouds of the environment and obstacles with large vision angles (horizontal angle of 180° and vertical angle of 32°). The wide-baseline system follows the 3MS in its accuracy, but with some noise in certain snapshots and with small vision angles (horizontal angle of 66° and vertical angle of 32°). The narrow-baseline system was the worst, failing in the second scenario. The laser system produced somewhat accurate depictions of the environment, but in 2D (horizontal angle of 180° and vertical angle of 1°).

Table 3.

Snapshots resulting from the proposed 3MS, laser system, wide-baseline system, and narrow-baseline system.

System	Point clouds (first scenario)	Point clouds (second scenario)
3MS
Laser system
Wide-baseline system
Narrow-baseline system		Failed

Table 4.

Snapshots of obstacles by the proposed 3MS, laser system, wide-baseline system, and narrow-baseline system.

Module	Rectangle	Chair	Wall
3MS
Laser system
Wide-baseline system
Narrow-baseline system

3MS: multi-sensors multi-baseline mapping system.

Comparison with other work

We compared the accuracy of the depth estimation by the proposed 3MS module with the work of H-J Yoon et al.,⁴⁴ and with that of RH Thaher and ZK Hussein.⁴⁵ The results of this comparison are shown in Table 5 and Figure 26, from far 3 m to near 0.5 m. The proposed 3MS module shows better results than the compared systems,⁴⁴ with a maximum error of 0.02 m compared to 0.2 m in the other method.

Table 5.

First comparison, accuracy of depth estimation.

Real depth (m)	Proposed 3MS		Compared method⁴⁴
Real depth (m)	Estimated depth (m)	Errors (m)	Estimated depth (m)	Errors (m)
3	3.01	0.01	2.94	0.06
2.5	2.51	0.01	2.47	0.03
2	2.01	0.01	2.05	0.05
1.8	1.81	0.02	1.87	0.07
1.7	1.71	0.01	1.79	0.09
1.5	1.52	0.02	1.60	0.10
1.3	1.32	0.02	1.47	0.17
1	1.02	0.02	1.20	0.20

3MS: multi-sensors multi-baseline mapping system.

Figure 26.

First comparison, accuracy of depth estimation.

The results of the comparison shown in Table 6 and Figure 27 are from far 313 m to near 121 m. The proposed 3MS module shows better results than the compared method.⁴⁵ The maximum error of the proposed 3MS was 2 cm, while the compared method had an error of 10.58 cm.

Table 6.

Second comparison, accuracy of depth estimation.

Experiment no.	Real depth (cm)	Proposed 3MS		Compared method⁴⁵
Experiment no.	Real depth (cm)	Estimated depth (cm)	Errors (cm)	Estimated depth (cm)	Errors (cm)
1	121	123	2	121.88	0.88
2	160	161	1	159.02	0.98
3	236	237	1	234.59	1.41
4	297	298	1	302.7	5.7
5	313	314	1	323.58	10.58

3MS: multi-sensors multi-baseline mapping system.

Figure 27.

Second comparison, accuracy of depth estimation.

Time execution

In terms of execution time, pruning point clouds in the PPM reduce the computational operations required by the SPM and PM. We executed 3MS 10 times with and without pruning the point clouds. As can be seen in Table 7 and Figure 28, pruning the point clouds in the PPM reduces average execution times from 882.7 to 771.5 ms, 637.8 to 377.2 ms, and 22.3 to 15.8 ms for PPM, SPM, and PM, respectively, along with reducing the total run time from 1.54 to 1.16 s.

Table 7.

Execution time of 3MS.

Execution time of 3MS (ms)					Execution time of 3MS without pruning point clouds (ms)
	PPM	SPM	PM	Total run time	PPM	SPM	PM	Total run time
1	780	405	16	1201	870	642	21	1533
2	780	358	17	1155	890	658	24	1572
3	765	354	15	1134	904	598	19	1521
4	764	401	16	1181	898	656	22	1576
5	765	386	16	1167	865	687	29	1581
6	764	346	16	1126	903	611	21	1535
7	780	388	16	1184	821	641	21	1483
8	780	351	15	1146	872	622	23	1517
9	766	381	15	1162	880	664	24	1568
10	771	402	16	1189	924	599	19	1542
Avg (std. dev)	771.5 ms (7.5)	377.2 ms (22.9)	15.8 ms (0.6)	1.164 s (24.59)	882.7 ms (28.4)	637.8 ms (29.6)	22.3 ms (2.9)	1.542 s (31.4)

3MS: multi-sensors multi-baseline mapping system; PPM: pre-processing module; SPM: stereo processing module; PM: post-processing module.

Figure 28.

Compression of execution time.

Conclusion

In this research, we have enhanced the efficacy of the mapping operation by fusing multi-baseline stereovision with laser-range data. This fusion builds an accurate, real-time mapping. The proposed system is used to build an accurate, real-time map for mobile robots. The proposed system could be used in any kind of mobile-robot application, such as housework, security patrolling, search-and-rescue operations, material handling, manufacturing, or automated transportation systems. A comparison of the proposed system with three other mapping systems (laser system, wide-baseline system, and narrow-baseline system) shows the accuracy and robustness of the proposed system. The proposed system has the following features: (1) 3D vision, (2) noiseless point clouds, (3) accurate point clouds, (4) 180° horizontal angle and 66° vertical angle, and (5) works under different light conditions.

Footnotes

Academic Editor: Anand Thite

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research group no. RG-1437-018.

References

Thrun

Robotic mapping: a survey. In: Lakemeyer

Nebel

(eds) Exploring artificial intelligence in the new millennium. San Francisco, CA: Morgan Kaufmann Publishers, 2002, pp.1–35.

Dudek

Jenkin

Computational principles of mobile robotics. Cambridge: Cambridge University Press, 2010.

Durrant-Whyte

Majumder

Thrun

. A Bayesian algorithm for simultaneous localisation and map building. In: Thrun

Brooks

Durrant-Whyte

(eds) Robotics research. Berlin: Springer, 2003, pp.49–60.

Dissanayake

Durrant-Whyte

Bailey

A computationally efficient solution to the simultaneous localisation and map building (SLAM) problem. In: Proceedings ICRA’00 IEEE international conference on robotics and automation, San Francisco, CA, 24–28 April 2000. New York: IEEE.

Leonard

Feder

HJS

. A computationally efficient method for large-scale concurrent mapping and localization. In: Robotics research international symposium, 2000, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.4705&rep=rep1&type=pdf

Thrun

Burgard

Fox

A probabilistic approach to concurrent mapping and localization for mobile robots. Auton Robot 1998; 5: 253–271.

Elfes

Sonar-based real-world mapping and navigation. IEEE J Robot Autom 1987; 3: 249–265.

Kurz

Constructing maps for mobile robot navigation based on ultrasonic range data. IEEE T Syst Man Cy B 1996; 26: 233–242.

Hoppen

Knieriemen

Von Puttkamer

. Laser-radar based mapping and navigation for an autonomous mobile robot. In: Proceedings of 1990 IEEE international conference on robotics and automation, Cincinnati, OH, 13–18 May 1990. New York: IEEE.

10.

Siegwart

Nourbakhsh

Scaramuzza

Introduction to autonomous mobile robots. Cambridge, MA: MIT Press, 2011.

11.

Fiorillo

AS.

Design and characterization of a PVDF ultrasonic range sensor. IEEE T Ultrason Ferr 1992; 39: 688–692.

12.

Zhang

Liu

Dai

Light field from micro-baseline image pair. Displays 2013; 32: 221.

13.

Joshi

Zitnick

CL.

Micro-baseline stereo. Technical report MSR-TR-2014-73, Microsoft Research, 2014.

14.

Ito

Ishii

. Range and shape measurement using three-view stereo analysis. In: Proceedings of the international conference on computer vision and pattern recognition, Miami Beach, FL, 1986, pp.9–14.

15.

Yang

Tang

Ahuja

Efficient and robust specular highlight removal. IEEE T Pattern Anal 2015; 37: 1304–1311.

16.

Blake

Brelstaff

Specular stereo. In: IJCAI, 18 August 1985, pp. 973–976.

17.

Elfes

Using occupancy grids for mobile robot perception and navigation. Computer 1989; 22: 46–57.

18.

Murray

Jennings

. Stereo vision based mapping and navigation for mobile robots. In: Proceedings of 1997 IEEE international conference on robotics and automation, Albuquerque, NM, 20–25 April 1997. New York: IEEE.

19.

Lategahn

Derendarz

Graf

. Occupancy grid computation from dense stereo and sparse structure and motion points for automotive applications. In: 2010 IEEE intelligent vehicles symposium (IV), San Diego, CA, 21–24 June 2010. New York: IEEE.

20.

Kuhn

Hirschmüller

Mayer

Multi-resolution range data fusion for multi-view stereo reconstruction. In: Weickert

Hein

Schiele

(eds) Pattern recognition. Berlin: Springer, 2013, pp.41–50.

21.

Milella

Reina

Foglia

. A multi-baseline stereo system for scene segmentation in natural environments. In: 2013 IEEE international conference on technologies for practical robot applications (TePRA), Woburn, MA, 22–23 April 2013. New York: IEEE.

22.

Gallup

Frahm

J-M

Mordohai

. Variable baseline/resolution stereo. In: 2008 CVPR IEEE conference on computer vision and pattern recognition, Anchorage, AK, 23–28 June 2008. New York: IEEE.

23.

Wang

Guzman

. Real time obstacle detection for AGV navigation using multi-baseline stereo. In: Rus

Singh

(eds) Experimental robotics VII. Berlin: Springer, 2001, pp.561–568.

24.

Lemaire

Berger

Jung

I-K

. Vision-based slam: stereo and monocular approaches. Int J Comput Vision 2007; 74: 343–364.

25.

Rusu

Blodow

Beetz

Fast point feature histograms (FPFH) for 3D registration. In: 2009 ICRA’09 IEEE international conference on robotics and automation, Kobe, Japan, 12–17 May 2009. New York: IEEE.

26.

Steder

Rusu

Konolige

. Point feature extraction on 3D range scans taking into account object boundaries. In: 2011 IEEE international conference on robotics and automation (ICRA), Shanghai, China, 9–13 May 2011. New York: IEEE.

27.

Johnson

Hebert

Using spin images for efficient object recognition in cluttered 3D scenes. IEEE T Pattern Anal 1999; 21: 433–449.

28.

Stiene

Lingemann

Nuchter

. Contour-based object detection in range images. In: Third international symposium on 3D data processing, visualization, and transmission, Chapel Hill, NC, 14–16 June 2006. New York: IEEE.

29.

Zach

Penate-Sanchez

Pham

M-T

. A Dynamic programming approach for fast and robust object pose recognition from range images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, 7–12 June 2015. New York: IEEE.

30.

Martínez

Loncomilla

Ruiz-del-Solar

. Object recognition for manipulation tasks in real domestic settings: a comparative study. In: Bianchi

RAC

Akin

Ramamoorthy

. (eds) RoboCup 2014: robot World Cup XVIII. Berlin: Springer, 2015, pp.207–219.

31.

Shi

Wang

Yin

. High-accuracy stereo matching based on adaptive ground control points. IEEE T Image Process 2015; 24: 1412–1423.

32.

Ambrosch

Kubinger

Accurate hardware-based stereo vision. Comput Vis Image Und 2010; 114: 1303–1316.

33.

Matthies

Shafer

Error modeling in stereo navigation. IEEE J Robot Autom 1987; 3: 239–248.

34.

Fazl-Ersi

Tsotsos

JK.

Region classification for robust floor detection in indoor environments. In: Kamel

Campilho

(eds) Image analysis and recognition. Berlin: Springer, 2009, pp.717–726.

35.

Fischler

Bolles

RC.

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 1981; 24: 381–395.

36.

Yao

Murphey

. A real-time vehicle detection and tracking system in outdoor traffic scenes. In: ICPR 2004 proceedings of the 17th international conference on pattern recognition, Cambridge, 23–26 August 2004. New York: IEEE.

37.

Rebut

Toulminet

Bensrhair

. Road obstacles detection using a self-adaptive stereo vision sensor: a contribution to the ARCOS French project. In: 2004 IEEE intelligent vehicles symposium, Parma, 14–17 June 2004. New York: IEEE.

38.

Labayrade

Aubert

Tarel

J-P

. Real time obstacle detection in stereovision on non flat road geometry through “v-disparity” representation. In: 2002 IEEE intelligent vehicle symposium, Versailles, 17–21 June 2002. New York: IEEE.

39.

Broggi

Caraffi

Fedriga

. Obstacle detection with stereo vision for off-road vehicle navigation. In: 2005 CVPR workshops IEEE computer society conference on computer vision and pattern recognition workshops, San Diego, CA, 25 June 2005. New York: IEEE.

40.

Ghayalod

Hall

EL.

Intelligent robot control using omnidirectional vision. In: Intelligent robots and computer vision XI: algorithms, techniques, and active vision, Boston, MA, 15 November 1992. Bellingham, WA: SPIE.

41.

Inaba

Kanehiro

Kagami

. Vision-equipped apelike robot based on the remote-brained approach. In: Proceedings of 1995 IEEE international conference on robotics and automation, Nagoya, 21–27 May 1995. New York: IEEE.

42.

Adept Mobile Robots. http://www.mobilerobots.com/ResearchRobots/PowerBot.aspx

43.

Point Grey. Bumblebee^® XB3, 2014, https://www.ptgrey.com/bumblebee-xb3-1394b-stereo-vision-camera-systems-2

44.

Yoon

H-J

Hwang

Y-C

Cha

E-Y

. Real-time container position estimation method using stereo vision for container auto-landing system. In: 2010 international conference on control automation and systems (ICCAS), Seoul, South Korea, 27–30 October 2010. New York: IEEE.

45.

Thaher

Hussein

ZK.

Stereo vision distance estimation employing SAD with canny edge detector. Int J Comput Appl T 2014; 107: 38–43.