Sage Journals: Discover world-class research

Abstract

In this paper, we propose a simple method to obtain an object’s 3D coordinate information in one image, using one monocular camera and two 2D LiDAR sensors, which are widely used low-cost sensors. An extrinsic calibration method is used for each LiDAR sensor to transfer LiDAR coordinate to camera pixel coordinate in one image. The proportion factor $η$ is calculated, based on the relationship of pixel from calibration points and distance from detected points of two LiDAR sensors. In order to create a correct 3D map, a deep learning algorithm for real-time object contour detection with a fully Convolutional Neural Network is proposed, through gathering data, labeling data, training model, and testing object contour detector. For 3D map building, by the proportion factor $η$ and the detector contour feature pixels, the height of the object is obtained. Combining with LiDAR sensor’s detection information, the 3D posture of the object is obtained. Based on the proposed extrinsic sensor calibration method and the object contour detect result, virtual points of the object are calculated. Finally, the experiment results show the 3D coordinate information of the detected single or multiple objects. Assembling sensors and a controller on a Kobuki robot, the indoor 3D map is built based on the proposed methods.

Keywords

Sensor fusion simple extrinsic calibration method of a monocular camera and 2D LiDAR sensors object’s 3D coordinate information real-time object contour detector fully convolutional neural network 3D map building

Introduction

In self-driving system or robot control system, perception, planning and control are three core parts.¹ Sensor, controller and actuator are connected to form a close-loop network control system.² For perception, LiDAR, camera, and GNSS sensors are used. About motion control, there are sliding mode control algorithm,^3,4 adaptive control algorithm,⁵ and so on. Sensor data processing are the prerequisites for the successful operation of the system. Recently, Neural Network control has been an important research topic recently.⁶ Machine vision has a wide detection range and abundant signals. It is widely used in object detection. However, it is easily affected by external factors. Using a monocular camera, the image contains only two-dimensional information of the measured point, and lacks the distance information from the measured point to the lens center. Therefore, it is needed to use other types of sensors to assist measurement. LiDAR for object detection uses the scanner to obtain the two-dimensional or three-dimensional position information of the object, but it lacks the color information of the object. LiDAR and camera jointly calibrate and recognize objects, mainly using the depth information of the object measured by LiDAR and the object image information collected by camera. Merging the depth and image information to identify the object, the object’s color, size, and position are obtained. High-precision localization and 3-dimensional mapping in an indoor environment has been hot research issues in the field of mobile measurement. LiDAR and vision sensors are integrated into the indoor mobile measurement system as the main sensors, in order to solve the above two issues. The monocular camera can obtain color information in the environment, and the laser can obtain spatial position information. The combination of these two sensors achieves complementary advantages.⁷ In order to combine LiDAR sensor and monocular camera, calibration is the first issue needed to be solved. Calibration is mainly the problem of data point matching between the camera coordinate system and the laser coordinate system. It includes line-to-planar matching, point-to-line matching, and planar-to-planar matching. The calibration board includes checkerboard calibration board,⁸ triangle calibration board,⁹ polygonal planar boards,¹⁰ and V-shaped calibration board.¹¹ Checkerboard is the most popular pattern design. Firstly, by binarizing the camera image and finding the black chessboard fields, chessboard corners candidates are found. Secondly, a filtering step is used to obtain quadrilaterals, which satisfied with certain size criteria and are organized in a regular grid structure. In order to be detected, the entire checkerboard should be visible in all images. It is difficult to obtain the information of the very edges of images. However, these areas provide good information, because they constrain the lens distortion model properly.¹² In addition, using the checkerboard, the calibrations include intrinsic and extrinsic calibrations. Two steps calibrations require two measurements from the checkerboard, which cause two sources of error.¹⁰ The triangle calibration board or polygonal planar board estimate the vertices from the scanned range data. The vertices are used as reference points between the color image and the 3D scanned data.¹⁰ However, it is difficult to obtain all polygonal vertices using 2D LiDAR sensor. In this paper, in order to obtain the 3D coordinate information of the object on the image, two V-shaped calibration boards are used for two LiDAR sensors to found the corresponding points on scan planes and an image.

Applying computer vision (CV) and machine learning (ML)¹³ in robotics has become hot research topics recently.¹⁴ Along with the growing number of images in public and private collections, the requirement of object detection is increasing. Object detection systems play important role in reaching higher-level autonomy for robots.¹⁵ TensorFlow’s Object Detection API is a powerful tool that makes it easy to construct, train, and deploy object detection models. In most of the cases, training an entire convolutional network from scratch is time consuming and requires large datasets. This problem can be solved by using the advantage of transfer learning with a pre-trained model using the TensorFlow API. However, using this method, we can only obtain boxes around specific objects. It is hard to obtain the correct 3D map only based on object detection boxes. Object contour detection is a good solution to build the correct 3D map. Classic OpenCV Canny¹⁶ or Sobel¹⁷ object edge detectors design simple filters to detect pixels with highest gradients in their local neighborhood. These filters only focus on the color or brightness differences between adjacent pixels but not on the texture differences.¹⁸ Although object contour detection is the basis requirement for performing practical tasks,¹⁹ the exploration of object contour detection in the literature is relatively insufficient.²⁰ Deep convolutional neural networks have demonstrated remarkable ability for object contour detection.^18,21 For training contour detectors in Uijlings and Ferrar,²¹ because it is very difficult to gather high quality contour annotations, the available data set is limited. In Asmaidi et al.,¹⁸ it focuses on detecting foreground objects. When using this method to create 3D map, it may ignore some important objects.²² showed a novel method for real-time 3D traffic cone detection, however, the method is only designed for traffic cone detection, but not for other objects. In this paper, we propose a deep learning algorithm for real-time object contour detection with a fully Convolutional Neural Network. In the indoor environment, the robot should detect cones, different color paper or plastic boxes, tables, and cabinets. Based on these objects, we create image data sets with different background and lighting conditions. The gray label image is created for each image. The fully encoder and decoder network is designed for object contour detection. The proposed method yields good performance for real-time object contour detection.

Mapping the environment for indoor and outdoor robots is very important for navigation, manipulation, and planning.²³ When implementing manipulation or path planning for robots, it will reduce obstacle or collision risk, if the robot uses 3D object information instead of 2D object information. Some 3D sensors are available for 3D mapping. The Velodyne LiDAR is the popular 3D LiDAR sensors. Due to high resolution and high acquisition rate, a lot of self-driving cars use them. However, the cost of this LiDAR is high.²⁴ RGB depth cameras, which provide color and depth information, are used as a substitute for 3D LIDAR sensor.²⁵ RGB-D cameras are low cost, but are limited in sensing range (0.5–4 m), accuracy, and being affected by lightning conditions.²⁶ In addition, depth information or feature points cannot be processed in real-time due to large calculation costs.²⁷ Similarly, the output of a stereo camera²⁸ is greatly affected by the lighting conditions and the appearance of the texture of the object. A combination of a 2D laser and a pan-tilt unit is used to create 3D map, but this system spends much time to move the pan-tilt unit.²⁹ There are a lot of research on 3D map building based on object detection,³⁰ but few research on 3D map building on object contour. It is hard to obtain correct 3D object information only based on object detection boxes. Object contour detection is a good solution to build correct 3D map. In Athavale et al.,²⁷ an edge-based object tracking for dynamic projection mapping is proposed. However, if the contour does not change depending on the posture, the method will fail. Therefore, the method is not suitable for completely objective shape. In Xie and Tu,²² a novel method for real-time 3D traffic cone detection is proposed, however, the method is only designed for traffic cone detection, but not for other objects. In this paper, based on the proposed extrinsic sensor calibration method and the object contour detect result, a simple 3D map building method is proposed. The 3D map building method includes two steps. Firstly, the height of the object is obtained, and then the 3D posture of the object is obtained by combining with LiDAR sensor’s detection information. Secondly, based on the extrinsic sensor calibration results and object contour detection result, virtual points of the object are calculated. Based on the proposed methods, experiment results show effective 3D map building results detecting single or multiple different objects.

The contribution of this paper is listed as: 1. For sensor fusion, using two low-cost 2D LiDAR sensors and a monocular camera, a simple extrinsic sensor calibration method is proposed to obtain an object’s 3D coordinate information. 2. In order to obtain correct 3D object information, a real-time object contour detection based on a fully Convolutional Neural Network is proposed, through gathering data, labeling data, training model, and testing object contour detector. 3. Based on extrinsic sensor calibration method and real-time object contour detector, 3D map building method is proposed by calculating object’s 3D posture and adding virtual points. 4. The experiment results show the effectiveness and efficiency of the proposed methods.

The rest of this paper is organized as: section 2 discusses a simple extrinsic sensor calibration method; section 3 discusses object contour detection method with a fully Convolutional Neural Network; section 4 describes the 3D map building method; section 5 shows the simulation and experiment results.

A simple extrinsic sensor calibration method to obtain an object’s 3D coordinate information

Coordinate system definition

The world-robot-LiDAR-camera system is shown in Figure 1. We define there are four coordinate systems: world coordinate system, robot coordinate system, LiDAR coordinate system, and camera coordinate system. The center of the robot coordinate system, presented as ${[\begin{matrix} x^{r} & y^{r} & z^{r} \end{matrix}]}^{T}$ , is the origin of the robot. The movement direction of the robot is the x-axis, the y-axis points to the left, and the z-axis points to up, in order to form a Cartesian right-handed coordinate system. The world coordinate system ${[\begin{matrix} x^{w} & y^{w} & z^{w} \end{matrix}]}^{T}$ describes the relative position relationship among different sensors. The origin of the world coordinate is defined on the ground. The directions of the axes are the same as the axes of the robot coordinate system. Two z-axes coincide. The polar coordinate system presents the detected target information by LiDAR, including distance d and angle $α$ . In this paper, in order to obtain 3D calibration coordinate result, two Slamtech RPLiDAR A2 sensors are used. Two z-axes coincide. The direction of x-axes and y-axes of two LiDAR sensors are same. Each LiDAR follows the left-hand rule to form its coordinate system. The front of the sensor is defined as the x-axis of the coordinate system, the origin of the coordinate system is the center of rotation of the ranging core, and the rotation angle increases with the clockwise rotation. The pixel coordinate system takes the top left corner of the image collected by the camera as the origin. The x-axis and y-axis are parallel to the x-axis and y-axis of the image coordinate system, respectively. ${[\begin{matrix} u^{c} & v^{c} \end{matrix}]}^{T}$ is used to represent its coordinate value.

Figure 1.

World-robot-LiDAR-camera coordinate system.

2D LiDAR coordinate system and robot coordinate system conversion

Generally, the LiDAR coordinate $X^{l} = [x^{l}, y^{l}, z^{l}]^{T}$ is transferred to robot coordinate $X^{r} = [x^{r}, y^{r}, z^{r}]^{T}$ , based on (1), where $φ_{x}$ , $φ_{y}$ , and $φ_{z}$ are the rotation angle with respect of x-axes, y-axes, and z-axes, counter-clock-wise, respectively. $P_{x}$ , $P_{y}$ , and $P_{z}$ are the translation distance in x-axes, y-axes, and z-axes, respectively.

\begin{matrix} X^{r} = [\begin{matrix} \cos φ_{z} & - \sin φ_{z} & 0 \\ \sin φ_{z} & \cos φ_{z} & 0 \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} \cos φ_{y} & 0 & \sin φ_{y} \\ 0 & 1 & 0 \\ - \sin φ_{y} & 0 & \cos φ_{y} \end{matrix}] \\ [\begin{matrix} 1 & 0 & 0 \\ 0 & \cos φ_{x} & - \sin φ_{x} \\ 0 & \sin φ_{x} & \cos φ_{x} \end{matrix}] [\begin{matrix} x^{l} \\ y^{l} \\ z^{l} \end{matrix}] + [\begin{matrix} P_{x} \\ P_{y} \\ P_{z} \end{matrix}] \end{matrix}

(1)

As shown in Figure 1, the 2D Slamtech RPLiDAR is equipped in the middle front of the horizontal robot plate. The LiDAR detection plane and the robot coordinate plane are approximately on a two-dimensional horizontal plane as shown in Figure 2. For the LiDAR detection point $p$ , the ${[\begin{matrix} x^{l} & y^{l} & z^{l} \end{matrix}]}^{T}$ represents the LiDAR cartesian coordinate system, where $x^{l}$ and $y^{l}$ are calculated based on $d$ and $α$ as (2) and (3). $z^{l}$ represents the height of LiDAR from the ground, so $z^{l}$ is equal to (4).

x^{l} = d \times \cos α

(2)

y^{l} = d \times \sin α

(3)

z^{l} = z^{r}

(4)

Figure 2.

2D coordinate system of LiDAR and robot.

In order to easily implement robot navigation or map building processes, all sensors’ coordinate should convert to the robot coordinate. For the target detected by LiDAR, the coordinates converted to the robot coordinate system are as:

x^{r} = x_{0} + x^{l} = x_{0} + d \times \cos α

(5)

y^{r} = - y^{l} = - d \times \sin α

(6)

z^{l} = z^{r}

(7)

Simple extrinsic sensor calibration method

Transfer one LiDAR coordinate to camera pixel coordinate

The scanning points of the LiDAR makes a line in the 3D space. The transformation from 2D LiDAR coordinates to image pixel coordinate can be regarded as a planar projective transformation. The 2D LiDAR coordinate multiplies an identity matrix to convert to the image pixel coordinate, as (8).

[\begin{matrix} u^{c} \\ v^{c} \\ 1 \end{matrix}] = [\begin{matrix} h_{1} & h_{2} & h_{3} \\ h_{4} & h_{5} & h_{6} \\ h_{7} & h_{8} & h_{9} \end{matrix}] [\begin{matrix} x^{l} \\ y^{l} \\ 1 \end{matrix}]

(8)

Define scaling factor $λ = \frac{1}{h_{9}}$ and $n_{i} = \frac{h_{i}}{h_{9}}$ ( $i = 1, 2, 3, . . ., 8$ ), (8) can be written as:

λ [\begin{matrix} u^{c} \\ v^{c} \\ 1 \end{matrix}] = [\begin{matrix} n_{1} & n_{2} & n_{3} \\ n_{4} & n_{5} & n_{6} \\ n_{7} & n_{8} & 1 \end{matrix}] [\begin{matrix} x^{l} \\ y^{l} \\ 1 \end{matrix}]

(9)

In (9), $n_{i}$ ( $i = 1, 2, 3, . . ., 8$ ) is the required transformation matrix parameter. Based on (9), the following three equations (10)–(12) can be obtained.

λ u^{c} = n_{1} x^{l} + n_{2} y^{l} + n_{3}

(10)

λ v^{c} = n_{4} x^{l} + n_{5} y^{l} + n_{6}

(11)

λ = n_{7} x^{l} + n_{8} y^{l} + 1

(12)

Substitute $λ$ into (10) and (11), (13) and (14) can be obtained.

u^{c} = (n_{1} x^{l} + n_{2} y^{l} + n_{3} - n_{7} x^{l} u^{c} - n_{8} y^{l} u^{c})

(13)

v^{c} = (n_{4} x^{l} + n_{5} y^{l} + n_{6} - n_{7} x^{l} v^{c} - n_{8} y^{l} v^{c})

(14)

Assuming that $j$ sets of LiDAR points and corresponding image coordinate points are collected, the following matrix can be obtained, based on (13) and (14).

[\begin{matrix} x_{1}^{l} & y_{1}^{l} & 1 & 0 & 0 & 0 & - x_{1}^{l} u_{1}^{c} & - y_{1}^{l} u_{1}^{c} \\ 0 & 0 & 0 & x_{1}^{l} & y_{1}^{l} & 1 & - x_{1}^{l} v_{1}^{c} & - y_{1}^{l} v_{1}^{c} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ x_{j}^{l} & y_{j}^{l} & 1 & 0 & 0 & 0 & - x_{j}^{l} u_{j}^{c} & - y_{j}^{l} u_{j}^{c} \\ 0 & 0 & 0 & x_{j}^{l} & y_{j}^{l} & 1 & - x_{j}^{l} v_{j}^{c} & - y_{j}^{l} v_{j}^{c} \end{matrix}] [\begin{matrix} n_{1} \\ n_{2} \\ n_{3} \\ n_{4} \\ n_{5} \\ n_{6} \\ n_{7} \\ n_{8} \end{matrix}] = [\begin{matrix} u_{1}^{c} \\ v_{1}^{c} \\ ⋮ \\ u_{j}^{c} \\ v_{j}^{c} \end{matrix}]

(15)

Calculate proportion factor

In Section 2.2, two LiDAR sensors are equipped in the middle front of the horizontal robot plate, and the heights of the LiDAR sensors from the ground are $z^{l 1}$ and $z^{l 2}$ , respectively. Therefore, the distances from the scanning horizontal line to the ground are $z^{l 1}$ and $z^{l 2}$ , which are positive constants. Based on section 2.3.1.1, define $[u^{c 1}, v^{c 1}]$ and $[u^{c 2}, v^{c 2}]$ are the calibration points in the image with respect of the LiDAR’s detected points $[x^{l 1}, y^{l 1}]$ and $[x^{l 2}, y^{l 2}]$ , respectively. $[x^{l 1}, y^{l 1}]$ and $[x^{l 2}, y^{l 2}]$ are calculated from set of $d^{1}$ and $α^{1}$ , and set of $d^{2}$ and $α^{2}$ , respectively. Define two detected points are selected, based on $α^{1} = α^{2}$ . Then as shown in Figure 3, the relationship of the pixel from two calibration points and the distance in z-axes from two detected points is described as:

η = \frac{v^{c 1} - v^{c 2}}{z^{l 2} - z^{l 1}}

(16)

Figure 3.

Relationship of the pixel from two calibration points and the distance in z-axes from two detected points.

where $η$ is the proportion factor, which represents the distance in z-axes of two conjunct pixels. Therefore, the distance in z-axes in real world of any two points in the image can be obtained, according to the proportion factor $η$ and the points’ pixels.

distance = η \times pixel

(17)

Real-time object contour detection method based on a fully convolutional neural network

Image collection and labeling

In this section, deep learning algorithm is applied to detect object contour based on fully Convolution Neural Network. Thousand images are collected, including cones, paper boxes, plastic boxes, sofas, basketballs, and toys. The selected objects have different colors, different texture, and different material. Each image randomly has one, two, or three objects. If one image has more than one object, the objects are placed next to each other or separately. The images have different background and lighting conditions. In addition, the objects in the image are detected in different directions. As shown in Figure 4(a), the shape of the collected image is 320 * 320 * 3. The original collected image includes box and plastic cone, basketball, sofa, and toy truck. Among the 1000 images, 800 images are used for training, and the other 200 images are used as test images. For all collected images, we label the desired objects, based on Figure 5. The camera acquires an image. And the image is resized to 320 ×320 × 3. Color threshold block coverts the color image to a binary image. Canny edge block obtains object’s edge. In the gray morphology block, the bright area of the edge dilates around the black regions of the background. We create the getCounter function to obtain the object’s contour and label image. As shown in Figure 4(b), the label image is a grayscale image. The contour of the desired object is white and the pixel is 255. The other pixels are filled in 0, and shown as black color.

Figure 4.

The original collected images: box and plastic cone, basketball, sofa, and toy truck (a), and the labeling images shown the desired object contour: box and plastic cone, basketball, sofa, and toy truck (b).

Figure 5.

Dataset creation diagram.

Neural network architecture

The fully encoder and decoder network is designed for object contour detection. As shown in Figure 6, the neural network architecture includes 16 convolutional layers, 6 max pooling layers, 6 unpooling layers, and 16 deconvolution layers. The input is a 320 ×320 ×3 color image. The output is a 320 × 320 ×1 gray image, which includes the desired object contour information. Each convolutional layer works with a batch normalization. The rectified linear units (ReLU) is the non-linear activation. Each deconvolution layer also includes a batch normalization followed by ReLU activation. The shape for each layer is depicted in Figure 6, including channels {32, 64, 128, 256, 512, 1024}. As described in De Gregorio et al.,³¹ as the convolutional layer increases, the tensor volume has more channels, but the spatial dimension is reduced. This means the tensor contains more global and advanced information than specific local information. Using the designed network architecture, the tensor has more global and advanced information to increase the output accuracy.

Figure 6.

The fully connected Neural Network Architecture.

Loss function and training

The pixel-wise logistic loss function is applied to calculate the pixel-to-pixel loss between the prediction image and the label image. In the program, the TensorFlow function is used to calculate the cross entropy of the sigmoid for the given prediction. In order to ensure stability and avoid overflow, the loss function is described as:

\begin{matrix} max (prediction, 0) - prediction * label \\ + \log (1 + \exp (- | prediction |) \end{matrix}

(18)

The momentum optimizer is used to train the model. The learning rate is 0.00001. The momentum equals 0.9. The power value used in the poly learning policy is 0.9. The value used for clipping is 1. The network is trained for 100 epochs. The maximum number of checkpoints to be saved is 50.

3D map building method

In this section, based on the proposed extrinsic sensor calibration method and the object contour detect result, a simple 3D map building method is proposed. The 3D map building method consists of two steps. Firstly, by the proportion factor $η$ and the detector contour feature pixels, the height of the object is obtained. Combining with LiDAR sensor’s detection information, the 3D posture of the object is obtained. Secondly, based on the extrinsic sensor calibration results and object contour detection result, virtual points of the object are calculated.

Calculate the 3D posture of the object

Based on (17), $η$ is the proportion factor, which represents the distance in z-axes of two conjunct pixels. If the maximum difference of the object pixel along height is obtained, the height of the object in the real world can be calculated. As shown in Figure 6, the result of the object contour detection method is an 8-bits gray level image, which only has black and white. The black pixel is 0, and the white pixel is 255. The size of the image is defined as $[x_{image}, y_{image}]$ . If multiple objects are detected, ROI (region of interest) for each object is created. The ROI can be described as $[x_{roi_lu}, y_{roi_lu}, x_{roi_rd}, y_{roi_rd}],$ where $[x_{roi_lu}, y_{roi_lu}]$ is the left-up point position and $[x_{roi_rd}, y_{roi_rd}]$ is the right-down point position. Firstly, sum of pixels in each raw is calculated, and the result is defined as $p y_{j}$ , where $j = 0, 1, \dots, y_{roi} - 1$ is the pixel raw number of the image. Based on the value of $p y_{j}$ , two sub_ROI rectangles are selected, based on (19), as the white rectangles shown in Figure 7. In (19), $α$ and $β$ are positive integer constant. If $p y_{j} > α * 255$ , all $j$ values are recorded. Indexing the $j$ value, the minimum value $j_{\min}$ , and maximum value $j_{\max}$ are used to create ROIs. For the four values used to describe the ROI, the first two values are the ROI’s left top point position, and the rest two values are the ROI’s right bottom point position.

\begin{matrix} if p y_{j} > α * 255, then \\ {\begin{matrix} ROI - 1 [x_{roi_lu}, j_{\min} - β, x_{roi_rd}, j_{\min} + β] j_{\min} \neq y_{roi_lu} \\ ROI - 2 [x_{roi_lu}, j_{\max} - β, x_{roi_rd}, j_{\max} + β] j_{\max} \neq y_{roi_rd} - 1 \end{matrix} \end{matrix}

(19)

Figure 7.

Calculation of 3D posture of the object.

When the recording value $j_{\min} = y_{roi_lu}$ and maximum value $j_{\max} = y_{roi_rd} - 1$ , two sub_ROI rectangles ROI_1 and ROI_2 should be calculated based on (20).

\begin{matrix} if p y_{j} > α * 255, then \\ {\begin{matrix} ROI - 1 [x_{roi_lu}, j_{\min}, x_{roi_rd}, j_{\min} + β] j_{\min} = y_{roi_lu} \\ ROI - 2 [x_{roi_lu}, j_{\max} - β, x_{roi_rd}, j_{\max}] j_{\max} = y_{roi_rd} - 1 \end{matrix} \end{matrix}

(20)

For each ROI, pixels in each column are calculated, and the result is defined as $p x_{k}$ , where $k = x_{roi_lu}, x_{roi_lu} + 1, \dots, x_{roi_rd} - 1$ is the column number of the image. Based on the value of $p x_{k}$ , the location of the central point is calculated based on (21), as the blue point shown in Figure 7. In (21), $γ$ is a positive integer constant. If $p x_{k} > γ * 255$ , all $k$ values are recorded. Then in (21), $\sum k$ means the sum up of the recorded $k$ .

if p x_{k} > γ * 255, then {\begin{matrix} [int (\frac{\sum k}{number of k}), j_{\min}], for ROI - 1 \\ [int (\frac{\sum k}{number of k}), j_{\max}], for ROI - 2 \end{matrix}

(21)

Based on (17), the height of the object is equal to $η * (j_{\max} - j_{\min})$ . The angle $θ$ can be calculated as (22). Based on the LiDAR information, the desired object’s information in x-axes and y-axes can be obtained. Based on (22), the desired object’s different angle in z-axis is calculated. Therefore, the 3D posture of the object is obtained.

θ = \frac{π}{2} - atan 2 (- (j_{m i n} - j_{m a x}), (ROI_1 (i n t (\frac{\sum k}{n u m b e r o f k})) - ROI_2 (i n t (\frac{\sum k}{n u m b e r o f k}))))

(22)

Calculate virtual points

In order to draw the virtual points of the detection object, the result from (19) and (20) is used. The pixel of the figure is considered as an 2D array box. As shown in Figure 8, for each block, the pixel is defined as $p [m] [j]$ , where $j$ is in range [ $j_{\min}, j_{\max}$ ] and $m$ is in range [ $x_{roi_lu}, x_{roi_rd} - 1$ ]. The virtual point pixel is calculated row by row. For each row $j$ , if $p [m] [j] > δ * 255$ ( $δ is a constant$ ), all column value $m$ values are recorded. Indexing the $m$ value, the minimum value $m_{\min}$ and maximum value $m_{\max}$ are used to create virtual points. Between $p [m_{\min}] [j]$ and $p [m_{\max}] [j]$ , the virtual points are filled.

Figure 8.

Image pixel array box.

Simulation and experiment

Extrinsic sensor calibration result

In this section, V-shaped calibration boards are used to found the corresponding points on scan planes and an image, as shown in Figure 9. Five pieces of right-angled trapezoid foam core boards are stitched together. The angle (∠ $p_{1}^{1} p_{2}^{1} p_{3}^{1}$ ) made by two pieces of board is 90°, in a V-shaped. Assume that LiDAR 1 detection points are $p_{1}^{1}, p_{2}^{1}, p_{3}^{1}, p_{4}^{1}, p_{5}^{1}, p_{6}^{1}$ , and LiDAR 2 detection points are $p_{1}^{2}, p_{2}^{2}, p_{3}^{2}, p_{4}^{2}, p_{5}^{2}, p_{6}^{2}$ .

Figure 9.

V-shaped calibration board.

Based on (15), to find the optimal solution for an overdetermined equation, six sets of data are required. Two LiDAR sensors and camera are mounted on the Kobuki robot as shown in Figure 1. Using LiDAR 1 and LiDAR 2 to detect the V-shaped board, detection lines, made by points, are obtained as shown in Figures 10 and 11 respectively, where the unit of x-axes and y-axes is meter. LiDAR 1 are in origin point $(0, 0)$ . Based on (2)–(4), $x^{l 1}$ and $y^{l 1}$ of detection points are calculated using distance $d^{1}$ and angle $α^{1}$ . In order to calculate $n$ parameters, the data $x^{l 1}$ and $y^{l 1}$ of detection points $p_{1}^{1}, p_{2}^{1}, p_{3}^{1}, p_{4}^{1}, p_{5}^{1}, p_{6}^{1}$ is gathered. To minimize detection error, the calibration board is detected 10 times, and the average of measurement value is used for points $p_{1}^{1}, p_{2}^{1}, p_{3}^{1}, p_{4}^{1}, p_{5}^{1}, p_{6}^{1}$ . Using the same method, the data $x^{l 2}$ and $y^{l 2}$ detection points $p_{1}^{2}, p_{2}^{2}, p_{3}^{2}, p_{4}^{2}, p_{5}^{2}, p_{6}^{2}$ is gathered. Adding the height of LiDAR 1 from the ground $z^{l 1}$ and the height of LiDAR 2 from the ground $z^{l 2}$ , the measurement value of points for calibration is shown in Table 1.

Figure 10.

Detection points by a 2D LiDAR 1.

Figure 11.

Detection points by a 2D LiDAR 2.

Table 1.

Measurement value of points $p_{1}^{1}, p_{2}^{1}, p_{3}^{1}, p_{4}^{1}, p_{5}^{1}, p_{6}^{1}$ of LiDAR 1 and $p_{1}^{2}, p_{2}^{2}, p_{3}^{2}, p_{4}^{2}, p_{5}^{2}, p_{6}^{2}$ of LiDAR 2 for calibration.

	Point 1	Point 2	Point 3	Point 4	Point 5	Point 6
$x^{l 1}$ (m)	1.087	1.205	1.075	1.199	1.075	1.183
$y^{l 1}$ (m)	−0.255	−0.129	−0.001	0.129	0.282	0.369
$z^{l 1}$ (m)	0.18	0.18	0.18	0.18	0.18	0.18
$x^{l 2}$ (m)	1.087	1.214	1.084	1.195	1.071	1.163
$y^{l 2}$ (m)	−0.237	−0.078	0.026	0.173	0.312	0.395
$z^{l 2}$ (m)	0.26	0.26	0.26	0.26	0.26	0.26

A monocular camera is equipped in front of the robot. Using the camera, the robot detects the V-shaped calibration board. The camera acquires the image and resized the image to 320 * 320 for image processing. The result is shown in Figure 12. The pixel value of points $p_{1}^{1}, p_{2}^{1}, p_{3}^{1}, p_{4}^{1}, p_{5}^{1}, p_{6}^{1}$ and $p_{1}^{2}, p_{2}^{2}, p_{3}^{2}, p_{4}^{2}, p_{5}^{2}, p_{6}^{2}$ is collected for following calibration process. The pixel value is shown in Table 2.

Figure 12.

Extract calibration board using OpenCV function.

Table 2.

Measurement pixel value of points $p_{1}^{1}, p_{2}^{1}, p_{3}^{1}, p_{4}^{1}, p_{5}^{1}, p_{6}^{1}$ and $p_{1}^{2}, p_{2}^{2}, p_{3}^{2}, p_{4}^{2}, p_{5}^{2}, p_{6}^{2}$ for calibration by a camera.

	Point 1	Point 2	Point 3	Point 4	Point 5	Point 6
$u^{c 1}$	69	109	143	177	217	240
$v^{c 1}$	217	212	217	210	215	208
$u^{c 2}$	69	110	143	175	214	243
$v^{c 2}$	188	184	186	182	185	180

After collecting all data of LiDAR 1, LiDAR 2, and a camera, as shown in Tables 1 and 2, based on (15), the calibration parameter can be calculated as Table 3. $n_{i}^{1} (i = 1, 2, \dots, 9)$ and $n_{i}^{2} (i = 1, 2, \dots, 9)$ represent calibration parameter for LiDAR 1 and camera, and for LiDAR 2 and camera, respectively.

Table 3.

Calibration parameter for LiDAR 1 and camera, and for LiDAR 2 and camera, respectively.

$n^{1}$	$n_{1}^{1}$	$n_{2}^{1}$	$n_{3}^{1}$	$n_{4}^{1}$	$n_{5}^{1}$	$n_{6}^{1}$	$n_{7}^{1}$	$n_{8}^{1}$	$n_{9}^{1}$
value	−19.5344	230.839	132.1308	−80.1167	24.605	255.0797	−0.2044	0.1419	1
$n^{2}$	$n_{1}^{2}$	$n_{2}^{2}$	$n_{3}^{2}$	$n_{4}^{2}$	$n_{5}^{2}$	$n_{6}^{2}$	$n_{7}^{2}$	$n_{8}^{2}$	$n_{9}^{2}$
value	−84.2739	118.2467	150.4878	−106.868	1.4718	197.8957	−0.5165	0.0275	1

Using calibration parameters in Table 3 and Figure 13 shows the calibration result. The detection points’ coordinate information of LiDAR sensors is transferred into pixel coordinate information, and then drawn as dots in the image. The image size is 320 * 320. The lower points are calculated from LiDAR sensor 1. There are nine points. The upper points are computed from LiDAR sensor 2. There are 13 points. LiDAR 1 and LiDAR 2 detection points are shown in Figures 14 and 15, respectively. In Figure 13, the shape of the calibration points is similar as the shape of LiDAR sensor’s detection points shown in Figures 14 and 15. Due to the calibration error from the V-shaped board and the LiDAR sensor’s noise, error occurs in Figure 13.

Figure 13.

Calibrate Two LiDAR sensors detected points into one image.

Figure 14.

Detection points by a 2D LiDAR 1.

Figure 15.

Detection points by a 2D LiDAR 2.

Real-time object contour detection result based on a fully convolutional neural network

Based on description in Figure 6, the model is trained. The training image shape is 320 * 320 * 3. The momentum value is 0.9 to use. The power value in the poly learning policy is selected as 0.9. The learning rate is equal to 0.00001. The Epochs is set as 100. The maximum number of checkpoints to be save is 50. For hardware, the NVIDIA GeForce GTX 1030 graphic card is installed in the training PC. For software, CUDA toolkit 11.0 and cuDNN 8.0 are installed. The training time for each step is around 0.4213 s. For each step of training, the loss is reported. It starts at about 12,500. After 15,790 steps training, the total loss dropped to 0.00581. Using TensorBoard, the progress of the training is shown as Figure 16. Based on the highest training checkpoint files, the object contour detector is tested, using a monocular camera. The object contour detector shows the object contour detection result in Figure 17(a). The detection frame per second (FPS) is 8. The original figure is shown in Figure 17(b). Comparing with the test label image, the mean square error (MSE) is 0.0469.

Figure 16.

Total loss of training process.

Figure 17.

(a) Object contour detection result based on fully CNN and (b) original image for object contour detection.

3D map building based on extrinsic sensor calibration result and object contour detector result

As shown in Figure 18, combining the extrinsic sensor calibration method and real-time object contour detector, the object’s 3D coordinate information is obtained and the map is built. LiDAR 1, LiDAR 2, and camera are multi-thread processed. After processing LiDAR data as (5) and (6), LiDAR data is transferred to camera pixel and printed in the image as (15). Based on LiDAR data (17) and transferring results, the proportion factor is calculated as (16). Based on real-time object contour detection result and proportion factor, object’s height and different angle in the z-axis are calculated. Finally, the object’s 3D coordinate information is obtained and a 3D map is built. As shown in Figure 19(a), the object is detected. In order to calculate the correct height of the object, the fully contour of the object should be detected. In Figure 19(b), the white dots are presented as two LiDAR detection points. Transferring LiDAR cartesian coordinate of LiDAR detection points to world coordinate, the object’s information in x-axis and y-axis is obtained as shown in Figure 20. Based on real-time object contour detector based on a fully Convolutional Neural Network, the object’s contour is detected. Using (17) and (22), the object’s height and the different angle in z-axis is calculated as 0.3 m. The object is placed in the ground, the ideal different angle in z-axis is 0. The ground is 100% even, so error occurs. The object real height is 0.27 m. The height error is 0.03 m. The result is acceptable. Based on description in Section 4.2, virtual points are filled in the object as the gray points in Figure 19(b). Finally, the 3D map is built.

Figure 18.

3D map building system structure.

Figure 19.

(a) Original image for 3D map building and (b) 3D map building result.

Figure 20.

Detection points by a 2D LiDAR.

3D map building including multiple-objects

When multiple-objects are detected as shown in Figure 21(a), based on the description in Section 4, the object’s 3D coordinate information can be obtained. The effective 3D map is built as shown in Figure 21(b). Because of shadow noise in Figure 21(a), an error occurs in Figure 21(b). The left object’s height and the different angle in the z-axis are calculated as 0.4 m and −0.24 rad. The real height of the object is 0.37 m. The height and different angle in the z-axis of the right multi-objects are 0.49 m and −0.21 rad. Since the top of the cone is thin, there is a detection error. The object’s information in the x-axis and the y-axis from LiDAR 1 and LiDAR 2 are shown in Figures 22 and 23.

Figure 21.

(a) Original image for 3D map building and (b) 3D map building result.

Figure 22.

Detection points by a 2D LiDAR 1.

Figure 23.

Detection points by a 2D LiDAR 2.

Conclusion

In this paper, 3D map building method is proposed, based on extrinsic sensor calibration method and real-time object contour detection method. The proposed methods are designed for low-cost sensors: one monocular camera and two 2D LiDAR sensors. The extrinsic calibration method is used to transfer each LiDAR coordinate to camera pixel coordinate in one image. In order to obtain the height and the different angle of the object in z-axis, the proportion factor $η$ is designed, based on the relationship of the pixel from two calibration points and the distance in z-axes from two detected points. A deep learning algorithm for real-time object contour detection with a fully Convolutional Neural Network is proposed, through gathering data, labeling data, training model, and testing object contour detector. To build 3D map including single or multi-objects, virtual points for the detected object is created. The experiments show effective of the proposed methods.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Yanyan Dai

References

Dai

Lee

. Perception, planning and control for self-driving system based on on-board sensors. Adv Mech Eng 2020; 12(9): 1–13.

Tong

Liu

Qian

, et al. Design of a networked tracking control system with a data-based approach. IEEE/CAA J Autom Sin 2019; 6(5): 1261–1267.

Qian

Ding

Lee

, et al. Suppression of chaotic behaviors in a complex biological system by disturbance observer-based derivative-integral terminal sliding mode. IEEE/CAA J Autom Sin 2020; 7(1): 126–135.

Deng

. Frequency regulation of power systems with a wind farm by sliding-mode-based design. IEEE/CAA J Autom Sin 2022; PP: 1–10.

Bagheri

Karafyllis

Naseradinmousavi

, et al. Adaptive control of a two-link robot using batch least-square identifier. IEEE/CAA J Autom Sin 2021; 8(1): 86–93.

Zhang

Jing

Wang

. Bioinspired nonlinear dynamics-based adaptive neural network control for vehicle suspension systems with uncertain/unknown dynamics and input delay. IEEE Trans Ind Electron 2021; 68(12): 12646–12656.

Cortes

Shahbazi

Ménard

. UAV-LiCAM system development: Calibration and geo-referencing. In: The international archives of the photogrammetry, remote sensing and spatial information sciences, Karlsruhe, Germany: Copernicus Publications, 10–12 October 2018, pp.107–114.

Zhang

Pless

. Extrinsic calibration of a camera and laser range finder (improves camera calibration). In: 2004 IEEE/RSJ international conference on intelligent robots and systems (IROS), Sendai, Japan, 28 September–2 October 2004, pp.2301–2306. New York, NY: IEEE.

Liu

, et al. An algorithm for extrinsic parameters calibration of a camera and a laser range finder using line features. In: 2007 IEEE/RSJ international conference on intelligent robots and systems, San Diego, CA, USA, 29 October–2 November 2007, pp.3854–3859. New York, NY: IEEE.

10.

Park

Yun

Won

, et al. Calibration between color camera and 3D LIDAR instruments with a polygonal planar board. Sensors 2014; 14: 5333–5353.

11.

Kwak

Huber

Badino

, et al. Extrinsic calibration of a single line scanning lidar and a camera. In: 2011 IEEE/RSJ international conference on intelligent robots and systems, San Francisco, CA, USA, 25–30 September 2011, pp.25–30. New York, NY: IEEE.

12.

Jakob

. Calibration patterns explained, https://calib.io/blogs/knowledge-base/calibration-patterns-explained (2018).

13.

Zhang

Jing

. Adaptive neural network tracking control for double-pendulum tower crane systems with non-ideal inputs. IEEE Trans Syst Man Cybern Syst 2022; 52: 2514–2530.

14.

Sun

Zhan

She

, et al. Object detection from the video taken by drone via convolutional neural networks. Math Probl Eng 2020; 2020: 1–10.

15.

Hui

Jiang

, et al. A novel autonomous navigation approach for UAV power line inspection. In: Proceedings of the 2017 IEEE international conference on robotics and biomimetics (ROBIO), Macau, China, 5–8 December 2017. New York, NY: IEEE. pp. 634-639.

16.

Akbari Sekehravani

Babulak

Masoodi

. Implementing canny edge detection algorithm for noisy image. Bull Electr Eng Inform 2020; 9(4): 1404–1410.

17.

Asmaidi

Putra

Risky

, et al. Implementation of sobel method based edge detection for flower image segmentation. J Publ Inform Eng Res 2019; 3(2): 161–166.

18.

Yang

Price

Cohen

, et al. Object contour detection with a fully convolutional encoder-decoder network. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016, pp.193–202. New York, NY: IEEE.

19.

Gong

, et al. An overview of contour detection approaches. Int J Autom Comput 2018; 15(6): 656–672.

20.

Uijlings

Ferrar

. Situational object boundary detection. In: 2015 IEEE conference on computer vision and pattern recognition, Boston, MA, USA, 7–12 June 2015, pp.4712–4721. New York, NY: IEEE.

21.

Xie

. Holistically-nested edge detection. In: 2015 IEEE conference on computer vision (ICCV), Santiago, Chile, 7–13 December 2015, pp.1395–1403. New York, NY: IEEE.

22.

Dhall

Dai

Gool

. Real-time 3D traffic cone detection for autonomous driving. In: 2019 IEEE intelligent vehicles symposium, Paris, France, 9–12 June 2019, pp.494–501. New York, NY: IEEE.

23.

Tran

Becker

Grzechca

. Environment mapping using sensor fusion of 2D laser scanner and 3D ultrasonic sensor for a real mobile robot. Sensors 2021; 21(9): 1–20.

24.

Jang

Hyun

, et al. A lane-level road marking map using a monocular camera. IEEE/CAA J Autom Sin 2022; 9(1): 187–204.

25.

Endres

Hess

Sturm

, et al. 3-D mapping with an RGB-D camera. IEEE Trans Robot 2014; 30(1): 177–187.

26.

Athavale

Harish Ram

Nair

. Low cost solution for 3D mapping of environment using 1D LIDAR for autonomous navigation. IOP Conf Ser Mater Sci Eng 2019; 561: 1–7.

27.

Morikubo

Hashimoto

. Edge-based object tracking for dynamic projection mapping. In: 2018 International workshop on advanced image technology (IWAIT), Chiang Mai, Thailand, 7–9 January 2018, pp.1–4. New York, NY: IEEE.

28.

Kitayama

Touma

Hagiwara

, et al. 3D map construction based on structure from motion using stereo vision. In: International conference on informatics, electronics, and vision (ICIEV), Fukuoka, Japan, 15–18 June 2015, pp.1–5. New York, NY: IEEE.

29.

Llamazares

Molinos

Ocaña

, et al. 3D map building using a 2D laser scanner, computer aided systems theory. Berlin, Heidelberg: Springer, 2012.

30.

De Gregorio

Cavallari

Di Stefano

. SkiMap++: real-time mapping and object recognition for robotics. In: 2017 IEEE international conference on computer vision workshops (ICCVW), Venice, Italy, 22–29 October 2017, pp.660–668. New York, NY: IEEE.

31.

Ronneberger

Fischer

Brox

. U-net: convolutional networks for biomedical image segmentation. LNCS 2015; 9351: 234–241.

3D map building based on extrinsic sensor calibration method and object contour detector with a fully convolutional neural network

Abstract

Keywords

Introduction

A simple extrinsic sensor calibration method to obtain an object’s 3D coordinate information

Coordinate system definition

2D LiDAR coordinate system and robot coordinate system conversion

Simple extrinsic sensor calibration method

Transfer one LiDAR coordinate to camera pixel coordinate

Calculate proportion factor

Real-time object contour detection method based on a fully convolutional neural network

Image collection and labeling

Neural network architecture

Loss function and training

3D map building method

Calculate the 3D posture of the object

Calculate virtual points

Simulation and experiment

Extrinsic sensor calibration result

Real-time object contour detection result based on a fully convolutional neural network

3D map building based on extrinsic sensor calibration result and object contour detector result

3D map building including multiple-objects

Conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References