Lane-level localization system using surround-view cameras adaptive to different driving conditions

Abstract

This article presents a lane-level localization system adaptive to different driving conditions, such as occlusions, complicated road structures, and lane-changing maneuvers. The system uses surround-view cameras, other low-cost sensors, and a lane-level road map which suits for mass deployment. A map-matching localizer is proposed to estimate the probabilistic lateral position. It consists of a sub-map extraction module, a perceptual model, and a matching model. A probabilistic lateral road feature is devised as a sub-map without limitations of road structures. The perceptual model is a deep learning network that processes raw images from surround-view cameras to extract a local probabilistic lateral road feature. Unlike conventional deep-learning-based methods, the perceptual model is trained by auto-generated labels from the lane-level map to reduce manual effort. The matching model computes the correlation between the sub-map and the local probabilistic lateral road feature to output the probabilistic lateral estimation. A particle-filter-based framework is developed to fuse the output of map-matching localizer with the measurements from wheel speed sensors and an inertial measurement unit. Experimental results demonstrate that the proposed system provides the localization results with submeter accuracy in different driving conditions.

Keywords

Lane-level localization deep learning surround-view cameras autonomous vehicle vision-based localization

Introduction

Vehicle self-localization is one of the essential functions in advanced driver-assistance systems. The most accessible localization technique is the global navigation satellite system (GNSS). However, GNSS cannot always have submeter accuracy when the signal is blocked by surrounding buildings. Much effort has been made to increase the localization accuracy, including map-matching methods based on light detection and ranging (LiDAR) or cameras. LiDAR-based approaches^1
–3 are able to achieve centimeter-level accuracy. Nevertheless, LiDAR is not feasible for mass deployment because of its high price currently. As cameras are relatively cheaper, and images contain abundant information, vision-based methods have gained a lot of attention.⁴

Despite decades of research, most vision-based methods can hardly perform self-localization adaptive to different driving conditions without much manual effort. Image-feature-based methods⁵ have poor performance during heavy traffic condition. Semantic-feature-based methods (such as lane, ground, and curb detection) gain much attention recent years. Rule-based methods^6,7 always have high accuracy for some specific driving conditions, but a perfect rule is difficult to design to suit for all kinds of real driving conditions, especially when many arrows exist, lanes are splitted, or vehicle is changing lanes. Deep-learning-based methods^8,9 extract features in diverse driving conditions. However, manual labeling in a large data set is required for most deep-learning-based methods. It costs much and limits the application of deep-learning-based methods on a large scale. In this article, we develop a cost-effective localization system adaptive to different driving conditions (see Figure 1).

Figure 1.

Illustrations of different driving conditions. (a) The ego vehicle (the red rectangle) is driving on a lane-splitting multi-lane road. Another vehicle (the yellow rectangle) is at the left side of the ego vehicle. (b) Arrows are painting on the ground. (c) The vehicle is changing lanes. There are many other driving conditions not shown here. It is difficult for rule-based methods to take into account all the real driving conditions.

In the proposed system, a map-matching localizer (MML) is proposed to estimate the probabilistic lateral position based on raw images from surround-view cameras and the lane-level road map. MML contains a sub-map extraction module, a perceptual model, and a map-matching model. The proposed system fuses the information from MML and low-cost sensors including inertial measurement unit (IMU), wheel speed sensors (WSS), and a GNSS receiver to provide the lane-level localization. The key contributions of our article are the following:

We devise a probabilistic lateral road feature (PLRF) as a sub-map which contains the lateral information of the lane-level map. PLRF is a general feature which requires no assumptions of road structures.

We design a deep learning network as the perceptual model to process raw images from surround-view cameras. Unlike conventional deep-learning-based methods, the perceptual model is trained by auto-generated labels from the map information to reduce manual effort.

We develop a particle-filter-based framework to fuse the results of the MML and the measurements from low-cost sensors. This framework provides robust lane-level localization results in different driving conditions.

Related work

Related work about map-based localization systems can be divided into three parts: the choice of maps, detection, and localization.

Map

Many different types of digital maps have been used for localization, including the roadway map,¹⁰ the 3D dense map,¹¹ the feature map,¹² and the lane-level road map.¹³ The roadway map, for example, OpenStreetMap, models roads as polygons without the information of width, so it is not sufficient for a localization method to achieve the lane-level accuracy. The 3D dense map contains many points that can construct the surface of surrounding objects with centimeter-level accuracy, but it requires too much storage when the area is large. The feature map is designed including features related to detection results. Users often have to self-generate them in advance, which is infeasible on a large scale. The lane-level road map contains the positions of lane lines, centerlines, and curbs, which does not require much storage and has lane-level information. Besides, many researchers focus on auto-generation of lane-level road maps.^14,15 Thus, we design our localization system using the lane-level road map.

A sub-map should be extracted for efficiently matching with observations. Typical sub-maps of lane-level road maps fit lane lines as splines,¹⁶ clothoids,¹⁷ or cubic polynomials,¹⁸ leading to inevitable errors when the shapes of lanes are irregular. In this article, we propose a PLRF that represent the distribution of lateral features without limitations of road structures.

Detection

Vision-based map-matching methods exploit visual features from images, for example, SIFT,¹⁹ SURF,²⁰ DIRD,²¹ ORB,²² CNN features,¹² and so on. However, these features suffer from heavy traffic conditions, since surrounding vehicles also have these local features which are noises for map matching. To provide robust localization, some researchers detect semantic features and then match them with the lane-level road map. Brown and Brennan⁶ presented mapped lane features for lateral localization using a mono camera. The mono camera is frequently blocked by front vehicles. To avoid the influence of occlusions, some researchers²³ utilize surround-view cameras: four fish-eye cameras mounted around the vehicle. Raw images from surround-view cameras have the strong distortion, and therefore, most methods use the composite bird’s-eye view synthesized from raw images. Dongwook Kim et al.⁷ presented a rule-based method that detects the lane lines and the stop lines in the bird’s-eye view image. Alexandru Gurghian et al.⁹ designed an end-to-end method to estimate lane positions in the bird’s-eye view image.

To generate bird’s-eye view image, it is supposed that the camera intrinsics and model are accurate and that the ground is flat. As it is hard to avoid the error of projection and undistortion, the features far from the cameras are always vague in the bird’s-eye view image. Therefore, most surround-view camera-based methods restrict the observation region of bird’s-eye view images to only detect 2 m or 3 m away from the ego vehicle (see Figure 2). However, it is important for lane-level localization system to get the observation of the curbs on multilane roads. For accurate features of the curb on the road, we design the perceptual model using raw images from surround-view cameras instead of bird’s-eye view image.

Figure 2.

An example of raw images and the bird’s-eye view image from surround-view cameras. (a) The front raw image. (b) The rear raw image. (c) The left raw image. (d) The right raw image. (e) The bird’s-eye view image. The valid region is inside the blue rectangle.

Localization

As noises and outliers are inevitable in the observed data, robust localization methods are developed for consistent position estimation. The Kalman filter²⁴ is the most widely used localization method on linear systems. To work on nonlinear systems, the extended Kalman filter²⁵ and the unscented Kalman filter²⁶ were developed. Optimization-based methods^27,28 have been demonstrated to have better performance than the Kalman filter and its variants on sophisticated nonlinear systems. Most localization methods suppose that all errors are Gaussian. Particle filters,^29,30 however, generate samples without the assumptions of the state-space model and the state distribution.

Lane-level localization system

In this section, we present the lane-level localization system which outputs the robust positioning result using surround-view cameras, other low-cost sensors, and a lane-level road map (see Figure 3).

Figure 3.

Framework of the lane-level localization system. The MML contains a sub-map extraction module, a perceptual model, and a matching model. The system fuses the result of the MML with measurements from other low-cost sensors based on particle filters. MML: map-matching localizer.

An MML is designed to provide the probabilistic lateral estimation as the observation. As the distribution of the MML outputs are non-Gaussian, we develop this system based on particle filters to fuse with the prediction from the IMU and WSS. The particle filter includes initialization, prediction, observation, weight update, resampling, and output estimation.

The steps of the proposed system are listed as follows:

The system initializes with several particles. Each particle contains a state of the vehicle and the probability of being this state.

Each particle predicts the motion estimation using the measurements of IMU and WSS.

The local PLRF is output from the perceptual model and matched with the sub-map to get the position observation.

The weight of the particles is updated by the distribution of observation.

The system decides if the resampling step should be performed, and outputs the posterior position estimation.

Initialization

An example of initialization is illustrated in Figure 4.

Figure 4.

The illustration of initial particles. The purple lines are the curbs of the road. The yellow line is the lane line of the road. The green dot represents position which cannot be observed in the system. The red dot represents the position from the low-cost GNSS. The white triangles indicate the position and direction of initial particles.

During initialization, we sample a number of particles close to the measurement from a low-cost GNSS receiver. Specifically, the state of the ith particle at time k, denoted as $x_{k}^{(i)}$ , contains the Cartesian XY coordinates $p x_{k}^{(i)}, p y_{k}^{(i)}$ , and the heading angle $θ_{k}^{(i)}$ . Each particle has its weight, $ω_{k}^{(i)}$ , showing the possibility that the ego vehicle is at the particle’s position. The GNSS coordinates are firstly transformed to local coordinates ${[p x_{gnss}, p y_{gnss}]}^{T}$ using universal transverse mercator coordinates. According to the accuracy of the low-cost GNSS receiver, we sample N particles within the valid region with the same weights. The initial direction of each particle is set to the direction of the road.

Prediction

At time $k, k > 0$ , we use IMU, WSS, and the last state to predict the current state based on dead reckoning (DR).³¹ We add white noises whose standard deviations are $σ_{x}, σ_{y}, σ_{θ}$ to the state prediction of the particles according to the accuracy of IMU and WSS. The position from a low-cost GNSS receiver is also used to determine the valid region. The particles out of the valid region are deleted while others keep their weights.

Map-matching localizer

The MML estimates the probabilistic lateral position using three main components: a sub-map extraction module, a perceptual model, and a matching model. At time step k, each particle gets a sub-map $m_{k}^{(i)}$ from the sub-map extraction module according to the prediction ${\hat{x}}_{k}^{(i)}$ . The perceptual model processes raw images from surround-view cameras to generate the feature $f_{obs, k}$ . The sub-map and the feature are given to the matching model to calculate the probabilistic lateral estimation.

Sub-map extraction

Given the predicted position of a particle and a lane-level road map, the sub-map extraction module outputs a sub-map as prior knowledge. Unlike 3D dense maps, a lane-level road map only requires low-cost sensors, small storage, and little manual effort.

As most open-source digital maps only contain information at the road level, we generate a lane-level road map¹⁵ including the precise position of road centerlines, lane lines, and curbs. Unlike typical sub-maps, we devise a PLRF as our sub-map. The PLRF (see Figure 5) is a mixed representation of lane lines and the road surface in the lateral direction without limitations of driving conditions.

Figure 5.

The illustration of a PLRF and a local PLRF. A PLRF is a vector which contains the data correlated with the probability of the existence of lane lines and road surface on a lateral line segment whose midpoint is on the road centerline. A local PLRF is a vector shorter than the PLRF, and it represents the data on a lateral line segment whose midpoint is a possible ego position. PLRF: probabilistic lateral road feature.

The sub-map is generated as follows.

Given a lane-level road map and the expected position of all the particles, we draw a line l on the road through the expected position, perpendicular to the centerline of the road. As the generated map contains all the road centerlines, the centerline of this road is extracted by searching the centerline nearest to the expected position. We denote the intersection of the line l and the road centerline as O. We extract a line segment l_O from l whose midpoint is O with a length of L ₀. Note that L ₀ should be greater than the width of all the roads in the map to guarantee that the l_O includes all the lateral information of the road. To simplify the computation, we discretize l_O as a list of equidistant points $s = {[s_{0},..., s_{M}]}^{T}$ . $P_{A} (s_{j} | h)$ denotes the probability that the point s_j is at the lane line h. h denotes the polyline stored in the lane-level map as the lane boundary. $P_{B} (s_{j})$ denotes the probability that the specific point s_j is on the road r. PLRF is a vector $f = {[f_{0},..., f_{M}]}^{T}$ , whose entry f_j is a value relative to the probability $P_{A} (s_{j} | h)$ and $P_{B} (s_{j})$ .

To compute PLRF, we estimate $P_{A} (s_{j} | h)$ and $P_{B} (s)$ according to the lane-level road map. There exist noises with 1–2 dm accuracy in our generated map, which is inevitable in most mapping methods. If the noise of the lane-level map is almost negligible, then we can set the probability $P_{A} (s_{j} | h)$ as a Dirac delta function. Considering this noise distribution from mapping, we set the probability $P_{A} (s_{j} | h)$ to follow the Gaussian distribution in the proposed system

P_{A} (s_{j} | h) \sim N (μ_{h}, σ_{h})

where $μ_{h}$ and $σ_{h}$ are the mean and standard deviation of the lane line h. The mean $μ_{h}$ and the standard deviation $σ_{h}$ are determined by the generation approach of the lane-level map.

The event B that a point is on the road is defined as follows:

If l intersects no curb, then all elements in s are on the road. It occurs at the crossroads.

If l intersects once the curb of the road, all elements in s at the right side of the left curb or all elements in s at the left side of the right curb are on the road. It occurs at a three-way junction.

If the line l intersects twice the curbs of this road and the road is straight here, then all elements in s between the two intersections are on the road. If the road is bending, we should treat the two curbs as the inner and outer curbs. The elements in s between one inner curb and one outer curb are on the road. If the line l may intersect twice the same curb, then the elements in s between the inner curb are off the road, but between the outer curb are on the road.

As we have defined the event B above, $P_{B} (s)$ is determined as follows: If s_j is on the road, then $P_{B} (s_{j})$ is set to 1. Otherwise, $P_{B} (s_{j})$ is set to zero. The element of the PLRF at the particle i is defined as

f_{j} = \sum_{h} P_{A} (s_{j} | h) + 2 P_{B} (s_{j}) - 1

It is noted from equation (2) that the position close to a lane line tends to have a larger value. If the position is on the road, it has a value not less than 1. Otherwise, the value is −1. Our PLRF mixes all the lateral characteristics in one vector to simplify further computations. The PLRF $f$ is defined as the sub-map $m_{k}$ .

Perceptual model

The perceptual model extracts a local PLRF, $f_{obs}$ , from raw images. The local PLRF (see Figure 5) is regarded as the PLRF that is observed by surround-view cameras of the ego vehicle. The differences between local PLRFs and PLRFs are the midpoint and the length of the line segment. The possible ego position is set as the midpoint of the local line segment $l_{obs}$ . The length of $l_{obs}$ is denoted as L ₁.

The task of estimating the local PLRF from images is formulated as a regression problem. Designing a rule-based algorithm to extract local PLRF from raw images is a challenging task as raw images of fish-eye cameras have large distortions. Since deep neural networks have outperformed traditional machine learning approaches in challenging tasks like classification and regression, we design a deep neural network shown in Figure 6 to estimate the local PLRF $f_{obs}$ .

Figure 6.

The perceptual model takes raw images from surround-view cameras as inputs and outputs a local PLRF using a deep neural network. The net contains four parallel subnetworks. Each subnetwork has the same structure as the convolutional layers of AlexNet to extract the features. Each subnetwork followed by a fully connected layer and then these outputs are concatenated together. The remaining three are fully connected. PLRF: probabilistic lateral road feature.

At time k, four raw images from surround-view cameras are cropped and resized to the color ones with a dimension being 210 (height) × 280 (width). These images are fed into four shared-weight subnetworks. We take layers from off-the-shelf CNNs and load the pretrained weights for better performance. Specifically, we use the first five layers of AlexNet in this article. Each subnetwork followed by a fully connected layer with 512 neurons and then outputs are merged. To output a local PLRF, we apply three fully connected layers with 512 neurons each. The LeakyReLU activation function in this neural network is with the negative slope of 0.1.

Unlike conventional deep-learning-based methods, our model requires no manual labels. Our instrumented vehicle is equipped with real time kinematic-global positioning system (RTK-GPS) to provide the ground truth position, and we generate lane-level map in the test region. Given the ground truth position and a lane-level road map, the true local PLRF for training can be auto-generated. First, the sub-map (PLRF) through the ground truth GT is extracted by equation (2). Then, we get the local PLRF by cropping the sub-map to change the midpoint to GT and shorten the length of the vector. Therefore, the training data set for the perceptual model is generated without much manual effort. We take least square errors as the loss function. Note that the accuracy of ground truth positions and the lane-level map are crucial to the performance of the perceptual model as they determine the accuracy of labels. If the lane-level road map is not accurate or noisy, the label may be influenced and then lead to a risky result. Therefore, the map should be checked before learning. If there is something wrong in the map, we should modify the map generation module or manually correct the map. It is supposed that the lane-level map is reliable and that it can be obtained from other sources such as the map company, so we do not introduce the method of map generation and maintenance in this article.

Matching model

The matching model (see Figure 7) estimates probabilistic lateral locations by computing the correlation between a local PLRF and a sub-map (PLRF). We adopt the idea of the correlation layer³² and define the probabilistic result as $c (m, f_{obs}) = [c_{0},..., c_{N_{m} - N_{obs}}]$ , which is computed by

\{\begin{array}{l} c_{i} = \frac{{c^{'}}_{i}}{\sum_{i} {c^{'}}_{i}} \\ {c^{'}}_{i} = \sum_{j = 0}^{N_{obs} - 1} m (i + j) f_{obs} (j) \end{array}

where N_m is the length of the sub-map m and $N_{obs}$ is the length of the local PLRF. Note that a sub-map is a PLRF which is set longer than a local PLRF, so N_m is greater than $N_{obs}$ . The length of the sub-map N_m should be set large enough to guarantee that the probabilistic result c contains all the possible locations.

Figure 7.

The matching model.

After the correlation, the probability of locations is discretely stored in a vector. To model the continuous probability distribution in the lateral direction, we use Gaussian mixture models (GMM) to fit the vector c to a small number of Gaussian distributions with weights. In this model, the expectation-maximization (EM) algorithm is used to fit GMM, $G = {g_{1}, g_{2},... g_{N_{1}}}$ , where g_j is a Gaussian distribution with the parameters of the mean $μ_{j}$ and the standard deviation $σ_{j}$ . N ₁ is the number of GMM according to the results of the EM algorithm.

Weight update

After the estimation of $x_{k}^{(i)}$ , we calculate the lateral and longitudinal position of the particle. The longitudinal position relative to the road ${plong}_{k}^{(i)}$ determines the sub-map. The lateral position relative to the road ${plat}_{k}^{(i)}$ is used to estimate the weight using the GMM output of the matching model. The weight is updated by

ω_{upd, k}^{(i)} = \frac{{\hat{ω}}_{k}^{(i)} \sum_{h} g_{k, h}^{(i)} ({plat}_{k}^{(i)})}{\sum_{j} {\hat{ω}}_{k}^{(j)} \sum_{h} g_{k, h}^{(j)} ({plat}_{k}^{(j)})}

Resampling and output estimation

Resampling is performed according to the effective number of particles³³ which is computed by

N_{eff} = \frac{1}{\sum_{i}^{N} {(ω_{upd, k}^{(i)})}^{2}}

If the $N_{eff}$ is less than a given threshold $N_{thr}$ , then resampling is performed: draw N new particles with the same weight $ω_{k}^{(i)} = \frac{1}{N}$ and the probability distribution of them are the same as that of old particles. Otherwise, $x_{k}^{(i)} = {\hat{x}}_{k}^{(i)}$ and $ω_{k}^{(i)} = ω_{upd, k}^{(i)}$ .

The output of the localization system is set to the weighted average of the particles.

Experiments

To evaluate our proposed system, we first generated the lane-level road map of the test region (see Figure 8) based on Ladybug3 spherical camera systems.¹⁵ In the whole area, we collected data several times in different days by our instrumented vehicle which is a Chery Tiggo car equipped with a Xsens MTi IMU, two WSS mounted on rear wheels, a low-cost GNSS receiver, and a RTK-GPS receiver. RTK-GPS was used to provide ground truth positions for evaluation. In the data-collection area, the accuracy of RTK-GPS is 2 cm. The Xsens MTi IMU is a nine-axis industrial-grade inertial sensor, and its yaw rate gyroscope is used for DR. We selected three regions to test the localization accuracy, while the data collected at the other regions were used for training the perceptual model of the MML.

Figure 8.

The areas where we generated a lane-level road map. Samples for testing were collected on the roads within the red rectangles. The regions in red rectangles are shown as enlarged views besides satellite images. Samples for training were collected in other regions for the perceptual model.

Given the length of local PLRFs, we set the parameters of the last six layers of the perceptual model. The first convolutional layer after the concatenate layer contains 384 filters of 3 × 3 with a stride of 1. The second convolutional layer contains 384 filters of 3 × 3 with a stride of 2. The last convolutional layer is same as the second one. The first two fully connected layers are with the size of 384, followed by the last one of $N_{obs} = 400$ .

The perceptual model was built upon PyTorch, and we used the pretrained weights of AlexNet convolutional layers to initialize the weights. We chose the Adam optimizer³⁴ with a learning rate of 0.0001 and the weight decay of 10⁻⁵. The batch size is set to 64. After 300,000 iterations, we stopped the training process. For comparison, we also build a CNN-based model using the mono image. The nets and parameters are almost same, except for the number of input subnetworks. The parameters for experiments are presented in Table 1.

Table 1.

Parameters for experiments.

	Unit	Value	Description
L ₀	m	56	Length of l_O
L ₁	m	20	Length of $l_{obs}$
N_m	–	1120	Length of a sub-map
$N_{obs}$	–	400	Length of a local PLRF
N	–	100	Number of particles
$N_{thr}$	–	60	Threshold for resampling
$σ_{h}$	m	0.15	Standard deviation of lane lines
$σ_{x}$	m	0.09	Standard deviation of x-axis
$σ_{y}$	m	0.09	Standard deviation of y-axis
$σ_{θ}$	rad	0.003	Standard deviation of heading
$R_{gnss}$	m	10	Radius of the valid region

PLRF: probabilistic lateral road feature.

The test data were used to evaluate the performance of both the perceptual model and the lane-level localization system. Note that the data for testing and training were collected in different regions (see Figure 8) to evaluate the generalization ability of the perceptual model to extract the features of lane lines, ground, and curbs. The detailed description of test regions is presented in Table 2. The plants distribute periodically and grow gradually on both sides of most roads in the test regions, so it is difficult to locate the vehicle by observing the vegetation. To reduce restrictions on the surroundings, our method only observes the lane lines and curbs as features. There are curves, lane splitting, and multi-lane roads in the test regions. The vehicle was driving on the test roads several times to collect data for different conditions of illumination and traffic. Note that the results in this test cannot be applied directly in new areas with entirely different roads. If we also apply this method in a new area, additional learning should be performed by adding new training data similar to this area. Nevertheless, the results in test regions are typical at this stage, as the data are collected in different conditions and can be used to evaluate the generalization ability on unseen roads. The processing system is a desktop with an Intel i7 CPU and an NVIDIA GTX TITAN X GPU. The proposed method is implemented using Python 2.7. The processing time of the proposed method is about 50 ms for one cycle.

Table 2.

The description of test regions.

	Length	# Lanes	Condition
Region 1	0.6 km	2	Curves
Region 2	0.6 km	4	Multi-lane roads
Region 3	0.3 km	2, 3, 4	Lane-splitting roads

If we take the position with the maximum probability as the output, the results of all the test data in three test region is shown in Figure 9, and the lateral positioning root mean square (RMS) errors are listed in Table 3. It is noted that the MML has better performance to use surround view images than the mono image. The success rate is listed in Table 4, which defines the inlier percentage of localization results. In the first and second regions, the success rate using surround-view images is higher than that using the mono image. Although in the third region, the success rate using the surround-view images is lower than that using the mono image, its results have a smaller RMS error.

Figure 9.

The lateral error distribution of the MML in the test regions. (a) The input is the mono image. (b) The input is surround-view images. MML: map-matching localizer.

Table 3.

Lateral RMS error (m).

	Region 1	Region 2	Region 3
Mono	0.249	0.286	1.651
Surround view	0.134	0.129	0.116

RMS: root mean square.

Table 4.

Success rate (%).

	Region 1	Region 2	Region 3
Mono	96.86	92.74	98.79
Surround view	98.33	94.07	92.81

Figure 10 shows the accurate results in different driving conditions. In this figure, the sub-map at the position of ground truth is extracted so that the matching result evaluates the performance of the perceptual model. It is evident that the matching results show a large probability near ground truth, which demonstrates that the perceptual model provides effective local PLRFs adaptive to different driving conditions. Figure 10(a) shows that the car is changing the lane. The second case in Figure 10(b) is that the white arrows are on the ground. The car is driving into the sun before sunset in Figure 10(c). These cases are challenging to common rule-based lane-detection methods. Without extra effort, our deep-learning-based method overcomes this challenge, both the outputs using mono images and surround-view images are accurate.

Figure 10.

Inputs and results in different driving conditions including the input images of the perceptual models, the local vector map, and the lateral localization errors from the MML. In the local vector map, the rectangle filled in green represents the ground truth of the ego vehicle, blue lines are curbs, and yellow or white lines represent lane lines. Although these conditions are difficult for most existing methods, the proposed method is still accurate. (a) The car is changing the lane on multi-lane roads. (b) Arrows on the ground of multi-lane roads. (c) The car is driving into the sun just before sunset. MML: map-matching localizer.

Figure 11 shows the cases when the method using surround-view images outperforms that using the mono image. Figure 11(a) shows that the car is driving on the curve. Figure 11(b) shows a lane splitting case. In these two cases, the localization result using surround view images is still accurate, however, the result using the mono image does not have a large probability near the ground truth. That is because the mono image cannot capture the lateral features exactly at the left and right side of the vehicle. Using the mono image, the output of the perceptual model is confusing when the road condition is changing.

Figure 11.

Inputs and results in different driving conditions. Although the driving condition is typical and the image quality is good, the method using mono image is not as accurate as that using surround view images. (a) The car is on the curve. (b) Three lanes after lane splitting.

Figure 12 shows typical failure cases near the intersections. PLRF contains no more than one curb in these cases, so the features are mainly determined by the lane lines. As most lane lines on the test roads periodically distributed, the results will have multi-modal noise. If the localization fails near the intersection, the error may be a multiple of the lane width. Figure 13 shows the probabilistic lateral error from a surround-view image sequence in the first region. It is noted that both the maximum probabilistic position and the expectation are far from the ground truth at about 20 s when the car is at the T-shaped intersection (see Figure 12(b)).

Figure 12.

Inputs and results near the intersections. Both methods fail with a low probability at ground truth position. (a) The car is near the crossroads. (b) The car is at the T-shaped intersection.

Figure 13.

The MML outputs from a surround view image sequence in the first test region. The red color indicates the probability of the ego vehicle at that time. At 20 s, the localization result has a large noise distribution. To have robust positioning estimation, we adopt particle filters to fuse these results with the measurements from IMU and WSS. MML: map-matching localizer; IMU: inertial measurement unit; WSS: wheel speed sensors.

To have robust positioning estimation, the proposed system fuses the results of MML with the prediction from IMU and WSS. The features on multi-lane roads are always periodic, so the noise can be under the multimodal distribution. Particle filters are adopted as the noises of MML is non-Gaussian. To compare with the proposed system, we also implement the Kalman filter. In the Kalman filter-based system, we use Gaussian distribution to approximate the observation probability. In the particle filter-based system, the particles update their weights by equation (4).

Figure 14 shows the result of localization systems from a sequence in the first test region. We can see that the system based on particle filters shows the better accuracy than that based on the Kalman filter, which demonstrate that the system based on particle filters is more robust to different driving conditions.

Figure 14.

The localization result of the systems. The system based on particle filters shows the better robustness than that based on the Kalman filter.

The results from numerous test cases verify that the localization method has a submeter accuracy even in different driving conditions. The accuracy is related to noises of the lane-level road map, as the map determines the PLRF and auto-generated labels. Most surround-view camera-based localization systems only provide the precise positioning estimation in the ego lane but cannot determine the lane on multi-lane roads, as they use the bird’s-eye view image from surround-view cameras. Some rule-based methods achieve centimeter-level accuracy in specific driving conditions. However, rule-based methods could meet challenges in diverse driving conditions. Conventional deep-learning-based methods may gain robust and accurate results, but their labels require much manual effort. Our proposed system is quite feasible with submeter accuracy in different driving conditions, which helps lane-level applications on a large scale.

Conclusions

In this work, we proposed a lane-level localization system adaptive to different driving conditions. The proposed system used surround-view cameras, other low-cost sensors and a lane-level road map. We proposed an MML that provided the probabilistic lateral estimation. In the MML, we devised the PLRF as the sub-map. The CNN-based perceptual model extracted a local PLRF from raw images captured by surround-view cameras. The matching model calculates the correlation between a sub-map and a local PLRF and outputs the probabilistic lateral estimation. The proposed localization system integrated the output of MML with the measurements from IMU, WSS, and a low-cost GNSS receiver based on particle filters. The proposed system is more feasible than common rule-based methods, as rule-based methods can hardly find out a perfect rule for diverse driving conditions. Besides, the proposed system requires little manual effort as the perceptual model was trained using auto-generated labels, unlike conventional deep-learning-based method. We evaluated the proposed system using data collected in different driving conditions. Experimental results demonstrated that the performance of MML is reliable in different driving conditions and that the localization method achieved submeter accuracy even in complicated situations.

The localization error of the proposed system is inevitable due to the noises of the lane-level road map. Future work could improve the localization accuracy by fusing with more observations.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the National Natural Science Foundation of China under Grant U1764264/61873165, in part by the International Chair on Automated Driving of Ground Vehicle, and in part by the Shanghai Automotive Industry Science and Technology Development Foundation under Grant 1733/1807.

ORCID iD

Tianyi Li

References

Baldwin

Newman

Road vehicle localization with 2D push-broom LIDAR and 3D priors. In: 2012 IEEE International Conference on Robotics and Automation, Saint Paul, Minnesota, USA, 14 May 2012, pp. 2611–2617. IEEE.

Wolcott

Eustice

. Robust LIDAR localization using multiresolution Gaussian mixture maps for autonomous driving. Int J Robot Res 2017; 36(3): 292–319.

Basaca-Preciado

Sergiyenko

Rodrguez-Quinonez

, et al. Optical 3D laser measurement system for navigation of autonomous mobile robot. Opt Laser Eng 2014; 54: 159–169.

Sergiyenko

Tyrsa

Flores-Fuentes

, et al. Machine vision sensors. J Sensor 2018; 2018. https://doi.org/10.1155/2018/3202761.

Lategahn

Stiller

. Vision-only localization. IEEE Trans Intell Transp Syst 2014; 15(3): 1246–1257.

Brown

Brennan

. Lateral vehicle state and environment estimation using temporally previewed mapped lane features. IEEE Trans Intell Transp Syst 2015; 16(3): 1601–1608.

Kim

Chung

, et al. Lane-level localization using an AVM camera for an automated driving vehicle in urban environments. IEEE/ASME Trans Mech 2017; 22(1): 280–290.

Gopalan

Hong

Shneier

, et al. A learning approach towards detection and tracking of lane markings. IEEE Trans Intell Transp Syst 2012; 13(3): 1088–1098.

Gurghian

Koduri

Bailur

, et al. Deeplanes: end-to-end lane position estimation using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, 26 June–1 July 2016, pp. 38–45. IEEE.

10.

Haklay

Weber

. Openstreetmap: user-generated street maps. IEEE Pervasive Comput 2008; 7(4): 12–18.

11.

McCormac

Handa

Davison

, et al. Semanticfusion: dense 3D semantic mapping with convolutional neural networks. In: 2017 IEEE International Conference on Robotics and automation (ICRA), IEEE, Singapore, 29 May 2017, pp. 4628–4635. IEEE.

12.

Zhu

Tian

, et al. Visual place recognition in long-term and large-scale environment based on CNN feature. In: 2018 IEEE Intelligent Vehicles Symposium (IV), IEEE, Changshu, Suzhou, China, 26–30 June 2018 pp. 1679–1685. IEEE.

13.

Joshi

James

. Generation of accurate lane-level maps from coarse prior maps and lidar. IEEE Intell Transp Syst Mag 2015; 7(1): 19–29.

14.

Guo

Kidono

Meguro

, et al. A low-cost solution for automatic lane-level map generation using conventional in-car sensors. IEEE Trans Intell Transp Syst 2016; 17(8): 2355–2366.

15.

Yang

, et al. Panorama-based multilane recognition for advanced navigation map generation. Math Probl Eng 2015; 2015: 1–14.

16.

Aly

Real time detection of lane markers in urban streets. In: 2008 IEEE Intelligent Vehicles Symposium, Eindhoven, the Netherlands, 4–6 June 2008, pp. 7–12. IEEE.

17.

Gackstatter

Heinemann

Thomas

, et al. Stable road lane model based on clothoids. Advanced microsystems for automotive applications. Berlin: Springer, 2010. pp. 133–143.

18.

Loose

Franke

Stiller

Kalman particle filter for lane recognition on rural roads. In: 2009 IEEE Intelligent Vehicles Symposium, Xi’an, Shaanxi, China, 3 June 2009, pp. 60–65.

19.

Lowe

. Distinctive image features from scale-invariant keypoints. Int J Comput Vis 2004; 60(2): 91–110.

20.

Bay

Tuytelaars

Van Gool

. Surf: speeded up robust features. In: European conference on computer vision. Springer, Berlin, Heidelberg, 7 May 2006, pp. 404–417. Springer.

21.

Lategahn

Beck

Stiller

. DIRD is an illumination robust descriptor. In: 2014 IEEE Intelligent Vehicles Symposium Proceedings, Dearborn, MI, USA, 8–11 June 2014, pp. 756–761. Springer.

22.

Rublee

Rabaud

Konolige

, et al. Orb: an efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011, pp. 2564–2571. Springer.

23.

Zhang

Appia

Pekkucuksen

, et al. A surround view camera solution for embedded systems. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 2014, pp. 662–667. Springer.

24.

Gruyer

Belaroussi

Revilloud

. Accurate lateral positioning from map data and road marking detection. Expert Syst Appl 2016; 43: 1–8.

25.

Julier

Uhlmann

. New extension of the Kalman filter to nonlinear systems. In: Signal processing, sensor fusion, and target recognition VI, volume 3068. International Society for Optics and Photonics, 28 July 1997, pp. 182–194. Society of Photo Optical.

26.

Wan

Van Der Merwe

The unscented Kalman filter for nonlinear estimation. In: Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium, Lake Louise, Alberta, 1–4 October 2000, pp. 153–158. Society of Photo Optical.

27.

Strasdat

Montiel

Davison

. Visual slam: why filter? Image Vision Comput 2012; 30(2): 65–77.

28.

Huang

Muffert

, et al. Precise pose graph localization with sparse point and lane features. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, Vancouver, BC, Canada, 24–28 September 2017, pp. 4077–4082. Society of Photo Optical.

29.

Yang

Zhou

, et al. A robust terrain-based road vehicle localization algorithm. In: 2016 IEEE Intelligent Vehicles Symposium (IV), Piscataway, NJ, 19–June 2016, pp. 915–920.

30.

Bonnifait

Ibanez-Guzman

, et al. Lane-level map-matching with integrity on high-definition maps. In: 2017 IEEE Intelligent Vehicles Symposium (IV), IEEE, Redondo Beach, CA, USA, 11–14 June 2017, pp. 1176–1181. Society of Photo Optical.

31.

Kao

WW.

Integration of GPS and dead-reckoning navigation systems. In: Vehicle Navigation and Information Systems Conference, volume 2, IEEE, 20 October 1991, pp. 635–643.

32.

Dosovitskiy

Fischer

Ilg

, et al. Flownet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, 26 April 2015, pp. 2758–2766.

33.

Liu

Tang

. Adaptive resampling in particle filter based on diversity measures. In: 2010 5th International Conference on Computer Science & Education, Hefei, China, IEEE, 24–27 August 2010, pp. 1474–1478.

34.

Kingma

. Adam: a method for stochastic optimization. arXiv preprint arXiv:14126980 2014.