Boreas: A multi-season autonomous driving dataset

Abstract

The Boreas dataset was collected by driving a repeated route over the course of 1 year, resulting in stark seasonal variations and adverse weather conditions such as rain and falling snow. In total, the Boreas dataset includes over 350 km of driving data featuring a 128-channel Velodyne Alpha-Prime lidar, a 360° Navtech CIR304-H scanning radar, a 5MP FLIR Blackfly S camera, and centimetre-accurate post-processed ground truth poses. Our dataset will support live leaderboards for odometry, metric localization, and 3D object detection. The dataset and development kit are available at boreas.utias.utoronto.ca.

Keywords

Autonomous vehicle camera dataset GPS IMU lidar radar snow winter

1. Introduction

To date, autonomous vehicle research and development has focused on achieving sufficient reliability in ideal conditions such as the sunny climates observed in San Francisco, California, or Phoenix, Arizona. Adverse weather conditions such as rain and snow remain outside the operational envelope for many of these systems. Additionally, a majority of self-driving vehicles are currently reliant on highly accurate maps for both localization and perception. These maps are costly to maintain and may degrade as a result of seasonal changes. In order for self-driving vehicles to be deployed safely, these shortcomings must be addressed.

To encourage research in this area, we have created the Boreas dataset, a large multi-modal dataset collected by driving a repeated route over the course of 1 year. The dataset features over 350 km of driving data with stark seasonal variations and multiple sequences with adverse weather such as rain and falling snow. Our data-taking platform, shown in Figure 1, includes a 128-beam lidar, a 5 MP camera, and a 360° scanning radar. Globally consistent centimetre-accurate ground truth poses are obtained by post-processing global navigation satellite system (GNSS), inertial measurement unit (IMU), and wheel encoder data along with a secondary correction subscription. Our dataset will support benchmarks for odometry, metric localization, and 3D object detection.

Figure 1.

Our platform, Boreas, includes a Velodyne Alpha-Prime (128-beam) lidar, a FLIR Blackfly S camera, a Navtech CIR304-H radar, and an Applanix POS LV GNSS-INS.

This dataset may be used to study the effects of seasonal variation on long-term localization. Further, this dataset enables comparisons of vision, lidar, and radar-based mapping and localization pipelines. Comparisons may include the robustness of individual sensing modalities to adverse weather or the resistance to map degradation.

The main contributions of this dataset are as follows:

• Data collected on a repeated route over the course of 1 year including multiple weather conditions.

• A unique, high-quality sensor configuration including a 128-beam lidar and 360° radar.

• Post-processed GNSS/IMU data to provide accurate ground truth pose information.

• A live and open leaderboard for odometry, metric localization, and 3D object detection.

• 3D object labels collected in sunny weather.

2. Related work

Many of the published autonomous driving datasets focus on perception, particularly 3D object detection and semantic segmentation of images and lidar pointclouds. However, these datasets tend to lack variation in weather and season. Further, many of these datasets do not provide radar data. Automotive radar sensors are robust to precipitation, dust, and fog thanks to their longer wavelength. For this reason, radar may play a key role in enabling autonomous vehicles to operate in adverse weather. The Boreas dataset addresses these shortcomings by including a 360° scanning radar, and data taken during various weather conditions (sun, cloud, rain, night, and snow) and seasons.

Another significant fraction of datasets focus on the problem of localization, usually odometry. The Boreas dataset includes both a high-density lidar (128-beam) and a 360° scanning radar. The combination of these sensors and the significant weather variation contained in this dataset enables detailed comparisons between the localization capabilities of these two sensing modalities. This is something that previous datasets were not able to support due to either not having a radar sensor or insufficient weather variation. Furthermore, our post-processed ground truth poses are sufficiently accurate to support a public leaderboard for odometry and metric localization. Another dataset which focused on adverse weather is RADIATE (Sheeny et al., 2021). Whereas RADIATE focused on perception, our dataset focuses on localization. Our dataset is larger and includes repeated traversals of a route with higher-quality localization ground truth. Furthermore, our dataset provides higher-resolution radar, lidar, and camera data. For a detailed comparison of related datasets, see Table 1.

Table 1.

Related datasets. Lead: public leaderboard. Size: For perception datasets, size is given as the number of annotated frames and the number of annotations (3D boxes).

Name	Lead	Size	Camera	Lidar	Radar	GT	Night	Rain	Snow	Seasons
Perception
Apollo scape (Huang et al., 2018)	✓	144k	2 × 9.2MP	1 × 64C	✗	GPS/IMU	✓	✓	✗	✗
Apollo scape (Huang et al., 2018)	✓	70k boxes	2 × 9.2MP	1 × 64C	✗	GPS/IMU	✓	✓	✗	✗
Argoverse (Chang et al., 2019)	✓	22k	7 × 2.3MP + 2 × 5MP	2 × 32C	✗	GPS/IMU	✓	✗	✗	✗
Argoverse (Chang et al., 2019)	✓	993k boxes	7 × 2.3MP + 2 × 5MP	2 × 32C	✗	GPS/IMU	✓	✗	✗	✗
CADC (Pitropov et al., 2021)	✗	7.5k	8 × 1.3MP	1 × 32C	✗	GPS/IMU + RTK	✗	✗	✓	✗
CADC (Pitropov et al., 2021)	✗	372k boxes	8 × 1.3MP	1 × 32C	✗	GPS/IMU + RTK	✗	✗	✓	✗
KITTI (Object) (Geiger et al., 2013)	✓	15k	4 × 1.4MP	1 × 64C	✗	GPS/IMU + RTK	✗	✗	✗	✗
KITTI (Object) (Geiger et al., 2013)	✓	200k boxes	4 × 1.4MP	1 × 64C	✗	GPS/IMU + RTK	✗	✗	✗	✗
nuScenes (Caesar et al., 2020)	✓	40k	6 × 1.4MP	1 × 32C	✓(A)	GPS/IMU + Lidar Loc	✓	✓	✗	✗
nuScenes (Caesar et al., 2020)	✓	1.4M boxes	6 × 1.4MP	1 × 32C	✓(A)	GPS/IMU + Lidar Loc	✓	✓	✗	✗
RADIATE (Sheeny et al., 2021)	✗	44k	2 × 0.25MP	1 × 32C	✓(N)	GPS/IMU	✓	✓	✓	✗
RADIATE (Sheeny et al., 2021)	✗	200k boxes	2 × 0.25MP	1 × 32C	✓(N)	GPS/IMU	✓	✓	✓	✗
Waymo OD (Sun et al., 2020)	✓	230k	5 × 2.5MP	1(MR†)	✗	GPS/IMU	✓	✓	✗	✗
Waymo OD (Sun et al., 2020)	✓	12M boxes	5 × 2.5MP	4(SR†)	✗	GPS/IMU	✓	✓	✗	✗
Boreas-Objects-V1	✓	7.1k	1 × 5MP	1 × 128C	✓(N)	GPS/IMU	✗	✗	✗	✗
Boreas-Objects-V1	✓	320k boxes	1 × 5MP	1 × 128C	✓(N)	GPS/IMU	✗	✗	✗	✗
Localization
KITTI (Odometry) (Geiger et al., 2013)	✓	39km	4 × 1.4MP	1 × 64C	✗	GPS/IMU + RTK	✗	✗	✗	✗
KITTI (Odometry) (Geiger et al., 2013)	✓	22 seqs	4 × 1.4MP	1 × 64C	✗	GPS/IMU + RTK	✗	✗	✗	✗
Complex Urban (Jeong et al., 2019)	✗	451km	2 × 1.9MP	2 × 16C + 2 x 1C	✗	SLAM	✗	✗	✗	✗
Complex Urban (Jeong et al., 2019)	✗	40 seqs	2 × 1.9MP	2 × 16C + 2 x 1C	✗	SLAM	✗	✗	✗	✗
Oxford RobotCar (Maddern et al., 2017)	✗	1000km	3 × 1.2MP + 3 × 1MP	1 × 4C + 2 × 1C	✗	GPS/IMU + RTK	✓	✓	✗‡	✓
Oxford RobotCar (Maddern et al., 2017)	✗	100 seqs	3 × 1.2MP + 3 × 1MP	1 × 4C + 2 × 1C	✗	GPS/IMU + RTK	✓	✓	✗‡	✓
Oxford Radar (Barnes et al., 2020)	✗	280km	3 × 1.2MP + 3 × 1MP	2 × 32C + 2 × 1C	✓(N)	GPS/IMU + VO	✗	✓	✗	✗
Oxford Radar (Barnes et al., 2020)	✗	32 seqs	3 × 1.2MP + 3 × 1MP	2 × 32C + 2 × 1C	✓(N)	GPS/IMU + VO	✗	✓	✗	✗
MulRan (Kim et al., 2020)	✗	124km	✗	1 × 64C	✓(N)	SLAM	✗	✗	✗	✗
MulRan (Kim et al., 2020)	✗	12 seqs	✗	1 × 64C	✓(N)	SLAM	✗	✗	✗	✗
Boreas	✓	350km	1 × 5MP	1 × 128C	✓(N)	GPS/IMU + RTX	✓	✓	✓	✓
Boreas	✓	44 seqs	1 × 5MP	1 × 128C	✓(N)	GPS/IMU + RTX	✓	✓	✓	✓

GT: ground truth pose source. (A): automotive radar. (N): 360° Navtech radar. RTK (Real-Time Kinematic) uses a global positioning system (GPS) base station and differential measurements to improve GPS accuracy. RTX uses data from a global network of tracking stations to calculate corrections. This can be used to achieve cm-level accuracy without a base station (Applanix, 2022). †Waymo’s Mid-Range, Short-Range proprietary 3D lidar. ‡The Oxford RobotCar dataset contains one sequence with snow on the ground but that sequence has no falling snow.

3. Data collection

The majority of the Boreas dataset was collected by driving a repeated route near the University of Toronto over the course of 1 year. Figure 2 illustrates the seasonal variations that were observed over this time. Figure 3 compares camera, lidar, and radar measurements in three distinct weather conditions: falling snow, rain, and sun. The primary repeated route will be referred to as the Glen Shields route and is depicted in Figure 4. Additional routes were also collected as either a single standalone sequence or a small number of repeated traversals. The Glen Shields route can be used for research related to long-term localization while the other routes allow for experiments that test for generalization to previously unseen environments. The frequency of different metadata tags is displayed in Figure 5.

Figure 2.

This figure depicts 1 year of seasonal changes in the Boreas dataset. Each image represents a camera image that was taken on a different day. The sequences are sorted in chronological order from left to right and top to bottom, starting in November 2020 and finishing in November 2021. Note that the sequences are not evenly spaced in time.

Figure 3.

Weather variation in the Boreas dataset. Note that the lidar pointcloud becomes littered with detections associated with snowflakes during falling snow and that the radar data remains relatively unperturbed across the weather conditions.

Figure 4.

The Glen Shields route in Toronto, Ontario, Canada. Mapbox satellite data was used to generate this figure.

Figure 5.

Frequency of metadata tags in the Boreas dataset. Snow: snow is on the ground, snowing: it is actively snowing, alternate: a route other than Glen Shields.

4. Sensors

Table 2 provides detailed specifications for the sensors used in this dataset. Figures 6 and 7 illustrate the placement of the different sensors on Boreas.

Table 2.

Sensor specifications.

Sensor	Specifications
Applanix	• 2–4 cm RTX accuracy (RMS)^†
POS LV 220	• 200 Hz
Navtech CIR304-H	• 0.0438 m range solution^‡
Radar	• 0.9° horizontal resolution
	• 250 m range^‡
	• 4 Hz
FLIR Blackfly S	• 2448 × 2048 (5 MP)
Camera (BFS-U3-51S5C)	• 81° HFOV × 71° VFOV
	• 10 Hz
Velodyne	• 128 beams
Alpha-Prime	• 0.1° vertical resolution (variable)
Lidar	• 0.2° horizontal resolution
	• 360° HFOV × 40° VFOV
	• 300 m range (10% reflectivity)
	• ∼ 2.2 M points/s
	• 10 Hz

†Position accuracy changes over time as a function of the number of visible satellites. †These numbers represent expected accuracy in nominal conditions. ‡Our Navtech radar’s firmware was upgraded partway through the project; older sequences have a range resolution of 0.0596 m and a range of 200 m.

Figure 6.

A close-up view of Boreas’ sensor configuration.

Figure 7.

Boreas sensor placement. Distances are given in metres. Measurements shown are approximate. Refer to the calibrated extrinsics contained in the dataset for precise measurements.

5. Dataset format

5.1. Data organization

The Boreas dataset is divided into sequences, which include all sensor data and ground truth poses from a single drive. Sequences are identified by the date and time at which they were collected with the format boreas-YYYY-MM-DD-HH-MM. The data for each sequence is organized as shown in Figure 8.

Figure 8.

Data organization for a single Boreas sequence.

5.2. Timestamps

The name of each file corresponds to its timestamp. These timestamps are given as UNIX epoch times in microseconds. All sensor timestamps were synchronized to the coordinated universal time (UTC) time reported by the Applanix POS LV. The Velodyne lidar was synchronized using a standard hardwired connection to the Applanix POS LV carrying a pulse-per-second (PPS) signal and NMEA messages. The camera was configured to emit a square-wave pulse where the rising edge of each pulse corresponds with the start of a new camera exposure event. The Applanix POS LV was then configured to receive and timestamp these event signals. Camera timestamps were then corrected in post using the recorded event times and exposure values: t_camera = t_event + ½exposure(t_event).

The data-recording computer was synchronized to UTC time in a fashion similar to the Velodyne, using an RS-232 serial cable carrying a PPS signal and NMEA messages. The Navtech radar synchronizes its local clock using network time protocol (NTP). Since the data-recording computer publishing the NTP time is synchronized to UTC time, the radar is thereby also synchronized to UTC time.

For lidar pointclouds, the timestamp corresponds to the temporal middle of the scan. Each lidar point also has a timestamp associated with it. These point times are given in seconds relative to the middle of the scan. For radar scans, the timestamp also corresponds to the middle of the scan: ⌊M/2⌋ − 1 where M is the number of azimuths. Each scanned radar azimuth is also timestamped in the same format as the filename, a UNIX epoch time. A diagram of our synchronization setup is shown in Figure 9.

Figure 9.

Time synchronization of sensors on Boreas.

5.3. File formats

Camera images are rectified and anonymized by default. We use Anonymizer to blur license plates and faces (Understand, 2022). Images are stored in the commonly used png format. Lidar pointclouds are stored in a binary format to minimize storage requirements. Our devkit provides methods for working with these binary formats in both C++ and Python. Each point has six fields: [x, y, z, i, r, t] where (x, y, z) is the position of the point with respect to the lidar, i is the intensity of the reflected infrared signal, r is the ID of the laser that made the measurement, and t the point timestamp explained in Section 5.2. Raw radar scans are stored as 2D polar images: M azimuths × R range bins. We follow Oxford’s convention and embed timestamp and encoder information into the first 11 columns (bytes) of each polar radar scan. The first eight columns represent a 64-bit integer, the UNIX epoch timestamp of each azimuth in microseconds. The next two columns represent a 16-bit unsigned integer, the rotational encoder value. The next column is unused but preserved for compatibility with the Oxford format (see Barnes et al. (2020) for further details on the Navtech sensor and this file format). The polar radar scans can be readily converted into a top-down Cartesian representation, as shown in Figure 3, using our devkit.

Note that measurements are not synchronous as in other datasets (KITTI and CADC), which means that measurements with the same index do not have the same timestamp. However, given the timestamps and relative pose information, different sensor measurements can still be fused together. Lidar pointclouds are not motion-corrected, but we do provide methods for removing motion distortion in our devkit. Navtech radar scans suffer from both motion distortion and Doppler distortion. Burnett et al. (2021a) and Burnett et al. (2021b) provide methods to compensate for these effects.

6. Ground truth poses

Ground truth poses are obtained by post-processing GNSS, IMU, and wheel encoder measurements along with corrections obtained from an RTX subscription using Applanix’s POSPac software suite. Positions and velocities are given with respect to a fixed East-North-Up frame ENU_ref. The position of ENU_ref is aligned with the first pose of the first sequence (boreas-2020-11-26-13-58), but the orientation is defined to be tangential to the geoid as defined in the WGS-84 convention such that x points East, y points North, and z points up. For each sequence, applanix/gps_post_process.csv contains the post-processed ground truth in the Applanix frame at 200 Hz. We follow the convention used by Barfoot (2017) for describing rotations and 4 × 4 homogeneous transformation matrices. Each sensor frame’s ground truth is stored as a row in applanix/<sensor>_poses.csv with the following format: [t, x, y, z, v_x, v_y, v_z, r, p, y, ω_z, ω_y, ω_x] where t is the epoch timestamp in microseconds that matches the filename; $r_{e}^{s e} = {[x y z]}^{T}$ is the position of the sensor s with respect to ENU_ref as measured in ENU_ref; $v_{e}^{s e} = {[v_{x} v_{y} v_{z}]}^{T}$ is the velocity of the sensor with respect to ENU_ref; and (r, p, y) are the roll, pitch, and yaw angles, which can be converted into a rotation matrix between the sensor frame and ENU_ref. $ω_{s}^{s e} = {[ω_{x} ω_{y} ω_{z}]}^{T}$ are the angular velocities of the sensor with respect to ENU_ref as measured in the sensor frame. The pose of the sensor frame is then: $T_{e s} = [\begin{array}{c} C_{e s} & r_{e}^{s e} \\ 0^{T} & 1 \end{array}] \in SE (3)$ where C_es = C₁(roll)C₂(pitch)C₃(yaw) (Barfoot 2017). We also provide post-processed IMU measurements in applanix/imu.csv at 200 Hz in the Applanix frame that include linear acceleration and angular velocity.

The residual root mean square (RMS) position error reported by Applanix is typically less than 5 cm in nominal conditions but can be as high as 20–40 cm in urban canyons. Figure 10 shows the residual RMS errors resulting from the post-processing conducting by the Applanix POSPac software. The estimated error can change depending on the visibility of satellites. Note that these values represent global estimates and that relative pose estimates are more accurate over short time horizons.

Figure 10.

Post-processed RMS position, velocity, and orientation residual error versus time reported by Applanix’s POSPac software for a sequence collected on 2021-09-07.

7. Calibration

7.1. Camera intrinsics

Camera intrinsics are calibrated using MATLAB’s camera calibrator (Mathworks, 2022) and are recorded in camera0_intrinsics.yaml. Images located under camera/ have already been rectified. The rectified camera matrix P is stored in P_camera.txt. To project lidar points onto a camera image, we use the pose of the camera T_ec at time t_c and the pose of the lidar T_el at time t_l to compute a transform from the lidar frame to the camera frame given by $T_{c l} = T_{e c}^{- 1} T_{e l}$ . Each point in the lidar frame is then transformed into the camera frame with x_c = T_clx_l, where x_l = [x y z 1]^T. The projected image coordinates are then obtained using (Barfoot, 2017):

[\begin{array}{c} u \\ v \end{array}] = D P \frac{1}{z} x_{c}

(1)

where

D = [\begin{array}{c} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{array}], P = [\begin{array}{c} f_{u} & 0 & c_{u} & 0 \\ 0 & f_{v} & c_{v} & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array}]

(2)

7.2. Sensor extrinsics

The extrinsic calibration between the camera and lidar is obtained using MATLAB’s camera to lidar calibrator (Mathworks, 2022). The results of this calibration are illustrated in Figure 11. To calibrate the rotation between the lidar and radar, we use correlative scan matching via the Fourier Mellin transform (Checchin et al., 2010). Several lidar–radar pairs were collected while the vehicle was stationary at different locations. The final rotation estimate is obtained by averaging the results from several measurement pairs (Burnett, 2020). The translation between the lidar and radar is obtained from the computer assisted design (CAD) model of the roof rack. The results of the radar-to-lidar calibration are shown in Figure 12. The extrinsics between the lidar and the Applanix reference frame were obtained using Applanix’s in-house calibration tools. Their tool outputs this relative transform as a by-product of a batch optimization aiming to estimate the most likely vehicle path given a sequence of lidar pointclouds and post-processed GNSS/IMU measurements. All extrinsic calibrations are provided as 4 × 4 homogeneous transformation matrices under the calib/ folder.

Figure 11.

Lidar points projected onto a camera image using the camera–lidar calibration. (a) Lidar points are coloured based on their longitudinal distance from the vehicle. (b) Lidar points are given RGB colour values based on their projected location on the camera image.

Figure 12.

Lidar measurements are drawn in red using a bird’s eye view projection with the ground plane removed. Radar targets are first extracted from the raw radar data and then are drawn as blue pixels. The two sensors have been aligned using the radar-to-lidar calibration.

8. 3D Annotations

We provide a set of 3D bounding box annotations for a subset of the Boreas dataset, obtained in sunny weather. We refer to this as the Boreas-Objects-V1 dataset. Annotations were obtained using the Scale.ai data annotation service (Scale, 2022). In total, 7111 lidar frames were annotated at 5 Hz, resulting in 326,180 unique 3D box annotations. Since the lidar data was collected at 10 Hz, the annotations may be interpolated between frames to double the number of annotated frames at a slightly lower fidelity. The data is divided into 53 continuous scenes where each scene is 20–70 s in duration. The scenes are then divided into 37 training scenes and 16 test scenes where the ground truth labels have been withheld for the benchmark. Figure 13 displays two statistics for our annotations.

Figure 13.

3D annotation statistics for Boreas-Objects-V1.

We use the same folder structure as in Figure 8 but with an additional folder, labels/. Similar to KITTI, annotations for a particular frame are stored in a text file with the same filename (timestamp) as the lidar frame. Each row of a label file corresponds to a different 3D box annotation with the format: [uuid, type, d_x, d_y, d_z, x, y, z, yaw]. The uuid is a unique ID for a particular object track that is consistent across frames within a particular scene. The type is the semantic class for an object that can be one of: {Car, Cyclist, Pedestrian, Misc}. The Car class includes coupes, sedans, SUVs, vans, pick-up trucks, and ambulances. The Cyclist class includes people riding motorcycles but excludes parked bicycles. The Misc class includes other vehicle types such as buses, industrial trucks, streetcars, and trains. Objects are labelled within a rectangular area centred on the lidar +/− 75 m in both dimensions. Bounding box locations (x, y, z) and orientations (yaw) are given with respect to the lidar frame. (d_x, d_y, d_z) represent the bounding box dimensions (length, width, and height). Figure 14 shows an example of what our 3D object annotations look like for lidar, camera, and radar.

Figure 14.

Examples of 3D annotations in the Boreas-Objects-V1 dataset.

9. Benchmark metrics

At launch, we plan to support online leaderboards for odometry, metric localization, and 3D object detection. For odometry, we use the same metrics as the KITTI dataset (Geiger et al., 2013). The KITTI odometry metrics average the relative position and orientation errors over every sub-sequence of length (100 m, 200 m, 300 m, …, 800 m). This results in two metrics, a translational drift reported as a percentage of path length and a rotational drift reported as degrees per metre travelled. For 3D object detection, we also defer to the KITTI dataset by reporting the mean average precision (mAP) on a per-class basis. For cars, a 70% overlap counts as a true positive, and for pedestrians, 50%. These ratios are used as they are the same as what was used in the KITTI dataset. We do not divide our dataset based on difficulty levels.

The purpose of our metric localization leaderboard is to benchmark mapping and localization pipelines. In this scenario, we envision a situation where one or more repeated traversals of the Glen Shields route are used to construct a map offline. Any and all data from the training sequences may be used to construct a map in any fashion.

Then, during a test sequence, the goal is to perform metric localization between the live sensor data and the pre-built map. Localization approaches may make use of temporal filtering and can leverage the IMU if desired, but GNSS information will not be available. The goal of this benchmark is to simulate localizing a vehicle in real time and as such methods may not use future sensor information in an acausal manner.

Our goal is to support both global and relative map structures. Only one of the training sequences will be specified as the map sequence used by the benchmark. For 3D localization, users must choose either the lidar or camera as the reference sensor. For 2D localization, only the radar frames are used as a reference. For each (camera–lidar–radar) frame s₂ in the test sequence, users will specify the ID (timestamp) of the (camera–lidar–radar) frame s₁ in the map sequence that they are providing a relative pose with respect to: ${\hat{T}}_{s_{1}, s_{2}}$ . We then compute root mean squared error (RMSE) values for the translation and rotation as follows:

\begin{array}{c} T_{e} & = T_{a, s_{1}} T_{s_{1}, s_{2}} {\hat{T}}_{s_{1}, s_{2}}^{- 1} T_{a, s_{1}}^{- 1} = [\begin{array}{c} C_{e} & r_{e} \\ 0^{T} & 1 \end{array}] \end{array}

(3)

\begin{array}{c} r_{e} & = {[\begin{array}{c} x_{e} & y_{e} & z_{e} \end{array}]}^{T} \end{array}

(4)

\begin{array}{c} ϕ_{e} & = \arccos (\frac{tr C_{e} - 1}{2}) \end{array}

(5)

where

T_{s_{1}, s_{2}}

is the known ground truth pose and

T_{a, s_{1}}

is the calibrated transform from the sensor frame to the Applanix frame (x-right, y-forwards, z-up). x_e, y_e, z_e are then the lateral, longitudinal, and vertical errors, respectively. We calculate RMSE values for x_e, y_e, z_e, ϕ_e.

Users will also have the option of providing 6 × 6 covariance matrices Σ_i for each localization estimate. A pose with uncertainty is described as $T = \exp (ξ^{\land}) \bar{T}$ where $ξ \sim N (0, Σ)$ (Barfoot, 2017). Given ${\hat{T}}_{i} = {\hat{T}}_{s_{1}, s_{2}} (t_{i})$ , we compute an average consistency score c for the localization and covariance estimates:

\begin{array}{c} ξ_{i} & = \ln {(T_{i} {\hat{T}}_{i}^{- 1})}^{\lor} = {[\begin{array}{c} ρ_{1} & ρ_{2} & ρ_{3} & ψ_{1} & ψ_{2} & ψ_{3} \end{array}]}^{T} \end{array}

(6)

\begin{array}{c} c & = {(\sum_{i = 1}^{N} \frac{ξ_{i}^{T} Σ_{i}^{- 1} ξ_{i}}{N \dim (ξ_{i})})}^{1 / 2} \end{array}

(7)

A consistency score close to 1 is ideal. c < 1 means that the method is over-confident and c > 1 means that the method is conservative. Note that the above metrics will be averaged across the test sequences.

10. Development kit

As part of this dataset, we provide a development kit for new users to get started. The primary purpose of the devkit is to act as a wrapper around the dataset to be used in Python. This allows users to query frames and the associated ground truth for either odometry, localization, or 3D object detection. We also provide convenience methods for removing motion distortion from pointclouds, working with polar radar scans, and converting to and from Lie algebra and Lie group representations. The devkit also provides several ways to visualize sensor data. We also provide introductory tutorials in Jupyter notebooks that include projecting lidar onto a camera frame and visualizing 3D boxes. Evaluation scripts used by our benchmark will be stored in the devkit, allowing users to validate their algorithms before submission to the benchmark. The development kit can be found at boreas.utias.utoronto.ca.

11. Conclusion

In this paper, we presented Boreas, a multi-season autonomous driving dataset that includes over 350 km of driving data collected over the course of 1 year. The dataset provides a unique high-quality sensor suite including a Velodyne Alpha-Prime (128-beam) lidar, a 5MP camera, a 360° Navtech radar, and accurate ground truth poses obtained from an Applanix POS LV with an RTX subscription. We also provide 3D object labels for a subset of the Boreas data obtained in sunny weather. The primary purpose of this dataset is to enable further research into long-term localization across seasons and adverse weather conditions. Our website will provide an online leaderboard for odometry, metric localization, and 3D object detection.

Footnotes

Acknowledgements

We would like to thank Goran Basic for his help in designing and assembling the roof rack for Boreas. We also thank General Motors for their donation of the Buick vehicle. The Amazon Open Data Sponsorship program supports this project by hosting the Boreas dataset.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Keenan Burnett

Andrew Z Li

Angela P Schoellig

Timothy D Barfoot

References

Applanix (2022) www.applanix.com

Barfoot

(2017) State Estimation for Robotics. Cambridge: Cambridge University Press.

Barnes

Gadd

Murcutt

, et al. (2020) The Oxford Radar RobotCar dataset: A radar extension to the Oxford RobotCar dataset. In: IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 2020, pp. 6433–6438.

Burnett

(2020) Radar to Lidar Calibrator.

Burnett

Schoellig

Barfoot

(2021a) Do we need to compensate for motion distortion and doppler effects in spinning radar navigation? IEEE Robotics and Automation Letters 6(2): 771–778.

Burnett

Yoon

Schoellig

, et al. (2021b) Radar odometry combining probabilistic estimation and unsupervised feature learning. In: Robotics: Science and Systems (RSS). roboticsproceedings.org.

Caesar

Bankiti

Lang

, et al. (2020) nuScenes: a multimodal dataset for autonomous driving. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 11621–11631.

Chang

Lambert

Sangkloy

et al. (2019) Argoverse: 3d tracking and forecasting with rich maps. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 8748–8757.

Checchin

Gérossier

Blanc

, et al. (2010) Radar scan matching slam using the Fourier-Mellin transform. In: Field and Service Robotics. Berlin Heidelberg: Springer, pp. 151–161.

10.

Geiger

Lenz

Stiller

, et al. (2013) Vision meets robotics: The KITTI dataset. The International Journal of Robotics Research 32(11): 1231–1237.

11.

Huang

Cheng

Geng

, et al. (2018) The ApolloScape dataset for autonomous driving. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Salt Lake City, UT, USA, 2018, pp. 954–960.

12.

Jeong

Cho

Shin

, et al. (2019) Complex urban dataset with multi-level sensors from highly diverse urban environments. The International Journal of Robotics Research 38(6): 642–657.

13.

Kim

Park

Cho

, et al. (2020) MulRan: Multimodal range dataset for urban place recognition. In: IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May 2020 - 31 August 2020, pp. 6246–6253.

14.

Maddern

Pascoe

Linegar

, et al. (2017) 1 Year, 1000 km: the oxford robot car dataset. The International Journal of Robotics Research 36(1): 3–15.

15.

Mathworks (2022) mathworks.com

16.

Pitropov

Garcia

Rebello

, et al. (2021) Canadian adverse driving conditions dataset. The International Journal of Robotics Research 40(4–5): 681–690.

17.

Scale (2022) www.scale.com

18.

Sheeny

De Pellegrin

Mukherjee

, et al. (2021) RADIATE: a radar dataset for automotive perception in bad weather. In: IEEE International Conference on Robotics and Automation (ICRA), Xi'an, China, 2021, pp. 1–7.

19.

Sun

Kretzschmar

Dotiwalla

, et al. (2020) Scalability in perception for autonomous driving: waymo open dataset. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 2446–2454.

20.

Understand (2022) Understand: ai Anonymizer.