Fast and robust learned single-view depth-aided monocular visual-inertial initialization

Abstract

In monocular visual-inertial navigation, it is desirable to initialize the system as quickly and robustly as possible. A state-of-the-art initialization method typically constructs a linear system to find a closed-form solution using the image features and inertial measurements and then refines the states with a nonlinear optimization. These methods generally require a few seconds of data, which however can be expedited (less than a second) by adding constraints from a robust but only up-to-scale monocular depth network in the nonlinear optimization. To further accelerate this process, in this work, we leverage the scale-less depth measurements instead in the linear initialization step that is performed prior to the nonlinear one, which only requires a single depth image for the first frame. Importantly, we show that the typical estimation of all feature states independently in the closed-form solution can be modeled as estimating only the scale and bias parameters of the learned depth map. As such, our formulation enables building a smaller minimal problem than the state of the art, which can be seamlessly integrated into RANSAC for robust estimation. Experiments show that our method has state-of-the-art initialization performance in simulation as well as on popular real-world datasets (TUM-VI, and EuRoC MAV). For the TUM-VI dataset in simulation as well as real-world, we demonstrate the superior initialization performance with only a 0.3 s window of data, which is the smallest ever reported, and validate that our method can initialize more often, robustly, and accurately in different challenging scenarios.

Keywords

Visual-inertial odometry SLAM initialization depth learning minimal problem robust estimation

Get full access to this article

View all access options for this article.

References

Agarwal

Mierle

Team

TCS

(2023) Ceres Solver. URL https://github.com/ceres-solver/ceres-solver

Bayard

Conway

Brockers

, et al. (2019) Vision-based navigation for the nasa mars helicopter. AIAA Scitech 2019 Forum: 1411.

Bloesch

Burri

Omari

, et al. (2017) Iterated extended kalman filter based visual-inertial odometry using direct photometric feedback. The International Journal of Robotics Research 36(10): 1053–1072.

Burri

Nikolic

Gohl

, et al. (2016) The euroc micro aerial vehicle datasets. The International Journal of Robotics Research.

Campos

Montiel

Tardós

(2019) Fast and robust initialization for visual-inertial slam. In: 2019 International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 1288–1294.

Campos

Montiel

Tardós

(2020) Inertial-only optimization for visual-inertial initialization. In: 2020 IEEE International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 51–57.

Campos

Elvira

Rodríguez

JJG

, et al. (2021) ORB-SLAM3: an accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Transactions on Robotics 37(6): 1874–1890.

Camurri

Ramezani

Nobili

, et al. (2020) Pronto: a multi-sensor state estimator for legged robots in real-world scenarios. Frontiers in Robotics and AI 7: 68.

Chatfield

(1997) Fundamentals of High Accuracy Inertial Navigation. Reston: AIAA.

10.

Chen

Yang

Geneva

, et al. (2022) Visual-inertial-aided online mav system identification. Piscataway: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

11.

Concha

Burri

Briales

, et al. (2021) Instant visual odometry initialization for mobile ar. IEEE Transactions on Visualization and Computer Graphics 27(11): 4226–4235.

12.

Dong-Si

Mourikis

(2011) Closed-form solutions for vision-aided inertial navigation. In: Technical report, Dept. of Electrical Engineering. Riverside: University of California. URL: http://tdongsi.github.io/download/pubs/2011_VIO_Init_TR.pdf

13.

Dong-Si

Mourikis

(2012) Estimator initialization in vision-aided inertial navigation with unknown camera-imu calibration. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems). Piscataway: IEEE, 1064–1071.

14.

Eckenhoff

Geneva

Huang

(2019) Closed-form preintegration methods for graph-based visual-inertial navigation. The International Journal of Robotics Research 38(5): 563–586.

15.

Eisele

Song

Nelson

, et al. (2019) Visual-inertial guidance with a plenoptic camera for autonomous underwater vehicles. IEEE Robotics and Automation Letters 4(3): 2777–2784.

16.

Evangelidis

Micusik

(2021) Revisiting visual-inertial structure-from-motion for odometry and slam initialization. IEEE Robotics and Automation Letters 6(2): 1415–1422.

17.

Forster

Carlone

Dellaert

, et al. (2015) Imu preintegration on manifold for efficient visual-inertial maximum-a-posteriori estimation Robotics: Science and Systems XI, Daegu, Republic of Korea, July 10 - July 14, 2023.

18.

Geneva

Huang

(2022) Openvins State Initialization: Details and Derivations. Newark: University of Delaware. Available: https://pgeneva.com/downloads/reports/tr_init.pdf

19.

Geneva

Eckenhoff

Huang

(2019) A linear-complexity EKF for visual-inertial navigation with loop closures Proc. International Conference on Robotics and Automation, Montreal, Canada, 25-25 April 1997.

20.

Geneva

Eckenhoff

Lee

, et al. (2020) OpenVINS: a research platform for visual-inertial estimation. Proc. Of the IEEE International Conference on Robotics and Automation, Paris, France, 25-25 April 1997. https://github.com/rpng/open_vins.

21.

Hesch

Kottas

Bowman

, et al. (2013) Consistency analysis and improvement of vision-aided inertial navigation. IEEE Transactions on Robotics 30(1): 158–176.

22.

Hesch

Kottas

Bowman

, et al. (2014) Camera-imu-based localization: observability analysis and consistency improvement. The International Journal of Robotics Research 33(1): 182–201.

23.

Hruby

Duff

Leykin

, et al. (2022) Learning to solve hard minimal problems. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. . Piscataway: IEEE, 5532–5542.

24.

Huang

(2019) Visual-inertial navigation: a concise review, Proc. International Conference on Robotics and Automation, Montreal, Canada, 25-25 April 1997.

25.

Kaiser

Martinelli

Fontana

, et al. (2016) Simultaneous state initialization and gyroscope bias calibration in visual inertial aided navigation. IEEE Robotics and Automation Letters 2(1): 18–25.

26.

Leutenegger

Lynen

Bosse

, et al. (2015) Keyframe-based visual-inertial odometry using nonlinear optimization. The International Journal of Robotics Research 34(3): 314–334.

27.

Mourikis

(2013) High-precision, consistent ekf-based visual-inertial odometry. The International Journal of Robotics Research 32(6): 690–711.

28.

Mourikis

(2014) A convex formulation for motion estimation using visual and inertial sensorsIn: Proceedings of the Workshop on Multi-View Geometry, Held in Conjunction with RSS, Berkeley, CA, July, 2014.

29.

Liu

Nie

Hamid

(2022) Depth-guided sparse structure-from-motion for movies and tv shows. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle WA, USA, Jun 21st, 2024, 15980–15989.

30.

Lupton

Sukkarieh

(2012) Visual-inertial-aided navigation for high-dynamic motion in built environments without initial conditions. IEEE Transactions on Robotics 28(1): 61–76.

31.

Martinelli

(2011) Vision and imu data fusion: closed-form solutions for attitude, speed, absolute scale, and bias determination. IEEE Transactions on Robotics 28(1): 44–60.

32.

Martinelli

(2014) Closed-form solution of visual-inertial structure from motion. International Journal of Computer Vision 106(2): 138–152.

33.

Merrill

Geneva

Katragadda

, et al. (2023) Fast monocular visual-inertial initialization leveraging learned single-view depth. In: Proc. Robotics: Science and Systems (RSS), Delft, Netherlands, Jul 15 – Jul 19, 2024.

34.

Mourikis

Roumeliotis

(2007) A multi-state constraint Kalman filter for vision-aided inertial navigation. In: Proceedings of the IEEE International Conference on Robotics and Automation. Rome, Italy, 13 May - 17 May 2024, 3565–3572.

35.

Mur-Artal

Tardós

(2017a) ORB-SLAM2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Transactions on Robotics 33(5): 1255–1262.

36.

Mur-Artal

Tardós

(2017b) Visual-inertial monocular slam with map reuse. IEEE Robotics and Automation Letters 2(2): 796–803.

37.

Nistér

(2004) An efficient solution to the five-point relative pose problem. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(6): 756–770.

38.

Özaslan

Loianno

Keller

, et al. (2017) Autonomous navigation and mapping for inspection of penstocks and tunnels with mavs. IEEE Robotics and Automation Letters 2(3): 1740–1747.

39.

Qin

Shen

(2017) Robust initialization of monocular visual-inertial estimation on aerial robots. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE, 4225–4232.

40.

Qin

Shen

(2018) VINS-Mono: a robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics 34(4): 1004–1020.

41.

Ranftl

Lasinger

Hafner

, et al. (2022) Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(3): 1623–1637.

42.

Schubert

Goll

Demmel

, et al. (2018) The tum vi benchmark for evaluating visual-inertial odometry. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE, 1680–1687.

43.

Trawny

Roumeliotis

(2005) Indirect Kalman filter for 3D attitude estimation. In: Technical Report. Minnesota, USA: University of Minnesota, Dept. of Comp. Sci. & Eng.

44.

Usenko

Demmel

Schubert

, et al. (2019) Visual-inertial mapping with non-linear factor recovery. IEEE Robotics and Automation Letters 5(2): 422–429.

45.

Guo

Georgiou

, et al. (2017) VINS on wheels. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 5155–5162.

46.

Yang

Kang

Huang

, et al. (2024) Depth anything: unleashing the power of large-scale unlabeled data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 13-19 June 2020.

47.

Zhang

Scaramuzza

(2018) A tutorial on quantitative trajectory evaluation for visual (-inertial) odometry. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE, 7244–7251.

48.

Zhang

Gallego

Scaramuzza

(2018) On the comparison of gauge freedom handling in optimization-based visual-inertial state estimation. IEEE Robotics and Automation Letters 3(3): 2710–2717.

49.

Zhao

Zhou

Song

, et al. (2022) Dit-slam: real-time dense visual-inertial slam with implicit depth representation and tightly-coupled graph optimization. Sensors 22(9): 3389.

50.

Zhou

Kar

Turner

, et al. (2022) Learned monocular depth priors in visual-inertial initialization. In: European Conference on Computer Vision. Berlin: Springer Science+Business Media.

51.

Zuñiga-Noël

Moreno

Gonzalez-Jimenez

(2021) An analytical solution to the imu initialization problem for visual-inertial systems. IEEE Robotics and Automation Letters 6(3): 6116–6122.

52.

Zuo

Merrill

, et al. (2021) Codevio: visual-inertial odometry with learned optimizable dense depth. In: Proc. of the IEEE International Conference on Robotics and Automation. China: Xi’an.