Abstract
Navigating complex real-world environments requires understanding the semantic context and effectively making decisions. Existing solutions leave room for improvements: traditional reactive approaches that do not maintain a map often struggle in complex environments, map-dependent methods demand significant effort in mapping processes, and learning-based methods rely on large training datasets and face the difficulty of generalization. To address these challenges, we propose a novel visual semantic navigation framework that combines data-driven semantic understanding, Pareto-optimal decision-making, and image-space planning. Our approach uses a local environmental representation called navigability image, which allows the robot to assess immediate traversability without relying on apriori mapping or navigation data. Building on this, we introduce Pareto-Optimal Visual Navigation (POVNav), a decision-making framework in the image space that identifies appropriate subgoals, constructs collision-free paths, and generates control commands using visual servoing. This framework also supports selective navigation behaviors, such as avoiding traversable yet slippery grasslands to prevent getting stuck, by dynamically adjusting the navigability criteria within the local representation. POVNav is lightweight, operating solely with a monocular camera and without requiring map storage or training data collection, making it highly versatile for different robotic platforms and environments. Extensive year-round real-world experiments validated its efficacy in both structured indoor environments and unstructured outdoor settings, including dense forest trails and snow-covered roads. Field experiments using various image segmentation techniques demonstrated its robustness and adaptability across a wide range of conditions. Additionally, we demonstrate that POVNav successfully guides a robot through narrow pipes in a culvert inspection task. Overall, we showcase the utility of POVNav in real-world scenarios, highlighting its flexibility and computational efficiency for autonomous robots in complex environments.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
