Sage Journals: Discover world-class research

Abstract

This paper presents the design, implementation, and experimental validation of a compact Ackermann-steered mobile robot platform developed for autonomous navigation and diverse service applications. The platform integrates dual stereo cameras and an inertial measurement unit to ensure robust localization, alongside a pan-tilt RGB-D camera designed for active perception tasks. Specifically, the dual stereo cameras mounted at the front and rear provide bidirectional visual odometry, which is fused with inertial measurements via an extended Kalman filter to achieve accurate, drift-reduced state estimation. Furthermore, the pan-tilt actuator enables dynamic viewpoint control, facilitating robust object tracking and dense three-dimensional (3D) perception. The system is built upon the Robot Operating System 2 framework, providing a modular and extensible architecture that allows for seamless integration with various perception and navigation algorithms. Experimental evaluations—including velocity tracking, dual visual odometry benchmarking, mapless navigation, and real-time 3D mapping—validate the effectiveness and versatility of the proposed platform in outdoor environments.

Keywords

Ackermann steering autonomous navigation dual visual odometry mobile robot platform

Introduction

Mobile robots have been widely adopted across domains such as delivery, infrastructure inspection, and environmental monitoring.^1–3 As the demand for compact, cost-effective, and adaptable robotic platforms continues to grow, recent research has focused on developing systems capable of robust navigation and enhanced situational awareness, particularly in global positioning system (GPS)-denied or complex outdoor environments.^4–6

Among various locomotion schemes, Ackermann steering mechanisms are particularly advantageous in outdoor paved settings due to their low-slip motion, mechanical simplicity, and compatibility with car-like kinematic constraints. However, navigation in such environments often faces challenges when Global Navigation Satellite System (GNSS) signals are unreliable or unavailable. This necessitates advanced onboard perception and localization strategies capable of operating independently of external infrastructure.

To address these limitations, we propose a compact and versatile Ackermann-steered mobile robot platform featuring stereo vision-based visual odometry (VO), inertial sensing, and active perception modules. The system employs front and rear stereo cameras to estimate bidirectional VO, which is fused with inertial measurement unit (IMU) data via an extended Kalman filter (EKF) to achieve drift-compensated localization. This approach enhances robustness in visually degraded or occluded environments by ensuring consistent pose estimation during bidirectional maneuvers. Furthermore, a pan-tilt RGB-D camera provides active three-dimensional (3D) perception, enabling the platform to serve as a testbed for dynamic visual tracking algorithms.

Built upon the Robot Operating System 2 (ROS 2) framework, the platform offers a modular and extensible software architecture for perception, planning, and control.⁷ While a complete end-to-end software stack is provided, individual modules are designed for compatibility with standard ROS 2 packages, facilitating the flexible integration and evaluation of various algorithms. This design ensures the platform functions effectively both as a standalone system and as a research testbed for benchmarking diverse perception and navigation strategies.

This paper outlines the hardware and software architecture of the platform and validates its performance through experiments on velocity control, dual VO-based localization, mapless navigation, and 3D mapping. The experimental results confirm the system’s effectiveness and versatility, demonstrating its suitability for research and outdoor service applications, including autonomous driving, inspection, and patrol missions.

Related work

Mobile robotic platforms utilize diverse drivetrain configurations, each tailored to specific mobility requirements and operational environments. This section categorizes prior research based on drivetrain mechanisms—specifically differential drive, skid steering, and omnidirectional systems—and examines their system characteristics. Subsequently, we highlight the limitations of existing platforms in outdoor service scenarios and contrast them with the proposed Ackermann-steered platform, which features high-performance computing, dual VO, and active perception capabilities.

Differential drive and skid steering platforms

Differential drive and skid steering mechanisms are widely adopted in mobile robotic platforms due to their mechanical simplicity, compact form factor, and ease of control. These configurations are particularly advantageous for cost-sensitive applications and are prevalent in service robots, educational kits, and research prototypes.

In practical deployments, skid-steered platforms have demonstrated efficacy in ground-surface exploration, spraying, and agricultural tasks. For instance, the Warthog platform by Clearpath Robotics was customized to support both ground and subsurface imaging through the integration of LiDAR and microwave antennas, validating its field readiness in both real-world and simulated experiments.⁸ Similarly, platforms equipped with pan-tilt actuators and magnetic guidance systems have been developed specifically for greenhouse spraying applications.⁹

Conversely, differential drive systems have been extensively employed in research and educational settings. Their low mechanical complexity and modular design make them well-suited for algorithm validation and control education. The UiAbot platform exemplifies this trend by integrating LiDAR and an IMU within a ROS 2-based control architecture, enabling simultaneous localization and mapping (SLAM) and autonomous navigation in indoor environments.¹⁰ Another compact differential drive platform has been introduced as a flexible testbed for research-level control and autonomy experiments.⁷

Despite their widespread utility, these platforms possess inherent kinematic limitations. Both differential drive and skid steering systems are prone to significant wheel slippage, which degrades odometry accuracy and results in trajectory deviations. Furthermore, generalizing their motion models across diverse terrains remains challenging, limiting their performance in unstructured or inclined environments.

Omnidirectional platforms

Omnidirectional mobile robots, typically employing Mecanum or omni wheels, offer holonomic mobility characterized by independent translation and rotation capabilities. This maneuverability renders them particularly advantageous in constrained indoor environments, such as smart factories and logistics warehouses, where precise movement within tight spaces is essential. Consequently, numerous studies have leveraged this mobility to facilitate autonomous navigation, manipulation, and human–robot interaction (HRI) tasks.

In industrial contexts, omnidirectional platforms have been engineered with modular drive-suspension units to accommodate variable payloads while ensuring agile motion.^11,12 To support robust navigation in cluttered spaces, these systems frequently employ multi-sensor fusion, integrating LiDAR, IMU, and VO to achieve accurate localization.¹³ Furthermore, lightweight onboard computing units running ROS 2 have been widely adopted to enable real-time SLAM and autonomous control.¹⁴ Beyond autonomous operations, such platforms are also utilized in HRI scenarios; for instance, variable-admittance models have been applied to decouple compliant behavior in intended and unintended motion directions, thereby ensuring intuitive and safe collaboration with human operators.¹⁵

While omnidirectional mobility provides distinct benefits in structured indoor settings, these advantages often come at the cost of mechanical robustness and traction efficiency. Due to the complexity of the wheel mechanisms and reduced stability on uneven surfaces, such platforms are generally less suitable for high-speed or outdoor operations compared to conventional steering configurations.

Swerve-drive and swivel-based platforms

Swerve-drive platforms enable holonomic motion by independently controlling the steering and propulsion of each wheel module, providing superior maneuverability for precise navigation in dense or dynamic environments. Consequently, numerous swerve and swivel-based architectures have been proposed as effective solutions to achieve agile planar motion.

For instance, the RoMop platform employs a swivel-based mechanism that facilitates multidirectional translation through continuous platform rotation and passive planar linkages, thereby mitigating kinematic singularities.¹⁶ Similarly, the S-bot utilizes a solar-powered, remotely actuated swerve-drive system designed to support agile, long-range transport tasks in outdoor logistics environments.¹⁷

Research on three-wheeled swerve platforms has demonstrated high-speed trajectory tracking using spline-based motion planning, achieving reduced wheel slip and lower position errors compared to omni-wheel systems.^18,19 Furthermore, holonomic platforms based on powered casters, such as TidyBot++, have been developed to enhance mobility in household settings, enabling learning-based manipulation through intuitive tele-operation.²⁰

However, despite their mechanical versatility, swerve-drive systems typically necessitate multiple actuators per wheel and require sophisticated multi-level control architectures. This inherent complexity increases hardware costs, computational loads, and energy consumption. Moreover, ensuring precise synchronization between wheel steering and propulsion poses significant challenges for real-time control, particularly under dynamic conditions or on uneven terrain. These factors often constrain their practicality for low-cost, long-duration, or power-limited applications.

Ackermann-steered platforms

Ackermann steering geometry is extensively utilized in autonomous ground vehicles due to its kinematic efficiency and suitability for high-speed, low-slip motion on structured surfaces. Unlike omnidirectional or skid-steered mechanisms, Ackermann geometry minimizes tire scrub and facilitates smoother turning dynamics, making it well-suited for applications demanding precise maneuvering and energy efficiency.

Significant research has been dedicated to trajectory planning and control strategies tailored to the non-holonomic constraints of Ackermann-steered robots. For power-constrained applications such as planetary exploration and border patrol, energy-aware path planning approaches have been proposed, achieving notable energy savings through motion primitives consistent with Ackermann kinematics.²¹ In agricultural contexts, ROS-based navigation stacks combining real-time trajectory optimization with non-smooth pose regulation have demonstrated robust obstacle avoidance.²² Additionally, for off-road scenarios, trajectory planning frameworks incorporating terrain traversability and 3D pose dynamics have been introduced to ensure feasible path generation on uneven terrain.²³

Parallel efforts have focused on platform development and advanced control methodologies. The Chronos platform, for instance, offers a cost-effective, 1/28-scale testbed for education and research, featuring an open-source software framework for multi-agent coordination.²⁴ To address the complexities of rough-terrain path tracking, hybrid controllers leveraging deep reinforcement learning have been developed, exhibiting enhanced robustness against dynamic disturbances compared to traditional model-based control.²⁵ Furthermore, real-time planners utilizing motion primitives have enabled smooth, time-efficient trajectories for operation amidst moving obstacles,²⁶ while time-optimal trajectory generation has been analyzed to identify fundamental components for efficient navigation under strict dynamic constraints.²⁷

Despite these advancements, a significant gap remains in standalone autonomy. Most existing Ackermann-based platforms rely heavily on pre-mapped environments or external positioning systems, such as GNSS or motion capture, due to a lack of robust onboard state estimation capabilities. Where VO is employed, it is frequently limited to monocular setups, which are susceptible to scale ambiguity and failure in low-texture environments. Moreover, current platforms seldom incorporate active perception modules, such as pan-tilt RGB-D sensors, limiting their capacity for dynamic object tracking and semantic scene understanding.

Distinctiveness of the proposed platform

The proposed platform integrates an Ackermann steering mechanism with dual stereo vision and visual–inertial fusion to achieve accurate, drift-reduced state estimation and dense 3D perception in real time. The Ackermann configuration ensures efficient, low-slip maneuverability while mimicking the kinematic characteristics of on-road vehicles, offering superior suitability for realistic navigation scenarios compared to differential or skid-steered systems.

For robust localization, dual stereo cameras mounted at the front and rear provide bidirectional VO. These estimates are fused with angular velocity data from an onboard IMU via an EKF, substantially mitigating the drift typically observed in VO-only systems, particularly in environments with occlusions or low-texture regions. Furthermore, the platform incorporates a pan-tilt RGB-D camera to enable active object tracking with dynamic viewpoint control. This capability ensures continuous perception of moving or partially occluded targets in 3D space. Unlike conventional LiDAR-based systems, our approach performs dense 3D mapping using stereo image sensing. This enables the generation of color-textured maps without reliance on heavy or expensive LiDAR sensors, offering a lightweight and cost-effective alternative while preserving map density and spatial fidelity.

The system architecture is fully modular and implemented in ROS 2, supporting real-time SLAM with loop closure, active perception, and autonomous navigation. This design renders the platform not only a practical solution for outdoor service tasks such as patrol, inspection, and delivery but also a flexible and extensible testbed for mobile robotics research.

To provide a clear positioning of the proposed system within the landscape of existing research, Table 1 presents a comparative analysis with other representative Ackermann-steered mobile robot platforms: the miniature educational robot Chronos,²⁴ the open-source autonomous racing platform F1TENTH,²⁸ and the mid-sized off-road platform AgileX Hunter SE.²⁵ While Chronos offers a cost-effective solution for swarm robotics, it relies on external infrastructure for localization. Similarly, the F1TENTH platform, widely used for high-speed racing research, typically employs 2D LiDAR and standard computing units such as the Jetson TX2 or Xavier. However, it generally relies on fixed sensor mounts, limiting its ability to perform active perception tasks. The Hunter SE represents a robust solution for rough terrain but employs a heavy sensor suite including GNSS, making it less suitable for compact service applications in GPS-denied environments. In contrast, our platform is distinguished by integrating high-performance onboard computing (NVIDIA Jetson AGX Orin) and a dual stereo vision system into a compact chassis. Unlike F1TENTH or Hunter SE, our system achieves robust drift-reduced localization via fused dual VO without reliance on LiDAR or GNSS, and uniquely supports active object tracking through its pan-tilt perception module.

Table 1.

Comparison of the proposed platform with existing Ackermann-steered research platforms.

Feature	Chronos²⁴	F1TENTH²⁸	Hunter SE²⁵	Proposed platform
Scale	1/28 scale	1/10 scale	Mid-sized UGV	1/8 scale
Primary sensor	IMU	2D LiDAR/RGB-D	LiDAR/Camera/GNSS	Dual Stereo + RGB-D
Localization	External (MoCap)	LiDAR SLAM	GNSS/LiDAR	Fused dual VO + IMU
Active perception	No	No (fixed mount)	No (fixed mount)	Yes (pan-tilt)
Compute unit	MCU (ESP32)	Jetson TX2/Xavier	PC (not specified)	Jetson AGX Orin
Target Env.	Indoor/swarm	Indoor racing	Off-road/simulation	Outdoor service

IMU: inertial measurement unit; 2D: two-dimensional; SLAM: simultaneous localization and mapping; GNSS: global navigation satellite system; VO: visual odometry.

Hardware design

Platform overview

Figure 1 presents an overview of the developed robotic platform. The robot measures $530 mm \times 420 mm \times 610 mm$ ( $L \times W \times H$ ) and weighs approximately $8.6 kg$ . It features an Ackermann steering mechanism, enabling a maximum linear velocity of $2.5 m / s$ and a steering angle of up to $30 \circ$ .

Figure 1.

Overview of the physical configuration of the robotic platform.

Hardware architecture

Figure 2 illustrates the hardware architecture and system interfaces. The central onboard computer (NVIDIA Jetson AGX Orin) operates on the ROS 2 framework and orchestrates communication between the sensing and control modules. It executes high-level computational tasks, including object detection and tracking, 3D mapping, sensor-fusion-based localization, path planning, and motion command generation.

The propulsion system is powered by a brushless direct-current (BLDC) motor (Hobbywing XERUN 4268) driven by an electronic speed controller (ESC; Hobbywing XERUN XR8). Steering is actuated by a high-torque servo motor (Savox SB-2290SG) via pulse width modulation (PWM) signals. A 32-bit microcontroller (Arduino Nano ESP32) handles low-level control, receiving velocity and steering commands from the onboard computer via a universal serial bus (USB) serial interface or from a remote controller via Zigbee wireless communication. The microcontroller translates these inputs into PWM signals for the ESC and servo motor, simultaneously executing proportional–derivative (PD) control to regulate the BLDC motor speed.

Figure 2.

Block diagram of the hardware architecture. Solid arrows indicate wired connections, and the dotted arrow represents wireless communication.

For perception, the platform is equipped with an RGB-D camera (Intel RealSense D455) dedicated to object tracking and two stereo cameras (StereoLabs ZED X Mini) mounted at the front and rear of the chassis for VO and 3D perception. The stereo cameras interface with the onboard computer via Gigabit Multimedia Serial Link connections to ensure high-bandwidth data transmission, while the RGB-D camera connects via USB.

To facilitate active tracking, the RGB-D camera is mounted on a two-axis pan-tilt actuator (ROBOTIS Dynamixel 2XL430-W250), providing dynamic gaze control. Additionally, an IMU supplies inertial measurements, which the onboard computer uses to estimate the robot’s angular velocity and orientation. Both the pan-tilt actuator and the IMU communicate with the onboard computer via USB.

Software design

Software architecture

Figure 3 presents the software architecture of the proposed robotic platform. The system is built upon the ROS 2 framework, which ensures modularity, distributed communication, and real-time performance. Each block in the diagram represents either an individual ROS 2 node or a logical cluster of nodes collaborating to execute specific robotic functions.

Figure 3.

Software architecture of the robotic platform.

At the highest level, the High-Level Application Module orchestrates task-specific behaviors tailored to the operational requirements. This module serves as the command interface for mission-level applications, such as last-mile delivery, autonomous patrolling, and infrastructure inspection. It transmits navigation goals to the Autonomous Navigation Module while receiving real-time feedback regarding the robot’s pose, system status, and task progress.

The Autonomous Navigation Module is implemented using the Nav2 stack,²⁹ leveraging its comprehensive suite of algorithms for path planning and control. In this platform, the module integrates subcomponents for localization, costmap generation, global and local planning, and recovery behaviors to facilitate robust point-to-point navigation and dynamic obstacle avoidance. Based on sensor inputs and goal constraints, it computes the desired linear velocity and steering angle. These commands are transmitted to the Robot Base Control Module, which translates them into PWM signals to actuate the ESC and steering servo motor.

To support active perception capabilities, the platform incorporates a vision-based tracking subsystem. The Tracking Cam Driver Module acquires RGB-D images and point clouds from the pan-tilt camera, which are subsequently processed by the Object Detection Module to identify target objects. The Pose Estimation Module calculates the 3D position of the detected target, while the Pan-Tilt Control Module actively regulates the two-degree-of-freedom gimbal. This feedback loop ensures continuous visual alignment with the target, even during robot motion.

For state estimation, the system employs two stereo vision modules mounted at the front and rear of the chassis. The Front and Rear Cam Driver Modules capture synchronized stereo image pairs and depth information. These data are processed by the respective VO Modules to estimate the egomotion of each camera relative to its initial frame. Finally, the EKF Module fuses these VO estimates with angular velocity data from the IMU to produce a drift-reduced, consistent estimate of the robot’s pose and velocity.

The Autonomous Navigation Module utilizes this fused odometry for localization and aggregates point cloud data from both stereo cameras to construct costmaps for path planning. This architecture enables fully autonomous operation, integrating accurate localization, responsive navigation, and intelligent object tracking within a unified and extensible ROS 2 framework.

Odometry estimation

The EKF Module (see Figure 3) estimates the robot’s pose by fusing egomotion data from the Front and Rear VO Modules with angular velocity measurements from the IMU, thereby providing robust position and orientation estimates over time.

Each VO Module independently computes the six-degrees of freedom motion of its corresponding stereo camera by tracking visual features across consecutive frames. This stereo-based approach facilitates depth-aware 3D motion estimation by exploiting geometric disparities in the observed environment. Crucially, utilizing both front and rear stereo cameras significantly enhances robustness, particularly when one sensor’s field of view is compromised by occlusion, poor illumination, or low texture. By fusing the VO outputs from both sensors via the EKF, the system generates a highly accurate and drift-reduced odometry estimate relative to the robot base frame.

Figure 4 depicts the coordinate frames assigned to the robot chassis and the stereo cameras. The robot coordinate frame, ${x_{R}, y_{R}}$ , is defined at the center of the rear axle, adhering to the standard convention for Ackermann-steered vehicles. In this frame, the $x_{R}$ -axis aligns with the forward heading, while the $y_{R}$ -axis points laterally to the left. The front and rear stereo cameras are rigidly mounted to the chassis, with their respective coordinate frames denoted as ${x_{F}, y_{F}}$ and ${x_{B}, y_{B}}$ .

Figure 4.

Coordinate frames of the robot platform and its stereo vision sensors.

The coordinate frames, velocity vectors, and transformation matrices utilized in this work are summarized in Table 2. These notations are consistently applied throughout the derivation of the state estimation and control models in the subsequent sections.

Table 2.

Nomenclature for coordinate frames, velocity vectors, and transformations.

Symbol	Description
${x_{R}, y_{R}}$	Robot base frame defined at the
	center of the rear axle
${x_{F}, y_{F}}$	Front stereo camera frame
${x_{B}, y_{B}}$	Rear stereo camera frame
$v^{\,front}$	Front VO velocity transformed
	into the robot coordinate frame
$v^{rear}$	Rear VO velocity transformed
	into the robot coordinate frame
$\hat{v}$	EKF-estimated robot velocity
${\tilde{v}}_{x}$	Normalized longitudinal velocity of $\hat{v}$
$_{F}^{R} T$	Transformation from front camera
	frame ( $F$ ) to robot frame ( $R$ )
$_{R}^{O^{F}} T$	Transformation from robot frame
	( $R$ ) to front VO frame ( $O^{F}$ )
$_{F^{0}}^{O^{F}} T$	Transformation from initial front
	camera ( $F^{0}$ ) to VO frame ( $O^{F}$ )
$_{F}^{F^{0}} T$	Transformation from front camera
	frame ( $F$ ) to initial frame ( $F^{0}$ )
$_{B}^{R} T$	Transformation from rear camera
	frame ( $B$ ) to robot frame ( $R$ )
$_{R}^{O^{B}} T$	Transformation from robot frame
	( $R$ ) to rear VO frame ( $O^{B}$ )
$_{B^{0}}^{O^{B}} T$	Transformation from initial rear
	camera ( $B^{0}$ ) to VO frame ( $O^{B}$ )
$_{B}^{B^{0}} T$	Transformation from rear camera
	frame ( $B$ ) to initial frame ( $B^{0}$ )

EKF: extended Kalman filter; VO: visual odometry.

VO filtering

VO is inherently susceptible to drift induced by environmental factors such as illumination variations, motion blur, dynamic obstacles, and texture-less surfaces. Directly incorporating these outlier-prone measurements into the EKF update step can severely compromise the integrity of the overall state estimation.

To mitigate this, a filtering mechanism is implemented to reject unreliable VO measurements based on the robot’s kinematic constraints and velocity feasibility. First, the velocity estimates $^{F} v^{\,front}$ and $^{B} v^{rear}$ , obtained from the Front and Rear VO Modules, are transformed into the robot coordinate frame as follows:

\begin{aligned} v^{\,front} & =_{F}^{R} T \cdot^{F} v^{\,front} = {[\begin{matrix} v_{x}^{\,front} & v_{y}^{\,front} \end{matrix}]}^{⊤} \\ v^{rear} & =_{B}^{R} T \cdot^{B} v^{rear} = {[\begin{matrix} v_{x}^{rear} & v_{y}^{rear} \end{matrix}]}^{⊤} \end{aligned}

(1)

where

_{F}^{R} T

and

_{B}^{R} T

denote the 2D homogeneous transformation matrices mapping velocities from the respective stereo camera frames to the robot base frame.

Figure 5 illustrates the admissible region in the velocity space defined for outlier rejection. Here, $v_{x}$ and $v_{y}$ represent the longitudinal and lateral velocity components in the robot frame. Since the robot’s body frame is located at the center of the rear axle, the Ackermann steering geometry imposes a non-holonomic constraint, theoretically necessitating zero lateral velocity ( $v_{y} = 0$ ). However, in practical scenarios, wheel slip, surface irregularities, and sensor noise typically result in small but non-negligible lateral velocity estimates.

Figure 5.

Admissible region in the velocity space employed for outlier rejection.

To ensure kinematic consistency, a VO measurement is accepted for the EKF update only if it falls within the defined validity bounds:

\begin{aligned} v_{x}^{min} \leq v_{x} \leq v_{x}^{max}, | v_{y} | \leq v_{y}^{max} \\ | \tan^{- 1} (\frac{v_{y}}{v_{x} + ϵ}) | \leq θ^{max} \end{aligned}

(2)

where

v_{x}^{min}

and

v_{x}^{max}

define the allowable longitudinal velocity range,

v_{y}^{max}

limits the permissible lateral drift,

θ^{max}

constrains the deviation angle from the forward axis, and

ϵ

is a small constant to prevent division by zero. This constraint effectively filters out VO inputs contaminated by significant drift or unmodeled dynamics. Only the validated velocity estimates are subsequently fused with IMU yaw-rate measurements via the EKF to generate a consistent and drift-reduced odometry.

Coordinate transformation of VO estimates

The Front and Rear VO Modules independently compute the egomotion of their respective stereo cameras relative to their initial frames. To fuse these spatially disjoint estimates, the camera-centric trajectories must be mapped into the common robot base frame. This transformation ensures that all motion updates are consistent with the robot’s kinematic center, enabling accurate sensor fusion within the EKF.

Let $O^{F}$ denote the reference frame for the front VO, which is defined to align with the robot’s initial pose. The pose of the robot in this reference frame, denoted as $_{R}^{O^{F}} T$ , is obtained by applying a similarity transformation to the camera’s estimated motion:

_{R}^{O^{F}} T =_{F}^{R} T \cdot_{F}^{F^{0}} T \cdot (_{F}^{R} T)^{- 1}

(3)

where $_{F}^{F^{0}} T$ represents the cumulative motion estimated by the front VO from its initial frame $F^{0}$ to the current frame $F$ , and $_{F}^{R} T$ is the constant extrinsic transformation matrix from the front camera to the robot base frame. This formulation effectively projects the motion observed in the camera frame onto the robot body frame, compensating for the mounting offset.

An analogous transformation is applied to the rear VO system. Let $O^{B}$ be the reference frame for the rear VO, aligned with the robot’s initial pose. The robot’s pose derived from the rear camera motion, $_{R}^{O^{B}} T$ , is expressed as:

_{R}^{O^{B}} T =_{B}^{R} T \cdot_{B}^{B^{0}} T \cdot (_{B}^{R} T)^{- 1}

(4)

where $_{B}^{B^{0}} T$ is the motion estimated by the Rear VO Module, and $_{B}^{R} T$ denotes the extrinsic calibration between the rear camera and the robot base.

By applying these transformations, the trajectories from both the front and rear cameras are unified into the robot base frame. These synchronized and spatially aligned pose estimates serve as redundant measurement inputs to the EKF, thereby enhancing the robustness and accuracy of the final localization.

Velocity control

The Robot Base Control Module translates high-level motion commands—specifically reference linear velocity and steering angle—into low-level actuator signals. While the steering angle is regulated by a servo motor with an internal closed-loop mechanism, precise linear velocity control necessitates a custom controller to generate appropriate PWM signals for the ESC. To achieve this, we employ a feedforward-feedback control architecture. A polynomial mapping function, derived from empirical data, generates the baseline feedforward PWM command for a desired reference velocity, while a PD controller utilizes real-time velocity feedback to compensate for modeling uncertainties and dynamic disturbances.

To characterize the system response and derive the feedforward model, open-loop step tests were conducted. Figure 6 depicts the measured forward velocity profiles for six discrete PWM inputs (1561, 1565, 1572, 1575, 1580, and 1585). The robot’s longitudinal velocity, ${\hat{v}}_{x}$ , was estimated via the EKF by fusing valid VO measurements satisfying the conditions defined in (2). Steady-state velocities were calculated by averaging the response over the stable time interval from $4 s$ to $12 s$ for each PWM step.

Figure 6.

Measured forward velocities of the robot over time for six discrete pulse width modulation (PWM) inputs.

To facilitate robust regression across the full operational range, the collected $(PWM, {\hat{v}}_{x})$ data pairs were normalized using min–max scaling prior to fitting:

\begin{aligned} \tilde{PWM} & = \frac{PWM - {PWM}^{min}}{{PWM}^{max} - {PWM}^{min}} \\ {\tilde{v}}_{x} & = \frac{{\hat{v}}_{x} - v_{x}^{min}}{v_{x}^{max} - v_{x}^{min}} \end{aligned}

(5)

where

{PWM}^{min}

and

{PWM}^{max}

denote the minimum and maximum PWM values, respectively. The boundary parameters used for this normalization are listed in Table 3.

Table 3.

Parameters for pulse width modulation (PWM) and velocity normalization.

Param.	${PWM}^{min}$	${PWM}^{max}$	$v_{x}^{min}$	$v_{x}^{max}$
Value	1561	1585	$0.4 m / s$	$2.8 m / s$

Using these normalized datasets, a fourth-order polynomial regression model was derived to approximate the mapping from the reference velocity ${\tilde{v}}_{x}$ to the required PWM input $\tilde{PWM}$ :

\tilde{PWM} = f ({\tilde{v}}_{x}) = a_{4} {\tilde{v}}_{x}^{4} + a_{3} {\tilde{v}}_{x}^{3} + a_{2} {\tilde{v}}_{x}^{2} + a_{1} {\tilde{v}}_{x} + a_{0}

(6)

The resulting regression curve is illustrated in Figure 7 alongside the experimental data points. The determined polynomial coefficients, which constitute the feedforward model, are summarized in Table 4.

Figure 7.

Polynomial regression result between normalized forward velocity and normalized pulse width modulation (PWM) input.

Table 4.

Coefficients of the fourth-order polynomial regression model.

Coefficient	$a_{0}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$
Value	$- 0.01$	$1.98$	$- 0.64$	$- 2.04$	$1.73$

Experiment

Velocity tracking

To validate the efficacy of the proposed control strategy, we evaluated the robot’s tracking performance across a range of reference velocities. The control architecture integrates the feedforward model, derived from the fourth-order polynomial regression in (6), with a PD feedback controller to minimize tracking errors in real time.

Step response tests were conducted for reference velocities of 0.5, 1.0, 1.5, 2.0, and 2.5 $m / s$ . Figure 8 depicts the temporal evolution of the measured velocity profiles. For statistical reliability, 10 experimental trials were performed for each reference command. The root mean square error (RMSE) and standard deviation (SD) of the tracking error were calculated over the steady-state interval (4–10 s) to exclude transient effects. The resulting statistical metrics are summarized in Table 5.

Figure 8.

Measured velocities over time for five reference velocity commands.

Table 5.

Statistical analysis of velocity tracking performance (steady-state interval: 4–10 s).

$v_{x}^{ref}$ [ $m / s$ ]	0.5	1.0	1.5	2.0	2.5
RMSE [ $m / s$ ]	0.060	0.046	0.134	0.084	0.071
SD [ $m / s$ ]	0.036	0.042	0.128	0.069	0.062

RMSE: root mean square error; SD: standard deviation.

The quantitative results demonstrate robust tracking performance for most reference velocities, maintaining RMSE values generally below 0.085 $m / s$ . The system achieved the highest accuracy at $v_{x}^{ref} = 1.0 m / s$ , indicating that the feedforward model is optimally calibrated near the midpoint of the operational range.

However, a distinct increase in both RMSE ( $0.134 m / s$ ) and SD ( $0.128 m / s$ ) was observed at $v_{x}^{ref} = 1.5 m / s$ . This performance deviation is primarily attributed to mechanical resonance. During experimentation, significant chassis vibration was noted specifically near this velocity, suggesting the excitation of the platform’s natural frequency. These physical vibrations introduced high-frequency noise into the IMU readings, thereby perturbing the EKF-based velocity estimation fed into the feedback loop. Since the current linear PD controller employs fixed gains, it lacks the adaptability to effectively suppress such nonlinear disturbances induced by structural resonance. Although the tracking accuracy remains sufficient for general navigation tasks, future iterations will address this issue by implementing model-based control strategies, such as model predictive control (MPC), and by reinforcing the mechanical damping of the sensor mounting interface.

Dual VO-based localization

To validate the contribution of the proposed dual VO system compared to single-camera configurations, we conducted a comparative ablation study in an outdoor environment. The experiment was designed to quantify the localization accuracy of the front-only VO, rear-only VO, and the proposed dual VO (fused via EKF) with respect to a ground truth trajectory obtained via real-time kinematic (RTK) GPS.

As illustrated in Figure 9, the experiment took place in a paved outdoor square. To introduce realistic environmental challenges, the test was conducted under uneven illumination conditions. As shown in the inset of Figure 9, the front camera faced intense backlighting and lens flare due to direct sunlight, whereas the rear camera observed a region cast in shadow. These conditions were intentionally selected to verify the robustness of the sensor fusion algorithm when individual visual sensors are degraded by lighting variations.

Figure 9.

Experimental setup for evaluating visual odometry accuracy. An RTK–GPS system (Base Station and Rover) was employed to record the ground truth trajectory. The inset images demonstrate the contrasting illumination conditions: the Front view is affected by strong backlighting, while the Rear view captures shadowed regions. RTK: real-time kinematic; GPS: global positioning system.

The robot traversed a rectangular path, and the trajectories estimated by the front VO, rear VO, and dual VO were recorded simultaneously alongside the RTK–GPS ground truth. Figure 10 presents the comparative trajectory plots. It is evident that the single-camera approaches (front and rear VOs) exhibit noticeable drift, particularly during turning maneuvers and in straight sections affected by the aforementioned adverse lighting. Specifically, the front VO shows significant deviation in the upper-left corner, likely attributed to feature loss caused by the strong backlight. In contrast, the dual VO trajectory (red dashed line) maintains close alignment with the RTK–GPS ground truth (black solid line) throughout the entire path. This demonstrates that the EKF-based fusion effectively compensates for the independent and uncorrelated errors of each camera by robustly filterning out outlier measurements.

Figure 10.

Trajectory comparison of front VO, rear VO, and dual VO with the RTK–GPS ground truth. VO: visual odometry; RTK: real-time kinematic; GPS: global positioning system.

Table 6 summarizes the quantitative performance metrics, including RMSE, mean error, maximum error, and SD. The front and rear VOs yielded RMSE values of $0.9109 m$ and $0.6903 m$ , respectively. However, the dual VO achieved the lowest RMSE of $0.4958 m$ , representing a significant improvement in localization accuracy ( $45.6 %$ reduction compared to front-only VO). Furthermore, the SD of the error for the dual VO was reduced to $0.2710 m$ , indicating superior stability. These results confirm that fusing bidirectional VO data significantly enhances robustness and accuracy, particularly in outdoor environments characterized by variable and challenging lighting.

Table 6.

Comparison of trajectory estimation accuracy (reference: RTK–GPS).

Metric	Front VO	Rear VO	Dual VO
RMSE [m]	0.9109	0.6903	0.4958
Mean error [m]	0.7175	0.6256	0.4152
Max error [m]	1.8392	1.1863	0.8867
Std Dev [m]	0.5612	0.2917	0.2710

RTK: real-time kinematic; GPS: global positioning system; VO: visual odometry; RMSE: root mean square error; Std Dev: standard deviation.

Mapless navigation

To validate the autonomous capabilities of the proposed platform, a mapless navigation experiment was conducted in a previously unknown environment. In this scenario, the robot was tasked with navigating toward a predefined goal coordinates without reliance on pre-built occupancy grids or external localization infrastructure. Consequently, the system depended exclusively on its onboard perception modules and local motion-planning algorithms to detect obstacles in real time and generate collision-free paths.

As depicted in Figure 11, the testing environment featured a constrained spatial layout. Obstacle area $A$ was positioned near the front-left of the robot’s initial pose, while obstacle area $B$ was located further ahead on the front-right. Additionally, a continuous wall extended along the left boundary, further limiting the navigable space. The goal position was set behind and to the left of area $B$ . This configuration necessitated a complex sigmoid trajectory, requiring the robot to maneuver through a central corridor while circumventing both obstacle regions under non-holonomic kinematic constraints.

Figure 11.

Experimental environment for mapless navigation.

The navigation framework was implemented using the ROS 2 Nav2 stack. Specifically, the A $*$ algorithm was employed for global path planning, while the regulated pure pursuit controller³⁰ was utilized for trajectory tracking. The latter was selected specifically for its suitability for Ackermann-steered vehicles, as it effectively handles curvature constraints and velocity regulation during sharp turns. Perception was achieved by fusing point-cloud data from the stereo cameras into a local costmap, with localization provided by the EKF-based odometry described in the “Velocity control” section.

Figure 12 illustrates the experimental outcome, displaying the global and local costmaps alongside the robot’s executed trajectory. The inflated obstacles in the global costmap correspond to areas $A$ and $B$ and the boundary wall. The local costmap, dynamically updated via sensor observations, represents a $10 m \times 10 m$ sliding window centered on the robot.

Figure 12.

Experimental result of mapless navigation: Global/local costmaps and the executed trajectory.

The results confirm that the robot successfully avoided all obstacles and reached the target location smoothly. Despite the restricted maneuvering space, the integrated system demonstrated robust performance, effectively combining real-time terrain perception, costmap inflation, and adaptive Ackermann control. This experiment validates the platform’s utility for real-world service applications—such as patrol, inspection, and delivery—in environments where prior map data is unavailable or unreliable.

SLAM-based 3D map generation

To assess the platform’s capacity for dense environmental reconstruction, a 3D SLAM experiment was conducted in an outdoor setting characterized by various static structures, as depicted in Figure 13. The mapping framework was implemented using real-time appearance-based mapping (RTAB-Map), a graph-based RGB-D SLAM algorithm fully integrated with the ROS 2 ecosystem.³¹

Figure 13.

Experimental environment for 3D SLAM and map generation. 3D: three-dimensional; SLAM: simultaneous localization and mapping.

In this configuration, the front-facing stereo camera served as the primary sensor, providing synchronized RGB images and aligned depth maps. RTAB-Map processed this incoming RGB-D stream to internally compute VO and extract features for loop closure detection. This demonstrates the platform’s capability to support standalone SLAM packages without reliance on external state estimation modules. The resulting constraints were added to a pose graph, which was continuously optimized online to maintain global map consistency and correct for accumulated drift.

During the experiment, the robot was tele-operated via a Zigbee-based remote controller to traverse the environment along a predetermined path similar to the trajectory shown in Figure 12. As the robot moved, RTAB-Map incrementally reconstructed a dense 3D map of the surroundings in real time. By operating in RGB-D mode, the system directly utilized the high-fidelity depth data generated by the stereo camera, ensuring accurate spatial representation without the computational overhead of redundant disparity calculations.

Figure 14 presents the resulting 3D point cloud map. The reconstruction captures the geometric structure of the environment with high fidelity, clearly distinguishing vertical walls, obstacles, and free space. This outcome demonstrates the system’s capability to perform real-time 3D SLAM and construct globally consistent maps in previously unknown environments.

Figure 14.

Experimental result of 3D SLAM: The reconstructed point cloud map. 3D: three-dimensional; SLAM: simultaneous localization and mapping.

The successful execution of this experiment validates that the developed platform is well-suited for tasks requiring comprehensive 3D environmental understanding, such as autonomous inspection, exploration in complex outdoor environments, and digital-twin generation for infrastructure modeling. Furthermore, the results underscore the platform’s ability to seamlessly integrate and operate multiple computationally intensive perception modules within a unified ROS 2 framework. The demonstrated 3D mapping capability thus highlights the system’s value not only as a reliable service robot but also as a flexible testbed for developing and evaluating diverse SLAM and perception algorithms.

Conclusion

This paper presented the design, implementation, and experimental validation of a compact Ackermann-steered mobile robot platform equipped with a dual VO system and active perception capabilities. Built upon the ROS 2 framework, the proposed system ensures modularity and extensibility, allowing researchers to seamlessly integrate and benchmark various algorithms for perception, localization, and navigation.

Experimental evaluations in outdoor environments demonstrated the platform’s robustness and quantitative performance. Velocity tracking tests confirmed stable motion control with an RMSE remaining below $0.085 m / s$ across the majority of reference speeds. Notably, the sensor fusion approach using dual stereo cameras achieved a localization RMSE of $0.4958 m$ against RTK–GPS ground truth. This represents a significant improvement over single-camera configurations (Front: $0.9109 m$ , Rear: $0.6903 m$ ), verifying the effectiveness of the proposed method in handling challenging illumination conditions. Furthermore, mapless navigation and 3D SLAM experiments validated the system’s practical utility as a versatile testbed for outdoor service robotics.

Future work will aim to enhance the platform’s autonomy in highly dynamic and crowded environments. Specifically, we plan to leverage the integrated pan-tilt RGB-D perception system to implement active visual tracking algorithms for HRI and surveillance tasks. Additionally, to mitigate the mechanical resonance observed at specific velocities, we intend to adopt model-based control strategies, such as MPC, and reinforce the mechanical damping of the sensor mounts to ensure superior stability.

Footnotes

ORCID iD

Dong-Hyun Lee

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Research Foundation of Korea (NRF) grant funded by Korean Government through MSIT under Grant No. RS-2024-00337584.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

Shen

, et al. Sensor-network-based navigation of delivery robot for baggage handling in international airport. Int J Adv Robot Syst 2020; 17: 1729881420944734.

Feng

Shang

Hoxha

, et al. Robotic inspection and data analytics to localize and visualize the structural defects of concrete infrastructure. IEEE Trans Autom Sci Eng 2025; 22: 22324–22336.

Matsuda

Kuroda

Fukatsu

, et al. A mutual positioning relay method of multiple robots for monitoring indoor environments. Int J Adv Robot Syst 2022; 19: 17298806221129842.

Gad

Bagavathiannan

Bhandari

, et al. Low-cost, compact mobile robot for autonomous soil monitoring in crop fields. In: 2025 22nd international conference on ubiquitous robots (UR), 2025, pp. 438–443. IEEE.

Guan

Kothandaraman

Chandra

, et al. GA-Nav: Efficient terrain segmentation for robot navigation in unstructured outdoor environments. IEEE Robot Autom Lett 2022; 7: 8138–8145.

Kim

Son

Kim

, et al. PTS-Map: Probabilistic terrain state map for uncertainty-aware traversability mapping in unstructured environments. IEEE Robot Autom Lett 2024; 10(2): 1257–1264.

Macenski

Foote

Gerkey

, et al. Robot operating system 2: Design, architecture, and uses in the wild. Sci Robot 2022; 7: 1257–1264.

Yilmaz

Ervan

Temeltas

, et al. An autonomous robotic system for ground surface and subsurface imaging. In: 2022 international conference on engineering and emerging technologies (ICEET), 2022, pp. 1–6. IEEE.

Chrysoulakis

Fasoulas

Sfakiotakis

. Development and initial evaluation of a multi-purpose spraying robot prototype. In: 2021 20th international conference on advanced robotics (ICAR), 2021, pp. 384–389. IEEE.

10.

Oyelere

Suhonen

. Design and implementation of mobileedu m-learning application for computing education in nigeria: A design research approach. In: 2016 international conference on learning and teaching in computing and engineering (LaTICE), 2016, pp.27–31. IEEE.

11.

Choi

. Advanced omni-directional mobility system for human-friendly industrial warehouse operations. IEEE Trans Ind Appl 2025; 61, no. 3, pp. 4364-4372(3): 4364–4372.

12.

Choi

Jin

. Development of an omni-directional human-friendly mobility platform for industrial warehouse. In: 2023 IEEE transportation electrification conference and expo, Asia–Pacific (ITEC Asia–Pacific), 2023, pp. 1–4. IEEE.

13.

Tan

Zhang

. Research on omnidirectional indoor mobile robot system based on multi-sensor fusion. In: 2021 5th international conference on vision, image and signal processing (ICVISP), 2021, pp. 111–117. IEEE.

14.

Fares

Souifi

Ghribi

, et al. Omnidirectional platform for autonomous mobile industrial robot. In: 2021 IEEE 3rd Eurasia conference on IoT, communication and engineering (ECICE), 2021, pp. 633–638. IEEE.

15.

Kastritsi

Ajoudani

. A passive power-based control strategy for phri tasks with omni-directional robotic mobile platforms. IEEE Robot Autom Lett 2024; 9: 6959–6966.

16.

Lin

Terakawa

Shinno

, et al. RoMop: A new type of wheeled mobile platform based on rotating locomotion. IEEE/ASME Trans Mechatron 2023; 29: 2510–2521.

17.

Duc-Nam

Dinh-Trung

Duc-Anh

, et al. Design of remote swerve-drive mechanism-based mobile platform for transportation applications. In: 2024 international conference on electrical, computer and energy technologies (ICECET), 2024, pp. 1–6. IEEE.

18.

Binugroho

Setiawan

Sadewa

, et al. Position and orientation control of three wheels swerve drive mobile robot platform. In: 2021 international electronics symposium (IES), 2021, pp. 669–674. IEEE.

19.

Amrulloh

Alasiry

Binugroho

, et al. Spline trajectory for three wheel mobile robot swerve drive. In: 2023 international electronics symposium (IES), 2023, pp. 250–255. IEEE.

20.

Chong

Holmberg

, et al. TidyBot++: An open-source holonomic mobile manipulator for robot learning. arXiv preprint arXiv:241210447 2024.

21.

Zhang

Liu

, et al. Energy efficient path planning for autonomous ground vehicles with Ackermann steering. Rob Auton Syst 2023; 162.

22.

Carpio

Potena

Maiolini

, et al. A navigation architecture for Ackermann vehicles in precision farming. IEEE Robot Autom Lett 2020; 5: 1103–1110.

23.

Chai

Han

, et al. An efficient trajectory planner for car-like robots on uneven terrain. In: 2023 IEEE/RSJ international conference on intelligent robots and systems (IROS), 2023, pp. 2853–2860. IEEE.

24.

Carron

Bodmer

Vogel

, et al. Chronos and CRS: Design of a miniature car-like robot and a software framework for single and multi-agent robotics and control. arXiv preprint arXiv:220912048 2022.

25.

Mehta

Salvi

Krovi

. Rough terrain path tracking of an ackermann steered platform using hybrid deep reinforcement learning. In: 2024 IEEE international conference on advanced intelligent mechatronics (AIM), 2024, pp. 685–690. IEEE.

26.

Lin

Zhou

Zhu

, et al. Search-based online trajectory planning for car-like robots in highly dynamic environments. In: 2021 IEEE international conference on robotics and automation (ICRA), 2021, pp. 8151–8157. IEEE.

27.

Ben-Asher

Rimon

. Time optimal trajectories for a car-like mobile robot. IEEE Trans Robot 2021; 38: 421–432.

28.

O’Kelly

Zheng

Karthik

, et al. F1TENTH: An open-source evaluation environment for continuous control and reinforcement learning. Proc Mach Learn Res 2020; 123: 77–89.

29.

Macenski

Martín

White

, et al. The marathon 2: A navigation system. In: 2020 IEEE/RSJ International conference on intelligent robots and systems (IROS), 2020, pp. 2718–2725. IEEE.

30.

Macenski

Singh

Martín

, et al. Regulated pure pursuit for robot path tracking. Autonom Robots 2023; 47: 685–694.

31.

Labbé

Michaud

. Multi-session visual slam for illumination-invariant re-localization in indoor environments. Front Robot AI 2022; 9.

Development of an Ackermann steering robot platform with dual visual odometry and pan-tilt perception system

Abstract

Keywords

Introduction

Related work

Differential drive and skid steering platforms

Omnidirectional platforms

Swerve-drive and swivel-based platforms

Ackermann-steered platforms

Distinctiveness of the proposed platform

Hardware design

Platform overview

Hardware architecture

Software design

Software architecture

Odometry estimation

VO filtering

Coordinate transformation of VO estimates

Velocity control

Experiment

Velocity tracking

Dual VO-based localization

Mapless navigation

SLAM-based 3D map generation

Conclusion

Footnotes

ORCID iD

Funding

Declaration of conflicting interests

Data availability statement

References