Hybrid Soft Actor-Critic with Curriculum Learning for Sparse-Reward Mobile Robot Navigation

Abstract

This paper presents a unified empirical study of extended Soft Actor-Critic methods for sparse-reward TurtleBot3 navigation in Gazebo under dense 360 $\circ$ LiDAR observations. We introduce SAC-XH, a streamlined SAC extension that augments the sparse task reward with auxiliary shaping signals and integrates a stage-wise curriculum to improve exploration and sample efficiency. Across progressively complex Gazebo environments, SAC-XH improves training stability and success rate compared to SAC, TD3, and DDPG, while maintaining full reproducibility through an open-source ROS 2/Gazebo framework. SAC-XH consistently outperforms the baselines in learning efficiency and success rate, with dense LiDAR observations (360 beams). Additionally, we evaluate a stage-wise Curriculum Learning protocol on top of SAC-XH, using competence-based advancement and controlled replay transfer. Under calibrated thresholds, the curriculum yields stable convergence and high success rates (87–91%), improving generalization across stages compared to non-curriculum training. These results demonstrate that SAC-XH improves convergence and generalization across multiple Gazebo-simulated navigation environments under sparse-reward conditions, providing a strong DRL baseline for autonomous navigation and a reproducible benchmark for future research.

Keywords

deep reinforcement learning hybrid soft actor-critic curriculum learning mobile robots turtlebot3

Get full access to this article

View all access options for this article.

References

(2025). A survey of curriculum learning in deep reinforcement learning. https://doi.org/10.1109/CCWC62904.2025.10903795.

Ablett

Chan

Wang

J. H.

Kelly

(2024). Value-penalized auxiliary control from examples for learning without rewards or demonstrations. https://arxiv.org/abs/2407.03311.

Bidaki

S. A.

Mohammadkhah

Rezaee

Hassani

Eskandari

Salahi

Ghassemi

M. M.

(2025). Online continual learning: A systematic literature review of approaches, challenges, and benchmarks. https://arxiv.org/abs/2501.04897.

de Jesus

J. C.

Kich

V. A.

Kolling

A. H.

Grando

R. B.

de Souza Leite Cuadros

M. A.

Gamarra

D. F. T.

(2021). Soft actor-critic for navigation of mobile robots. Journal of Intelligent and Robotic Systems: Theory and Applications, 102, 31. https://doi.org/10.1007/s10846-021-01367-5

Dong

Zeng

Wan

Dong

(2024). Mitigating catastrophic forgetting in robot continual learning: A guided policy search approach enhanced with memory-aware synapses. IEEE Robotics and Automation Letters, 9(12), 11242–11249. https://doi.org/10.1109/LRA.2024.3487484

Fujimoto

Hoof

Meger

(2018). Addressing function approximation error in actor-critic methods. In International conference on machine learning (pp. 1587–1596). PMLR.

Haarnoja

Zhou

Abbeel

Levine

(2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In J. Dy & A. Krause (Eds.), Proceedings of the 35th international conference on machine learning, Proceedings of Machine Learning Research (Vol. 80, pp. 1861–1870). PMLR. https://proceedings.mlr.press/v80/haarnoja18b.html.

Jaziri

Künzel

Ramesh

(2025). Mitigating the stability-plasticity dilemma in adaptive train scheduling with curriculum-driven continual DQN expansion. https://arxiv.org/abs/2408.09838.

Kemker

McClure

Abitino

Hayes

Kanan

(2018). Measuring catastrophic forgetting in neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11651

10.

Wang

Gao

(2022). Path planning of mobile robot based on improved TD3 algorithm. https://doi.org/10.1109/ICMA54519.2022.9856399.

11.

Lillicrap

T. P.

Hunt

J. J.

Pritzel

Heess

Erez

Tassa

Silver

Wierstra

(2019). Continuous control with deep reinforcement learning. https://arxiv.org/abs/1509.02971.

12.

Miranda

V. R. F.

Neto

A. A.

Freitas

G. M.

Mozelli

L. A.

(2024a). Generalization in deep reinforcement learning for robotic navigation by reward shaping. IEEE Transactions on Industrial Electronics, 71(6), 6013–6020. https://doi.org/10.1109/tie.2023.3290244

13.

Mnih

Kavukcuoglu

Silver

Rusu

A. A.

Veness

Bellemare

M. G.

Graves

Riedmiller

Fidjeland

A. K.

Ostrovski

Petersen

Beattie

Sadik

Antonoglou

King

Kumaran

Wierstra

Legg

Hassabis

(2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236

14.

Muttaqien

M. A.

Yorozu

Ohya

(2024). Mobile robots through task-based human instructions using incremental curriculum learning. https://arxiv.org/abs/2412.19159.

15.

Nagaraj

Sood

Patil

B. M.

(2022). A concise introduction to reinforcement learning in robotics. https://arxiv.org/abs/2210.07397.

16.

Olayemi

K. B.

Van

McLoone

McIlvanna

Sun

Nguyen

N. M.

(2023). The impact of LiDAR configuration on goal-based navigation within a deep reinforcement learning framework. Sensors (Basel), 23(24), 9732. https://doi.org/10.3390/s23249732

17.

Open Robotics. (2024). Gazebo classic simulator. https://classic.gazebosim.org/. Accessed: 2025-10-01.

18.

Remman

S. B.

Lekkas

A. M.

(2025). Realistic counterfactual explanations for machine learning-controlled mobile robots using 2d lidar. arXiv preprint arXiv:2505.06906 https://arxiv.org/abs/2505.06906.

19.

Rengarajan

Vaidya

Sarvesh

Kalathil

Shakkottai

(2022). Reinforcement learning with sparse rewards using guidance from offline demonstration. https://arxiv.org/abs/2202.04628.

20.

Riedmiller

Hafner

Lampe

Neunert

Degrave

van de Wiele

Mnih

Heess

Springenberg

J. T.

(2018). Learning by playing solving sparse reward tasks from scratch. In J. Dy & A. Krause (Eds.), Proceedings of the 35th international conference on machine learning, Proceedings of Machine Learning Research (Vol. 80, pp. 4344–4353). PMLR.

21.

ROS Community. (2024). ROS Documentation. Open Robotics. http://wiki.ros.org/.

22.

Song

Bihl

Liu

(2025). Coulomb force-guided deep reinforcement learning for effective and explainable robotic motion planning. Frontiers in Robotics and AI, 12, 1697155.

23.

Soviany

Ionescu

R. T.

Rota

Sebe

(2021). Curriculum learning: A survey. CoRR abs/2101.10382. https://arxiv.org/abs/2101.10382.

24.

Steinmetz

Rosa

F. D.

Kich

V. A.

Bottega

J. A.

Grando

R. B.

Gamarra

D. F. T.

(2025). World models for autonomous navigation of terrestrial robots from lidar observations. Journal of Intelligent & Fuzzy Systems, 18758967251399741. https://doi.org/10.1177/18758967251399741

25.

TurtleBot Team. (2024). TurtleBot Documentation. Open Robotics. https://www.turtlebot.com/documentation/.

26.

Wang

Tan

Yang

Wang

Shen

Huang

Zhang

(2025). Enhancing deep reinforcement learning-based robot navigation generalization through scenario augmentation. https://arxiv.org/abs/2503.01146.

27.

Wen

Shu

Rad

Wen

Guo

Gong

(2025). A deep residual reinforcement learning algorithm based on soft actor-critic for autonomous navigation. Expert Systems with Applications, 259, 125238.

28.

Wisniewski

Chatzithanos

Guo

Tsourdos

(2025). Benchmarking deep reinforcement learning for navigation in denied sensor environments. Journal of Intelligent & Robotic Systems, 111(3), 103.

29.

Xie

Miao

Wang

Blunsom

Wang

Chen

Markham

Trigoni

(2021). Learning with stochastic guidance for robot navigation. IEEE Transactions on Neural Networks and Learning Systems, 32, 166–176. https://doi.org/10.1109/TNNLS.2020.2977924

30.

Yang

Wang

Zhang

Wang

Oliehoek

F. A.

Kober

(2025). Task-free lifelong robot learning with retrieval-based weighted local adaptation. https://arxiv.org/abs/2410.02995.

31.

Zhang

Wang

Tan

Yang

Wang

Shen

(2025a). DRL-DCLP: A deep reinforcement learning-based dimension-configurable local planner for robot navigation. IEEE Robotics and Automation Letters, 10(4), 3636–3643. https://doi.org/10.1109/LRA.2025.3544927

32.

Zhang

Yang

Lin

(2025b). Deep reinforcement learning for path planning of autonomous mobile robots in complicated environments. Complex & Intelligent Systems, 11(6), 277.