This paper presents a unified empirical study of extended Soft Actor-Critic methods for sparse-reward TurtleBot3 navigation in Gazebo under dense 360
LiDAR observations. We introduce SAC-XH, a streamlined SAC extension that augments the sparse task reward with auxiliary shaping signals and integrates a stage-wise curriculum to improve exploration and sample efficiency. Across progressively complex Gazebo environments, SAC-XH improves training stability and success rate compared to SAC, TD3, and DDPG, while maintaining full reproducibility through an open-source ROS 2/Gazebo framework. SAC-XH consistently outperforms the baselines in learning efficiency and success rate, with dense LiDAR observations (360 beams). Additionally, we evaluate a stage-wise Curriculum Learning protocol on top of SAC-XH, using competence-based advancement and controlled replay transfer. Under calibrated thresholds, the curriculum yields stable convergence and high success rates (87–91%), improving generalization across stages compared to non-curriculum training. These results demonstrate that SAC-XH improves convergence and generalization across multiple Gazebo-simulated navigation environments under sparse-reward conditions, providing a strong DRL baseline for autonomous navigation and a reproducible benchmark for future research.