Abstract
The inherent training instability of the Deep Deterministic Policy Gradient (DDPG) algorithm has critically hindered its practical application to complex, safety-critical tasks such as quadrotor attitude control. To address this key challenge, this paper proposes an integrated approach named RS-DDPG (Robust and Stabilized DDPG), designed to enhance training stability and controller robustness. While individual components like delayed policy updates (adapted from Twin Delayed DDPG (TD3)) and exponential reward functions have been explored, our contribution lies in the synergistic integration of these elements with a structured curriculum and evaluation framework. This holistic approach is shown to be uniquely effective for this specific control problem. Extensive simulations and ablation studies, now benchmarked against both standard DDPG and TD3, provide definitive evidence of the efficacy of our approach. The resulting controller not only surpasses the baselines in convergence speed and performance but also exhibits exceptional robustness against a wide range of random initial states, persistent external disturbances, and significant model uncertainties. This work demonstrates how the careful integration of existing and novel components can yield a reliable, high-performance, data-driven controller, representing a vital step toward bridging the gap between simulation and real-world deployment in aerial robotics.
Get full access to this article
View all access options for this article.
