Abstract
Legged robots are expected to exhibit natural and adaptive maneuvers of animals on harsh terrains. A compelling way is reinforcement learning, however, the current concurrent training for gait generation and adaptation to varying environmental disturbance challenge reward shaping and damage learning efficiency. To alleviate the problem, we present the Dc-Gait pipeline, which separates the rough locomotion learning process into two sequential stages: gait generation and imitating adaptation, interconnected through state-based gait constraints based on the generated gait dataset. Firstly, gait generation training is induced by well-designed rewards without external interference, leading to the creation of a gait-specific dataset comprised of a series of state transition pairs. Inspired by adversarial imitation learning, these pairs are then generalized through a discriminator network, which is used to generate state-based imitating reward to constrain gaits during adaptation training. This state-based constraint effectively induces the robot to rapidly converge from a disrupted state back to the original gait, significantly enhancing training efficiency. Extensive experiments for different tasks across various robots demonstrate that the proposed pipeline enables robots to master adaptively gait-constraining movements to overcome challenging terrains (see Supplemental video).
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
