Sage Journals: Discover world-class research

Abstract

As robots are required to conduct versatile manipulations in unstructured space environments, traditional planning and control strategies may become cumbersome or even infeasible. To overcome this challenge, the paper presents imitation learning with inherent Lyapunov stability (IL2S), a novel framework for jointly learning the dynamical system and accompanying Lyapunov function from demonstrations. We represent the robot motion policy as a nonlinear autonomous dynamical system that captures the invariant motion patterns from a handful of teaching examples. Furthermore, the elaborate neural networks are leveraged to simultaneously learn the motion model and the parametric control function, whereby the generated movements closely follow the demonstrations, ultimately converge to the target, and instantly respond to unanticipated changes. Our approach yields resembled trajectories on the handwriting dataset and is demonstrated extensively in real-world experiments, where the robot accomplishes two different satellite manipulation tasks, namely static grasping and dynamic docking.

Keywords

Imitation learning dynamical system neural network Lyapunov stability space manipulation

Introduction

The last decade has witnessed burgeoning progress in space manipulations encompassing docking, repairing, refueling, assembly and orbit cleanup, thanks to their notable economic potential and strategic benefits.^1,2 Accordingly, a variety of pertinent technologies have been advanced rapidly by the aerospace community all over the world.^3,4 The intricate nature of missions in harsh space environments poses critical challenges for the planning and control of robots remaining to be further developed.

The motion planning of robots is one of the research hotspots in the space manipulation field. In the case of a structured environment, preprogramming strategies are commonly applied. Such methods require experts to design motion trajectories ahead according to predefined tasks, then the trajectories are tested on the ground and stored in the on-board controller. When the space robot executes a task in orbit after deployment, it invokes the corresponding trajectory command upon the specific task. Using this method, the Shuttle Remote Manipulator System (SRMS) completed on-orbit maintenance and repair missions, such as the inspection of the space shuttle and the repositioning of the payload.⁵ To improve the adaptation to the dynamic environment, visual servo and other sensor-based guidance were used to adjust offline trajectories.⁶ For example, the Orbital Express spacecraft was equipped with different sensory devices to estimate the real-time state in proximity tasks.⁷ Additionally, teleoperation technology also plays a significant role in space robot control. It puts the operator directly into the control loop. The operator can utilize physical movements to give commands to directly control the robot, while perceiving the state information of the space robot through vision and force feedback. National Space Development Agency of Japan accomplished the teleoperation experiment based on predictive virtual environment technology on ETS-VII, realized the ground-space bilateral operation with a delay of 7 s, and successfully implemented the slope tracking and socket experiments.^8,9 Chen et al.¹⁰ addressed the issue of bilateral teleoperation control for the networked space robotic system subject to multiple physical constraints. Nevertheless, with the growing complexity and versatility of tasks performed by space manipulators, two main issues arise. First, existing control schemes are heavily dependent on manual programming, which is difficult to put into use efficiently without a large amount of expertise and experience. Second, traditional approaches provide a limited level of autonomy and adaptation to fulfill the requirements of complex operations.^11–13

Propelled by advances in artificial intelligence and cybernetics, some of these concerns mentioned above could be alleviated from an imitation learning perspective.^14–16 Imitation learning, commonly referred to as learning from demonstrations, is one of the most instinctive paradigms to learn skill model capturing motion patterns from a collection of examples.^17–19 It has been a thriving theme for decades and three main strands stand out in this domain, namely, dynamical-system-based methods which learn the underlying motion dynamics,^20–22 probabilistic approaches that aim to harness data variability and model uncertainty,^23–26 and more recently, neural networks which focus on generalization to unseen situations.^27–29 Dynamical-system-based methods have been shown to be particularly promising and potential in modeling robot motion, which reacts naturally to disturbances in the dynamic setting, instead of stubbornly mimicking the teaching examples.^30,31 However, designing a fictitious system is not trivial given that the estimated dynamics demands furnishing a stabilizing certificate enabling the robot to ultimately reach the desired goal from any initial points. As a result, there is a body of research devoted to learning dynamical system from demonstrations. The pioneering work was Dynamic Movement Primitive (DMP) which estimated a time-varying dynamics for a wide spectrum of robotic skills, ranging from walking, pouring, to grasping.^32–34 Stable Estimator of Dynamical Systems (SEDS) learned a stable nonlinear autonomous dynamical system from human demonstrations using the Gaussian mixture model and the quadratic Lyapunov function.²⁰ A principal deficiency of SEDS is that the quadratic stability constraint is too restrictive to effectively mimic highly nonlinear trajectories. To enhance the precision, a more universal task-oriented energy function was estimated using constrained optimization from a few demonstrations.³⁵ Figueroa and Billard.³⁶ presented an alternative approach to derive an asymptotically stable model by the linear parameter varying reformulation. To address the “accuracy versus stability” dilemma, the diffeomorphism often modeled by a neural network was introduced to learn the stable dynamical system on a latent space.^37–40 Despite making extensive advancements, several challenges still need yet to be resolved: encoding and reproduction of the full robot pose,^40,41 adaptation to unexpected changes,⁴² handling sequential or dynamic tasks,^43,44 and among others.

Inspired by the preceding discussion, the paper resorts to the capabilities of deep learning and imitation learning to synthesize a motion policy from human demonstrations. Our work follows the line of dynamical-system-based learning from demonstrations. This approach exhibits desirable properties that enable the system to react swiftly to uncertainties in dynamic settings while guaranteeing that the resulting movements converge to the attractor. The core contributions of this study revolve around the following three axes:

(i) A data-driven modular motion planning framework is investigated that encodes the robotic manipulator movements into a nonlinear autonomous dynamical system at the kinematic level, dispensing with tedious hardcoding of robot skills;

(ii) In contrast to previous approaches that enforce stability via imposing predefined control constraints, we provide a more general paradigm to concurrently learn a dynamical system and corresponding Lyapunov function from demonstrations;

(iii) The efficacy of the proposed scheme is evaluated on the handwriting dataset as well as demonstrations collected from a real-world robotic system, demonstrating that our approach can accomplish the satellite manipulation tasks of static grasping and dynamic docking.

The remainder of this article is organized as follows. The second section formulates the dynamical system to produce robot movements. Then, the mechanism to jointly learn the dynamical system and Lyapunov function from demonstrations is presented. The fourth section provides the results of simulated and real-world robot experiments. Finally, this paper is completed with conclusions.

Problem formulation

Assume that the motion of the robotic manipulator is characterized by a first-order autonomous ordinary differential equation, that is, time-invariant dynamical system

\overset{\cdot}{ξ} = f (ξ)

(1)

where $ξ \in R^{d}$ is the position and orientation of the manipulator end-effector in Cartesian space, $f : R^{d} \to R^{d}$ is a continuous and differentiable function with a single equilibrium state ${\overset{\cdot}{ξ}}^{*} = f (ξ^{*}) = 0$ , which is also called the target or attractor. Without loss of generality, the attractor can be designated as the origin of the reference frame, i.e. $ξ^{*} = 0$ . Hence, the motion evolves in this frame. The training data is demonstrations $D = {ξ_{t, n}, {\overset{\cdot}{ξ}}_{t, n}}_{t = 0, n = 1}^{T^{n}, N}$ collected by kinesthetic teaching, where $ξ_{t, n}$ is the state of the manipulator at time $t$ of the $n$ th teaching trajectory, $N$ is the number of examples, and $T^{n}$ is the length of the $n$ th demonstration. Formally, the objective we consider in this study is declared as follows.

Given a set of demonstrations $D$ , derive an estimate of dynamics $f$ , so that it produces robot movements that 1) resemble at best the demonstrated behavior, and 2) reach the target ultimately even when perturbed. The first requirement is fundamental since it presents a straightforward strategy for manipulators to mimic human motions. The latter is critical since it endows robots with the certificate to accomplish the task as long as the goal is reachable and reactivity to unknown disturbances, which is called “stability” in the control theory.

The representative approach to achieve the aforementioned objective is in two steps: A regression algorithm is first employed to learn an original dynamical system, followed by the addition of an auxiliary control term based on Lyapunov theory to ensure system stability, as exemplified by methods such as NEUM-GPR²⁶ and CLF-GMR³⁵. However, existing approachesstill exhibit multiple limitations in terms of model structure, stability guarantees, and generalization capabilities. Most works implicitly assume that robot trajectories adhere to a Gaussian distribution, which to some extent constrains the reproduction accuracy and generalization ability of the learned model. Although data-driven energy functions are utilized to enhance model precision, the essential properties, namely unique minimum, positive definiteness, and continuous differentiability, are not fully guaranteed. Moreover, previous imitation learning approaches rarely take into consideration the dynamic nature of space manipulation tasks. Therefore, it is of significant importance to propose a sound imitation learning algorithm for these tasks to unify accuracy and flexibility.

In this article, we focus on overcoming the key challenges of leveraging dynamical-system-based imitation learning for space robotic manipulation. This study distinguishes itself from prior research in two key aspects. On the theoretical front, we adopt the neural network with a special structure to simultaneously learn the dynamical system and the Lyapunov function, thereby striking a balance between stability and generalization. On the practical front, we conduct ground-based simulation experiments for satellite static grasping and dynamic docking, substantially expanding the application scope of imitation learning.

Proposed approach

This section delineates a new approach to learning skill model from demonstrations, referred to as imitation learning with inherent Lyapunov stability (IL2S). Firstly, the compositional stable dynamical system is deduced by solving a constrained optimization problem. Then, harnessing the prominent fitting competence of neural networks, we estimate concurrently the movement dynamics and associated Lyapunov function, where the dynamics is inherently adherent to be stable satisfying the Lyapunov criteria. Finally, the exponential stability of the learned dynamical system is rigorously proven.

Stable dynamical-system-based robot motion

To clarify the narration, we begin the section with a lemma that presents an instructional criterion for analyzing the stability of a nonlinear dynamical system in the sequel.⁴⁵

Lemma 1. The dynamical system (1) is globally exponentially stable at the equilibrium point $ξ^{*}$ , if there exists a continuous and continuously differentiable Lyapunov candidate function $V (ξ) : R^{d} \to R$ such that

\overset{\cdot}{V} (ξ) \leq - α V (ξ); c_{1} {‖ ξ ‖}^{2} \leq V (ξ) \leq c_{2} {‖ ξ ‖}^{2}

(2)

for all $ξ \in R^{d} ∖ ξ^{*}$ with three positive constants $α, c_{1}, c_{2}$ .

The dynamical system $f (ξ)$ is composed of two terms: original learned dynamics $o (ξ)$ and additional corrected term $u (ξ)$

\overset{\cdot}{ξ} = f (ξ) = o (ξ) + u (ξ)

(3)

We will learn the nominal system $o (ξ)$ from demonstrations using neural networks in the next subsection. Given that plain deep learning approaches do not take into account the stability of dynamics during training, it is probably divergent or converges to a spurious target. To this end, the corrected command $u (ξ)$ is introduced to enforce that $f (ξ)$ has a component along the negative direction of the Lyapunov function gradient

\nabla V {(ξ)}^{⊤} f (ξ) \leq - α V (ξ)

(4)

for all $ξ \in R^{d} ∖ ξ^{*}$ , where $\nabla V (ξ) = \frac{\partial V (ξ)}{\partial ξ}$ .

Moreover, we intend to adjust the nominal estimated dynamics model $o (ξ)$ at least as possible such that it satisfies the stability condition (4), which leads to the formulation of the constrained optimization problem

\begin{matrix} \min_{u (ξ)} u {(ξ)}^{⊤} u (ξ) \\ s . t . \nabla V {(ξ)}^{⊤} (o (ξ) + u (ξ)) \leq - α V (ξ) \end{matrix}

(5)

It is found apparently that the issue (5) is convex and its analytic solution can be derived as

\begin{array}{l} u (ξ) = {\begin{matrix} 0, & if \nabla V {(ξ)}^{⊤} o (ξ) \leq - α V (ξ) \\ - \nabla V (ξ) \frac{\nabla V {(ξ)}^{⊤} o (ξ) + α V (ξ)}{{‖ \nabla V (ξ) ‖}^{2}}, & otherwise \end{matrix} \\ = - \nabla V (ξ) \frac{ReLU (\nabla V {(ξ)}^{⊤} o (ξ) + α V (ξ))}{{‖ \nabla V (ξ) ‖}^{2}} \end{array}

(6)

The stable dynamical system is eventually obtained as follows

\overset{\cdot}{ξ} = o (ξ) - \nabla V (ξ) \frac{ReLU (\nabla V {(ξ)}^{⊤} o (ξ) + α V (ξ))}{{‖ \nabla V (ξ) ‖}^{2}}

(7)

which implies that, for all $ξ \in R^{d} ∖ ξ^{*}$ , if the original dynamics $o (ξ)$ complies with the condition (4), it is returned with no modification. Otherwise, we modify it minimally so that the revised model $f (ξ)$ barely satisfies condition (4) with equality.

The schematic architecture of our proposed learning and control framework is illustrated in Figure 1, comprising two distinct operational modes: online execution (green/blue modules) and offline training (orange modules). In the offline learning phase, we employ kinesthetic demonstrations ${ξ_{t, n}, {\overset{\cdot}{ξ}}_{t, n}}_{t = 0, n = 1}^{T^{n}, N}$ to co-train two neural networks through our IL2S algorithm. The first network approximates the nominal motion dynamics $o (ξ)$ , while the second neural network estimates a Lyapunov function $V (ξ)$ that provides formal stability certificates. The training process ensures the physical plausibility of the learned dynamics while enforcing stability constraint through the Lyapunov-based corrective term, further details of which will be discussed in the upcoming subsection. During online operation, the framework executes the following procedures: At each sampling instant, the learned dynamics model first predicts the nominal system behavior. However, as theoretically established, the dynamics generated by $o (ξ)$ lacks the inherent stability guarantee. To ensure convergence to the target attractor despite perturbations, we introduce a stability correction term derived from the Lyapunov function gradient. This results in the desired stabilized command ${ξ, \overset{\cdot}{ξ}}_{des}$ computed through (7). Subsequently, the inverse kinematics module transforms these Cartesian space commands into joint space trajectories ${q, \overset{\cdot}{q}}_{des}$ . Finally, these reference signals drive the low-level PID controller that generates motor torques $τ$ for precise trajectory tracking in the space robotic system. Here, $q \in R^{n}$ and $\overset{\cdot}{q} \in R^{n}$ respectively denote the actual joint angles and velocities of the robotic manipulator. Notably, our framework achieves computational efficiency through: 1) the analytical formulation in (7) that avoids online optimization, and 2) the decoupled architecture that shifts computationally intensive training to the offline phase. This design enables real-time implementation using onboard computing resources, making it particularly suitable for space applications with strict computational constraints.

Figure 1.

The schematic diagram of the designed learning and control scheme.

Joint learning of dynamics and Lyapunov function from demonstrations

This subsection delves into the problem of how to determine the original dynamics $o (ξ)$ and control function $V (ξ)$ from human demonstrations. Choosing the appropriate tools is crucial to effective robot learning. Propelled by deep learning techniques, the work intends to exploit neural networks to learn a dynamical system with well-behaved properties in virtue of their substantial representation abilities. Unlike prior work such as Khansari-Zadeh et al.,²⁰ we do not enforce stability via imposing predefined inflexible constraints, but integrate it directly into the learned model.

The $o (ξ)$ could be represented by an unconstrained neural network with anarbitrary architecture. Specifically, this work selects the classical fully connected neural network (FCNN)

\begin{matrix} \begin{matrix} z_{o, i + 1} & = σ_{o, i} (W_{o, i} z_{o, i} + b_{o, i}), i = 0, \dots, m - 1 \\ o (ξ) & = z_{o, m} \end{matrix} \end{matrix}

(8)

where $z_{o, i}$ denotes the $i$ th layer input with $z_{o, 0} = ξ$ , $W_{o, i}$ is the network weight mapping from the $i$ th layer activation to the next layer, $b_{o, i}$ is the bias, and $σ_{o, i}$ is the nonlinear activation function, such as ReLU or its variants.

While the previous narration may appear to make the issue of learning a stable dynamical system apparent, the subtlety of this method is rooted in the parameterization of the control function $V$ . The qualification conditions for the Lyapunov function are threefold.

Firstly, as mentioned previously, $V$ should be devoid of non-zero local minima, as its time derivative remains consistently negative for any state except the equilibrium. As such, an input-convex neural network (ICNN)⁴⁶ is introduced to learn $V$ , which enforces the property that the output $g (ξ)$ is convex with respect to the input $ξ$ .

\begin{matrix} \begin{matrix} z_{g, 1} = σ_{g, 0} (W_{g, 0}^{(ξ)} ξ + b_{g, 0}) \\ z_{g, i + 1} = σ_{g, i} (W_{g, i}^{(z)} z_{g, i} + W_{g, i}^{(ξ)} ξ + b_{g, i}), i = 1, \dots, k - 1 \\ g (ξ) = z_{g, k} \end{matrix} \end{matrix}

(9)

where $z_{g, i}$ represents the layer $i$ th activation, $W_{g, i}^{(z)}$ is non-negative weight mapping $i$ th layer activation to the subsequent layer, $σ_{g, i}$ is convex, non-decreasing activation function, and $W_{g, i}^{(ξ)}$ and $b_{g, i}$ are the parameters analogous to these in (8).

It could be found that a remarkable characteristic of ICNN lies in the incorporation of “passthrough” layers which establish a direct connection between the input $ξ$ and the deeper hidden units. These layers are redundant in classical feedforward networks since preceding hidden units can always be mapped to the next units via identity mapping. Nonetheless, the restriction in ICNN that the weight $W_{g, i}^{(z)}$ is non-negative limits the potential of hidden layers that mirror the identity mapping, and thus the additional “passthrough” is encompassed explicitly. That is, although the non-negativity constraint of weight $W_{g, i}^{(z)}$ is somewhat restrictive, the network still retains its universal approximation capability owing to the flexibility of the $W_{g, i}^{(ξ)}$ and $b_{g, i}$ .

Furthermore, given that the Lyapunov function $V$ is positive definite, we formalize it based on (9) as follows

V (ξ) = σ_{g, k} (g (ξ) - g (0)) + η ‖ ξ ‖^{2}

(10)

where $g$ is the ICNN defined in (9), $η$ is a small positive constant, and $σ_{g, k}$ is a non-negative convex non-decreasing function with $σ_{g, k} (0) = 0$ . By this means, we have $V (ξ) > 0$ for $ξ \neq 0$ and $V (0) = 0$ , and thus the requirement for positive definiteness of the Lyapunov function is strictly satisfied.

Last but not least, the control Lyapunov function $V$ should be continuously differentiable. To this end, we take the improved ReLU activation smoothed by a quadratic function in the interval $[0, δ]$ as an alternative to the traditional one in (9) and (10).

\begin{matrix} \begin{matrix} σ (x) = {\begin{matrix} 0, & if x \leq 0 \\ \frac{x^{2}}{2 δ}, & if 0 < x < δ \\ x - \frac{δ}{2}, & otherwise \end{matrix} \end{matrix} \end{matrix}

(11)

where $δ$ is the tunable positive value, and $x$ is the activation input. Figure 2 presents a parametric analysis of smoothing effects, illustrating how increasing $δ$ enhances the smoothness of the modified activation function at the expense of greater deviation from the canonical ReLU formulation. This design establishes an explicit trade-off between continuous differentiability and nonlinear approximation capability.

Figure 2.

Smoothed ReLU activation.

Figure 3 illustrates the comprehensive structure for joint learning of the dynamical system and the Lyapunov function from demonstrations. This framework addresses two critical aspects of stable imitation learning:

1) Original dynamics learning: the FCNN is employed to approximate the original dynamical system. This network learns the intrinsic characteristics of the demonstration trajectories by minimizing reproduction error through supervised training. Crucially, the network captures the desired motion primitives while preserving the smoothness and feasibility constraints inherent to the task.

2) Stability-guaranteeing Lyapunov learning: parallel to dynamics learning, the ICNN is utilized to synthesize a valid Lyapunov function. Specifically, the ICNN’s enforced convexity with respect to the system state ensures the learned function satisfies the fundamental Lyapunov conditions. This enables the derivation of a corrective control term that actively modulates the nominal dynamics.

The synergistic operation of these components yields a provably stable control policy. While the FCNN generates target-conforming motions, the ICNN-derived correction term guarantees global asymptotic stability by constraining the system evolution to progressively decreasing level sets of the Lyapunov function. This joint learning approach simultaneously achieves performance optimization (handled by the FCNN) and stability assurance (enforced by the ICNN), thereby mitigating limitations inherent in end-to-end learning methods that lack rigorous stability guarantees.

Figure 3.

The schematic that illustrates the joint learning of stable dynamics and Lyapunov function from demonstrations.

Stability analysis

In the following, we analyze the stability of the constructed model rigorously via Lyapunov theory. The inherent stability of the learned dynamics is declared in Theorem 1.

Theorem 1. For the compound dynamical system (7), if the original model $o (ξ)$ in (8) and Lyapunov function $V$ in (10) are learned with bounded weight neural networks, then the dynamics of $f (ξ)$ is exponentially stable at equilibrium.

Proof. Let $ξ (t)$ be a motion generated by the learned dynamical system (7) with initial state $ξ (0) \in R^{n}$ . First, taking the time derivative of the control function $V$ along the system (7)

\frac{d}{dt} V (ξ (t)) = \nabla V {(ξ (t))}^{⊤} \frac{d}{dt} ξ (t) = \nabla V {(ξ (t))}^{⊤} f (ξ (t))

(12)

Considering (4), it is obtained as

\frac{d}{dt} V (ξ (t)) \leq - α V (ξ (t))

(13)

Integrating this inequality gives the upper bound as

V (ξ (t)) \leq V (ξ (0)) e^{- α t}

(14)

From the positive definite structure of $V$ in (10), we have

V (ξ (t)) \geq η ‖ ξ (t) ‖^{2}

(15)

Combining (14) and (15) yields

‖ ξ (t) ‖ \leq \sqrt{\frac{V (ξ (t))}{η}} \leq \sqrt{\frac{V (ξ (0))}{η}} e^{- \frac{α}{2} t}

(16)

Then, given the fact that the activation function $σ$ defined in (11) renders linear for large input values and quadratic in the neighborhood of the origin, $V (ξ (t))$ behaves linearly as $‖ ξ (t) ‖ \to \infty$ and is quadratic near zero, which implies that it can be upper bounded by quadratic term $ρ ‖ ξ (t) ‖^{2}$

V (ξ (t)) \leq ρ ‖ ξ (t) ‖^{2}

(17)

where $ρ > 0$ is a positive scalar.

Taking (16) and (17) together, it has

‖ ξ (t) ‖ \leq \sqrt{\frac{ρ}{η}} ‖ ξ (0) ‖ e^{- \frac{α}{2} t}

(18)

Therefore, the dynamical system exponentially converges to the equilibrium. Theorem 1 is proven.

Remark 1. It can be found in (18) that the parameter $α$ affects the convergence speed of the trajectory, while the parameter $η$ affects the convergence accuracy. The larger the value of $α$ , the faster the convergence to the target. The larger the value of $η$ , the higher the reaching accuracy. In addition, hyperparameters, such as neural network structure and learning rate, have a direct impact on the shape consistency between the generated motion and the demonstration. In practical implementation, the parameters should be traded off based on the specific demonstration and task requirement.

Experiments and results

This section demonstrates the capabilities of our approach through a series of experiments. We first validate the performance of the learned dynamical system in terms of stability and reproduction accuracy using the handwriting dataset. Subsequently, we conduct two satellite manipulation tasks: static grasping and dynamic docking.

Simulated experiments

The performance of the presented framework is first gauged upon the LASA handwriting dataset. This dataset contains 30 patterns of 2D human handwriting motions, each with seven similar demonstrations starting from different initial points and ending at the identical goal. For each motion case, the model is constructed with neural networks as described in (8) and (9). Specifically, let $o (ξ)$ be a 2-256-256-2 FCNN, and $V$ be a 2-128-128-1 ICNN with its $W_{g, i}^{(z)}$ weights put through a softplus unit to make them positive. The models are trained using a mini-batch gradient descent optimizer with Mean Square Error (MSE) loss. The learning rate is set as 0.01, the batch size is 64, and the number of epochs is 100. The designed parameters are selected as $α = 0.1$ , $η = 0.1$ and $δ = 0.005$ .

Figure 4(a) outlines the MSE loss evolution throughout the training of Gshape motion. It could be observed that the training loss remains in a downward trend and is almost convergent at about 80 epochs. The plots in Figure 4(b) illustrate the demonstrated Gshape trajectories with red lines, the learned dynamical system represented as a light black vector field, and the reproduced trajectories with dark black lines. These trajectories are generated by simulating the forward integration of the resulting dynamical system. The simulations start from the same points as those in the examples. As we can see, the presented IL2S learning system is capable of accurately reproducing the reference trajectories and encapsulating the inherent motion preferences of Gshape in a broad area. Furthermore, all generated trajectories starting from different initial points ultimately converge to the target, which manifests the stability of the learned dynamical system. Figure 5(a) indicates the reproduced trajectories have a similar velocity profile to the demonstrations, meaning that they follow the analogous dynamics.

Figure 4.

The learned results of Gshape motion: (a) Training loss evolution. (b) Learned dynamical system.

Figure 5.

The learned results of Gshape motion: (a) Velocity profile. (b) Norm of the corrected command. (c) Learned Lyapunov function.

Figure 5(b) showcases the norm of the corrected term for Gshape movement. It is evident that the corrected command is activated only on a small region to guarantee convergence to the goal, which implies that the learned Lyapunov function is, to a great extent, consistent with the demonstration trajectories. Moreover, activation of the corrected term in a fraction of trajectories demonstrates that $o (ξ)$ is unstable, and not utilizing the proposed strategy, the trajectories would have deviated from the goal. It should be noted that the reproduced trajectories and demonstration cases have similar patterns in spite of the additional corrected term. This interesting phenomenon is boiled down to the fact that the corrected term is retrieved from the learned Lyapunov function in accordance with demonstrations. Figure 5(c) shows the contour map of the Lyapunov function $V$ , which illustrates that the vector field produced by the dynamical system always points towards the lower-level sets of the Lyapunov function. Figure 6 displays the obtained results of six representative cases in the handwriting dataset: Pshape, Jshape, Lshape, Nshape, Leaf, and Sine. The trajectory colors follow the same convention as Figure 4(b). It could be observed that the reproductions are almost coincident with the demonstrated reference trajectories and ultimately reach the attractor. Overall, the resulting dynamical system possesses the capacity to trade off accuracy and stability.

Figure 6.

The learned dynamical system of six handwriting motion patterns.

To elucidate the critical role of enhanced ReLU activation (11) in the learning framework, we perform a comparison with the canonical counterpart. As demonstrated in Figure 7, although the standard ReLU enables closer trajectories alignment with expert demonstrations, it induces progressive divergence from the target in the end. This instability fundamentally originates from the standard ReLU’s non-differentiable characteristic at the origin, which violates the smoothness prerequisite for constructing a valid Lyapunov function. In sharp contrast, the improved ReLU formulation exhibits two critical advantages simultaneously: 1) preservation of demonstration-matching precision through adaptive gradient modulation, and 2) provable asymptotic stability via maintaining continuous differentiability across the operational domain. The empirical evidence coupled with the theoretical stability guarantee conclusively establishes the indispensability of the incorporated ReLU modification for achieving both trajectory-tracking accuracy and closed-loop convergence in the imitation learning framework.

Figure 7.

Learned dynamical system of Gshape using two configures of our method. (1) Canonical ReLU function. (2) Improved ReLU function.

To rigorously benchmark the superiority of the proposed method, we evaluate IL2S against two approaches: NEUM-GPR²⁶ and CLF-GMR³⁵. CLF-GMR integrates Gaussian mixture regression (GMR) for dynamical system modeling while enforcing stability through online trajectory correction regulated by a parameterized Lyapunov function, specifically constructed as a weighted combination of asymmetric quadratic functions learned from demonstration data. In contrast, NEUM-GPR employs Gaussian process regression (GPR) to initially learn the nominal dynamical system, and then augments the system with an online-optimized corrective term derived through constrained neural control Lyapunov framework to ensure stability. To ensure experimental comparability, we rigorously adhere to the parameter configuration protocols specified in the original implementations to achieve optimal performance. Detailed parameter settings are systematically documented in the respective publications.

Training time and inference time serve as primary metrics for evaluating computational efficiency. As indicated in Table 1, IL2S requires marginally longer offline training than NEUM-GPR and CLF-GMR, but achieves a millisecond-scale inference time, which is significantly faster than comparative methods. This improvement stems from IL2S’s explicit integration of Lyapunov stability constraints during offline training, thereby eliminating the computationally intensive online optimization inherent to NEUM-GPR and CLF-GMR. These results robustly demonstrate the real-time performance advantage of IL2S, which is suitable for dynamic manipulation tasks.

Table 1.

Training time and inference time of different imitation learning methods.

Methods	Training time (s)	Inference time (ms)
IL2S	$246$	$0.99$
NEUM-GPR	$127$	$11.25$
CLF-GMR	$33$	$9.93$

Two metrics, Root Mean Squared Error (RMSE) and Dynamic Time Warping Distance (DTWD), are utilized to quantitatively assess the contour similarity between the reproduction and demonstration. The comparative results of six motion patterns are shown in Figure 8. We find that the proposed IL2S can achieve favorable performance in terms of both RMSE and DTWD, which highlights the accuracy and flexibility of reproduction.

Figure 8.

Comparative results of the proposed IL2S against NEUM-GPR and CLF-GMR under two metrics with one standard deviation: (a) RMSE. (b) DTWD.

To systematically investigate the influence of parameters on the learned system behavior, we conduct a comprehensive ablation study based on the Pshape demonstration. Adhering to the single-variable principle, we rigorously control all system parameters except for the target variable according to the baseline specifications.

Figure 9(a) comparatively illustrates system behaviors under $η = 0.01$ and $η = 0.1$ . While both parameter settings exhibit stable convergence to the target as evidenced by the vector field, the reproduced trajectories reveal a notable disparity in tracking precision. The configuration $η = 0.1$ achieves superior reproduction accuracy compared to $η = 0.01$ , thus experimentally validating our theoretical prediction regarding the critical role of the parameter $η$ in reproduction accuracy.

Figure 9.

Ablation results for the Pshape demonstration. (a) Different values of $η$ . (b) Different structures of ICNN. (c) Different values of $α$ .

Figure 9(b) displays the generated systems under two different layers of ICNN. Crucially, the results demonstrate that the convexity property of the learned Lyapunov function remains invariant to the network depth, while the resulting dynamical systems exhibit comparable performance in terms of generalization capability, reproduction accuracy, and stability. These findings empirically establish that the fundamental stability characteristic and performance metrics remain robust against modifications in ICNN’s architectural complexity.

The velocity modulation effect of the parameter $α$ is demonstrated in Figure 9(c), where the contour map represents the speed magnitude. The setting $α = 1$ induces accelerated motion dynamics as theoretically anticipated, though at the cost of occasional trajectory deviations in high-curvature regions. This empirical observation suggests an inherent trade-off between convergence rate and reproduction accuracy, necessitating coordinated adjustment of $α$ to establish an optimal operating parameter for specific application requirements.

Robot experiments

To assess the designed control scheme in real settings, we implement two satellite manipulation tasks using a 6-DOF UR5e arm mounted with the Robotiq gripper: static grasping and dynamic docking.

As shown in Figure 10, the robot experimental platform consists of six components: core control computer, robotic manipulator, Robotiq gripper, satellite model, Vicon motion capture system, and network switch. The developed learning algorithm is executed on the core computer under the Ubuntu 22.04 environment. The Vicon capture system is leveraged to acquire the pose of the target in real time. The network switch establishes communication link between the control computer and other devices, ensuring the consistency of signal timing. For the UR5e robot, the application program interface can be invoked directly to realize the low-level control which uses the PID approach internally.

Figure 10.

The robotic experimental platform: (a) Hardware composition and connection. (b) Physical devices of satellite manipulation.

The demonstrations are collected from the kinesthetic teaching robot in gravity compensation mode. The data is collected at a frequency of $100$ Hz. For our task, we collect ten trajectories of successful demonstrations. Different trajectories may vary in length, and there is no need for alignment or dynamic time warping across the demonstrations. The raw trajectories contain only position and orientation measurements. After demonstration, the velocity data is then calculated via numerical differentiation. The smoother the pose and its velocity profiles, the smoother the learned dynamical system will be. Hence, we preprocess the raw demonstrated trajectories with a Savitsky-Golay filter to smooth them out.

The resulting dynamical system, which is estimated by neural networks, is applied to represent the trajectories of the end-effector position $ξ_{p} \in R^{3}$ and orientation $ξ_{o} \in R^{3}$ , that is, $ξ = [ξ_{p}; ξ_{o}]$ . Specifically, the orientation is expressed in the form of the scaled axis-angle, where the direction of the vector $ξ_{o}$ denotes the axis of rotation, and $‖ ξ_{o} ‖$ denotes the angle of rotation. In addition, let $o (ξ)$ be a 6-256-256-6 FCNN, and $V$ be a 6-128-128-1 ICNN with its $W_{g, i}^{(z)}$ weights put through a softplus module. The optimization algorithm and parameters for this experiment are the same as those in simulated handwriting motion.

In the first scenario, the robotic arm is tasked with autonomously reaching and grasping a static satellite. Figure 11 presents the generated end-effector trajectories aimed at eight spatially distributed positions within the workspace. As anticipated, these trajectories preserve the characteristics of the human demonstration while guaranteeing asymptotic convergence to the target. Sequential snapshots in Figure 12 document the complete grasping procedure, highlighting the excellent capability to achieve the predefined target position and orientation during the final approach phase.

Figure 11.

Reproduced trajectories starting from different points.

Figure 12.

Snapshots of the static grasping.

To rigorously evaluate the real-time adaptability of IL2S, we introduce an unexpected target perturbation during the task execution. Specifically, at t = 1s after the initiation of the movement, the target satellite is physically displaced to a new location. Figure 13 demonstrates the response of the dynamical system: a new trajectory is regenerated online, simultaneously satisfying physical constraints. Critically, this adaptation leverages our online framework without recomputing the fundamental motion primitives. The replanned path successfully intercepts the displaced target with high positional and angular accuracy, validating the instantaneous adaptation capability of the presented method.

Figure 13.

On-the-fly adaptation to a change in the target for the static grasping task.

Collectively, this scenario demonstrates the efficacy of our method in high-dimensional manipulation spaces: precise kinematic coordination is achieved through 6-DoF pose control. The demonstrated performance validates exceptional stability and robustness when operating in unstructured environments, confirming the capacity of IL2S to handle real-world uncertainties.

The second task involves manipulating the robotic arm to enable the grasped satellite to track and dock with a second, dynamically moving satellite. To accurately acquire the real-time position and orientation of this dynamic target, we affix eight optical markers to its surface. These markers are continuously tracked by a Vicon motion capture system, which provides high-fidelity pose estimation at a sufficient update rate. The captured pose data is then streamed in real time to the robotic controller, forming a closed-loop feedback system essential for dynamic tracking. Snapshots documenting the progression of this dynamic docking experiment are presented in Figure 14(a), visually illustrating the relative alignment of both satellites during the maneuver. Figure 14(b) depicts the motion trajectories of the target satellite and the robotic system in the $ξ_{1}$ - $ξ_{3}$ plane. It is evident that the target satellite’s trajectory exhibits pronounced nonlinearity. Employing the IL2S framework, the robotic arm achieves accurate tracking of the target satellite, thus accomplishing the dynamic docking task. This demonstrates the effectiveness of the proposed learning framework for dynamic manipulation tasks.

Figure 14.

The task of dynamic docking: (a) Snapshots of the dynamic docking. (b) Tracking trajectories in the $ξ_{1}$ - $ξ_{3}$ plane. (c) Position tracking. (d) Orientation tracking.

Figure 14(c) and (d) provide quantitative analyses of the docking process, depicting the time histories of both the robotic end-effector pose and the dynamic satellite pose, all expressed relative to the robotic base frame. Analysis of these curves reveals a clear trend: under the control of the proposed method, the relative pose error (comprising both positional displacement and angular misalignment) between the two satellites exhibits a consistent decrease throughout the tracking phase. Obviously, this error converges and stabilizes at a low magnitude, confirming successful and stable docking achievement and maintenance. These results substantiate the robust adaptation and reactivity of the proposed IL2S method when operating in challenging dynamic environments characterized by moving targets. The system’s ability to replan trajectories online and precisely execute the docking maneuver demonstrates its effectiveness for complex, time-critical manipulation tasks.

To validate the reliability of the proposed method in practical implementation, we test the system performance under position measurement noise. As shown in Figure 15, despite the noise in the target position, the robotic system achieves successful tracking of the target satellite trajectory within approximately 3 s using the IL2S scheme, whilst maintaining a smooth trajectory. This ensures the successful completion of the dynamic docking task, demonstrating the strong robustness of the IL2S scheme against measurement noise.

Figure 15.

Tracking performance under position measurement noise.

Conclusions

In this paper, an imitation learning framework named IL2S is proposed for motion planning and control of robotic manipulation tasks. Firstly, we model the robot movement as a first-order autonomous dynamical system which consists of original dynamics and additional stabilizing term. Based on the demonstrations collected from kinesthetic teaching, the dynamical system and control Lyapunov function are learned using the fully connected neural network and input-convex neural network, respectively. The neural parameterized Lyapunov function with the property of unique minimum, positive definite and continuously differentiable serves as a certificate for the learned dynamical system to converge to the target. We demonstrate that the proposed framework is able to learn the stable dynamical system exhibiting the complex vector field on handwriting motion. Furthermore, we evaluate the methods in two different satellite manipulation scenarios, namely static grasping and dynamic docking. The experiment results elucidate the provided strategy can complete the tasks with regard to high reproduction accuracy, convergence to the target, and accommodation to unexpected changes. As future work, we will focus on how the learned dynamical system can be used in conjunction with force to allow compliant control when in contact with dynamic objects.

Footnotes

ORCID iD

Yalu Su

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Flores-Abad

Pham

, et al. A review of space robotics technologies for on-orbit servicing. Prog Aerosp Sci 2014; 68: 1–26.

Moghaddam

Chhabra

On the guidance, navigation and control of in-orbit space robotic missions: A survey and prospective vision. Acta Astronaut 2021; 184: 70–100.

Zheng

Tian

, et al. Recent advances in automatic modulation classification technology: Methods, results, and prospects. Int J Intell Syst 2025; 2025(1): 4067323.

Cheng

Liu

, et al. On-orbit service (OOS) of spacecraft: A review of engineering developments. Prog Aerosp Sci 2019; 108: 32–120.

Jorgensen

Bains

SRMS history, evolution and lessons learned. In AIAA SPACE 2011 Conference & Exposition. AIAA, 2011, p. 7277.

Sabatini

Monti

Gasbarri

, et al. Deployable space manipulator commanded by means of visual-based guidance and navigation. Acta Astronaut 2013; 83: 27–43.

LeCroy

Hallmark

Scott

, et al. Comparison of navigation solutions for autonomous spacecraft from multiple sensor systems. In Sensors and Systems for Space Applications II. SPIE, 2008, pp. 88–99.

Imaida

Yokokohji

Doi

, et al. Ground-space bilateral teleoperation of ETS-VII robot arm by direct bilateral coupling under 7-s time delay condition. IEEE Trans Robot Automat 2004; 20(3): 499–511.

Nishida

Oda

, et al. Space telerobot experiment system based on NASDA ETS-VII satellite. In Guidance, Navigation, and Control Conference. American Institute of Aeronautics and Astronautics, 1997, pp. 543–551.

10.

Chen

Huang

Liu

Mode switching-based symmetric predictive control mechanism for networked teleoperation space robot system. IEEE/ASME Trans Mechatron 2019; 24(6): 2706–2717.

11.

Santos

Rade

da Fonseca

IM.

A machine learning strategy for optimal path planning of space robotic manipulator in on-orbit servicing. Acta Astronaut 2022; 191: 41–54.

12.

Cao

Wang

Zheng

, et al. Reinforcement learning with prior policy guidance for motion planning of dual-arm free-floating space robot. Aerosp Sci Technol 2023; 136: 108098.

13.

Jiang

, et al. Autonomous planning and control strategy for space manipulators with dynamics uncertainty based on learning from demonstrations. Sci China-Technol Sci 2021; 64(12): 2662–2675.

14.

Zheng

Tian

, et al. Robust automatic modulation classification using asymmetric trilinear attention net with noisy activation function. Eng Appl Artif Intell 2025; 141: 109861.

15.

Pérez-Dattari

Kober

Stable motion primitives via imitation and contrastive learning. IEEE Trans Robot 2023; 39(5): 3909–3928.

16.

Dawson

Gao

Fan

Safe control with learned certificates: A survey of neural Lyapunov, barrier, and contraction methods for robotics and control. IEEE Trans Robot 2023; 39(3): 1749–1767.

17.

Abu-Dakka

Chen

, et al. Fusion dynamical systems with machine learning in imitation learning: A comprehensive overview. Inf Fusion 2024; 108: 102379.

18.

Ravichandar

Polydoros

Chernova

, et al. Recent advances in robot learning from demonstration. Annu Rev Contr Robot Autonom Syst 2020; 3: 297–330.

19.

Osa

Pajarinen

Neumann

, et al. An algorithmic perspective on imitation learning. Found Trends Robot 2018; 7(1–2): 1–179.

20.

Khansari-Zadeh

Billard

Learning stable nonlinear dynamical systems with Gaussian mixture models. IEEE Trans Robot 2011; 27(5): 943–957.

21.

Zhang

Cheng

, et al. Learning accurate and stable point-to-point motions: A dynamic system approach. IEEE Robot Autom Lett 2022; 7(2): 1510–1517.

22.

Figueroa

Billard

Locally active globally stable dynamical systems: Theory, learning, and experiments. Int J Robot Res 2022; 41(3): 312–347.

23.

Calinon

A tutorial on task-parameterized movement learning and retrieval. Intell Serv Robot 2016; 9: 1–29.

24.

Paraschos

Daniel

Peters

, et al. Using probabilistic movement primitives in robotics. Auton Robot 2018; 42: 529–551.

25.

Huang

Abu-Dakka

Silvério

, et al. Toward orientation learning and adaptation in cartesian space. IEEE Trans Robot 2020; 37(1): 82–98.

26.

Jin

Liu

, et al. Learning a flexible neural energy function with a unique minimum for globally stable and accurate demonstration learning. IEEE Trans Robot 2023; 39(6): 4520–4538.

27.

Zhang

Cheng

Cao

, et al. A neural network based framework for variable impedance skills learning from demonstrations. Robot Auton Syst 2023; 160: 104312.

28.

Manek

Kolter

JZ.

Learning stable deep dynamics models. In International Conference on Neural Information Processing Systems. 2019, pp. 11128–11136.

29.

Zhi

Lai

Ott

, et al. Learning efficient and robust ordinary differential equations via invertible neural networks. In International Conference on Machine Learning. PMLR, 2022, pp. 27060–27074.

30.

Gribovskaya

Khansari-Zadeh

Billard

Learning non-linear multivariate dynamics of motion in robotic manipulators. Int J Robot Res 2011; 30(1): 80–117.

31.

Billard

Mirrazavi

Figueroa

Learning for Adaptive and Reactive Robot Control: A Dynamical Systems Approach. MIT Press, 2022.

32.

Schaal

. Dynamic movement primitives - a framework for motor control in humans and humanoid robotics. In: Kimura

Tsuchiya

Ishiguro

, et al. (eds) Adaptive Motion of Animals and Machines. Springer, 2006. pp. 261–280.

33.

Ude

Gams

Asfour

, et al. Task-specific generalization of discrete and periodic dynamic movement primitives. IEEE Trans Robot 2010; 26(5): 800–815.

34.

Ijspeert

Nakanishi

Hoffmann

, et al. Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Comput 2013; 25(2): 328–373.

35.

Khansari-Zadeh

Billard

Learning control Lyapunov function to ensure stability of dynamical system-based robot reaching motions. Robot Auton Syst 2014; 62: 752–765.

36.

Figueroa

Billard

A physically-consistent Bayesian non-parametric mixture model for dynamical system learning. In Conference on Robot Learning. PMLR, 2018, pp. 927–946.

37.

Neumann

Steil

JJ.

Learning robot motions with stable dynamical systems under diffeomorphic transformations. Robot Auton Syst 2015; 70: 1–15.

38.

Perrin

Schlehuber-Caissier

Fast diffeomorphic matching to learn globally asymptotically stable nonlinear dynamical systems. Syst Control Lett 2016; 96: 51–59.

39.

Rana

Fox

, et al. Euclideanizing flows: Diffeomorphic reduction for learning stable dynamical systems. In Learning for Dynamics and Control. PMLR, 2020, pp. 630–639.

40.

Zhang

Mohammadi

Rozo

Learning Riemannian stable dynamical systems via diffeomorphisms. In Conference on Robot Learning. PMLR, 2022, pp. 1211–1221.

41.

Urain

Tateo

Peters

Learning stable vector fields on lie groups. IEEE Robot Autom Lett 2022; 7(4): 12569–12576.

42.

Beik-Mohammadi

Hauberg

Arvanitidis

, et al. Reactive motion generation on learned Riemannian manifolds. Int J Robot Res 2023; 42(10): 729–754.

43.

Salehian

SSM

Khoramshahi

Billard

. A dynamical system approach for softly catching a flying object: Theory and experiment. IEEE Trans Robot 2016; 32(2): 462–471.

44.

Auddy

Hollenstein

Saveriano

, et al. Continual learning from demonstration of robotics skills. Robot Auton Syst 2023; 165: 104427.

45.

Khalil

HK.

Nonlinear Systems. 3rd ed. Prentice Hall, 2002.

46.

Amos

Kolter

JZ.

Input convex neural networks. In International Conference on Machine Learning. PMLR, 2017, pp. 146–155.

An imitation learning framework with inherent stability for motion planning and its application to space manipulation tasks

Abstract

Keywords

Introduction

Problem formulation

Proposed approach

Stable dynamical-system-based robot motion

Joint learning of dynamics and Lyapunov function from demonstrations

Stability analysis

Experiments and results

Simulated experiments

Robot experiments

Conclusions

Footnotes

ORCID iD

Funding

Declaration of conflicting interests

References