Sage Journals: Discover world-class research

Abstract

Real-world object manipulation has been commonly challenged by physical uncertainties and perception limitations. Being an effective strategy, while caging configuration-based manipulation frameworks have successfully provided robust solutions, they are not broadly applicable due to their strict requirements on the availability of multiple robots, widely distributed contacts, or specific geometries of robots or objects. Building upon previous sensorless manipulation ideas and uncertainty handling approaches, this work proposes a novel framework termed Caging in Time to allow caging configurations to be formed even with one robot engaged in a task. This concept leverages the insight that while caging requires constraining the object’s motion, only part of the cage actively contacts the object at any moment. As such, by strategically switching the end-effector configuration and collapsing it in time, we form a cage with its necessary portion active whenever needed. We instantiate our approach on challenging quasi-static and dynamic manipulation tasks, showing that Caging in Time can be achieved in general cage formulations including geometry-based and energy-based cages. With extensive experiments, we show robust and accurate manipulation, in an open-loop manner, without requiring detailed knowledge of the object geometry or physical properties, or real-time accurate feedback on the manipulation states. In addition to being an effective and robust open-loop manipulation solution, Caging in Time can be a supplementary strategy to other manipulation systems affected by uncertain or limited robot perception.

Keywords

Planning under uncertainties manipulation planning planning and simulation path planning for manipulators

1. Introduction

Object manipulation is a fundamental ability for robots to engage themselves in tasks where physical interactions are expected (Billard and Kragic, 2019). While research in this field has seen significant advancements with data-driven frameworks and sensing-enhanced systems (Kaelbling, 2020; Lee et al., 2019; Yuan et al., 2017), we still do not see many robot systems working robustly around us. Among others, two major reasons are typically observed for this challenge: 1) real-world physical uncertainties are significantly more complex than any lab environment, often rendering certain modeling assumptions invalid (Rodriguez, 2021) and 2) perception or sensing is almost never good enough for real-world robot deployments. Gaps between modeling assumptions and reality, domain variations, and random occlusions often worsen the perception performance, or even cause a perception system to completely lose track of target objects (Bohg et al., 2017). As such, approaches fully relying on closed-loop control for robot manipulation have been constantly challenged in real-world applications.

Unlike approaches that focus on contacts, caging configuration-based manipulation has provided a novel paradigm to significantly reduce perception requirements. More importantly, although it does not aim at accurate control, caging configurations can robustly work by fully ignoring the effect of physical uncertainties, such as seen in grasping, in-hand manipulation, and multi-robot coordination tasks (Bircher et al., 2021; Rodriguez et al., 2012; Song et al., 2021). Concretely, a caging configuration aims at completely constraining all possible configurations of the target object within a known region (cage). The target object is manipulated as the cage moves or deforms, while the configuration of the target object is guaranteed to follow the cage to complete the task (Wang et al., 2005). However, it has been a challenge to apply this idea to general manipulation tasks, since forming a caging configuration has very strict requirements on the hardware, such as multi-agent coordination, widely distributed contacts, or specific geometries of the robot or the object to construct such configurations (Makita and Wan, 2018).

This work proposes a novel concept, termed Caging in Time, to extend caging configuration-based manipulation to more general problems without hardware-specific assumptions. Example applications are shown in Figure 1. The high-level idea of this framework can be explained as follows. In an extreme situation, let us assume we have an object to manipulate and a robot is able to make an infinite number of contacts everywhere on the object. As such, the object is fully caged, and arbitrary manipulation can be achieved by moving all contacts simultaneously. In another extreme situation, let us assume a robot can make one contact with the object at a time, but it is able to switch the contact to other locations infinitely fast so that virtually there are contacts everywhere on the object. Equivalently to the former case, arbitrary manipulation can also be achieved with this virtual cage. Our idea of Caging in Time exploits the possibilities between these two extremes: We assume a robot can make one or a few contacts at a time, and it can switch to other contacts fast enough as needed so that, in time, it makes a cage. This idea is visualized in Figure 2 via a planar pushing task, where an object is pushed by a virtual cage (gray) with multiple bars through a circular trajectory. Note that, physically and in time, only one bar (red) is effectively needed at a time to complete the task. Furthermore, by collapsing the configurations of the effective bars through time, a cage is formed and can be unrolled in time to complete the task as if a complete cage has always been there.

Figure 1.

Example object manipulation tasks via Caging in Time. Top: Object planar pushing to trace “RICE.” Without sensing feedback, unknown objects were randomly replaced during the manipulation process. The recordings were taken at different times and concatenated to show all objects. Bottom: Ball catching on a flat end-effector without any sensing feedback.

Figure 2.

The theory of Caging in Time visualized through an example planar pushing task. I: A virtual cage, formed by line-shaped bars (gray), robustly pushes an object through a circular path. II: Along the time dimension, only one bar (red) is effectively making contacts at a time, while other bars seem to be unnecessary. III: If we collapse all configurations of the effective bars (red) through time, a cage is formed, which can be unrolled into time to achieve the task in I with only one bar at a time.

We instantiated the Caging in Time theory on both quasi-static and dynamic manipulation tasks with a real robot. It is worthwhile to note that, the example instantiations showed that Caging in Time can be applied on general cage formulations, including geometry-based and energy-based cages. Without any sensing feedback, Caging in Time showed guaranteed task success on all experimented tasks, even when the manipulation is physically affected by in-task perturbations and unknown object shape variations. In comparison with a baseline closed-loop control approach, we show that our framework is similarly accurate, while being significantly more robust against perception uncertainties, as enabled by zero reliance on precise sensing feedback.

Contributions: The proposed Caging in Time concept makes three contributions: 1) providing a planning paradigm for robust robot manipulation that can significantly mitigate the effect of perception uncertainties; 2) broadening the traditional caging configuration-based manipulation to a more general manipulation framework as enabled by strategic sequential robot motions; and 3) offering an option for manipulation without relying on sensing feedback to support robust manipulation in various real-world tasks, especially in scenarios where perception is not reliable.

Limitations: The work reported in this paper is the first step towards general applications of the Caging in Time concept. With an emphasis on the derivation of the foundations of the theory, this work does not develop algorithms for addressing general manipulation problems via Caging in Time skills. Specifically, we identify the following major limitations of the current scope of this work and state them before the details of the proposed theory: 1) as example instantiations of the proposed theory, the reported manipulation algorithms were designed for implementing Caging in Time on specific tasks only; 2) manipulation physics or dynamics models are analytically derived for the purpose of explicitly verifying the theory, although simulation or learning-based models can be more efficient; and 3) this work mainly handles perception uncertainties while uncertainties in action execution and certain environmental interactions are not considered. Nevertheless, we underscore that the proposed Caging in Time framework is a complete theory for general manipulation problems as discussed in Section 4 and Section 8.

2. Related works

2.1. Perception assumptions in manipulation

Contact properties, geometric and physical properties of objects, and perfect perception are commonly presumed when modeling manipulation systems (Bütepage et al., 2019; Suomalainen et al., 2022). Alternatively, recent learning-based approaches instead assume training data adequately covers all relevant task variation domains, with perception systems matching those in training environments, for example, similar camera poses (Andrychowicz et al., 2020; Kaelbling, 2020; Kroemer et al., 2021; Lee et al., 2019). Meanwhile, interactive perception has significantly reduced the requirements for certain prior knowledge and direct perception (Bohg et al., 2017; Hang et al., 2021), while still assuming reliable perception through some channels for iterative state estimation. Despite demonstrating great performance in the lab, the robustness of many manipulation approaches typically faces significant challenges from unreliable perception in the real world, such as tracking noise, signal latency, and occlusions.

In early works, uncertainty was discussed in object pushing through analytical modeling (Akella and Mason, 1992; Lynch, 1999; Lynch and Mason, 1995, 1996), enabling robust planar pushing despite uncertainties. While these approaches handled some uncertainties in object–surface interactions, they still required precise object geometry and sensor feedback. Meanwhile, sensorless manipulation (Akella et al., 1997; Erdmann and Mason, 1988; Goldberg, 1993) provided inspiration for addressing manipulation without any feedback, with applications in orienting planar parts without sensors (Akella and Mason, 1998; Bohringer et al., 2000). However, these approaches were specifically limited to parts orienting, rather than general manipulation tasks. Later works on motion cones (Chavan-Dafle et al., 2020) and in-hand manipulation (Bhatt et al., 2021; Holladay et al., 2015) extended these concepts beyond pushing.

Notably, a recent work in robust pushing shares a similar idea with our work through variance-constrained optimization based on belief dynamics, generating stable pushing trajectories also in open-loop (Jankowski et al., 2025). While theoretically rigorous, this approach remains confined to quasi-static pushing scenarios, without addressing its potential applications in broader manipulation contexts.

Therefore, a comprehensive framework that can robustly handle uncertainty even without requiring feedback and apply across diverse manipulation tasks is valuable. Different from most existing works, Caging in Time aims to use caging configurations along the time dimension to eliminate the reliance on, hence the negative effect of, unreliable perception. Importantly, our work is not intended to replace existing planning or control methods, but rather to complement them as a supplementary approach to enhance manipulation robustness under unreliable perception.

2.2. Caging configuration-based manipulation

Caging configuration initially emerged as a concept for multi-robot systems in SE(2) to constrain the motion of a target object, enabling robust object transportation (Pereira et al., 2004; Sudsang and Ponce, 2000; Wang et al., 2005), or even herding mobile agents (Song et al., 2021). This concept was subsequently extended to planar grasping, with the objective of restricting the pose of a target object within a confined region (Rimon and Blake, 1999; Rodriguez et al., 2012; Stork et al., 2013a; Varava et al., 2016), sometimes even with partially observable object geometries (Zarubin et al., 2013), as well as hooking and latching techniques developed through topological analysis (Stork et al., 2013b). Recently, in-hand manipulation has been enhanced through caging configurations where objects are manipulated via controlled states within deformable cages (Bircher et al., 2021; Komiyama and Maeda, 2021). Besides algorithmic advances, the caging concept has also inspired specialized end-effector designs that enhance manipulation through physical embodiment (Dong et al., 2025; Xu et al., 2024). However, the most fundamental limitation of traditional caging is that multiple robots (ZhiDong Wang and Kumar, 2002), widely distributed contacts (Rodriguez et al., 2012), or specific geometries of the robots or objects (Varava et al., 2016) are required, making caging a concept not applicable to more general setups.

Beyond complete geometric caging, extended formulations have been proposed including energy-bounded caging to incorporate external forces like gravity (Mahler et al., 2016, 2018), and partial caging to relax full enclosure requirements (Varava et al., 2016, 2019)—both demonstrating that complete caging becomes unnecessary with environmental force assistance, broadening the application range for caging. These advancements necessitated practical metrics for escape probability evaluation (Stork et al., 2013a; Varava et al., 2019), with sampling-based algorithms enabling quality assessment through escape path clearance analysis (Varava et al., 2020a, 2020b). Such quantification methods have evolved beyond traditional caging to evaluate diverse manipulation tasks including pushing, soft object interaction, and multi-object handling (Dong et al., 2023, 2024; Dong and Pokorny, 2024). Despite effectively extending traditional formulations, energy-based partial caging still demands specific robot-object geometry knowledge and maintains complete constraints at each discrete time point, preserving specific end-effector geometry requirements. While robustness prediction metrics have advanced for more general tasks, the fundamental challenges of autonomous caging-based planning and control remain largely unresolved.

Building upon unified geometric and energy-based caging principles, Caging in Time departs from the conventional understanding that a cage must be complete at every moment. Instead, we establish a paradigm where a cage can be considered complete when it achieves completeness across the space-time continuum, thus significantly relaxing hardware requirements and expanding the practical applications of caging-based manipulation. Additionally, traditional methods require very complex algorithms to verify caging configurations (Varava et al., 2021), which can be even harder for partial cages (Makita and Nagata, 2015). As will be seen with Caging in Time, since we can, in theory, virtually have our robots or contacts everywhere as needed, caging verification is generally simpler as we only need to make sure that the object does not penetrate the predefined cage boundaries.

2.3. Theories for uncertainty handling

Our approach draws inspiration from foundational theoretical frameworks that address uncertainties through diverse mathematical formulations across control theory, motion planning, and decision-making. Belief space planning transforms state estimation into probability distributions for decision-making under uncertainty (Kurniawati et al., 2008; Platt et al., 2010). POMDPs formalize partial observability by optimizing over belief states (Kaelbling et al., 1998; Silver and Veness, 2010). Contraction theory provides stability guarantees through analysis of convergence properties (Lohmiller and Slotine, 1998; Manchester and Slotine, 2017), while reachability analysis enables formal safety verification by computing attainable states (Althoff, 2010; Mitchell et al., 2005). LQR trees combine optimal control with sampling-based planning for stabilizing controllers (Majumdar and Tedrake, 2017; Tedrake et al., 2010), and set-based control ensures invariance properties for worst-case scenarios (Rakovic et al., 2005).

While all these approaches can effectively address uncertainties in different ways, they still rely on the assumption that perception feedback or certain prior geometric knowledge of the tasks is always, or at least partly, available. Inspired by these prior works and with an aim to address their limitations in contact-rich manipulation tasks, our proposed Caging in Time synthesizes their insights, including those in state representations and uncertainty-aware planning, from these fundamental theories into a framework that focuses on large perception uncertainties, in order to enable robust manipulation skills with practical real-world implementations.

3. Preliminaries

In this section, we first introduce notations in Section 3.1 for traditional caging definitions, and then in Section 3.2 extend the notations to more general task spaces to enable the derivation of our proposed Caging in Time framework.

3.1. Traditional caging as object closure

In the scenario of an object caged by multiple robots, we denote the configuration space (C-space) of an object by $C_{obj}$ and the configuration of an object (e.g., position and orientation) by $c_{obj} \in C_{obj}$ . The C-space and the configuration of the ith robot are denoted by $C_{i}$ and $c_{i} \in C_{i}$ , respectively. The object and all robots share a common workspace $W$ , where $A_{obj}, A_{i} \subset W$ are the geometries of the object and of the ith robot in the workspace. As shown in Figure 3, the free C-space of the object $C_{free} \subset C_{obj}$ can be defined as the set of all possible states where the object does not collide with any robot:

C_{free} ≔ \{c_{obj} \in C_{obj} ∣ \forall i : A_{obj} (c_{obj}) \cap A_{i} (c_{i}) = \emptyset\}

(1)

Figure 3.

An illustration of the workspace $W$ (left) and the C-space $C_{obj}$ (right) of a circular object surrounded by seven line robots. Left: The red line is the geometry of the ith robot $A_{i}$ and the solid gray circle is the geometry of the object $A_{obj}$ . Right: The red region is all the object’s configurations at which the object collides with a robot and the green region is the object’s free C-space $C_{free} \subset C_{obj}$ .

The traditional definition of caging, initially termed object closure, was first introduced by ZhiDong Wang and Kumar (2002) as a condition for an object to be trapped by robots. In other words, the caging condition is met when there is no viable path for the object to move from its current configuration to a configuration infinitely far away.

As depicted in Figure 4, given a point infinitely far away from the current c_obj in the C-space $c_{\infty} \in C_{obj}$ , two distinct sets $C_{free}^{obj}, C_{free}^{\infty} \subset C_{free}$ are defined as follows. $C_{free}^{obj}$ is the largest subset of $C_{free}$ within which every configuration c is reachable from the current configuration c_obj through a collision-free path; similarly, $C_{free}^{\infty}$ is the largest subset of $C_{free}$ within which every c can reach c_∞:

\begin{aligned} C_{free}^{obj} & = \{c \in C_{free} ∣ connected (c, c_{obj})\}, \\ C_{free}^{\infty} & = \{c \in C_{free} ∣ connected (c, c_{\infty})\} \end{aligned}

(2)

The caging condition is met if and only if:

\{\begin{cases} C_{free}^{obj} \neq \emptyset \\ C_{free}^{obj} \cap C_{free}^{\infty} = \emptyset \end{cases}

(3)

where the first criterion ensures the existence of a caging configuration and the second condition guarantees that no feasible path connects

C_{free}^{obj}

and

C_{free}^{\infty}

. This traditional concept of caging serves as the foundation for our proposed Caging in Time framework.

Figure 4.

An illustration of the traditional caging condition. Left: The green region and the red region represent $C_{free}^{obj}$ and $C_{free}^{\infty}$ , respectively. The object is caged in this case since $C_{free}^{obj}$ and $C_{free}^{\infty}$ are not connected. Right: A non-caging scenario where $C_{free}^{obj}$ and $C_{free}^{\infty}$ become connected. The object may escape following the trajectory shown by the dashed line.

3.2. Generalized representations

To enable the derivation of the Caging in Time theory, we now generalize the representations and notations from configuration spaces to state spaces. This allows for more comprehensive descriptions of manipulation tasks and accommodates various dynamic scenarios and uncertainties, as opposed to the traditional quasi-static configuration-based caging definitions.

We denote the state space of an object by $S_{obj}$ , which encompasses a wider range of properties than $C_{obj}$ . The state of an object at time t is represented by $q_{t} \in S_{obj}$ . Depending on the specific manipulation task, q_t may include not only configuration (e.g., position and orientation) but also velocity, acceleration, or other relevant properties.

Furthermore, often in real-world systems, the state of the object at time t is not exactly known due to perception uncertainties or modeling simplifications. To account for such uncertainties, we define the Potential State Set (PSS) of an object to be the set consisting of all possible states of the object at time t, denoted by $Q_{t} \subset S_{obj}$ .

4. Caging in Time

In this section, we will formally introduce the definition of our proposed Caging in Time theory. With as few as only a single robot interacting with the object, Caging in Time verifies that the object’s state remains caged while being manipulated, thereby enabling open-loop manipulation and guaranteeing the desired movement of the object. As an essential component of our theory, we need to predict the bounded motions of the object, which involves propagating the PSS over time, as will be introduced in Section 4.1. Then, we will formally define Caging in Time in Section 4.2.

For more intuitive illustrations of the proposed concepts, figures in this section are sketched in 2D spaces. However, it is important to emphasize that q_t can take any form and dimension as required by the specific manipulation task. The 2D visualizations are chosen for visual simplicity and do not limit the generality of the state space concept.

4.1. Propagation of PSS

At time t, for each specific state of the object in the PSS $q_{t} \in Q_{t}$ , we denote the set of all motions allowed to transit the state of the object by $V_{q_{t}}$ . We define the notation $T_{x} M$ to represent the tangent space of a manifold $M$ at a point $x \in M$ in this manifold. Note that $V_{q_{t}} \subset T_{q_{t}} S_{obj}$ , that is, $V_{q_{t}}$ lies in the tangent space of the object’s state space at the state q_t. Based on tangent bundle theory (Jost, 2011), we define the motion bundle at time t in equation (4), which is the set consisting of all possible state-motion pairs of the object:

{Q V}_{t} ≔ {(q_{t}, v_{t}) ∣ q_{t} \in Q_{t}, v_{t} \in V_{q_{t}}}

(4)

We need to define a propagation function

π : {Q V}_{t} \mapsto S_{obj}

to predict the state of the object q_t+1 at the next time step given a possible state q_t and a motion v_t, that is, q_t+1 = π(q_t, v_t).

However, the above propagation function only predicts a single state for the next time step. In our Caging in Time framework, we need to know all the possible states propagated from the previous step, that is, the PSS $Q_{t + 1}$ . To this end, we extend the propagation function π to another function $Π : 2^{Q V_{t}} \mapsto 2^{S_{obj}}$ whose domain is the power set of the motion bundle ${Q V}_{t}$ . As illustrated in Figure 5, it is used to obtain the entire set of all possible states at time t + 1 by propagating every possible q_t:

\begin{aligned} Q_{t + 1} & = Π ({Q V}_{t}) \\ = {q_{t + 1} = π (q_{t}, v_{t}) ∣ (q_{t}, v_{t}) \in {Q V}_{t}} \end{aligned}

(5)

Figure 5.

An illustration of the PSS propagation. Left: The propagation for a single state $q_{t} \in Q_{t} \subset S_{obj}$ . With a set of possible motions $V_{q_{t}}$ , q_t can be propagated to a set of different states (the orange region). Right: The propagation from the entire PSS $Q_{t}$ to $Q_{t + 1}$ by propagating all points in $Q_{t}$ . The different colors of regions in $Q_{t + 1}$ illustrate the propagation from multiple different $q_{t} \in Q_{t}$ .

4.2. Caging in Time

Unlike the traditional caging introduced in Section 3.1, our Caging in Time framework only requires one robot to manipulate the object. The cage is formed over time by switching the single robot to a different configuration and interacting with the object differently at each time step. Figure 6 shows an example where a robot pusher can push an object from various possible initial locations relative to the object (marked by the gray bars on the right). At each time step, by predicting the object’s possible motion via the propagation function Π defined in Section 4.1, the robot needs to figure out which candidate push to select (e.g., the green bar in the right figure of Figure 6), to prevent the object from escaping. As such, the object can always be confined to a bounded region as the robot pusher switches to different positions over time and pushes the object in different directions.

Figure 6.

A 2D pushing example showing how the PSS propagation is influenced by a robot action. Left: A scenario where $Q_{t}$ propagates to $Q_{t + 1}$ without robot intervention. $Q_{t + 1}$ will exceed the cage region $S_{cage}^{t + 1}$ . Right: With a well-selected robot push starting from the pose shown by the green bar towards the center of $S_{cage}^{t + 1}$ , $Q_{t + 1}$ will deform to a different shape and stay inside $S_{cage}^{t + 1}$ . In this example, the workspace and the object’s state space (i.e., 2D position of the object) share the same space $R^{2}$ , so we overlay the robot pushes (gray bars) onto the object’s state space for visualization.

Specifically, for every time step t, we want to confine the object’s state to be always inside a region $S_{cage}^{t} \subset S_{obj}$ in its state space, and we term this region the “cage region.” The “cage region” $S_{cage}^{t}$ does not have to be stationary; it can deform and displace depending on the manipulation task it entails. Moreover, a set of candidate robot actions $U$ needs to be predefined based on the context of the task. For example, in the planar pushing manipulation scenario shown in Figure 6, $U$ can be a set of robot-pushing actions defined by different starting positions and pushing directions of the robot pusher. Such robot actions will affect the possible motion of the object. For example, while the robot is interacting with the object through contact, the object will not be able to move in the directions that penetrate the robot. In other words, for a specific object state $q_{t} \in S_{obj}$ , the allowed object motions $V_{q_{t}}$ will be influenced by the robot action $u \in U$ . We use a function U to represent such influence by

V_{q_{t}} = U (q_{t}, u) \subset T_{q_{t}} S_{obj}

(6)

Applying a robot action u may cause the object to move in a direction towards the outside of the cage region $S_{cage}^{t + 1}$ at the next time step. To guarantee that the object’s state is always inside $S_{cage}^{t}$ , for each time step t = 1, …, T, there must exist some robot action $u_{t} \in U$ that causes all possible states of the object to be contained in $S_{cage}^{t + 1}$ after the robot action is executed. As such, the definition of Caging in Time is formally given as follows.

Definition 4.1

For a time-varying cage $S_{cage}^{t}$ in an object’s state space where t = 1, …, T, Caging in Time is achieved if and only if: ∀t = 0, 1, …, T − 1, $\exists u_{t} \in U$ such that

\begin{aligned} {Q V}_{t} & = {(q_{t}, v_{t}) ∣ q_{t} \in Q_{t}, v_{t} \in U (q_{t}, u_{t})}, \\ Q_{t + 1} & = Π ({Q V}_{t}) \subset S_{cage}^{t + 1} \end{aligned}

(7)

It is worth noting that Caging in Time does not guarantee the existence of such robot actions ${u_{t}}_{t = 0}^{T - 1}$ to cage the object. However, if a sequence of robot actions ${u_{t}}_{t = 0}^{T - 1}$ is verified to meet the Caging in Time condition, it is guaranteed that the object’s state always lies inside the time-varying cage $S_{cage}^{t}$ .

The process of determining whether Caging in Time is achieved is detailed in Algorithm 1. When the Caging in Time condition is met, the object is guaranteed to be caged inside a cage region $S_{cage}^{t}$ through a certain sequence of robot actions in an open-loop manner. If we apply this action sequence, the object will never escape even without any sensing feedback. As aforementioned, the cage region $S_{cage}^{t}$ does not have to be static; it is allowed to change over time. For example, if $S_{cage}^{t}$ is moved to follow a certain trajectory in the state space of the object over time, the object being caged in time will be able to follow the same trajectory of $S_{cage}^{t}$ .

Verification of action feasibility: In practice, the robot action u_t may not be realizable due to the speed limit or reachability of the robot, which varies on different robot platforms. For example, between the executions of consecutive actions u_t−1 and u_t, the object may keep moving if without certain assumptions, such as quasi-static motions. This requires the robot to switch to the next action execution fast enough, which may exceed the hardware’s speed limit. To this end, at Line 3 of Algorithm 1, to achieve Caging in Time under a real-world setting, it is necessary to verify the feasibility of the action u_t by taking into account the capability of the actual hardware.

5. Quasi-static tasks

In this section, we instantiate the Caging in Time theory on a quasi-static planar pushing problem. To facilitate the instantiation, we also develop relevant tools for propagating the PSS of the object in a 2D space and generating open-loop robot pushing actions to cage the object. With the Caging in Time framework applied to the planar pushing problem, we can enable a single robot to push an object of unknown shape to follow certain trajectories in an open-loop manner, without requiring sensory feedback, exact geometric information, or any other physical properties of the object. This is an instantiation of Caging in Time with a geometry-based cage.

5.1. Problem statement

The robot is tasked to push an object on a 2D plane. To generalize over different shapes of the object without requiring exact geometric modeling, we represent the object’s geometry by a simplified bounding circle with radius r, that is, $A_{obj} (q) = \{p \in R^{2} ∣ ‖ p - q ‖ \leq r\}$ where q denotes the position of the object. The radius r is chosen based on the size of the object such that the bounding circle covers the actual shape of the object. The robot is equipped with a fixed-length line pusher, represented by a line segment $L$ in the 2D space. The object is considered to be in collision with the line pusher when $L$ intersects the bounding circle of the object, that is, the distance from q to the line segment $L$ is not greater than the radius r.

While the configuration space of a quasi-static 2D object is SE(2), requiring both the position and orientation of the object, our use of a bounding circle allows us to simplify the object’s state space to its position only: $S_{obj} \subset R^{2}$ . As such, the state of the object becomes $q_{t} = (x_{t}, y_{t}) \in S_{obj}$ . Therefore, both the PSS $Q_{t}$ and the cage $S_{cage}^{t}$ can be conceptualized as regions within this 2D plane. In this specific planar pushing task, we further define the Potential Object Area (POA), $P_{t} \subset R^{2}$ , to be the set consisting of all the points in the workspace that are possibly occupied by the object’s geometric shape:

P_{t} ≔ ⋃_{q_{t} \in Q_{t}} A_{obj} (q_{t})

(8)

The “cage region” $S_{cage}^{t} \subset S_{obj}$ is then given in order to cage the object’s geometric coverage $P_{t}$ . Specifically, at each time t, we require the object’s state to be always caged such that the object’s geometric shape (i.e., the bounding circle), when at this state, is inside a manually set circle $S_{A}^{t}$ . The center of the circle $S_{A}^{t}$ is at $(x_{t}^{c}, y_{t}^{c})$ with a constant radius R. As such, the “cage region” $S_{cage}^{t}$ in the object’s state space is also a circle that shares the same center with $S_{A}^{t}$ and has a radius of R − r, that is, $S_{cage}^{t} = {(x, y) \in R^{2} ∣ {(x - x_{t}^{c})}^{2} + {(y - y_{t}^{c})}^{2} \leq {(R - r)}^{2}}$ . The radius of $S_{cage}^{t}$ , which is R − r, is called the size of the cage.

The robot action is characterized by a scalar angle θ, as shown on the left of Figure 7. The robot will first place the center of the line pusher in a starting position determined by θ, which is always at the boundary of the circle $S_{A}^{t}$ and can be calculated by $(x_{t}^{c} + R \cos θ, y_{t}^{c} + R \sin θ)$ . The line pusher will be oriented to be tangent to the boundary circle of $S_{A}^{t}$ . The robot will then move the pusher in the direction towards the center of $S_{A}^{t}$ with a fixed distance d_push to push the object. We assume that there are K candidate robot actions, that is, $U = {θ_{k}}_{k = 1}^{K}$ . The value of each action is obtained through θ_k = 2kπ/K such that the starting positions of these K candidate actions will evenly surround $S_{A}^{t}$ .

Figure 7.

Representation of robot actions and heuristic evaluation for planar pushing. Left: The green dashed circle is $S_{A}^{t}$ . Each of the K = 12 candidate actions (gray bars) is characterized by an angle θ_k. The robot places the line pusher $L$ (red) at θ_k and pushes towards the circle center with a distance d_push. Right: Heuristic evaluation, where S_out (pink area) is the part of $P_{t}$ (the POA) behind the pusher, and d_out (black arrow) is the maximum distance to the boundary of $P_{t}$ .

As defined in Section 4, we require the object’s state to be always confined inside the cage region

S_{cage}^{t}

, that is, the PSS of the object

Q_{t} \subset S_{cage}^{t}

. Then, for the object to follow a certain trajectory

T = {(x_{1}^{d}, x_{1}^{d}), \dots, (x_{t}^{d}, y_{t}^{d}), \dots, (x_{T}^{d}, y_{T}^{d})}

represented by a sequence of waypoints (i.e., desired positions) in the object’s state space, we just need to move the center of

S_{cage}^{t}

, which is

(x_{t}^{c}, y_{t}^{c})

, along the same trajectory. As the object is always confined inside the moving

S_{cage}^{t}

for every time step t = 1, …, T, the object’s position is guaranteed to follow the trajectory

T

with a positional error no greater than the cage size R − r.

Next, we develop an algorithm for solving the planar pushing problem based on the proposed Caging in Time theory. A high-level description of the developed algorithm is detailed in Algorithm 2. At each time step t, an open-loop robot action $u_{t} \in U$ to cage the object in time will be found, as will be detailed in Section 5.3; and then the PSS of the object will be propagated according to the robot action u_t, which will be discussed in Section 5.2. If the propagated PSS is always contained in the cage region $S_{cage}^{t}$ , the object being pushed will follow the desired trajectory along with the moving cage; otherwise, it is possible that the object is outside the cage, indicating a failure of the task.

5.2. Unknown object shape and its motion

For planar pushing, we represent the motion of the object by the displacement of the position of the object between adjacent time steps, denoted as v_t = Δq_t = (Δx_t, Δy_t). Then the propagation function π, initially defined in Section 4.1, for a single object’s configuration and motion as inputs becomes π(q_t, v_t) = π(q_t, Δq_t) = q_t + Δq_t, which is used to propagate the object’s configuration q_t+1 at the next step.

For a specific object’s configuration q_t, the set of all possible motions $V_{q_{t}}$ is modeled by the interaction between the object and the robot’s line pusher $L$ while the action u_t is being executed, as represented by the function U defined in equation (6). The pusher $L$ approaches the object with a straight-line distance d_push while u_t is being executed. If the distance between q_t and the line pusher $d (q_{t}, L)$ is greater than r + d_push, the object’s bounding circle will not collide with $L$ . The object will not move at all, and thus $V_{q_{t}} = \{(0,0)\}$ . However, if the object’s bounding circle collides with $L$ , indicating the object is likely displaced by $L$ , we denote d_con ≤ d_push as the moving distance of $L$ after contacting the bounding circle. As shown in Figure 8, we can derive a bound for this displacement using the friction theory and Peshkin Distance (Peshkin and Sanderson, 1988) as follows:

\begin{aligned} V_{q_{t}} = U (q_{t}, u_{t}) = if d (q_{t}, L) > r + d_{push} or u_{t} is None : \\ \{(0,0)\} \\ otherwise: \\ \{v_{t} = R (u_{t}) (\begin{matrix} Δ x \\ Δ y \end{matrix})| \frac{{(Δ x)}^{2}}{d_{con}^{2}} + \frac{{(Δ y)}^{2}}{{(\frac{d_{con}}{2})}^{2}} \leq 1, Δ x \leq 0\} \end{aligned}

(9)

where R(u_t) ∈ SO(2) is a rotation matrix of the angle u_t. The details of the derivation are given in the Appendix 9.1.

Figure 8.

The illustration of the bounds for the object’s displacement. The maximum displacement of the object equals d_con. Therefore, $Q_{t}$ containing only the central gray point can propagate into a semi-circle $Q_{t + 1}$ (the semi-circle region in dark green) with a radius equal to d_con (i.e., ‖Δq_t‖ = d_con). Correspondingly, the POA $P_{t}$ (the gray bounding circle) with radius r will evolve into the larger translucent green area $P_{t + 1}$ . With analysis based on Peshkin Distance, the possible displacement of the object will be further constrained such that Q_t+1 becomes a semi-ellipse, as shown on the right side.

The PSS $Q_{t}$ and the set of potential motions $V_{q_{t}}$ can both include an infinite number of points in the continuous space, making the propagation of PSS intractable. To this end, we implement the propagation of PSS in a discretized manner, similar to processing a binary image. Specifically, we construct a binary image $I_{t} \in R^{H \times W}$ at each time step t with height H and width W, centered at $(x_{t}^{c}, y_{t}^{c})$ , that is, the center of the cage. We use $I_{t} (x, y)$ to denote the intensity value of the pixel corresponding to the point (x, y). An intensity value of 1 signifies the presence of the point (x, y) in the PSS of the object, and 0 indicates the absence, that is, $I_{t} (x, y) = 1 \leftrightarrow (x, y) \in Q_{t}$ and $I_{t} (x, y) = 0 \leftrightarrow (x, y) \notin Q_{t}$ . Figure 9 shows several example images of $P_{t}$ enhanced with other features including visuals of the robot actions, the cage, the object’s POA, etc. In the figure, the white pixels (surrounded by the green pixels which are the POA of the object) correspond to the object’s PSS, and the pixels in other colors indicate the absence of PSS.

Figure 9.

An example showing how the PSS (white region) and the POA (green region) evolve in the discretized space relative to a moving cage. The moving direction of the cage is shown by the yellow arrows. Left: Without robot intervention, the PSS and the POA remain unchanged. However, they will translate to the left since the cage is moving towards the right. Right: If the POA is predicted to be moving outside of $S_{A}^{t}$ (the dark green circle), a selected line pusher (red) will interact with the POA from the left bottom to deform it to a different shape.

With this discretized representation, the PSS of the object is implemented by a finite set of pixels on

I_{t}

. We also do a similar discretization for the possible motion set

V_{q_{t}}

. As such, to propagate the PSS on the discretized space, we just enumerate and propagate every pixel in the PSS with every motion in the discretized finite set of

V_{q_{t}}

. As mentioned,

I_{t}

is always centered with the moving cage. So we need to deal with an offset of

(x_{t}^{c}, y_{t}^{c})

whenever we convert between a pixel location on

I_{t}

and its actual coordinates in the original continuous space. Algorithm 3 outlines the discretized implementation of the PSS propagation.

5.3. Cage the push in time

At each time step t, we need to select one robot pushing action u_t from all the candidates

U

. A naive approach is to simulate every action in

U

, propagate the PSS for every action, and then select one of the actions that keep the PSS inside the cage

S_{cage}^{t + 1}

. However, propagating the PSS can be computationally expensive, especially when the number of candidate actions K is large. Therefore, we propose a heuristic-based strategy without requiring the PSS propagation for every action being considered, as detailed in Algorithm 4. It is worth mentioning that the heuristic-based strategy is developed to make action selection more efficient, alleviating the burden of brute-force search over all possible actions. It does not conflict with the completeness of the definition of Caging in Time. If the heuristically selected pushing actions are verified to meet the Caging in Time condition, the object’s state is still guaranteed to be always caged. However, this work does not focus on developing an optimal strategy for action selection, and we admit that a better strategy might be developed in the future.

As the cage moves from $S_{cage}^{t}$ in the previous time step to $S_{cage}^{t + 1}$ along with the desired trajectory, if the PSS is still inside the moving cage, that is, $Q_{t} \subset S_{cage}^{t + 1}$ , no robot action will be needed and Algorithm 4 will return a None. However, if $Q_{t} ⊄ S_{cage}^{t + 1}$ , it indicates that the PSS of the object will go outside the cage if no robot action is taken. In this case, we have to select one robot action to move and deform the PSS to keep the object caged.

Recall that in Section 5.1, each candidate action in $U$ is represented by an angle θ_k, which determines the starting pose of the line pusher. For each candidate action in $U$ with an index k = 1, …, K, we will evaluate a heuristic score by $h^{k} = λ_{1} S_{out}^{k} + λ_{2} {(d_{out}^{k})}^{2}$ where λ₁ and λ₂ are weighting factors. As shown on the right of Figure 7, $S_{out}^{k}$ is the POA area behind the line pusher when we place the pusher in a starting pose determined by the angle θ_k, and $d_{out}^{k}$ is the farthest distance from behind the pusher to the POA boundary. It is worth noting that calculating the heuristics h^k for each action does not require the propagation of PSS, that is, the PSS remains $Q_{t}$ . Intuitively, a large h^k indicates that a large portion of the PSS will go outside the cage if the kth action is not taken.

The top five candidate actions with the highest heuristic scores, which compose a set $U^{'}$ , will be further considered for selecting the best one. From these five candidates in $U^{'}$ , the one closest to the previous action u_t−1 (i.e., the angle of the last push) will be chosen to propagate the PSS to the next step. This mechanism reduces the reorientation of the line pusher at adjacent time steps, which can largely facilitate motion efficiency when implemented on a real robot manipulator. Under the quasi-static setting, we know that the object does not move when not interacting with the robot pusher. This assumption ensures that the physical robot can switch to the subsequent pushing action in time, without requiring a sufficiently fast speed of operation. Therefore, the verification of action feasibility (Line 3 of Algorithm 1) is always satisfied. Figure 10 demonstrates an overall process of how the PSS and POA evolve as the cage moves, with strategically selected pushing actions ensuring the object remains caged throughout the task.

Figure 10.

The propagation of PSS (white region) and POA (green region) when the cage is following a circular path. Each image is centered at the center of the cage. The yellow arrow in each image represents the motion direction of the cage. In this example, K = 32 candidate actions are used, represented by the gray bars. With a different pushing action selected (red bar) for each step, the object is always caged in time, and the POA always stays inside $C_{A}^{t}$ (the dark green circle).

6. Dynamic tasks

In this section, we extend our Caging in Time theory from quasi-static tasks to dynamic scenarios where we instantiate the same framework on a challenging dynamic ball balancing problem. Tools are developed for propagating the ball’s PSS of its position and velocity, as well as for generating open-loop robot actions to keep the ball balanced while the robot end-effector follows different trajectories without requiring any sensory feedback. This is an instantiation of Caging in Time with an energy-based cage.

6.1. Problem statement

The robot is tasked with balancing a rolling object on a tilting plate (end-effector) while the plate follows different trajectories in the workspace. The plate is an n-dimensional flat surface in a workspace of dimension n + 1, where n = 1 or 2. Each dimension of the plate has a size ranging from −l to l. The object has mass m, radius r_b, an estimated rolling friction coefficient μ_r, and an estimated moment of inertia I_b. We assume that the object only rolls on the plate without slipping. Figure 11 illustrates example setups where n = 1 and 2. The position of the plate is denoted as $x_{p} \in R^{n + 1}$ and its translational acceleration as ${\ddot{x}}_{p} \in R^{n + 1}$ . Its orientation is characterized by a tilt angle vector $θ \in R^{n}$ . While only θ and ${\ddot{x}}_{p}$ are initially defined in the world frame (inertial), all other definitions and models are made in the plate frame (non-inertial). In this case, we introduce the ball’s non-inertial acceleration $a_{p} \in R^{n + 1}$ and gravity $g_{θ} \in R^{n + 1}$ induced by ${\ddot{x}}_{p}$ and θ so that all the calculations can be done in the plate frame, as shown in Figure 11.

Figure 11.

Illustration of the dynamic ball balancing system for plate dimension n = 1 and n = 2. Left: Physical setup when n = 1. x_t and ${\dot{x}}_{t}$ represent the ball’s position and velocity, respectively. The plate’s tilt angle is denoted by θ . Accelerations acting on the ball include gravity g_θ, a_p induced by the plate’s acceleration ${\ddot{x}}_{p}$ , and rolling friction a_μ. Right: Physical setup when n = 2, where the notations are the same with n = 1 case. Note that θ here is the tilt angle vector that contains two angles.

Although our framework employs a pure rolling assumption, this approach leverages a physical insight: rolling friction coefficients are typically smaller than sliding, introducing a higher dynamic uncertainty. Our Caging in Time, specifically designed based on PSS, capitalizes on this property by effectively handling the more uncertain rolling dynamics, inherently encompassing all possible motion states, including sliding behaviors and transitional contact modes, without requiring explicit modeling of each interaction type.

The state of the object at time t is defined as $q_{t} = (x_{t}, {\dot{x}}_{t}) \in R^{2 n}$ , where $x_{t} \in R^{n}$ is the position vector of the object relative to the center of the plate, and ${\dot{x}}_{t} \in R^{n}$ is its velocity vector. In this dynamic system, the state space is denoted as $S_{obj} \subset R^{2 n}$ , which encompasses all the possible states that the object can have in the plate frame.

To represent the distribution of states in a continuous way, we define the probability density of the object in state q_t as $p (q_{t}) = p (x_{t}, {\dot{x}}_{t})$ . The PSS $Q_{t}$ is represented as the set of all possible states with non-zero probability density:

Q_{t} = {q_{t} \in S_{obj} | p (q_{t}) > 0}

(10)

After defining PSS, a “cage region” is needed to constrain these states. Unlike quasi-static tasks where the cage is defined geometrically (Section 5.1), dynamic systems require a different approach because their state space encompasses both configuration and velocity. For such systems, energy provides a powerful and intuitive framework for representing and constraining motion-inclusive states. Here, we define the cage region in the state space based on system energy:

S_{cage}^{t} = {q_{t} \in R^{2 n} | E (q_{t}, g_{θ}, a_{p}) < E_{\max} (g_{θ}, a_{p})}

(11)

The total energy of the system E(q_t, g_θ, a_p), for a given state $q_{t} = (x_{t}, {\dot{x}}_{t})$ is expressed as:

E (q_{t}, g_{θ}, a_{p}) = E_{k} ({\dot{x}}_{t}) + E_{v e} (x_{t}) + E_{p} (x_{t}, g_{θ}, a_{p})

(12)

where E_k, E_ve, and E_p represent the kinetic energy, virtual elastic energy, and potential energy in the plate frame.

To establish an upper limit on the allowable energy that ensures that the object cannot “escape” from the plate, we consider an extreme scenario where the object is statically positioned at the edge of the plate as viewed from the plate frame. This maximum allowable energy, E_max, is defined as:

E_{\max} (g_{θ}, a_{p}) = E_{v e} (l) + E_{p} (l, g_{θ}, a_{p})

(13)

If the total energy of the object exceeds E_max at any time step, it would have enough energy to move beyond the plate’s boundaries, even if its current position is within the plate. By maintaining the energy of the object under the condition E(q_t, g_θ, a_p) < E_max(g_θ, a_p), we ensure that it remains within the plate for all possibilities throughout the task, effectively creating an energy-based “cage.”

This energy-based definition of the cage region can capture both position and velocity constraints and establish boundaries that prevent the object from leaving the plate to effectively cage it in the state space and enable robust dynamic object manipulation.

The control action of the robot is defined by adjusting the tilt angle vector θ of the plate by a continuous variable $u_{t} = d θ (t) \in R^{n}$ , representing the change rate of the tilt angles. The translation motion of the plate is guided by a predetermined trajectory $T = {x_{p} (t)}_{t = 0}^{T}$ in the world frame, where $x_{p} (t) \in R^{n + 1}$ represents the position of the plate’s center in the workspace. The plate’s acceleration trajectory $T_{a} = {{\ddot{x}}_{p} (t)}_{t = 0}^{T}$ is then calculated using numerical differentiation of the velocity, which in turn is derived from the position trajectory $T$ . According to the theory defined in Section 4, our objective is to maintain the state of the object within the energy-constrained cage $S_{cage}^{t}$ while the plate follows its prescribed trajectory $T$ , ensuring that the PSS of the object satisfies $Q_{t} \subset S_{cage}^{t}$ at all times.

Next, we develop an algorithm for solving the dynamic ball balancing problem based on the proposed Caging in Time theory. At each time step t, we compute the maximum allowable energy, calculate the weighted average energy of the current PSS, and solve an optimization problem to find the optimal control input u_t = d θ (t) that keeps the ball caged in time. We then propagate the PSS according to the system dynamics and check if the energy constraint and position constraint are satisfied. If either constraint is violated at any time step, the task is considered to have failed.

6.2. Ball dynamics and PSS propagation

As mention above, the system state at time t is denoted as $q_{t} = (x_{t}, {\dot{x}}_{t})$ with its corresponding probability density p(q_t). We represent the motion of the system between adjacent time steps as $v_{t} = Δ q_{t} = ({\dot{x}}_{t} Δ t, {\ddot{x}}_{t} Δ t)$ , with associated probability p(v_t). As defined in equation (6), the set of potential motions $V_{q_{t}}$ at a certain q_t can be given as:

V_{q_{t}} = U (q_{t}, u_{t}) = \{v_{t} ∣ v_{t} = ({\dot{x}}_{t} Δ t, {\ddot{x}}_{t} Δ t)\}

(14)

The propagation function π that propagate from q_t to q_t+1 for a single state-motion pair (q_t, v_t) is given with the joint probability p(q_t, v_t) by:

π (q_{t}, v_{t}) = (x_{t} + {\dot{x}}_{t} Δ t, {\dot{x}}_{t} + {\ddot{x}}_{t} Δ t)

(15)

Note that π is a deterministic function for state transition. Unlike Section 5.2, here we propagate the PSS by updating its probability distribution. Since the joint probability of the state-motion pair can be expressed as p(q_t, v_t) = p(q_t) ⋅ p(v_t|q_t), the probability p(q_t+1) can then be derived as:

\begin{aligned} p (q_{t + 1}) & = \iint_{q_{t} \in Q_{t}, v_{t} \in V_{q_{t}}} p (q_{t}, v_{t}) 1 (π (q_{t}, v_{t}) = q_{t + 1}) d v_{t} d q_{t} \\ = \int_{q_{t} \in Q_{t}} p (q_{t}) \int_{v_{t} \in V_{q_{t}}} \\ p (v_{t} | q_{t}) 1 (π (q_{t}, v_{t}) = q_{t + 1}) d v_{t} d q_{t} \end{aligned}

(16)

where the decomposition of p(v_t|q_t) can be given as:

\begin{aligned} p (v_{t} | q_{t}) & = p ({\dot{x}}_{t} | q_{t}) \cdot p ({\ddot{x}}_{t} | q_{t}) \\ = p ({\ddot{x}}_{t} | q_{t}) \\ = p ({\ddot{x}}_{t}) \end{aligned}

(17)

where

p ({\dot{x}}_{t} | q_{t}) = 1

since

{\dot{x}}_{t}

is fully determined by q_t. To calculate

p ({\ddot{x}}_{t})

, we need to consider the system dynamics. Based on the physical properties of the ball mentioned in Section 6.1, we can define the set of potential accelerations

A_{q_{t}}

for a given state q_t and control input u_t as:

\begin{gathered} A_{q_{t}} = \{{\ddot{x}}_{t} ∣ {\ddot{x}}_{t} = f (q_{t}, u_{t}, η_{m}, η_{p}, η_{μ}), \\ η_{m} \sim N (0, σ_{m}^{2}), \\ η_{p} \sim N (0, Σ_{p}), \\ η_{μ} \sim N (0, σ_{μ}^{2})} \end{gathered}

(18)

where f(⋅) represents the system dynamics equation, detailed in the Appendix 9.2.1. The terms η_m, η _p, and η_μ represent uncertainties in mass, plate acceleration, and friction coefficient, respectively.

Leveraging the closure property of Gaussian distributions under linear operations, we can conclude that the acceleration ${\ddot{x}}_{t}$ , being a linear combination of Gaussian-distributed uncertainties, also follows a Gaussian distribution:

{\ddot{x}}_{t} \sim N (μ_{\ddot{x}}, Σ_{\ddot{x}})

(19)

where

μ_{\ddot{x}}

and

Σ_{\ddot{x}}

are derived from the system dynamics equation and the distributions of the uncertainty terms.

Given this Gaussian distribution of ${\ddot{x}}_{t}$ , we can now calculate $p ({\ddot{x}}_{t})$ for any ${\ddot{x}}_{t} \in A_{q_{t}}$ , which allows us to further calculate p(q_t+1) based on equations (16) and (17).

As the probability of p(q_t+1) is propagated from all the states in

Q_{t}

, we can get the PSS at next time step in the same way as equation (10):

Q_{t + 1} = {q_{t + 1} \in S_{obj} | p (q_{t + 1}) > 0}

(20)

To numerically implement the PSS propagation described by equations (20) and (16), we employ a discretized representation of the state space, similar to the approach used in the quasi-static case (Section 5.2). However, instead of binary values, we use continuous values to represent the probability distribution of the PSS in the state space.

We construct a 2n-dimensional array $I_{t} \in R^{N^{2 n}}$ at each time step t, where N is the grid size for each dimension of the state. The array $I_{t}$ is centered at the origin in the state space, with x ∈ [−x_max, x_max] and $\dot{x} \in [- {\dot{x}}_{\max}, {\dot{x}}_{\max}]$ . The state space is discretized uniformly in each dimension, with appropriate grid spacings for position and velocity components. Each element $I_{t} (q_{t})$ represents the probability of the occurrence of the discretized version of state q_t, derived from p(q_t). Figure 12 shows a visual illustration under the above numerical representation when n = 1.

Figure 12.

Discretized representation of the state space $S_{obj}$ with energy map. Each pixel of the discretized map represent a state $(x_{t}, {\dot{x}}_{t})$ of the ball. The cage region $S_{cage}^{t}$ is bounded by the maximum energy E_max. The gradient gray area shows energy value $E (x_{t}, {\dot{x}}_{t})$ at different states. And the gradient green area shows PSS $Q_{t}$ with the ball’s probability density $p (x_{t}, {\dot{x}}_{t})$ at different states.

The PSS propagation is implemented in Algorithm 5, which discretizes the continuous integrals in equation (16). For each q_t in the current PSS $Q_{t}$ , we compute the set of potential motions $V_{q_{t}}$ using equation (14). We then iterate over discretized versions of these motions, updating the probability of the resulting states q_t+1 according to equation (17):

I_{t + 1} (q_{t + 1}) = I_{t + 1} (q_{t + 1}) + I_{t} (q_{t}) \cdot p (v_{t} | q_{t})

(21)

where p(v_t|q_t) is computed as in equation (17). After updating all probabilities, we normalize them to ensure that the grid values sum up to 1 as a valid distribution. For computational efficiency, we apply a threshold to

I_{t + 1}

, setting the probabilities below a certain value (e.g., 10⁻³) to zero.

This pixel-based approach provides a computationally efficient representation of the PSS in state space, avoiding the exponential increase in computational cost associated with the sampling methods used in previous caging work in long-horizon tasks (Makapunyo et al., 2012; Welle et al., 2021).

Figure 13 illustrates the propagation and caging of the PSS in the state space during the tracking of an ∞-shaped trajectory for a one-dimensional plate (n = 1). The PSS initially manifests as a compact Gaussian distribution, subsequently expanding before converging to a stable size caged by the energy boundary.

Figure 13.

Sequence of the ball’s PSS evolution during dynamic balancing where the plate dimension n = 1. Top rows: State space representation of the PSS $Q_{t}$ (green) within the cage region $S_{cage}^{t}$ (dark background), as detailed in Figure 12. Bottom rows: Corresponding configurations of the system in the world frame. The dark gray arrow shows the direction of the plate’s translational motion.

6.3. Cage the ball in time

To achieve Caging in Time for the dynamic ball balancing, we employ a control optimization strategy that combines Control Barrier Function (Ames et al., 2019) and Control Lyapunov Function (Anand et al., 2021) to keep the ball on the plate while following the desired trajectory.

In our case, Control Barrier Function (CBF) is utilized to implement the concept of “caging” in a dynamic setting. As a physical cage constraining an object within a defined space, the CBF establishes an energy-based cage that can be characterized by the margin between the maximum energy attainable within the PSS and the preset energy boundary:

h (Q_{t}) = E_{\max} (g_{θ}, a_{p}) - \max_{q_{t} \in Q_{t}} E (q_{t}, g_{θ}, a_{p})

(22)

This creates an energy barrier and acts as the mathematical representation of our cage. By ensuring that $h (Q_{t}) > 0$ at all times, we guarantee that the ball’s energy, E(q_t, g_θ, a_p) at state q_t as calculated in equation (12), never exceeds the maximum allowable energy E_max defined in equation (13).

To further ensure that our control actions maintain this energy barrier, we enforce a CBF condition. This provides a forward-looking constraint that considers not only the current state but also the future behavior of the system, rather than simply enforcing $h (Q_{t}) > 0$ .

L_{f} h (Q_{t}) + L_{g} h (Q_{t}) d θ + α (h (Q_{t})) \geq 0

(23)

In this condition, $L_{f} h (Q_{t})$ and $L_{g} h (Q_{t})$ are the Lie derivatives of the CBF $h (Q_{t})$ along the vector fields f and g, respectively, where f and g are defined in the system dynamics equation ${\ddot{x}}_{t} = f + g u$ , detailed in Appendix 9.2.2. $L_{f} h (Q_{t})$ represents the change in the barrier function due to the natural dynamics of the system, while $L_{g} h (Q_{t}) d θ$ represents the change due to the control input. The term α(⋅) is a class $K$ function that shapes the convergence behavior of the barrier function, ensuring that it increases more rapidly as the system approaches the boundary of the safe set.

Control Lyapunov Functions (CLFs), on the other hand, are used to optimize the control performance by driving the system towards a desired state. In our context, we use a CLF to encourage the ball to stay near the center of the plate and maintain a well-distributed probability state, with both the system’s energy and entropy considered:

V (Q_{t}) = \sum_{I_{t} (q_{t}) > 0} I_{t} (q_{t}) E (q_{t}, g_{θ}, a_{p}) - k_{S} S (Q_{t})

(24)

where

I_{t} (q_{t})

is the probability value at state q_t in the discretized probability distribution, E(q_t, g_θ, a_p) is the system energy at the state q_t, k_S is a weighting factor, and

S (Q_{t})

is the system entropy.

This CLF balances two objectives: minimizing the expected energy of the system, which tends to keep the ball near the center of the plate, and maximizing the entropy of the probability distribution, which maintains a well-distributed probability state, ensuring both stability and robustness in the face of uncertainties. By minimizing this CLF, we drive the system towards a state where the ball is likely to be near the center of the plate, but with some uncertainty to handle unexpected disturbances.

Similar to CBF condition, we enforce the CLF condition to ensure the CLF decreases over time:

L_{f} V (Q_{t}) + L_{g} V (Q_{t}) d θ + c V (Q_{t}) \leq δ

(25)

where

L_{f} V (Q_{t})

and

L_{g} V (Q_{t})

are Lie derivatives of the CLF along the system dynamics and control input, respectively, c is the convergence rate, and δ is a relaxation variable. This condition ensures that our control input d θ drives the system towards the desired behavior at a sufficient rate, while the relaxation variable δ allows for temporary violations of the decrease condition when necessary to satisfy the CBF safety constraint.

By combining the CBF and CLF conditions, we formulate a Quadratic Program (QP) that not only keeps the ball on the plate but also tries to keep it centered and stable.

\begin{aligned} \min_{d θ, δ} & {(d θ)}^{2} + λ δ^{2} \\ s.t. & L_{f} h (Q_{t}) + L_{g} h (Q_{t}) d θ + α (h (Q_{t})) \geq 0 \\ L_{f} V (Q_{t}) + L_{g} V (Q_{t}) d θ + c (V (Q_{t})) \leq δ \\ d θ_{\min} \leq d θ \leq d θ_{\max} \end{aligned}

(26)

Here, λ > 0 is a weighting factor that balances the trade-off between minimizing the control effort (d θ )² and the CLF constraint violation δ². The values of d θ _min and d θ _max are derived from the hardware torque limits of the robot arm’s motors. We utilize the OSQP (Operator Splitting Quadratic Program) solver to efficiently solve this optimization problem at each time step.

By solving this QP at each time step, we obtain the optimal control input that maintains the ball within the energy-based cage while following the desired trajectory. The detailed implementation is presented in the Appendix 9.2.2.

This approach allows us to cage the ball in time under dynamic conditions on a tilting plate, accounting for the ball’s dynamics, uncertainties, and energy constraints. By solving the QP at each time step, we generate a sequence of actions that keeps the ball within the energy-based cage region with the plate following the desired trajectory.

7. Experiments

We conducted extensive experiments to evaluate our Caging in Time framework using a Franka Emika Panda robot arm for both quasi-static pushing tasks and dynamic ball balancing tasks with different end-effectors shown in Figure 14. All algorithms were implemented in Python and executed on a single thread of a 3.4 GHz AMD Ryzen 9 5950X CPU.

Figure 14.

Experiment setup. Left: Robot and camera setup. The robot performs the experiment on a transparent surface (blue) with a pre-installed line pusher (green). We used cameras 2 and 3 below the surface to record (not to track) the trajectories of the objects through April-Tag. Camera 1 was used to record the experiment from an upper view. Top Right: The 3D-printed objects used in the experiments. 1. Square with two trimmed tips, 2. Pentagon, 3. Triangle with trimmed tips, 4. Octagon, 5. Square, 6. Heart, 7. Hexagon, 8. Ellipse, 9. Circle, and letter-shaped objects: R, I, C, E. Bottom Right: 3D printed plates, tennis ball and plastic fruits for dynamic tasks. The square plate is for ball balancing when dimension n = 2, and the thinner one is for ball balancing and catching when dimension n = 1, which has a guide rail to constrain the motion of the ball only on the required dimension. The plastic fruits are strawberry (YCB #12), plum (YCB #18), pear (YCB #16), and lemon (YCB #14).

Our experimental setup utilized multiple cameras for recording: the cameras shown on the left of Figure 14 for quasi-static tasks, with an additional camera for dynamic tasks. For quasi-static experiments, object tracking was achieved using AprilTags (Olson, 2011), while in dynamic scenarios OpenCV was employed to track the ball’s motion. It is important to note that object tracking served only for trajectory visualization and accuracy evaluation, and all experiments were conducted in an open-loop manner.

To quantify manipulation precision, we adopted the Mean Absolute Error (MAE) as our primary metric, calculated by averaging the absolute distances between the actual and reference trajectories throughout the manipulation process.

As in this initial exposition of our framework, we deliberately selected two representative manipulation paradigms, one quasi-static and one dynamic, to evaluate fundamental capabilities while facilitating clear visualization of PSS propagation. Through these experiments, we aimed to thoroughly assess the performance, robustness, and versatility of our Caging in Time framework, establishing a foundation for future extensions to more scenarios.

7.1. Quasi-static tasks

Through quasi-static experiments, we aimed to investigate the performance of our framework from three aspects: 1) Without any sensing feedback besides the size and initial position of the bounding circle, can robust planar manipulation be achieved by the proposed Caging in Time? 2) If so, what precision can be achieved and how is the robustness affected by different settings of the framework and outer disturbances? 3) Under imperfect perceptions like positional noise and network lag, can Caging in Time surpass closed-loop methods?

As shown in Figure 14, the robot in this task used a 100 mm line pusher, and objects of various shapes were 3D printed in a similar size, fitting within a bounding circle of radius r = 25 mm. The pushing distance for each action was set as d_push = 20 mm. The computation time for each step in action selection and PSS propagation was 26.2 ± 8.5 ms.

7.1.1. Evaluation of cage settings

The task of this experiment is to push the object through a predefined circular trajectory. This experiment focused on evaluating the precision of our framework by varying two key parameters: R − r, the cage size, and K, the number of candidate actions. We selected values for R − r and K from the sets R − r = {10, 20, 30, 40} mm and K = {16, 32, 64, 128}. For each combination of R − r and K, five trials were conducted using five different objects (Object 1–5; see Figure 14). We generated only one robot action sequence per (R − r, K) setting, following Algorithm 2 and applied it to different objects in an open-loop manner. The MAE across these trials was averaged and presented in Figure 15, which also includes the actual trajectories recorded from all experiments with different settings and different objects.

Figure 15.

Evaluation results in terms of the cage size (R − r) and number of candidate actions. Top: All trajectories (nine trajectories of different objects are overlaid in the plot per circle) Bottom: Mean Absolute Error of the shown trajectories under different cage settings. “CA” stands for candidate action.

From the results, we can see that the performance follows two major trends. One is that a larger cage radius tends to increase the average error. The other trend is that an increase in the number of candidate actions generally leads to a decrease in error. As shown in Figure 16, with larger cage sizes, the POA will grow larger, therefore, increasing the range the object could possibly deviate from the reference trajectory. Also, with more candidate actions, it increases the possibility of finding the optimal action in Algorithm 4.

Figure 16.

Visualization of POAs (the white boundaries) in real pushing experiment recordings under various cage sizes.

Examining the trajectories in Figure 15, we observed that all the paths remained within their respective cage regions. In particular, trajectories generated by the same action sequence showed similar patterns across different objects. For example, trajectories with 16 candidate actions and a cage size of 40 mm (R − r) mostly moved to the left of the reference path, indicating that the action sequence plays a more dominant role in the performance than the shape of the object.

7.1.2. Why caging in time

Although the line pusher might be perceived as a stable method for pushing an object in an open-loop way, this stability is often compromised by the unpredictability of the object’s physical properties and shape, which necessitates the implementation of Caging in Time.

To validate the necessity and robustness of our approach, an object was tasked to navigate a ∞-shaped trajectory repeatedly for 10 cycles. A comparative analysis was performed against a naïve pushing strategy, where the pusher simply followed the target trajectory with its orientation parallel to the direction of the trajectory. As shown in Figure 17, in contrast to the naïve method, where the pusher lost control of the object after a few steps, our approach successfully maintained control over the object after 10 rounds with an average error of 10.09 mm.

Figure 17.

Robustness validation with a ∞-shaped trajectory. This experiment was conducted with the heart-shaped object (9), recorded by camera 2. Top: Ten loops conducted by Caging in Time. Bottom: A failed attempt by a naïve pushing strategy.

7.1.3. Comparison with a closed-loop method

To further validate the effectiveness of our framework, we conducted a comparative study against a closed-loop pushing method: a proportional controller (P-controller) implemented under the quasi-static assumption. The P-controller works by calculating an error vector from the object’s current position to the nearest waypoint on the reference trajectory. This error vector is then scaled by the P-gain (set to 0.5) to determine the direction and magnitude of the pushing action. Specifically, the pushing direction is aligned with the scaled error vector, while the pushing distance is capped at 20 mm for each action, consistent with the d_push used in our Caging in Time experiments.

We introduced two types of disturbances to simulate real-world scenarios with imperfect perception and communication delays. First, positional noise was added as Gaussian noise with a standard deviation of the noise level in both x and y directions, simulating the imperfect perception, such as that of an RGBD camera under dim lighting conditions. Second, to emulate potential packet loss during data transmission or sudden obstruction of the object from the camera, we introduced a network lag that for each execution step, there was a 50% possibility that the controller would receive the object’s position from 0.5 or 1 second ago.

The experimental results, as shown in Figures 18 and 19, demonstrate that our Caging in Time approach achieves accuracy comparable to that of the P-controller with perfect perception. Moreover, the accuracy of the P-controller deteriorates significantly under the influence of perception noise or network lag. As shown in Figure 18, when the Gaussian noise is large, the resultant trajectory tends to exhibit random motion patterns, whereas network lag induces classic sinusoidal fluctuations in the trajectory.

Figure 18.

Trajectory comparison between Caging in Time with 20 mm cage size and 128 candidate actions and the P-controller under varying positional noise levels and network lags. Odd rows: Representative recordings of a single object. Even rows: Overall trajectories from all nine test objects.

Figure 19.

Evaluation results of the P-controller in terms of the noise level (blue) and network lag (red) compared to the Caging in Time framework (yellow) under 20 mm cage size and 128 candidate actions.

These findings underscore the robustness of our Caging in Time framework, particularly in real-world scenarios where perception and communication are often imperfect. The open-loop nature of our approach eliminates the need for continuous feedback, making it inherently resilient to perception noise and network delays that can significantly impair the performance of closed-loop methods.

7.1.4. In-task perturbations

The robustness of Caging in Time was further tested by maneuvering objects along specially designed trajectories shaped by the letters “RICE,” as shown in Figure 20.

Figure 20.

Trajectory following with in-task perturbation using six different objects (4, 7, 9, R, C, E) for tracing “RICE.” The manipulated objects are shown at the start point of each segment after the switch, with the color of the objects matched with their trajectories. The real recording is shown in Figure 1.

Notably, during this task, objects were intermittently randomly replaced by a human operator to test the system’s ability to adapt to new shapes and position perturbations while maintaining trajectory precision in one single task. The results have further shown the robustness of our proposed Caging in Time framework: as long as the new object was positioned within the current PSS $Q_{t}$ , the robot was able to consistently cage and manipulate it along the reference trajectory, demonstrating the method’s robustness and adaptability.

7.2. Dynamic tasks

To evaluate the capabilities and limitations of Caging in Time in handling various dynamic tasks, we conducted a series of experiments using a tennis ball of diameter 6.6 cm with end-effector plates in different sizes, as shown in Figure 14.

Like quasi-static tasks, dynamic experiments also focused on three key aspects: 1) Can Caging in Time achieve robust ball balancing on a moving plate along different trajectories without real-time sensing feedback? 2) How well does the framework perform in more extreme dynamic scenarios, such as catching a ball in an open-loop manner? 3) How robust is Caging in Time when handling uncertainties higher-dimensional state spaces?

In the following experiments, the plate followed a predefined translational trajectory $T = {x_{p} (t)}_{t = 0}^{T}$ . While the trajectory of the plate was preset, the tilt angle of the plate at each time step is computed by our framework. A Cartesian space motion control is then used to control the robot’s end-effector to follow this trajectory in a pure open-loop manner. For all dynamic experiments, we also used MAE to quantify the performance. Since the plate accurately tracks the target trajectory, the MAE is calculated as the average distance between the ball and the center of the plate over all time steps and all trials. The computation time for each step in action selection and PSS propagation was 67.3 ± 12.6 ms.

7.2.1. Dynamic sensitivity analysis

Before proceeding with experiments, we conducted a dynamic sensitivity analysis to examine how Caging in Time performance in the ball catching tasks responds to initial state distribution and hardware limitations, as previously mentioned in Algorithm 1 and equation (26). Figure 21 illustrates the success rates for finding feasible action sequences in Algorithm 1 for a ball catching task across varying initial velocity uncertainties $Δ {\dot{x}}_{0}$ , average initial velocities ${\bar{\dot{x}}}_{0}$ , and end-effector angular acceleration limits β_max. The left plot fixes ${\bar{\dot{x}}}_{0}$ at 0.8 m/s, demonstrating how acceleration capabilities counteract uncertainty effects, while the right plot fixes $Δ {\dot{x}}_{0}$ at 0.05 m/s, revealing velocity-hardware requirement relationships. We sampled each parameter using 10 evenly spaced values within their ranges, with 100 trials per configuration performed in simulated PSS propagation in Algorithm 1.

Figure 21.

Ball catching success rate evaluation of finding feasible action sequences in PSS propagation over initial velocity uncertainty range $Δ {\dot{x}}_{0}$ , average initial ball velocity ${\bar{\dot{x}}}_{0}$ , and hardware limitation represented by the end-effector angular acceleration limit β_max.

As shown in Figure 21, with a fixed ${\bar{\dot{x}}}_{0}$ of 0.8 m/s, the tolerance of $Δ {\dot{x}}_{0}$ improved to 0.2 m/s as β_max increased beyond 20 rad/s², indicating that higher acceleration capabilities allow the system to accommodate larger PSS. Similarly, with $Δ {\dot{x}}_{0}$ fixed at 0.05 m/s, higher ${\bar{\dot{x}}}_{0}$ required proportionally higher β_max to maintain 100% success rates. Based on our Franka robot’s maximum end-effector angular acceleration, corresponding to the robot configuration used for this task, of approximately 25 rad/s², we determined our setup could reliably handle initial velocities up to 0.9 m/s with uncertainty ranges below 0.15 m/s, which guided our subsequent experimental design for dynamic tasks.

7.2.2. Ball balancing

We first tested ball balancing with dimension n = 1 on two different trajectories, as shown in Figure 22. The plate used in this task is 16 cm long and has a guide rail to constrain the ball’s rolling to the x-axis only, as shown in the bottom right of Figure 14. The “RICE” trajectory, featuring sharp turns, resulted in an average error of 20.12 ± 5.66 mm. The ∞-shaped trajectory, repeated four times to test smooth recurring curves, yielded an average error of 26.59 ± 7.54 mm. Both experiments were repeated 20 times, with the ball consistently remaining on the plate, demonstrating the robustness and stability of Caging in Time for long-horizon dynamic tasks.

Figure 22.

Experimental validation of Caging in Time for dynamic manipulation tasks using a tennis ball and a support surface where dimension n = 1 (see Figure 14). Top row: Representative frames from video recordings, with overlaid transparent balls showing the ball’s position at different time points. Bottom row: Trajectory plots of 10 repeated trials for each task. Left: Ball balancing while tracing “RICE.” Middle: Ball balancing along an ∞-shaped trajectory. Right: Ball catching with the balls rolling down from the same height on a slope to achieve consistent initial tossing velocity.

We observed that sharp turns in the trajectory of the plate caused sudden changes in ${\ddot{x}}_{p}$ , increasing the control difficulty and causing larger errors in the trajectory of the ball. Additionally, when tracking the ∞-shaped trajectory, ${\ddot{x}}_{p}$ was non-zero most of the time, resulting in a persistent non-zero tracking error. This led to a higher average error for the ∞-shaped trajectory compared to the “RICE” trajectory, even though the maximum positional error of the ball’s trajectory was relatively smaller.

7.2.3. Ball catching

To further challenge the capability of Caging in Time to handle complex dynamic tasks, we implemented an open-loop ball catching experiment using the same plate with a 1D track constraint where dimension n = 1. In actual implementation, to ensure appropriate timing for initiating the action sequence, we utilized OpenCV to detect when the ball entered the camera frame. The open-loop action was activated upon detection with a manually tuned fixed time offset. Within the Caging in Time framework, we considered the task to start when the ball made contact with the plate, ignoring bouncing effects. In addition, we incorporated a hard-coded translational retreat motion to minimize ball rebounds on the plate.

We conducted two sets of experiments. In the first experiment, balls were released from a fixed height on an inclined slope, ensuring a consistent initial position and approximately the same velocity of the ball upon leaving the slope. Out of 20 trials, the system successfully caught the ball in all 20 attempts, as shown in Figure 22 (right). This demonstrates the reliability of Caging in Time under well-defined initial conditions.

In the second experiment, we introduced greater uncertainty by having a human manually toss the ball onto the plate, as illustrated in Figures 1 and 23. Due to the inherent difficulty in controlling the landing position of hand-tossed balls, we only recorded trials where the ball successfully landed on the plate. Out of 20 such successful tosses, the system was able to catch and stabilize the ball in 13 trials. This partial success rate demonstrates the framework’s robustness to velocity uncertainties within a certain range, as well as the limitations of the Caging in Time framework’s pre-configured error tolerance. This tolerance is constrained to ensure feasible actions within the robot’s hardware torque limits, and some human tosses evidently introduced velocity variations beyond this tolerance range.

Figure 23.

Demonstration of ball catching with human tossing using representative frames of nine overlayed trials with time labels. Results along with Figure 1 show that Caging in Time is able to catch the ball tossed inaccurately by a human using the same open-loop action sequence in Figure 22.

To quantitatively evaluate task completion rates under varying uncertainty conditions, we analyzed success rates in ball catching across different initial velocity distributions. Our data shows that slope dropping with velocities of 0.84 ± 0.12 m/s achieved a 100% success rate, while human tossing with greater variability beyond the tolerance range (0.93 ± 0.28 m/s) yielded a 65% success rate. This real-world evaluation further validates our dynamic sensitivity analysis in Figure 21 and demonstrates the framework’s performance boundaries under different uncertainty magnitudes.

7.2.4. Caging with uncertainty

To test the robustness of the framework to shape uncertainty, we balanced various fruits in the YCB dataset shown in Figure 14 along the “RICE” trajectory with dimension n = 1, as shown in Figure 24. Although the shape uncertainty brought higher uncertainties to the system dynamics, the framework could still make sure that the object motion stays in the caged PSS. In 10 trials, all objects remained on the plate, demonstrating the adaptability of Caging in Time to shape variations.

Figure 24.

Demonstration of object-agnostic robustness in Caging in Time. Balancing of diversely shaped YCB fruits from Figure 14 where dimension n = 1 using the same open-loop action sequence for the tennis ball. Each figure shows multiple instances of the same fruit, representing its position at different time points during the balancing task.

To note that, the success of these tasks can be partially attributed to the nature of irregular shapes, which are less likely to roll freely due to their inherent energy traps created by their non-uniform geometries. This characteristic actually aids in maintaining stability, as the objects tend to settle into local energy minima, complementing the caging strategy of our framework.

7.2.5. Caging in higher dimensions

Lastly, we extended our experiments to higher-dimensional spaces using a 16 cm×16 cm plate, where the absence of the guide rail allows the ball to move freely in both the X and Y directions, significantly increasing the complexity of the balancing task. As shown in Figure 25, we performed ball balancing with dimension n = 2, where the ball’s state is represented in a 4D state space $(x_{t}, y_{t}, {\dot{x}}_{t}, {\dot{y}}_{t})$ , two dimensions higher than all previous tasks.

Figure 25.

Ball balancing using Caging in Time with the dimension n = 2 while tracing “RICE.” Left: Representative frames from recordings. The overlaid transparent balls show the ball’s position at different time points during the task. Right: Qualitative trajectory plots of 10 repeated trials for each task.

In 10 trials, the ball consistently remained on the plate, showcasing Caging in Time’s applicability to higher-dimensional scenarios. The trajectories shown in Figure 25 are for illustrative purposes due to non-orthogonal camera placement and may not represent exact quantitative performance, where we can visibly tell that the trajectories exhibit larger deviations compared to the previous balancing experiments. This increased error is attributed to the unrestricted rolling direction in this setup, which introduces greater uncertainties and control challenges.

8. Applications in practice

Applications for Caging in Time could potentially extend beyond previous experiments to broader manipulation domains. This section explores the possibilities and practical considerations for implementing Caging in Time in various scenarios.

8.1. Example applications

In Caging in Time, according to Section 3.2, object states are represented through PSS $Q_{t}$ within task-specific state spaces, with actions u_t that ensure $Q_{t + 1} = Π ({Q V}_{t}) \subset S_{cage}^{t + 1}$ , where ${Q V}_{t}$ is the motion bundle in equation (4) and Π is the propagation function in equation (5). Below are tasks that we believe can be effectively represented and enhanced through this framework.

8.1.1. In-hand manipulation

For in-hand manipulation, our framework enables instantaneously incomplete cages to dynamically form complete cages over time, guiding objects toward target states with containment guarantees, significantly relaxing the requirement for in-hand caging (Bircher et al., 2021) in both quasi-static and dynamic cases.

In quasi-static cases, the state space is $S_{obj} \subset S E (3)$ , encoding object pose. The cage can form in two ways: through geometric constraints, where spatial finger contact arrangements prevent the object from escaping, or through energy-based constraints using potential energy functions where the energy boundary forms the cage as defined in equation (11): $S_{cage}^{t} = {q_{t} \in S E (3) | E (q_{t}) < E_{\max}}$ , where E(q_t) represents the combination of gravitational potential energy and virtual potential fields that model contact constraints. These two approaches can also work in tandem—potential fields control object motions while geometric constraints provide explicit caging boundaries, as demonstrated in Bircher et al. (2021).

In dynamic cases, the state q_t becomes $(x_{t}, {\dot{x}}_{t})$ and the state space expands to $S_{obj} \subset S E (3) \times s e (3)$ to incorporate both the pose and the twist, where the cage emerges as energy barriers. Similarly in equation (12), the energy-based cage can be explicitly defined as: $S_{cage}^{t} = {(x_{t}, {\dot{x}}_{t}) \in S E (3) \times s e (3) | E_{p} (x_{t}) + E_{k} ({\dot{x}}_{t}) < E_{\max}}$ , where E_p(x_t) represents the potential energy from gravity and contact forces, and $E_{k} ({\dot{x}}_{t}) = 1 / 2 {\dot{x}}_{t}^{T} M {\dot{x}}_{t}$ is the kinetic energy determined by the object’s velocity and inertia matrix M.

Maintaining this time-varying cage requires coordinated finger actions including sliding for continuous contact adjustment, rolling for smooth surface transitions, and gaiting for discrete contact reestablishment. These motion primitives and underlying finger reconfigurations strategically evolve both geometric constraints and energy barriers over time, known as Caging in Time.

8.1.2. Extrinsic dexterity

Extrinsic dexterity leverages environmental features as complementary cage elements, extending manipulation capabilities beyond what is possible with end-effectors alone.

As aforementioned, in quasi-static setup, the state space $S_{obj} \subset S E (3)$ represents the object pose, while the cage is defined geometrically through the joint arrangement of the end-effector and environmental features such as surfaces, corners, and obstacles. Similarly to equation (3), these elements collectively form spatial barriers around $C_{free}^{obj}$ , preventing uncontrolled object motion while considering the potential energy from gravity and interactions (Mahler et al., 2016).

In dynamic scenarios, the state space also expands to $S_{obj} \subset S E (3) \times s e (3)$ . Similarly to the energy-based cage for in-hand manipulation, the cage can be defined as $S_{cage}^{t} = {(q_{t}, {\dot{q}}_{t}) \in S E (3) \times s e (3) | E_{p} (q_{t}) + E_{k} ({\dot{q}}_{t}) < E_{\max}}$ , where the kinetic energy of the controlled object momentum, the potential energy of the gravitational effects and the contact forces together create energy barriers that guide the motion of the object within admissible regions, allowing manipulation with fewer contacts.

The cage is maintained through strategic management of object-environment and object-robot contacts relative to environmental features, and multi-modal motion primitives like pushing, flipping, and grasping. Unlike traditional extrinsic dexterity approaches (Yang et al., 2023), Caging in Time explicitly plans transitions between contact states without requiring continuous feedback, extensive training, or specific object geometry limits, ensuring continuous caging in time despite uncertainties.

8.1.3. Deformable object manipulation

For deformable objects, each state $q_{t} \subset R^{3}$ directly represents the object’s continuous shape in $R^{3}$ within the state space $S_{obj} \subset 2^{R^{3}}$ , rather than a discrete configuration, naturally aligned with the PSS representation. These shapes can be encoded through parametric approaches with predefined models (Pokorny et al., 2013) or nonparametric methods using point distributions (Shi et al., 2023). In cases where objects are considered non-elastic, they maintain their deformed shape after contact is removed, allowing representation within $R^{3}$ without modeling time-dependent recovery dynamics.

The cage is formed through strategically distributed contacts that constrain the PSS while transforming the object shape. Using an estimated deformation model that can cover general PSS motion similarly to equation (9) in planar pushing, we can create spatial barriers that limit possible deformations from exceeding the cage, enabling shape control without requiring precise physical models of complex deformation or large amounts of data.

Unlike object pushing, the action space for deformable object manipulation encompasses material-specific motion primitives such as pushing, pinching, folding, and rolling. By sequencing these primitives strategically, Caging in Time creates virtual cages whose deformation directly controls object transformation, maintaining reliable manipulation even under occlusion or complex deformation scenarios.

8.2. Implementation guidelines

Adapting Caging in Time to new manipulation applications requires systematic consideration of several key elements:

State space formulation: Define $S_{obj}$ to capture task-relevant properties as mentioned above, along with the state propagation function discussed in Section 4.2:

\begin{aligned} {Q V}_{t} & = {(q_{t}, v_{t}) ∣ q_{t} \in Q_{t}, v_{t} \in U (q_{t}, u_{t})}, \\ Q_{t + 1} & = Π ({Q V}_{t}) \end{aligned}

(27)

Note that the propagation function Π can be implemented through analytical models as presented in this work or data-driven predictive frameworks such as neural networks and diffusion-based models.

Action space design: Define the action space $U$ that satisfies $\forall t, \exists u_{t} \in U : Π ({Q V}_{t}) \subset S_{cage}^{t}$ . The action space can be discrete as in our pushing tasks or continuous as in our ball balancing task with tilting angles, depending on task requirements and the control strategy.

Cage region: Design the cage in time $S_{cage}^{t}, t = 0,1, \dots, T$ to balance precision and robustness while ensuring the caging condition $Q_{t + 1} = Π ({Q V}_{t}) \subset S_{cage}^{t}$ holds. While our work mainly employed shape invariant cages, Caging in Time naturally supports the time-varying cage morphology where $S_{cage}^{t}$ can reshape to accommodate state-dependent constraints, enabling wider applications such as in-hand manipulation or deformable object manipulation.

Action selection: Determining optimal actions $u_{t} \in U$ requires a strategy that satisfies $Π ({Q V}_{t}) \subset S_{cage}^{t}$ . Our approach employed optimization-based methods like exhaustive search for planar pushing or quadratic programming for ball manipulation. Alternative paradigms such as reinforcement learning could extend the applicability to more scenarios. Importantly, as discussed in Algorithm 1 and Figure 21, hardware constraints must be incorporated to ensure that the actions can be deployed in real-world physical systems.

Computational efficiency: Current action selection and PSS propagation require 26.2 ± 8.5 ms (pushing) and 67.3 ± 12.6 ms (ball balancing) on a single CPU thread, with potential for optimization through GPU acceleration. Though currently offline-computed, these timings show feasibility for integrating Caging in Time into the online planning for more diverse and dynamic scenarios.

9. Conclusion

In this work, we proposed and evaluated Caging in Time, a novel theory for robust object manipulation. Our framework demonstrated robust performance in both quasi-static and dynamic tasks without requiring detailed object information or real-time feedback. Rigorous evaluations highlighted the framework’s resilience and adaptability to various objects and dynamic scenarios. The Caging in Time approach proved effective in handling new objects, positional perturbations, and challenging dynamic tasks, showcasing its potential for reliable manipulation in uncertain environments.

While the current Caging in Time framework shows promising results, it is important to acknowledge its limitations. Currently, the framework requires manual definition of task-specific parameters and the PSS propagation function. This reliance on human expertise may limit its generalizability to a wider range of manipulation tasks. Additionally, the current approach may not fully capture the complexity of certain real-world scenarios where object interactions and environmental factors are highly unpredictable.

Looking forward, we aim to address these limitations and expand the horizons of Caging in Time. A key direction for future research is the integration of learning techniques, LLMs, and diffusion models to autonomously construct and learn Caging in Time tasks. This approach could enable the framework to automatically derive appropriate propagation functions and strategies, reducing the need for manual parameter tuning. We also plan to explore its applications as mentioned in Section 8, while potentially leveraging reinforcement learning to handle increased task complexity and environmental variability.

Supplemental Material

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Science Foundation under grant FRR-2240040.

ORCID iDs

Gaotian Wang

Kejia Ren

Kaiyu Hang

Supplemental Material

Supplemental material for this article is available online.

Appendix

References

Akella

Mason

(1992) Posing polygonal objects in the plane by pushing.

Akella

Mason

(1998) Parts orienting with shape uncertainty. In: IEEE International Conference on Robotics and Automation, 565–572.

Akella

Huang

Lynch

, et al. (1997) Sensorless parts orienting with a one-joint manipulator. In: IEEE International Conference on Robotics and Automation, 2383–2390.

Althoff

(2010) Reachability analysis and its application to the safety assessment of autonomous cars.

Ames

Coogan

Egerstedt

, et al. (2019) Control barrier functions: theory and applications. In: 2019 18th European Control Conference (ECC).

Anand

Seel

Gjærum

, et al. (2021) Safe learning for control using control lyapunov functions and control barrier functions: a review. Procedia Computer Science 192: 3987–3997.

Andrychowicz

Baker

Chociej

, et al. (2020) Learning dexterous in-hand manipulation. The International Journal of Robotics Research 39(1): 3–20.

Bhatt

Sieler

Puhlmann

, et al. (2021) Surprisingly robust in-hand manipulation: an empirical study. In: Robotics: Science and Systems.

Billard

Kragic

(2019) Trends and challenges in robot manipulation. Science (New York, N.Y.) 364(6446): 333.

10.

Bircher

Morgan

Dollar

(2021) Complex manipulation with a simple robotic hand through contact breaking and caging. Science Robotics 6(54): eabd2666.

11.

Bohg

Hausman

Sankaran

, et al. (2017) Interactive perception: leveraging action in perception and perception in action. IEEE Transactions on Robotics 33(6): 1273–1291.

12.

Bohringer

Bhatt

Donald

, et al. (2000) Algorithms for sensorless manipulation using a vibrating surface. Algorithmica 26(3): 389–429.

13.

Bütepage

Cruciani

Kokic

, et al. (2019) From visual understanding to complex object manipulation. Annual Review of Control, Robotics, and Autonomous Systems 2: 161–179.

14.

Chavan-Dafle

Holladay

Rodriguez

(2020) Planar in-hand manipulation via motion cones. The International Journal of Robotics Research 39(2): 163–182.

15.

Dong

Pokorny

(2024) Quasi-static soft fixture analysis of rigid and deformable objects. In: 2024 IEEE International Conference on Robotics and Automation (ICRA), 6513–6520.

16.

Dong

Cheng

Friedl

, et al. (2023) Advancing robust multi-object manipulation with energy margin.

17.

Dong

Cheng

Pokorny

(2024) Characterizing manipulation robustness through energy margin and caging analysis. IEEE Robotics and Automation Letters 9(9): 7525–7532.

18.

Dong

Han

Cheng

, et al. (2025) Cagecoopt: enhancing manipulation robustness through caging-guided morphology and policy co-optimization.

19.

Erdmann

Mason

(1988) An exploration of sensorless manipulation. IEEE Journal of Robotics and Automation 4(4): 369–379.

20.

Goldberg

(1993) Orienting polygonal parts without sensors. Algorithmica 10(2): 201–225.

21.

Hang

Bircher

Morgan

, et al. (2021) Manipulation for self-identification, and self-identification for better manipulation. Science Robotics 6(54): eabe1321.

22.

Holladay

Paolini

Mason

(2015) A general framework for open-loop pivoting. In: IEEE International Conference on Robotics and Automation (ICRA), 3675–3681.

23.

Jankowski

Brudermuller

Hawes

(2025) Robust pushing: exploiting quasi-static belief dynamics and contact-informed optimization. The International Journal of Robotics Research.

24.

Jost

(2011) Riemannian Geometry and Geometric Analysis. Springer.

25.

Kaelbling

(2020) The foundation of efficient robot learning. Science 369(6506): 915–916.

26.

Kaelbling

Littman

Cassandra

(1998) Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(1): 99–134.

27.

Komiyama

Maeda

(2021) Position and orientation control of polygonal objects by sensorless in-hand caging manipulation. In: IEEE International Conference on Robotics and Automation (ICRA), 6244–6249.

28.

Kroemer

Niekum

Konidaris

(2021) A review of robot learning for manipulation: challenges, representations, and algorithms. Journal of machine learning research 22(30): 1–82.

29.

Kurniawati

Hsu

Lee

(2008) Sarsop: efficient point-based pomdp planning by approximating optimally reachable belief spaces.

30.

Lee

Zhu

Srinivasan

, et al. (2019) Making sense of vision and touch: self-supervised learning of multimodal representations for contact-rich tasks. In: IEEE International Conference on Robotics and Automation (ICRA), 8943–8950.

31.

Lohmiller

Slotine

JJE

(1998) On contraction analysis for non-linear systems. Automatica 34(6): 683–696.

32.

Lynch

(1999) Locally controllable manipulation by stable pushing. IEEE Transactions on Robotics and Automation 15(2): 318–327.

33.

Lynch

Mason

(1995) Controllability of pushing. In: IEEE International Conference on Robotics and Automation, 112–119.

34.

Lynch

Mason

(1996) Stable pushing: mechanics, controllability, and planning. The International Journal of Robotics Research 15(6): 533–556.

35.

Mahler

Pokorny

McCarthy

, et al. (2016) Energy-bounded caging: formal definition and 2-d energy lower bound algorithm based on weighted alpha shapes. IEEE Robotics and Automation Letters 1(1): 508–515.

36.

Mahler

Pokorny

Niyaz

, et al. (2018) Synthesis of energy-bounded planar caging grasps using persistent homology.

37.

Majumdar

Tedrake

(2017) Funnel libraries for real-time robust feedback motion planning. The International Journal of Robotics Research 36(8): 947–982.

38.

Makapunyo

Phoka

Pipattanasomporn

, et al. (2012) Measurement framework of partial cage quality. In: 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 1812–1816.

39.

Makita

Nagata

(2015) Evaluation of finger configuration for partial caging. In: IEEE International Conference on Robotics and Automation (ICRA).

40.

Makita

Wan

(2018) A survey of robotic caging and its applications. Advanced Robotics 31(19): 1071–1085.

41.

Manchester

Slotine

JJE

(2017) Control contraction metrics: convex and intrinsic criteria for nonlinear feedback design. IEEE Transactions on Automatic Control 62(6): 3046–3053.

42.

Mitchell

Bayen

Tomlin

(2005) A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games. IEEE Transactions on Automatic Control 50(7): 947–957.

43.

Olson

(2011) AprilTag: a robust and flexible visual fiducial system. In: IEEE International Conference on Robotics and Automation (ICRA), 3400–3407.

44.

Pereira

Kumar

Campos

(2004) Decentralized algorithms for multirobot manipulation via caging. Springer Tracts in Advanced Robotics V: 257–273.

45.

Peshkin

Sanderson

(1988) The motion of a pushed, sliding workpiece. IEEE Journal of Robotics and Automation 4(6): 569–598.

46.

Platt

Tedrake

Kaelbling

, et al. (2010) Belief space planning assuming maximum likelihood observations.

47.

Pokorny

Hang

Kragic

(2013) Grasp moduli spaces. In: Robotics: Science and Systems.

48.

Rakovic

Kerrigan

Kouramas

, et al. (2005) Invariant approximations of the minimal robust positively invariant set. IEEE Transactions on Automatic Control 50(3): 406–410.

49.

Rimon

Blake

(1999) Caging planar bodies by one-parameter two-fingered gripping systems. The International Journal of Robotics Research 18(3): 299–318.

50.

Rodriguez

(2021) The unstable queen: uncertainty, mechanics, and tactile feedback. Science Robotics 6(54): eabi4667.

51.

Rodriguez

Mason

Ferry

(2012) From caging to grasping. The International Journal of Robotics Research 31(7): 886–900.

52.

Shi

Clarke

, et al. (2023) Robocook: long-horizon elasto-plastic object manipulation with diverse tools. arXiv preprint arXiv:2306.14447.

53.

Silver

Veness

(2010) Monte-carlo planning in large pomdps.

54.

Song

Varava

Kravchenko

, et al. (2021) Herding by caging: a formation-based motion planning framework for guiding mobile agents. Autonomous Robots 45(5): 613–631.

55.

Stork

Pokorny

Kragic

(2013a) Towards postural synergies for caging grasps.

56.

Stork

Pokorny

Kragic

(2013b) A topology-based object representation for clasping, latching and hooking. In: IEEE International Conference on Humanoid Robots (HUMANOIDS), 138–145.

57.

Sudsang

Ponce

(2000) A new approach to motion planning for disc-shaped robots manipulating a polygonal object in the plane. In: IEEE International Conference on Robotics and Automation (ICRA), Vol. 2, 1068–1075.

58.

Suomalainen

Karayiannidis

Kyrki

(2022) A survey of robot manipulation in contact. Robotics and Autonomous Systems 156: 104224.

59.

Tedrake

Manchester

Tobenkin

, et al. (2010) Lqr-trees: feedback motion planning via sums-of-squares verification. The International Journal of Robotics Research 29(8): 1038–1052.

60.

Varava

Kragic

Pokorny

(2016) Caging grasps of rigid and partially deformable 3-d objects with double fork and neck features. IEEE Transactions on Robotics 32(6): 1479–1497.

61.

Varava

Welle

Mahler

, et al. (2019) Partial caging: a clearance-based definition and deep learning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 1533–1540.

62.

Varava

Carvalho

Kragic

, et al. (2020a) Free space of rigid objects: caging, path non-existence, and narrow passage detection.

63.

Varava

Carvalho

Pokorny

, et al. (2020b) Caging and path non-existence: a deterministic sampling-based verification algorithm. In: Springer Proceedings in Advanced Robotics. Springer, Vol. 10, 589–604.

64.

Varava

Carvalho

Kragic

, et al. (2021) Free space of rigid objects: caging, path non-existence, and narrow passage detection. The International Journal of Robotics Research 40(10): 1049–1067.

65.

Wang

Hirata

Kosuge

(2005) Deformable caging formation control for cooperative object transporation by multiple mobile robots. In: IEEE/ASME International Conference on Advanced Intelligent Mechatronics, 1158–1163.

66.

Welle

Varava

Mahler

, et al. (2021) Partial caging: a clearance-based definition, datasets, and deep learning. Autonomous Robots 45(5): 647–664.

67.

Song

(2024) Dynamics-guided diffusion model for robot manipulator design.

68.

Yang

Magnusson

Stork

, et al. (2023) Learning extrinsic dexterity with parameterized manipulation primitives. IEEE International Conference on Robotics and Automation (ICRA).

69.

Yuan

Dong

Adelson

(2017) Gelsight: high-resolution robot tactile sensors for estimating geometry and force. Sensors 17(12): 2762.

70.

Zarubin

Pokorny

Toussaint

, et al. (2013) Caging complex objects with geodesic balls. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2999–3006.

71.

ZhiDong

Kumar

(2002) Object closure and manipulation by multiple cooperating mobile robots. IEEE International Conference on Robotics and Automation (ICRA), 394–399.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

Caging in time: A framework for robust object manipulation under uncertainties and limited robot perception

Abstract

Keywords

1. Introduction

2. Related works

2.1. Perception assumptions in manipulation

2.2. Caging configuration-based manipulation

2.3. Theories for uncertainty handling

3. Preliminaries

3.1. Traditional caging as object closure

3.2. Generalized representations

4. Caging in Time

4.1. Propagation of PSS

4.2. Caging in Time

5. Quasi-static tasks

5.1. Problem statement

5.2. Unknown object shape and its motion

5.3. Cage the push in time

6. Dynamic tasks

6.1. Problem statement

6.2. Ball dynamics and PSS propagation

6.3. Cage the ball in time

7. Experiments

7.1. Quasi-static tasks

7.1.1. Evaluation of cage settings

7.1.2. Why caging in time

7.1.3. Comparison with a closed-loop method

7.1.4. In-task perturbations

7.2. Dynamic tasks

7.2.1. Dynamic sensitivity analysis

7.2.2. Ball balancing

7.2.3. Ball catching

7.2.4. Caging with uncertainty

7.2.5. Caging in higher dimensions

8. Applications in practice

8.1. Example applications

8.1.1. In-hand manipulation

8.1.2. Extrinsic dexterity

8.1.3. Deformable object manipulation

8.2. Implementation guidelines

9. Conclusion

Supplemental Material

Footnotes

Declaration of conflicting interests

Funding

ORCID iDs

Supplemental Material

Appendix

References

Supplementary Material