Abstract
Robotic manipulation of volumetric elastoplastic deformable materials, from foods such as dough to construction materials like clay, is in its infancy, largely due to the difficulty of modelling and perception in a high-dimensional space. Simulating the dynamics of such materials is computationally expensive. It tends to suffer from inaccurately estimated physics parameters of the materials and the environment, impeding high-precision manipulation. Estimating such parameters from raw point clouds captured by optical cameras suffers further from heavy occlusions. To address this challenge, this work introduces a novel Differentiable Physics-based System Identification (DPSI) framework1 that enables a robot arm to infer the physics parameters of elastoplastic materials and the environment using simple manipulation motions and incomplete 3D point clouds, aligning the simulation with the real world. Extensive experiments show that with only a single real-world interaction, the estimated parameters, Young’s modulus, Poisson’s ratio, yield stress and friction coefficients, can accurately simulate visually and physically realistic deformation behaviours induced by unseen and long-horizon manipulation motions. Additionally, the DPSI framework inherently provides physically intuitive interpretations for the parameters in contrast to black-box approaches such as deep neural networks.
Keywords
1. Introduction
Despite the recognised importance of robotic manipulation of deformable materials, this topic remains underexplored, particularly when it comes to high-precision manipulation of volumetric elastoplastic materials. A primary challenge in this area arises from the materials’ infinite degrees of freedom (DoFs), leading to highly unpredictable deformation dynamics.
The intrinsic complexity of these dynamics inhibits the direct application of conventional robotic motion planning methods, which typically require explicit physics models for all concerned objects (Latombe, 2012). Learning approaches, such as reinforcement learning (RL), often involve training an agent to learn to interpret its perception and take actions through inefficient trial & error in a realistic physics-based simulation (Collins et al., 2021; Kroemer et al., 2021), which is both challenging and largely unavailable when it comes to deformable materials.
In contrast to the well-studied rigid body dynamics in robotics, where motions can be predicted and controlled using well-defined equations of motion and deterministic models (Featherstone, 2014), deformable materials do not follow such straightforward patterns. Directly applying these methods is problematic because it is extremely difficult and often infeasible to accurately model and perceive real-world elastoplastic materials and measure the underlying physics parameters that govern their motions and deformations (Arriola-Rios et al., 2020; Yin et al., 2021). Additional challenges such as time-varying material properties (saturation, dryness, etc.) make the deformation dynamics even more unpredictable. As a result, achieving high-precision manipulation for such materials with motion planning or data-driven techniques is challenging due to the high computational cost and the lack of techniques to capture the dynamics accurately and efficiently.
To close this gap, this research proposes a Differentiable Physics-based System Identification (DPSI) framework for the robotic manipulation of volumetric elastoplastic materials. Our framework can efficiently estimate key physics parameters governing material deformation dynamics using minimal and simple manipulation motions (in less than 5 min with a good initial guess). The estimated physics parameters enable accurate material simulation for long-horizon predictions of real-world elastoplastic deformation behaviours.
The workflow of the proposed DPSI framework can be summarised as follows. As shown in Figure 1, a robot equipped with an in-hand 3D camera (Zivid) and three end-effectors is deployed to manipulate the elastoplastic object (e.g., play dough). Before manipulation, the robot takes multi-view point clouds of the object to minimise occlusions. These point clouds are used to create the initial particle system for the material point methods (MPM)-based simulation. The robot then performs a manipulation motion on the object and captures point clouds of the deformed state. Physics simulations of the same manipulation are run with the same initial state and motion, whose resultant particle states are compared to the real-world deformed state through variants of the Chamfer distance (CD) and the earth mover’s distance (EMD) loss. The parameters of the simulated physics models are updated after every simulation to minimise the loss. To facilitate fast optimisation, we use a differentiable simulator built on the TaiChi auto-differentiation mechanism, which allows automatic gradient computation from the losses and gradient-based optimisation for the physics parameters (Hu et al., 2019, 2020). The proposed system identification framework enables a robot to interact with elastoplastic material via simple manipulation motions (orange box) and then identify the physics parameters of the real-world manipulation dynamics. The parameters are found using gradients computed, through differentiable simulation, from a differentiable point-cloud-based similarity function between the real and simulated observations of the manipulated material (blue and red boxes). These parameters then enable accurate simulations that allow the real-world grounding of motion planning, trajectory optimisation or policy learning techniques (cyan box).
Our approach achieves unprecedented simulation-to-real alignment accuracy, characterised by the integration of the following novel features.
1.1. High-fidelity physics
Unlike previous methods that either employ non-physics-based models (e.g., neural networks) or highly simplified material geometry representations (e.g., sparse keypoints), we use high-fidelity physics-based simulation powered by the MPM (Jiang et al., 2016), which simulates materials as Lagrangian particles and keeps track of their positions and velocities. It achieves faster simulation by computing the motions, deformation gradients, and frictional contacts on a background Eulerian grid (Gao et al., 2017; Hu et al., 2018; Jiang et al., 2016; Stomakhin et al., 2013). MPM-based simulations provide highly efficient and realistic simulation with high physical plausibility by closely following real-world physics laws such as Newton’s laws and elastic and plastic energy conservation models.
1.2. Incomplete and noisy observations
Unlike existing works that rely on synthetic videos with complete sequences of perfect observations (Kaneko, 2024; Li et al., 2023; Murthy et al., 2020), our framework uses 3D point clouds to observe real-world object geometries. Capturing the full depth of an object during manipulation is impractical due to occlusions caused by the end-effector or environment. This means that only the point clouds before and after a manipulation motion are practical to obtain and informative enough to observe the full geometry of the deformed object in real-world experiments. In addition, real-world point clouds tend to suffer from inaccurately estimated camera matrices and sensory noises.
1.3. Small data, short and simple motions
Using extensive and diverse manipulation motions to collect real-world deformation data is time-consuming and costly. Existing studies demand a significant number of real-world interactions or complete sequences of simulation videos to identify object deformations under various motions, yet still resulting in simulations with insufficient accuracy for real-world applications (Kaneko, 2024; Li et al., 2023; Lin et al., 2022; Shi et al., 2023). Our goal is to recover physics parameters that enable accurate predictions of long-horizon, unseen and complicated elastoplastic material manipulation dynamics, using minimal simple and short real-world interactions.
1.4. Joint parameter estimation
We aim to jointly estimate the physics parameters provided by physics models. Besides Newton’s laws, we employ the fixed corotated elastic energy model (Stomakhin et al., 2012), the von Mises plasticity model (Jones, 2009) and the dynamic friction model in our simulation. These lead to six key parameters: Young’s modulus E, Poisson’s ratio ν, yield stress
1.5. Differentiable physics
Identifying the physics parameters in their discretised spaces via search or evolutionary algorithms is computationally slow due to the exponentially growing number of possible combinations as the discretisation becomes finer. While gradient-based optimisation methods offer faster convergence toward the minimum, it is infeasible with most physics simulations because many computation steps are not differentiable and these simulators do not support derivative computations. In this work, we explore the feasibility of optimising system parameters using gradients computed by differentiating loss functions through a physics simulator written by DiffTaiChi, a programming language tailored for GPU-accelerated parallel computation and automatic differentiation (Hu et al., 2019, 2020). DiffTaichi generates derivative functions for simulation steps via source code transformation that retains arithmetic intensity and parallelism. It uses a memory-efficient tape to record the order of computation kernels for forward simulation and traverses their derivative functions in the backward order to generate gradients through the computation graph. We build DPSI upon DiffTaiChi and explore the feasibility of directly optimising several physics parameters jointly with gradients computed by differentiating point-cloud-based loss functions through the high-fidelity physics simulator.
Substantial experiments demonstrate that our main contribution, DPSI, can achieve highly accurate simulation-to-reality alignment for elastoplastic materials manipulated by unseen, long horizon and complex motions using minimal simple and short interactions, and noisy and incomplete observations. Results show that when multiple solutions and parameter uncertainty exist, DPSI can provide physically intuitive parameter interpretations that can guide further system identification, model improvement, and motion adaptation. Statistics on the computation costs also indicate promising practical deployment of the DPSI framework.
The rest of the article reviews related literature, presents formally our method and experiment results, and discusses limitations and future directions.
2. Related work
2.1. Deformable object manipulation
Both model-free and model-based approaches have been taken for manipulating deformable materials. Existing model-free methods often lack manipulation precision due to the absence of physics laws that describe the motion and deformation under complex contacts (Cherubini et al., 2020; McConachie et al., 2020; Shen et al., 2024; Shi et al., 2023, 2024). Physics-model-based methods, while more accurate, struggle with aligning the simulated dynamics with the real world and often rely on simplified geometric representations for higher computational efficiency, sacrificing manipulation precision (Navarro-Alarcon et al., 2016; Shetab-Bushehri et al., 2023; Yang et al., 2023). With or without a model, many of them use simulations that offer a cost-effective way to test manipulation methods and allow the collection of massive data for learning-based approaches (Arriola-Rios et al., 2020; Collins et al., 2021; Yin et al., 2021). However, inaccurate dynamics predictions and control precision are unacceptable for many tasks such as surgery, assembly and disassembly. Therefore, akin to the way humans efficiently learn about object and environment physics properties, this work proposes to actively identify system parameters that enable high-precision simulation of real-world manipulation dynamics for volumetric elastoplastic objects.
2.2. Deformable object modelling
Simulators are essential for advancing robotic manipulation, providing a fast, low-cost alternative to real-world testing (Collins et al., 2021; Featherstone, 2014). In recent years, arguably the most efficient and accurate simulation for 3D deformable objects is achieved by the material point method (MPM) (Jiang et al., 2016). Like most well-known methods for simulating continuum matter, such as position-based dynamics (PBD) and smoothed particle hydrodynamics (SPH) (Yin et al., 2021), MPM represents the object as Lagrangian particles and keeps track of their positions and velocities. Unlike pure Lagrangian methods, MPM achieves faster simulation by computing the motions, deformation gradients, and frictional contacts on a background Eulerian grid (akin to the finite element method), governed by elastic and plastic energy functions and Newton’s laws in the form of partial differential equations (Gao et al., 2017; Hu et al., 2018; Jiang et al., 2016; Stomakhin et al., 2013). MPM has been proven superior to other methods in terms of efficiency and visual effects for objects that undergo large deformations, fractures, and self-collision (Hu et al., 2018). Despite its advantages, accurately identifying physics parameters for real-world robotic manipulation of elastoplastic objects remains an open challenge.
2.3. Deformable object system identification
Unlike rigid bodies (Chen et al., 2022; Heiden et al., 2022; Jaques et al., 2022), identifying the physics properties of deformable objects is more complex than for rigid bodies, primarily due to the difficulty in measuring key parameters like material properties and friction coefficients (Arriola-Rios et al., 2020; Yin et al., 2021). Previous research has mostly focused on linear and planar deformable objects, such as ropes and cloths (Caporali et al., 2024; Sundaresan et al., 2022; Yang et al., 2022).
Early works that sought to identify 3D volumetric material properties have much simpler assumptions and scenarios. For instance, one of the earlier works uses gradient-based optimisation to retrieve the stiffness of a spring system representing elastic deformable objects (Lloyd et al., 2007), while another work uses an exhaustive search method to find the value of Poisson’s ratio for an elastic form object (Güler et al., 2017). They focus on single parameter identification for volumetric elastic deformable objects with reduced Dofs that under-represent the geometries and deformation behaviours of real-life objects.
Recent efforts like GradSim (Murthy et al., 2020) have used differentiable physics and rendering to identify five parameters for elastic objects with much higher DoFs from single-view simulation videos, demonstrating the feasibility of differentiable system identification with synthetic videos. Two following works, PAC-NeRF (Li et al., 2023) and LPO (Kaneko, 2024) propose to jointly reconstruct object geometries (shapes, positions and colours) and physical properties by using a voxel neural radiance field (Sun et al., 2022) that performs differentiable rendering and allows gradients to be back-propagated from the image space to the Eulerian grid.
Compared to these works that focused on simulations, our study tackles the more challenging task of system identification for real-world objects through robot interactions. Similarly, we also employ the material point method (Hu et al., 2018) and DiffTaiChi (Hu et al., 2020) for differentiable physics simulation and study the system identification task without known object geometries. However, we aim to align simulations with real-world dynamics with minimal robot interactions, using only incomplete, occluded, and noisy point cloud data.
3. Method
3.1. Overview
The proposed differentiable physics-based system identification (DPSI) framework, as shown in Figure 2, can be divided into the following modules: differentiable dynamics modelling, real-to-sim object and trajectory reconstruction and the loss functions. This section starts with a formal problem description and elaborates on each of these modules. The overall workflow of the proposed differential physics-based system identification (DPSI) framework. Modules in each colour are elaborated in individual subsections of the Method section. Green: differentiable dynamics modelling and the physics parameters. Orange: real-to-sim object reconstruction. Blue: real-to-sim trajectory reconstruction. Pink: optimisation and evaluation loss functions.
3.2. Problem statement
We define the physics parameter identification problem for real-world deformable materials as follows. Denote
With these notations, the optimisation problem could be formulated as the minimisation of some distance function between the real-world and simulated observations after manipulation. As the parameters are optimised with a dataset of interaction experiences with the real objects,
The rest of this section will discuss the differentiable simulation dynamics f
sim
, the reconstructed object state,
3.3. Differentiable dynamics modelling
3.3.1. Material point method (MPM)
This work employs the mean-least-square material point method (MLS-MPM) to simulate the deformable object manipulation dynamics. MPM is a meshless, hybrid computation scheme that enables efficient computation and preserves high physical fidelity for various materials, especially for elastoplastic materials that undergo large deformation, while the MLS-MPM arises from a novel weak form discretisation of the conservation equations and replaces the shape functions in the force computation with MLS approximators, leading to faster and more realistic simulation of sharp separation of particles and two-way coupling with rigid objects that traditional MPM cannot simulate.
We modified the standard procedures in one simulation step of the MLS-MPM as shown below to incorporate end-effector control following Xian et al. (2023). The readers are referred to the original paper for more details about MPM (Stomakhin et al., 2013) and MLS-MPM (Hu et al., 2018). 1. Compute particle deformation gradient using the MLS approximation equation. 2. Applied plasticity to recompute deformation gradients and particle stress using the elastic energy model 3. Particle to grid. Use the affine particle-in-cell transform (Jiang et al., 2015) to transfer the velocities and masses of the Lagrangian particles to the background Eulerian grid nodes. In our implementation, we assume equal volume and mass for the particles. 4. Update end-effector positions given control inputs. 5. Compute grid node momenta and velocities with gravity applied. 6. Signed-distance field (SDF)-based collision detection with rigid objects (end-effector and boundaries) and frictional contact computation. 7. Grid to particle. Use the affine particle-in-cell transform to transfer the velocities and affine coefficients from the grid nodes to the particles. Perform SDF-based collision detection and frictional contact computation again to minimise particle penetration. 8. Update particle positions with the new velocities.
3.3.2. Elastoplasticity
We assume strain elastoplasticity for the studied objects, meaning that the strain-stress relationship of the material is described using the deformation gradient that can be decomposed into elastic and plastic parts,
For the n-th simulation step, we compute a trial elastic deformation gradient,
In this work, we follow Gao et al. (2017) to use the von Mises model (Jones, 2009) to compute the return mapping, which takes on the associative plastic flow assumption. The projection process of the trial stress outside of the yield region can be described succinctly as
Note that the Hencky strain formation is only used in plastic response computation. At each simulation step, we compute the trial deformation gradient using the MLS approximation equation, apply plastic response, and then compute the new stress with equation (5).
3.3.3. Frictional contacts
Collision detection is done by checking the distance of the particles to the surface of the rigid objects using pre-computed signed distance fields (SDFs). For each rigid object (table, and three end-effectors in our case), we employ the procedure described in Park et al. (2019) to generate SDFs for watertight meshes. We assume that the frictional contacts only happen in two cases: between the particles and the table, or between the particles and the end-effector. We also assume uniform friction coefficient distribution over the contact surface of the table and the end-effectors. When a particle or grid node and an object are in contact, we follow Stomakhin et al. (2013) to determine the velocity of the particle and grid nodes using dynamic friction with sticky impulse.
Specifically, for each particle, we calculate the local normal
3.3.4. Differentiable programming
Several programming tools are available for creating differentiable simulations, such as PyTorch (Arnavaz et al., 2023; Sundaresan et al., 2022), DiffTaiChi (Hu et al., 2020), Jax (Schoenholz and Cubuk, 2020) and neural-network-based simulators (Heiden et al., 2021). We build our simulator based on DiffTaiChi due to its automated differentiation mechanism, GPU-accelerated parallel computation, fast computation kernel evaluation, intuitive Python APIs, rich community support and various promising applications (Huang et al., 2021; Lin et al., 2022; Xian et al., 2023).
In particular, for each numerical computation step, DiffTaiChi flattens its computation branches (e.g., boundary and collisions) and replaces mutable local variables with extra local storage variables, producing straight-line codes without mutable variables. It then uses standard source code transform to generate the derivative function based on the adjoint method. To compute gradients with a loss function, DiffTaiChi records the order of computation kernels and the scalar variables in the forward simulation direction, and then it computes the gradients for the concerned variables by evaluating the derivative functions in the reversed simulation direction (Hu et al., 2020). We build a simulator based on this programming language to allow automatic gradient computation for the physics parameters.
In summary, the elastic and plastic models describe the deformation behaviours of the object using Young’s modulus E, Poisson’s ratio ν and yield stress σ y , the computation of the particle and grid velocities will involve another parameter, the object density ρ, and the frictional contact processes are controlled by respective friction coefficients: table friction coefficient μ t and manipulator friction coefficient μ m . We optimise these six parameters to align the simulation to the real-world dynamics.
3.4. Real-to-sim object reconstruction
3.4.1. Real-world platform
To collect real-world data, we set up a deformable object manipulation system where a Kuka IIWA LBR 14 industrial arm (Kuka, 2024) is equipped with a Zivid One + medium camera (Zivid, 2024) for perceiving the object, and one of the three end-effectors, namely a rectangular cuboid, a cylinder roller and a bullet-shaped object for collecting interaction experiences with different contact geometries, as shown at the bottom-left of Figure 1. We use plasticine (non-hardening modelling clay) as the main material for our experiments and cloud slime and soil for generalisation experiments. The clay and cloud slime are purchased from Amazon.2
3.4.2. Real-world perception
We create a multi-view point-cloud capture and fusion process to obtain the real-world observation of the deformable object
These observations are noisy and incomplete in three senses. Firstly, due to camera calibration error, the fusion result always exhibits a ∼±3 mm discrepancy. Secondly, due to joint limits, the robot arm cannot reach poses that allow the camera to capture the bottom part of the object. To simplify the problem, we assume that the angle between the object boundary and the table surface is equal to or greater than 90°, which allows the camera to capture as much as possible the bottom part of the object. We then project all points to its bottom to form a closed surface. Note that the initial configuration of the object can be manually shaped to satisfy this assumption of contact angle but the end configuration of the object after being manipulated is out of manual control, which tends to have more occlusions. Lastly, we only take the observations before and after applying the manipulation motions, without providing the intermediate observations during the manipulation, because it is impractical to do so when too much occlusion occurs during manipulation.
3.4.3. Reconstruction pipeline
To simulate the object as a set of particles, we design a pipeline to reconstruct the object particle system
For the end-effectors, we assume they are rigid bodies. In simulation, we keep track of the coordinate of its frame and a pre-computed SDF for collision detection.
3.5. Real-to-sim trajectory reconstruction
3.5.1. Manipulation motions
Waypoint designs and statistics of the interaction motions. For collecting optimisation data, there are two motions for each of the two levels of contact complexity. For out-of-distribution validation, one motion per end-effector is designed. Directions of the waypoints are relative to the robot base frame, as shown in Figure 1. Each waypoint of the real trajectories takes an uneven time interval, while each waypoint in simulation takes exactly 0.01 s.
The first contact level focuses on identifying the parameters that primarily govern the deformation of the object (Young’s modulus, Poisson’s ratio, yield stress and material density) using two poking motions, which press the object down by a certain distance. The second level further includes the friction coefficients, using two poking-shifting motions, which press down the object and make horizontal shifting movements. For out-of-distribution validation, three longer motions with more drastic contact processes are created. Each motion uses a different end-effector. The triple-poking motion is designed for the round end-effector to validate the long-term deformation prediction with small fraction influences. The flattening motion is designed to validate the long-term deformation and frictional contact prediction under large linear movements, using the cylinder end-effector. The pressing-180-rotating motion is designed to validate the long-term deformation and frictional contact prediction under large rotational movements using the rectangle end-effector.
All motions start from a configuration where the end-effector tip is positioned at the top centre of the object. ROS (Quigley et al., 2009) and the MoveIt!-based (Görner et al., 2019) OMPL planner (Sucan et al., 2012) are used to plan real-world motion trajectories τ real . For calculating each motion plan, we pass a series of waypoints to the MoveIt! planner by discretising each segment of the motion with an interval of 0.002 m or 5°.
A challenging phenomenon that occurs during our data collection process is that the object tends to stick to the end-effectors after contact is made. If we allow this to happen and assume that the object always drops down eventually, the optimisation process will be extremely difficult due to a large uncertain dynamics process of the object dropping from the air. For example, the object may bounce out of the workspace or even off the table. This further exacerbates the difficulty of optimising with only the start and end observations of the object. However, it is beyond the capability of current perception hardware and, thus, a future research direction. To simplify the problem, we take a simple workaround by covering the end-effector with a thin layer of flour before each manipulation motion is executed. We sink the effector into the flour and flick it gently to ensure that it is only covered by a thin layer of flour. This greatly prevents the sticking phenomenon from happening. Note that the end-effector friction coefficient we are optimising for is then the one covered by flour instead of the original value.
3.5.2. Time duration constrained reconstruction
To simulate end-effector motions, the real-world trajectory generated by MoveIt! is inconvenient as it has uneven time differences between consecutive waypoints and the simulation can only handle a constant step size dt. Thus, for a motion segment between a pair of waypoints, we reconstruct its simulation counterpart to have constant velocity by dividing the travelled distance by the real-world time duration, which is provided by the motion planner, and then we discretise the segment with a constant dt. In this study, we set dt = 0.01 second for better simulation stability. A larger dt will result in a too-high compounding error during simulation stepping, while a too-small dt will demand too much computation. The statistics of the real and reconstructed motions are summarised in Table 1. It can be seen that the validation motions are much longer than the optimisation ones. The MoveIt! trajectories and the reconstructed ones are saved as .npy files and will be open to the public upon acceptance.
3.6. Loss functions
We use four loss functions to calculate the difference between the simulated and real object states. As we are dealing with points and particles, it is natural to select point-based metrics. Therefore, we employ the two most common distance metrics for point sets, namely, the Chamfer distance (CD) and the earth mover’s distance (EMD). Given two point sets
To calculate the loss between the real and simulated end configurations, one can use the original fusion point cloud
Finally, although the CD and EMD losses are very common in calculating point set distance, they are not intuitive to visualise and understand. Also, as revealed by the experiment results shown in Section 4.3, the CD and EMD losses focus on different spatial aspects of two point sets, which make them biased for result analysis across all loss functions. To make results easier to analyse and compare in a more unbiased way, we further calculate the heightmaps of the fusion point cloud and the simulated particles, denoted as
In practice, we use the fusion point cloud without downsampling to compute
4. Results
This section presents the design of the experiment and the result analyses including the optimisation and generalisation performances of the DPSI framework in identifying the physical characteristics of elastoplastic matter through simple robot manipulation.
4.1. Experiment design
The performance of the DPSI framework is examined in three steps: loss landscape analysis, in-distribution performance analysis and out-of-distribution generalisation analysis. The first step visualises the four loss functions (PCD-CD, PRT-CD, PCD-EMD and PRT-EMD) to help understand how the physics parameters are related to the loss value distributions. The second step investigates the in-distribution performance of the proposed DPSI framework, that is, whether it can produce realistic simulations for unseen object configurations with the same motions used in parameter identification. The loss landscape and in-distribution performance analyses are carried out at two levels of contact complexity. At each level, the performances are compared across four loss functions and five optimisation datasets (see data collection description below). The last step explores the out-of-distribution generalisability of the DPSI by inspecting the simulation accuracy of three unseen longer motions using identified parameters from the contact complexity level 2 experiment.
4.1.1. Data collection
For each optimisation motion (i.e., Poking-1, Poking-2, Poking-shifting-1, Poking-shifting-2), we collect datapoints by executing the motions on the objects with different end-effectors. Each datapoint contains the multi-viewpoint point clouds of the object before and after a manipulation motion. For each of the two complexity levels, five optimisation datasets are created to examine the data hungriness of the proposed DPSI framework.
With level (Lv.) 1 as an example, the first dataset is created with both motions (Poking-1 and Poking-2). Each motion is performed twice using three different end-effectors. Therefore, 6 datapoints are created for each motion, and a total of 12 datapoints are created for both motions. The second dataset for Lv. 1 is similar, except that only one datapoint is created from each motion-effector pair, resulting in three datapoints for each motion, and a total of six datapoints for both motions. The other three datasets for Lv. 1 are more straightforward, with only one datapoint collected for each dataset. The three datapoints are created using the three end-effectors individually, performing the second motion from Lv. 1 (i.e., Poking-2). We name these datasets as 12-mix, 6-mix, 1-rectangle, 1-round and 1-cylinder, respectively. Data collection for Lv. 2 is similar, except that the two motions of Poking-1 and Poking-2 are replaced by Poking-shifting-1 and Poking-shifting-2, respectively. The datasets will be open-sourced upon the acceptance of the article.
For in-distribution validation, with each contact complexity level, we collect two extra datapoints from the second motion (i.e., Poking-2 and Poking-shifting-2) with all three end-effectors, resulting in 12 validation datapoints. For out-of-distribution validation, where we collect two datapoints with each of the three long-horizon motions, resulting in six datapoints.
For each datapoint, the real-world object is initially roughly shaped into a convex shape. We then acquire its point cloud by fusing the captured multi-view data. Next, we calculate the coordinates of the top centre of the object, where the end-effector tip will be moved to, and the motion will always be executed from there. We perform the same point cloud capturing and fusion process after a motion is executed. Each datapoint takes no more than 2 min to collect, with most of the time spent on capturing point clouds.
4.1.2. Optimisation and validation
At each contact level, for each pair of loss function and dataset, gradient descent is carried out with the Adam algorithm (Kingma and Ba, 2015) with three random seeds for 100 iterations (gradient updates). For each datapoint, the simulation loads the initial particle configuration, simulates the motion and produces the resultant particle state, which is used to compute losses and gradients. Each optimisation iteration goes over all datapoints within a dataset and takes the average gradients to update the parameters. In-distribution validation is done after every gradient update with the validation dataset, simulating all datapoints and calculating the losses.
Step sizes and the ranges of values for the parameters: Young’s modulus E, Poisson’s ratio ν, yield stress
4.2. Loss landscape analysis
The loss landscapes, including point-cloud and particle chamfer distance (PCD CD, PRT CD), point-cloud and particle earth mover’s distance (PCD EMD, PRT EMD), are computed over three pairs of physics parameters: (E, ν), ( Loss landscapes (centralised to have zero means) at level-1 (left) and level-2 (right) contact complexity over pairs of physics parameters. Darker colours represent higher loss values. Parameters: Young’s modulus (E) against Poisson ratio (ν), yield stress (
Firstly, we compare the landscapes in Figure 3 vertically to examine the sensitivity of the loss functions against different parameters. At contact complexity level 1, it shows that the CD and EMD losses exhibit quite similar value changing directions along the E, ν and ρ axes, while quite the opposite directions along the
These observations indicate that, with the adopted von Mises plasticity model, whose plastic deformations are governed by the yield stress (
Secondly, by comparing the landscapes horizontally, Figure 3 shows the distributional patterns of most (not all) loss landscapes are quite similar across the five datasets. This means that the loss values, and thus the optimisation processes, are not sensitive to the number of datapoints used in computation. This observation allows us to expect the recovery of physics parameters with small data, even with a single interaction experience.
Thirdly, many of the loss distributions display large areas of flat regions, where the loss values are very similar. This indicates many optimisation saddle points and that multiple parameter value combinations may serve as plausible solutions, describing the same physical characteristics. Whether this is an issue from an inaccurate physics model or a general fact of the real-world dynamics remains to be determined.
4.3. In-distribution performances
This subsection investigates the in-distribution performance of the parameter identification task through gradient descent at two levels of contact complexity. For each level, experiments are conducted with the four loss functions and five datasets. To evaluate the performances thoroughly, the four loss functions and a heightmap-based distance function are used to evaluate the differences between the real and simulated manipulation results using the in-distribution validation dataset. This enables the observation of the influences on other distance functions from minimising each objective. In particular, this subsection investigates the following questions. • Can the loss functions be minimised (does optimisation converge)? • Do the loss functions agree with each other in such system identification tasks? • If local minima appear and multiple solutions exist, do they produce visually distinct manipulation results? • How does the number of datapoints affect optimisation? • Does DPSI produce parameter values that are physically realistic and interpretable? Are they close to the parameter values reported in relevant literature and studies?
4.3.1. Quantitative results
We start by analysing the quantitative results of the parameter identification task. The changes of the validation losses and parameter values over the course of optimisation at both contact complexity levels are shown in Figures 4 and 5. The best parameter values corresponding to the lowest validation heightmap loss among the three random seeds are summarised in Table 3. Validation losses (top five rows) and parameter values (last four rows) over 100 gradient updates at level-1 contact complexity. Each column presents the results of optimising with a different dataset. Each row shows the changes of an evaluation metric or a parameter, denoted on the left. In each figure, different colours indicate the results of minimising a different loss function, as labelled by the legend on the top. For the top five rows, each colour has three dotted lines corresponding to the results of three random seeds and a solid line corresponding to their means. For the last four rows, each colour has three solid lines corresponding to the results of three random seeds. Validation losses (top five rows) and parameter values (last six rows) over 100 gradient updates at level-2 contact complexity. Each column presents the results of optimising with a different dataset. Each row shows the changes of an evaluation metric or a parameter, denoted on the left. In each figure, different colours indicate the results of minimising a different loss function, as labelled by the legend on the top. For the top five rows, each colour has three dotted lines corresponding to the results of three random seeds and a solid line corresponding to their means. For the last four rows, each colour has three solid lines corresponding to the results of three random seeds. The parameter values corresponding to the lowest validation heightmap distance in each optimisation case at both contact complexity levels. Bolded texts denote the three lowest losses at each level. It can be seen that the PRT EMD loss tends to produce the best results and DPSI can find parameters that outperform hand-picked parameters (from the geotechnical literature) with even just one interaction datapoint.

First of all, we start by observing the tendency of convergence. The top five rows in Figures 4 and 5 show that, at both contact complexity levels, most of the individual validation loss curves (dotted lines) tend to stabilise and converge, which indicates that DPSI can effectively converge to local minima. The last four rows in Figure 4 and the last six rows in Figure 5 reveal that the parameters converge to different solutions. Table 3 also shows that multiple parameter solutions exist for a low validation heightmap distance at both levels of experiments. These observations mean the parameter identification task at both contact complexity levels does converge but has multiple local minima and solutions, aligned with the large flat regions observed from the loss landscapes.
However, the parameter distributions in Figure 5 clearly show that some parameters converge to much smaller and distinct value regions than those found in the level-1 experiments. More specifically, the values of Young’s modulus (E), Poisson’s ratio (ν) and yield stress (
Also, some parameter values are closer to empirical studies of soft/hard/saturated clay in the level-2 experiments. For example, Young’s modulus values are closer to the reported range of 5000 to 54000 kPa, the material density seems to be closer to 1400 kg/m3, and Poisson’s ratio is closer to the reported range of 0.4 to 0.5 (Waheed and Asmael, 2023; StructX, 2014–2024). However, the recovered yield stress values are higher than the reported values for clays (210 to 600 kPa) (Rehman et al., 2018) because the reported values were found to make the material collapse in simulation. Based on the empirical values from Waheed and Asmael (2023); StructX (2014-2024) and the loss landscapes, we hand-pick a set of parameter values and calculate the validation losses at both levels. As denoted in Table 3, the resultant performances are reasonable but not as good as the performances of the DPSI method.
Secondly, the top five rows in Figure 4 reveal that, in most cases, the CD and EMD losses have a negative correlation. For example, the red curves of the first column show that minimising the PRT EMD loss reduces both EMD losses and the heightmap distance, but increases both CD losses. Also, Table 3 shows that optimising the CD losses tends to produce large yield stress values (
Thirdly, the results show that it is possible to obtain comparable performances with just one datapoint at both contact complexity levels. This can be concluded by comparing the results horizontally: (1) most validation curves of the top five rows in Figures 4 and 5 with 1-datapoint show highly similar tendencies with the 6-mix and 12-mix datasets, and (2) the last four rows in Figure 4 and last six rows in Figure 5 also show that the parameters found with different datasets mostly converge to similar value regions. Moreover, Table 3 shows that the lowest validation heightmap distances at level-1 contact complexity and the third lowest at level-2 are achieved by optimising with the 1-rec. and 1-round datasets. These results are aligned with the analysis of the loss landscapes, where similar loss distributions are observed among different datasets.
4.3.2. Qualitative results
We now examine the manipulation processes simulated using the best physics parameters corresponding to the lowest validation heightmap losses presented in Table 3. Figure 6 shows the resultant particles and heightmaps after the manipulation of objects at two contact complexity levels with three end-effectors (two object configurations are included for each contact level and end-effector). The particles and heightmaps of the objects after applying the second motions at contact level 1 (a and b) and 2 (c and d), simulated with the best set of physics parameters. Darker colours of the heightmaps indicate greater heights of the object. For each combination of the loss function and dataset at each level, three trajectories on two object configurations are simulated (3 effectors × 2 datapoints). There are three groups in each subfigure, with the ground truth point clouds and heightmaps placed in the leftmost column. In each group, a row shows the results corresponding to a loss function and a column results corresponding to a dataset. (a) Level-1 contact, object configuration 1; (b) Level-1 contact, object configuration 2; (c) Level-2 contact, object configuration 1; and (d) Level-2 contact, object configuration 2.
Firstly, all the simulated particles and heightmaps post-manipulation are highly similar to the ground truths across the loss functions and datasets. This shows that minimising any of the four loss functions with any of the five datasets can indeed reproduce visually plausible manipulation results close to the real-world system. It indicates that DPSI is not data-hungry and it is robust to the choice of common point-based loss objectives. These characteristics are highly preferred in robotic applications, as they lead to smaller data collection costs, simpler observation preprocessing, and simpler loss function engineering.
Secondly, by examining the details of the heightmaps more carefully, it shows that optimising the CD losses tends to produce heightmaps with darker colours (which represent greater heights of the object) while optimising the EMD losses tends to produce heightmaps where the objects look slightly bigger and wider. This is especially true at level-1 contact complexity, where the heightmaps in the first two rows for each object configuration in Subfigures 6(a) and 6(b) are generally darker than those in the last two rows (see Figure 7 for enlarged examples). At level 2, the PRT CD loss sometimes changes its focus on the x and y directions, producing lighter-colour heightmaps (see the second rows of the heightmaps associated with the rectangle and round end-effectors in Subfigures 6(c) and 6(d)). This observation indicates that optimising the CD losses tends to produce particles that match the height of the ground truth shapes, that is, in the z direction, while optimising the EMD losses tends to match the x and y directions. Selected examples from Figure 6(b) to show that the CD losses produce darker-coloured heightmaps while the EMD losses produce lighter-coloured and wider heightmaps, which can be intuitively explained by the optimised yield stress values and their potential difference in spatial focus.
This may be caused by the fact that these losses distribute differently along the yield stress axis (the CD and EMD losses at level 1, the PRT CD loss at level 2). According to physics intuitions, a smaller yield stress will cause the object to yield more easily and respond more drastically to the poking forces, hence the more spreading in the x and y directions and more compressing in the z direction, and vice versa. One can see that a greater yield stress value in Table 3 corresponds to a darker heightmap in Figure 6, and vice versa.
Thirdly, though the particles and heightmaps are very similar, they are produced by quite different parameter combinations. From the results at contact level 2 in Subfigures 6(c) and 6(d), one can observe a correlation between the friction coefficients and the material density—the three key parameters that determine how much the object will be moved in the shifting motion direction. The parameters in Table 3 and their corresponding visualisations exhibit the following physically plausible relationships (see Figure 8 for a few examples): 1. With similar weights (ρ), the object is moved at a longer distance when the manipulator frictions (μ
m
) is greater (compare the parameters and visualisations between the results from the PCD EMD loss with the 12-mix and 1-rec. datasets) 2. With similar manipulator frictions (μ
m
), the object is moved further when the table friction (μ
t
) is low (compare the parameters and visualisations between the PCD CD loss with the 1-round dataset and the PRT EMD loss with the 12-mix dataset) 3. With similar friction values, the object is moved further when it is lighter (compare the results from the PCD EMD loss with the 12-mix and the 1-cyl. datasets) Selected examples from Figure 6(d) to show the physically realistic associations between object movements and three physics parameters, demonstrating the physical plausibility of different physics parameter values.

These relationships between parameters and visualisations align with our understanding of real-world physics, demonstrating that DPSI can produce physically plausible and interpretable parameter values. Moreover, the variations in parameter values suggest that, aside from the mismatch of the models and the materials, there is hardly any fixed set of parameter values that can simulate a material in every condition. These parameters will always have some differences in different conditions (e.g., heat, light, and humidity) and they change over time (e.g., drying). Therefore, the fact that DPSI can estimate the parameters with only one interaction makes it highly promising for fast parameter re-identification that can be performed online to recalibrate simulation accuracy efficiently, eliminating the need to keep track of previously identified parameter values.
Fourthly, the visualisations in Figure 6 also reveal limitations and potential improvement directions of the physics model. By comparing with the ground truths, the simulated contact areas of the objects always deform more sharply with insufficient elastic returning. The real objects, on the other hand, tend to respond more elastically after being plastically deformed, hence the higher and smoother surfaces shown in the ground truths. This is more obvious for contact areas with sharp edges, such as the four edges of the rectangle end-effector or the sharp sides of the cylinder end-effector. In addition, at level-2 cases, with the effectors shifting horizontally, some simulated particles are unrealistically displaced and stay floating. In the real world, the displaced parts would fall because of gravity. What’s more, the shifting motion in the real world causes the whole object to tilt in the moving direction, while in the simulation the contact impact tends to remain in a much smaller region around the contact area. These inconsistencies can be caused not only by inaccurate modelling, but also by various computation approximations, such as time integration method, simulation step size, and contact handling.
4.3.3. Summary
In short, this subsection presents a detailed examination of the in-distribution performances of DPSI with motions at two levels of contact complexity. By analysing the validation loss curves, the parameter values and their manipulation visualisations, we can answer the proposed questions: • The loss functions can be minimised, demonstrating the feasibility of DPSI even in the presence of noisy and incomplete point cloud observations. • The loss functions produce similar visualisations, but do not always agree with each other on the found parameter values. The CD and EMD losses seem to focus on quite different spatial aspects of the point sets. They distribute differently along the yield stress ( • There are many local minima and possible parameter solutions, but they produce visually and physically similar manipulation results with physically intuitive interpretability. • The number of datapoints has a minor influence on the optimisation performances both quantitatively and qualitatively, indicating that DPSI is not data-hungry even in the presence of real-world perception challenges. • Discussions on the visualisations and parameter values show that DPSI can produce physics parameter values with physically realistic and interpretations.
4.4. Generalisation
This subsection will look at the out-of-distribution performance of DPSI by visualising the manipulation processes of three unseen and much longer motions that induce more complex contact dynamics with the best parameters (Table 3) found at the level-2 contact complexity identification task.
The lowest pixel-wise heightmap distances (in mm) achieved by the three unseen motions (each with two object configurations), simulated by the best parameters found at the level 2 contact complexity identification task.
Bolded texts denote the top three lowest losses. Underlined texts denote the losses achieved by the cases which have achieved the top-three lowest in-distribution validation performances.

The long-horizon manipulation motions. Every subfigure visualises from top to bottom the real manipulation process and its simulations with the top two best DPSI-recovered parameters and hand-picked parameters. (a) Triple-poking, object configuration 1; (b) triple-poking, object configuration 2; (c) flattening, object configuration 1; (d) flattening, object configuration 2; (e) poking-180-rotating, object configuration 1; and (f) poking-180-rotating, object configuration 2.

The particles and heightmaps of the objects after applying the long-horizon unseen motions, simulated with the best set of physics parameters found at the level 2 contact complexity experiments. Darker colours of the heightmaps indicate greater heights of the object. For each motion, two object configurations are manipulated with three end-effectors. The results for each trajectory are grouped with the ground truth placed on the left. In each group, a row shows the results corresponding to a loss function and a column results corresponding to a dataset. (a) Object configure 1 and (b) object configure 2.
Firstly, in Figure 9, (as well as the animation video), it can be seen that, for all motion and object configuration pairs, the real and simulated trajectories are highly similar. Figure 10 also shows that, although discrepancies exist, the simulated particles and their heightmaps of the manipulation results are also highly similar across the loss functions and datasets. This visual similarity proves that DPSI can recover physics parameters from simple interactions that allow the accurate simulation of unseen, longer, and more complex manipulation motions, exhibiting robustness against perception and data challenges. Moreover, we note that it is the single-datapoint cases that achieve 11 out of 18 top-three lowest heightmap losses across the three unseen motions as shown in Table 4, where 2 achieved by the hand-picked parameters and 5 by the 12 mix and 6 mix datasets.
Secondly, the found physics parameters retain their physical meanings and interpretability for the unseen and more complex motions. Evidence supporting this can be found in Figure 10 produced by different parameters (see Figure 11 for selected examples). • The first example is that when the yield stress ( • Another example is that, when moving the cylinder end-effector along the x axis after pressing down the object, a higher manipulator friction coefficient (η
m
) causes larger whole-body displacement of the object, while a smaller value causes the effector to slip away and compress down the particles along its moving direction. This is shown more clearly in the bottom-right case (cylinder end-effector) of Figure 10, where the resultant object bodies that are more concentrated to the heightmap centres correspond to larger manipulator friction coefficients, while the objects whose upper part of the heightmaps are flattened correspond to smaller one. • The last example is that, for the poking-180-rotating motion results shown at the top of Figure 10, when the objects are more rotated by the end-effector, at least two of the following three conditions can be verified true from Table 3 at the same time: a relatively small material density, a relatively small table friction coefficient, and a relatively large manipulator friction coefficient. This is physically intuitive as any two of the three conditions would make the object easier to move. Selected examples from Figure 10 to show the physically realistic associations between the manipulation results and the found parameter values. Comparisons to the ground truths reveal the modelling limitations (insufficient elastic returning).

Thirdly, the found solutions are not globally optimal. Despite the visual similarity, Table 4 shows that the best generalisation results are not always achieved by the cases with the best in-distribution validation performances, as the underlined and bolded texts rarely coincide. In addition, though the hand-picked parameters do achieve the best results in two out of six object simulations, they are not the best fit for every scenario. This again suggests the importance of re-fitting the parameters on the fly for every new scenario as there are hardly two objects that are identical especially for elastoplastic materials. Qualitatively, from the heightmaps and trajectories in Figures 9 and 10, one can also recognise certain visual discrepancies between the real objects and their simulation counterparts, indicating a certain degree of mismatch between the found parameters and the actual solutions. This could be improved by further diversifying the data (and the motions that produce them) used for optimisation. Other directions for improvement may include better observational noise removal or better physics model selection/approximation.
Finally, the modelling limitations observed in the in-distribution visualisation still exist in the unseen motion visualisations (Figures 9 and 10). The insufficient elastic returning at contact areas, especially with sharp manipulator edges, can be seen from all visualisations where the primary contact areas of the object rarely return to heights close to the real object (compare also the enlarged heightmaps in Figure 11). Floating displaced particles can be observed from the poking-180-rotating motion visualisations, where many particles are carried away during rotations. The insufficient spreading of the influence of contacts can be recognised from the first object configuration of the flattening motion visualisations where the object is not tilted enough in the second half of the trajectory. See Figure 12 for enlarged examples. Selected frames from two trajectories in Figure 9 to show modelling limitations (floating particles and insufficient spreading of contacts). The areas around the green cycles show that the second part of the flattening motion does not tilt the object up, and there are floating particles in the poking-180-rotating motion.
In summary, the results and discussions above strongly support the argument that DPSI can recover physics parameters that produce the accurate simulation of unseen, longer, and more complex deformable object manipulation dynamics by optimising with very few real-world noisy and incomplete point-cloud data collected by simple and short interacting motions.
4.5. Cloud slime and soil materials
This subsection reports the DPSI performances with cloud slime and soil materials. Note that we use the same elastoplastic physics models in simulation. To do so, we collect a single interaction data point for each material using the cylinder end-effector and the poking-shifting-2 motion, optimise the parameters with the PRT EMD loss from a randomised initialisation for 100 iterations with three random seeds, and validate the results on another data collected with the flattening motion and the cylinder end-effector. The real manipulation trajectory of the flattening motion and its simulations based on the found parameters are shown in Figure 13. The recovered parameters are presented in Table 5. Comparison of the real manipulation trajectories and the resultant height maps for the cloud slime and soil materials and their simulation counterparts based on the recovered physics parameters. In each subfigure, the second to last rows correspond to the results with three random seeds. (a) Cloud slime and (b) soil. Physics parameters recovered by the DPSI method for cloud slime and soil materials with three random seeds.
Figure 13 shows that DPSI manages to recover parameters that simulate cloud slime and soil as closely as possible despite the mismatch of the physics models and the target materials. Especially in the case of soil, Table 5 shows that DPSI finds large table friction coefficients and small end-effector friction coefficients to simulate the effect of the soil being flattened, although these are likely not close to the actual values. This suggests that DPSI will seek to find parameters that make the deformations of the simulated object as realistic as possible, regardless of whether the physics models are a good match for the real materials. On the other hand, this also suggests that the manual selection of the physics models is important for DPSI to produce the best deformation simulation.
4.6. Cause and treatment to local minima
As shown by the loss landscapes and the optimisation results, although DPSI can find good solutions, it could get stuck in local minima. Therefore, we performed extra experiments to optimise the PRT EMD loss with the three 1-datapoint datasets at contact level 1 with eight more random seeds to examine the random seed sensitivity of DPSI. The results are shown by the black dotted lines in Figure 14. Validation losses (top five rows) and parameter values (last four rows) over 100 gradient updates at level-1 contact complexity from optimising the loss functions with an extra set of 8 random seeds. Solid lines (highly overlapped) are results with the same initial solution, while dotted lines show results of optimising the PRT EMD loss with randomised initial solutions. Each column presents the results of optimising with a different dataset. Each row shows the changes of an evaluation metric or a parameter, denoted on the left. In each figure, different colours indicate the results of optimising a different loss, as labelled by the legend on the top.
First of all, these dotted lines show consistent results with the previous conclusions: they converge to stable (though different) loss and parameter values, still reveal the negative relationship between the CD and EMD losses, and can achieve comparable performances with results from larger datasets. On the other hand, it can be seen that the main cause of converging to different solutions is the different initial parameter values, which were determined by the pseudo-random process controlled by different random seed values. This is not surprising as gradient-descent algorithms with complex physics problems are known to be sensitive to the choice of the initial solution (Antonova et al., 2023; Hu et al., 2020).
Hand-picked physics parameter values based on the loss landscapes, published literature and open-source website in geo-technical engineering (Rehman et al., 2018; Waheed and Asmael, 2023; StructX, 2014–2024). The yield stress value is slightly higher than the reported values because the reported values were found to cause the material collapse in simulation.
4.7. Remark on CD and EMD losses
As mentioned previously, the CD and EMD losses seem to focus on different spatial aspects of the object’s geometry, leading to deformations that match either the horizontal or the vertical directions. With the results of optimising the four losses with the same initial solutions shown in Figure 14, we can analyse the differences between CD and EMD losses with hollow or filled point sets.
4.7.1. Hollow or filled point sets
For the CD losses, it can be seen by comparing the blue and orange solid lines in Figure 14. The blue solids in the first two rows indicate that optimising the CD loss with a hollow point cloud target can achieve smaller losses in both CD loss metrics. However, the third to fifth rows in Figure 14 show that optimising CD with a filled particle system target (PRT CD loss) outperforms PCD CD in terms of both EMD loss metrics and the height map distance. On the other hand, optimising the EMD loss with a filled particle system target (red lines) exhibits the same pattern, consistently showing inferior performances in terms of the CD loss metrics while outperforming the PCD EMD losses (green lines) in terms of both EMD loss metrics and the height map distance. Therefore, the results indicate that optimising with a filled particle system target generated from a reconstructed watertight mesh is better than optimising with a hollow point cloud target.
The cause behind it is the inaccurate point-to-point correspondences generated with a hollow point set. As shown by Figure 15, a hollow point set either averages the spatial information of an area of particles over a thin layer of target points in computing the CD loss or ignores the information of a large body of particles within the particle body in computing the EMD loss. On the contrary, with a non-hollow target, the point correspondences connect spatially closer points and do not disregard information within the object body. Correspondence examples for the CD and EMD computations with hollow and filled point sets as targets. Colours indicate corresponding points. Note the CD loss is directional.
4.7.2. CD or EMD
Figure 14 shows that either with a hollow or non-hollow target, the EMD losses (green and red lines) consistently outperform the CD losses (blue and orange lines) in terms of the EMD and height map distance metrics while being inferior in terms of the CD metrics. Again, we believe that this is another consequence of the different ways CD and EMD compute the point correspondences.
Figure 15 shows that, with a hollow target, CD distributes the spatial information of the points within over the surface points, which is problematic. Take the shape in Figure 15 as an example, the particles above the target surface will be forced down, while the particles below the target surface will be pushed up, which is not supposed to happen. On the other hand, the EMD loss does not suffer from this effect because it corresponds the particles to the closest points from the target surface only. With a non-hollow target point set, the CD loss produces different correspondences in the two computation directions, which results in a mixture of correct and wrong spatial relations, while the EMD loss is more likely to find the best match spatially due to the one-to-one correspondence generated by a linear assignment algorithm.
In short, this analysis leads to the conclusion that the PRT EMD loss, which computes point correspondences between the filled simulation particle system and a non-hollow target point cloud, is the most ideal choice for capturing the correct spatial differences between point sets.
4.8. Remark on computation cost
Computational costs (time and GPU memory consumption) of the TaiChi-based differentiable simulator used in the DPSI framework with an Nvidia GeForce RX 4090 GPU. The costs of simulations with three motions and particle densities are reported. cp.: compile-time costs. The first time a program is run takes extra time to compile TaiChi kernels. 100 FB-iterations (min): the runtime of 100 forward and backward computations in minutes.
It can be seen that the computation time and memory increase as the number of simulation particles and the length of motion increase. Because of the pre-compilation of kernels when they are first called in a TaiChi programme, the first forward and backward passes always take much longer and the later repetitive calls are much faster, as denoted by ‘CP. Foward time (s)’.
With particle numbers around 2000 to 4000, a DPSI optimisation run that consists of 100 iterations of forward-backward computations using 1 datapoint can be done in about 10 min. As shown by Figures 4 and 5, most runs converge within 50 iterations, which means that a plausible solution can be found by DPSI in about 5 min. With a good initial guess of the parameters, as shown by the dotted lines in Figure 14, this runtime can be further reduced to around 2 min. In addition, the memory consumption for these simulations is within 3 GB, which permits implementations on portable and small GPUs. With no more than 2 min for the 3D reconstruction pipeline, a parameter identification process could take less than 5 min to complete. These statistics confirm that DPSI meets the criteria in computational costs for practical deployment. It also suggests again that it is practical to perform parameter identification with DPSI at any time without the need to rely on previously found parameters or slow manual measuring techniques.
5. Conclusion
This work addresses the important problem of closing the gap between simulated and real-world manipulations of elastoplastic objects. In particular, we propose a differentiable physics-based system identification framework (DPSI) that can identify physics parameters through gradient descent algorithms. Substantial experiments demonstrate that the proposed framework can identify parameters that reproduce quantitatively and qualitatively realistic elastoplastic object manipulation dynamics in the presence of real-world perceptual and data collection challenges. These challenges include (1) simple and short motions, (2) incomplete trajectories, (3) object occlusions, (4) point cloud noises and (5) small data. The proposed framework is the first example of system identification with differentiable physics-based, particle-based simulation for robotic volumetric elastoplastic object manipulation. It serves as the foundation for faster and more accurate real-world deployments of deformable object manipulation.
In addition, with the use of physics-based dynamics models, the identified parameters are physically meaningful. The experiments reveal that different parameter values found through optimisation can be interpreted in a way that is aligned with our understanding and intuitions about real-world physics. Therefore, the DPSI framework not only gives users confidence in the simulation controlled by the physics models and these parameters but also provides users with intuitive angles to identify the limitations of the reconstructed manipulation dynamics.
5.1. Limitations and future research
Several limitations can be observed from the experiments conducted in this work. The most obvious would be the under-representativeness of the physics models that describe the underlying manipulation dynamics. As discussed in the Results section, there is a lack of elastic returning effect at the areas of contact with sharp edges of objects, the insufficient spread of force impact that causes the deformation to stay near the local contact area, and the artefacts of floating displaced particles. Though these inaccuracies may be negligible for coarse manipulations that appear more often in our daily lives, they would lead to unacceptable solutions for high-precision manipulation tasks such as surgery and soft object assembly. In addition, various approximations in the MLS-MPM and programming implementations also contribute to the simulation inaccuracy. As such, future research is needed to develop more accurate physics models, numerical approximations, and coding implementations to simulate object deformations.
Secondly, an important assumption of this work is that the deformation behaviour of the target material in need of parameter identification can be simulated by the selected elastoplasticity model. Thus, it should be noted that when the target material’s behaviour is largely underrepresented by the selected physics model, DPSI would not be able to produce very realistic simulations. In other words, the manual selection of the elastoplasticity model is of high importance for the proposed DPSI framework. For example, our results with soil reveal that DPSI found parameters to make simulations appear as close to the real materials as possible, but the use of the von Mises plasticity model will make it difficult to recover parameter values that accurately reflect realistic soil or foam behaviours, which are better captured by the DruckerâPrager model.
Thirdly, an important lesson learnt from the results is that increased complexity of motion and contact mode leads to fewer local minima and better system identification accuracy. The number of local minima indicates that the collected data is insufficient to fully induce the correct values of the concerned parameters. Thus, in the future, we envision a better framework that incorporates learning-based approaches to allow the automatic selection of diverse tools and interaction motions to achieve more efficient and accurate parameter identification.
Another limitation comes from the means of capturing real-world observations. In this work, we employ the multi-view fusion point clouds as the observation space, which suffers from noise and the inaccurate estimate of the camera extrinsic matrix. In the future, new methods should be developed to reduce point cloud noises and improve the precision of camera calibration. Lastly, the simulated scene in this work is limited to an end-effector, a table surface, and a target elastoplastic object. Efforts are needed to extend the framework to support more realistic and complex contact dynamics. More experiments are also needed to examine the feasibility of large-scale simulation in terms of efficiency and accuracy.
Supplemental Material
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Engineering and Physical Sciences Research Council (grant No. EP/X018962/1). The authors would like to thank Samuel Moeller and Dr Prasad Rayamane for their help with 3D printing.
Supplemental Material
Supplemental material for this article is available online.
Notes
Appendix
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
