Sage Journals: Discover world-class research

Abstract

This article investigates the task of folding a fabric using a human–robot collaboration (HRC) solution. HRC is a widely adopted concept in the industrial realm, as physical interaction between operators and robots has enabled the former to focus on high-value tasks and the latter to complete repetitive, physically strenuous, and risky tasks. This article proposes a robust framework based on Kalman filter pose estimation, for the human–robot comanipulation in folding of fabrics. The framework utilizes the Kinect 2 tracking algorithm to find the location of the operator and then performs Kalman filtering to eliminate noise and improve the tracking of the operator’s position. The robot reacts to the operator’s movement, by moving to the same hand direction and distance on each iteration, to assist in the folding of a rectangular fabric. The proposed system was tested in a real use case scenario, where the collaborative task is the fold of a rectangular fabric, and the presented results showed that the system is capable of tracking the human motion to assist him.

Introduction

The textile industry has historically relied on manual labor, particularly in developing countries where it is more economical. Advancements in technology and robotics have sparked renewed interest in automating the textile industry, particularly in manipulating nonrigid objects. This automation has broad applications in various sectors, including domestic, agricultural, automotive, and robotic space industries. Fabric handling tasks, upholstery installation, and satellite blanket manipulation could see significant productivity gains and risk reduction through automation.

Automation of hand layup in creating composite structures from multiple layers of materials presents another compelling application of nonrigid object manipulation.¹ In domestic settings, automated fabric manipulation can aid individuals with special needs or disabilities, facilitating daily tasks like bed making and dressing. Automating textile production poses challenges due to the unpredictable nature of nonrigid objects. Their low bending resistance and complex dynamics make them prone to deformations, unlike rigid objects.² The increasing fabric variety complicates matters, requiring sophisticated control methods and highly specialized approaches to overcome obstacles in automation.

The robotic systems for manipulating fabrics are based on single robots^9–12,30 or multiple robots.^3–8,27 Most of these applications use a vision-based system that can identify the fabric’s position and state (folded/unfolded/wrinkled/unwrinkled) by extracting useful geometric features^{3,4,6–8,11,29} or machine learning models that are trained on human motion^5,9,24,29 or by generating a folding trajectory based on human motion planning.^10,12,28,30 These approaches require complex algorithmic models and are not applicable to large fabrics due to the limited workspace of the robots.

Most recent applications make use of the adaptability and customization of collaborative robots and the intelligence of humans to create a human–robot collaboration framework for fabric manipulation.^13–15,23 In Koustoumpardis et al.’s¹³ study, a hybrid vision–force control system that tracks human motion was used to enable the robot to assist the human guide. In Kruse et al.’s¹⁴ study, a hybrid vision–force control system was used to identify the fabric’s state (wrinkles/deformations) and calculate the appropriate direction to follow the operator’s movement. Alternatively, in Koustoumpardis and Aspragathos’s¹⁵ study, instead of a visions system, a neural network force controller is incorporated to regulate the fabric’s tensional forces and comply to the human’s motion. In Sidiropoulos et al.’s²³ study, a combination of a hybrid dynamic movement primitives (DMP)-based and force control input with an external Kalman filter (KF) was used, for the robot to be able to assist the human by minimizing his/her effort of transferring an object.

Predicting future values in unstable systems with unpredictable behavior is crucial. Besides the KF, methods like the linear unknown input observer (LUIO) and nonlinear unknown input observer (NUIO) are utilized. LUIO, employing linear algebra, is effective in linear systems with unknown inputs, while NUIO handles nonlinear systems with noisy or unknown inputs.^33–35 To ensure stability, observer gains are estimated using techniques, such as solving linear matrix inequalities (LMIs) or equations (LMEs).³⁶ NUIOs find applications in fault detection, robust control, and state estimation. An advanced method, the extended state observer (ESO), estimates both state and unknown input simultaneously in nonlinear systems.^37,38 However, it requires more tuning parameters, is sensitive to noise, and faces stability analysis challenges. These systems, incorporating observers for predicting system behavior, find applications in human–robot collaborative scenarios, especially in tasks involving heavy fabrics where the human teleoperates the robot.^25,26

This article proposes an enhanced approach for collaborative human–robot comanipulation and folding rectangular cloth pieces. It aims to bridge the automation gap by harnessing human–robot collaboration benefits and flexibility, facilitated by Kalman filtering. The design integrates human decision-making, perception, and cognition with the robot’s precision and repeatability, forming an efficient system for complex fabric handling. Human guidance combined with robot accuracy maximized performance. The system’s testing involves two serial fabric folds, with detailed sections covering problem statement, framework, decision-making model, system architecture, simulation, experimental results, and conclusions with future work. The Proposed Framework section introduces the problem statement and proposed framework, followed by the simulation of the proposed system in the Simulation section. Experimental results are presented in the Results for the Experimental Case Study of Folding section and conclusions and future work are discussed in the Conclusion section.

Proposed Framework

The proposed approach utilizes a single Kinect 2 sensor, instead of using Asus Xtion sensor,¹³ to detect fabric position, corners, and operator hand position, offering a lower error rate, requiring no training or prior knowledge, relying solely on recognizing human intentions and movements. Comanipulation begins with identifying the human state using a decision-making model that considers hand position and orientation to determine folding direction and starting/stopping positions for both human and robot. While previous work applied a general moving average filter for sensor data smoothing, this approach utilizes the KF, known for improving tracking accuracy with Kinect data. It is also able to smooth the spatial coordinates of a movement and with a fine tuning of Kalman Gain it can eliminate any rapid movements that can damage the robot or the operator.^20–22

This approach integrates a force sensor with vision feedback for collaborative motion, ensuring fabric tension monitoring and robot position correction when vision system fails. The proposed method employs a more intelligent grasping technique inspired by Koustoumpardis et al.¹⁶ An illustration of the proposed system is provided in Figure 1.

FIG. 1.

Proposed system and state transition diagram.

The developed system aims to enable robot assistance to the operator based solely on identifying intentions and motion, without relying on prior knowledge of fabric properties. The state transition diagram in Figure 1, adapted from Koustoumpardis et al.¹³ with an additional decision model state, outlines hierarchical states during collaboration. Like in Koustoumpardis et al.’s¹³ study, the Setup state involves the operator placing the fabric, transitioning to Human Motion state upon fabric position identification. The Decision Model determines folding direction based on hand orientation, initiating Robot Motion state for fabric manipulation. Collaboration concludes when both reach stopping positions. The current method differences from Koustoumpardis et al.’s¹³ study are summarized in Table 1.

Table 1.

Differences Between This Article and the Previous Work¹³

	Previous Work¹³	This article
RGB-D Sensor	Asus Xtion	Kinect 2
Tracking Algorithm	NITE Skeleton tracking library	Kinect 2 tracking algorithm
Tracking Algorithm Optimization	Moving average filter	Kalman filter
Decision Model	—	State diagram now includes the decision model to identify the human desired folding direction
Robot	Adept Cobra s800 robot	KUKA LWR IV+
Gripper	A 2-finger parallel pinching gripper	Reflex One 3-finger gripper

Human tracking

The RGB-D sensor is used to monitor the operator’s right-hand position, orientation, and state (open/closed) and identify the fabric’s grasping points as well as the position calculating a 25-keypoint skeleton inside the collaborative space.

Infrared sensors, are sensitive to sunlight, humidity, temperature, reflective surfaces, and rapid movements, thus generate a lot of noise due to self-occlusion,^20,22 struggling the skeleton tracking accuracy when objects are moving fast in complex environment. Therefore, a nonlinear KF was considered^20,21 to predict in real time the state vector of a moving object as a discrete time stochastic process, estimating the 6-dimensional position and velocity joint point. The KF dynamics was applied on the operator’s hand, wrist, and hand tip joint: $[\begin{matrix} x_{t + 1} \\ y_{t + 1} \\ z_{t + 1} \end{matrix}] = [\begin{matrix} x_{t} \\ y_{t} \\ z_{t} \end{matrix}] + [\begin{matrix} v_{x t} \\ v_{y t} \\ v_{z t} \end{matrix}] * t + \frac{1}{2} * [\begin{matrix} a_{x t} \\ a_{y t} \\ a_{z t} \end{matrix}] * t^{2}$ (1) $[\begin{matrix} v_{x (t + 1)} \\ v_{y (t + 1)} \\ v_{z (t + 1)} \end{matrix}] = [\begin{matrix} v_{x t} \\ v_{y t} \\ v_{z t} \end{matrix}] + [\begin{matrix} a_{x t} \\ a_{y t} \\ a_{z t} \end{matrix}] * t$ (2)where, (x_t, y_t, z_t) are the world joint coordinates on each axis, (v_xt, v_yt, v_zt) and (α_xt, α_yt, α_zt) the joint’s velocity and acceleration at time t. The joint’s parameters are derived from the Kinect’s tracking algorithm in short time steps.

High Kalman Gain values prioritize recent measurements, making the filter more responsive to them. In our application, the KF is engineered not to react to rapid changes in Kinect’s measurements, as such movements are not anticipated during collaborative tasks. This is achieved by leveraging the Kalman Gain, which ensures that short-time deviations do not influence the estimated state. In addition, the KF is applied to the force values generated from the force sensor.

The KF predictions of the operator’s right wrist and hand are used to estimate the angles between the hand and sensor coordinate systems. The hand line vector is computed using the equation 3. The corresponding angles are derived from equations 4 –6. $\vec{Han d_{line}} = (Han d_{x} - Wris t_{x}) * \vec{i} + (Han d_{y} - Wris t_{y}) * \vec{j} + (Han d_{z} - Wris t_{z}) * \vec{k}$ (3) $θ_{x} = \cos^{- 1} (\frac{\vec{Han d_{line}} * \vec{x_{axis}}}{‖ \vec{Han d_{line} ‖} * ‖ \vec{x_{axxis} ‖}})$ (4) $θ_{y} = \cos^{- 1} (\frac{\vec{Han d_{line}} * \vec{y_{axis}}}{‖ \vec{Han d_{line} ‖} * ‖ \vec{y_{axxis} ‖}})$ (5) $θ_{z} = \cos^{- 1} (\frac{\vec{Han d_{line}} * \vec{z_{axis}}}{‖ \vec{Han d_{line} ‖} * ‖ \vec{z_{axxis} ‖}})$ (6)where, Hand_line is the human’s hand line, Hand_{(x, y, z)} and Wrist_{(x, y, z)} are the hand and wrist positions, respectively, and θ_{(x, y, z)} are the hand angles.

The sensor’s tracking algorithm provides additional information related to the tracked hand gesture state, recognizing if the hands are in open or closed position, by processing the opened fingers’ area and the area ratio that the hand covers. In case of partial occlusions (the operator holds the fabric and part of his hand is hidden below the fabric), the previously discussed algorithm fails to identify the operator’s hand state.

Initially, a color-based skin detection algorithm is applied by cropping a region of interest around the operator’s hand (Fig. 2a), transforming the image from the RGB to the HSV color space (Fig. 2b). After detecting the hand area (Fig. 2c), the hand contour and its convex hull (Fig. 2d) is estimated. Based on this information, their convexity defects are derived as the difference between the hand’s contour and convex hull, the yellow triangles shown in Figure 2e. Consequently, the farthest point from the convex hull line that belongs to the same convex defect area (blue dot between fingers in Fig. 2f), its distance d from that convex hull line (blue perpendicular line in Fig. 2f), and the angle φ between the two triangle sides that the farthest point belongs (Fig. 2f) are estimated.

FIG. 2.

Hand state classification algorithm.

Thresholding the distance d and angle φ, the algorithm differentiates between the convexity defects of the hand and the fingers, as shown in Figure 2f. The convexity defects on the right side of the little finger and on the left of the thumb have bigger angle values or smaller distances. The proposed method counts how many fingers are open and classifies the hand as open if more than two fingers are counted, giving accurate results when the hand is facing the camera in a clear view and in close distances.

Improving the method robustness and overcoming these limitations, the ratio of the convex and the contour area is estimated. A closed hand (less than two convexity defects) has small ratios as the convex hull will tend to coincide to the hand contour. In addition, the ratio value is dynamically scaled using the distance of the hand from Kinect. The state of the hand is identified using the number of fingers and the computed ratio, as shown in Equation 7. $finge r_{count} \geq 2 & r \geq \frac{constant}{depth}$ (7)where, finger_count is the number of fingers, r is the computed ratio, constant is a constant pixel value, and depth is the hand’s distance from the Kinect.

The operator’s hand position, orientation, and state are used in the decision-making model to identify the operator’s intentions and to estimate the appropriate robot’s starting position.

Decision-making model

The decision-making model is designed to identify the operator-desired folding direction, as shown in Figure 3. Our method is restricted to rectangular fabrics but can be easily modified to compensate for more shapes.

FIG. 3.

Decision-making model example.

Label each fabric corner, starting with the closest to the RGB-D sensor (placed in the bottom right corner of Fig. 3) that has the lowest x value as Bottom Right (BR) and continues clockwise, from the sensor’s view with Bottom Left (BL), Top Left (TL), and Top Right (TR).

Identify the corner that the operator grabbed (BL in Fig. 3) and compute the appropriate robot starting corner. The operator’s and robot’s starting corner are always adjacent and never diagonally, as the fabric cannot be physically folded this way. Therefore, only two options for the robot to grab are available (green circles in Fig. 3).

The hand’s orientation, in relation to the fabric, is used to identify the folding direction. Given the operator’s grabbed corner the model constructs two vectors representing the fabric lines that include the grabbed corner (purple vectors in Fig. 3).

The hand line vector is computed as described in Human tracking section.

The angles between the fabric line vectors and the hand line vector are estimated using equations (3–5) and the line with the lowest angle, because the hand line must be parallel to that fabric line.

The second corner defines the robot’s starting direction. In Figure 3, the hand angle is closer to the BR line, thus the model identifies the path depicted by the blue arrow pointing from the bottom line to the top line as the folding direction, as well as the BR corner as the robot’s starting position and the TR corner as the robot’s ending position.

Once the robot determines its starting position, it proceeds to execute the grasping, lifting the fabric off the table. Our decision-making model currently caters only to rectangular fabrics. However, a more versatile model could be developed to accommodate various shapes and determine starting points and folding directions accordingly. Moreover, the contrast between the fabric and the surface color influences the accuracy of the robot in grasping the fabric.

Fabric grasping technique

The fabric grasping technique (Fig. 4), based on Koustoumpardis et al.,¹⁶ mimics human fabric handling. Featuring three fingers, with two capable of rotation and longer than the third, the gripper offers adaptability to various sizes, shapes, and material types, along with flexibility and maneuverability during approach, grasping, and manipulation, owing to the multiple degrees of freedom provided by its fingers. The gripper’s fingers offer low bending resistance, enabling them to slide on various surfaces like a human finger lifting fabric. It is designed to approach fabric corners at a 45° angle between the smaller, nonrotating finger and the fabric center, ensuring the nonrotating finger reaches the corner effectively, as depicted in Figure 4’s top view.

FIG. 4.

Fabric grasping technique.

After approaching the fabric, the gripper uses the following four steps to grasp and lift the fabric as shown in Figure 4:

The two rotating fingers align parallel to the 45° purple line and gradually descend toward the fabric until contact is sensed through pressure on the force sensor. This process allows the fingers to exert force on the fabric, immobilizing it on the table.

The third nonrotating finger initiates its motion toward the fabric’s end to lift it off the table, as seen in step 2 in Figure 4.

The two rotating fingers adjust to center the nonrotating finger.

The two rotating fingers pinch the fabric corner to secure it tightly, ensuring it remains securely held by the gripper throughout the human–robot collaboration.

These four steps offer an effective method for grasping and lifting the fabric, ensuring a secure hold as the fingers apply pressure and squeeze the fabric, without deformations for folding the fabric symmetrically with minimal wrinkles. After grasping the fabric, the robot is ready to collaborate with the operator to manipulate the fabric as described in the next section.

System architecture

For proper fabric folding, synchronization between human and robot motion is essential. Like the prior collaboration attempt,¹³ the hybrid RGB-D–force feedback control system is employed to track the human’s hand position, orientation, and state, along with the applied force on the fabric, as illustrated in Figure 5.

FIG. 5.

RGB-D–Force feedback control.

The RGB-D sensor tracks the human hand’s position, while a KF enhances tracking accuracy by mitigating the position and spikes noise from the Kinect’s algorithm. The filter operates in two steps: In the prediction step, a mathematical model updates the human hand state Q_t, predicting the future state Q_t+1 using matrices A and B to model the system’s physical behavior and α for constant acceleration. $Q_{t + 1} = A * Q_{t} + B * a$ (8)

After state prediction, KF takes into consideration the measurement noise and the error covariance of the state: $P_{t + 1} = A * P_{t} * A^{'} + E_{x}$ (9)where, E_x is the error covariance matrix and P_t is the updated error covariance matrix.

In the correction step, the KF compares the measured and the prediction values using the Kalman Gain factor: $K_{t + 1} = \frac{P_{t + 1} * C^{'}}{C * P_{t + 1} * C^{'} + E_{z}}, where 0 \leq K \leq 1$ (10)where, E_z is the expected noise in the sensor’s measurements and C is a constant matrix.

The gain can be adjusted using the E_z and E_x matrices. When there’s minimal expected noise in measurements (E_z approaching zero), the Kalman Gain remains relatively constant, closely tracking measurements with high responsiveness. Conversely, a low Kalman Gain (P approaching zero) leads to slower response of the predicted state to measurement changes.

After calculating the Kalman Gain, the predicted state can be corrected by: $Q_{t + 1} = Q_{t + 1} + K_{t + 1} ({measured}_{state} - C * Q_{t + 1})$ (11)where, the measured_state is the sensor measurements.

The new error covariance matrix is given by: $P_{t + 1} = (I - K_{t + 1} * C) * P_{t}$ (12)

The dynamics used to describe the movement of the human hand in one axis is: $x_{t + 1} = x_{t} + v_{t} * t + \frac{1}{2} * a_{t} * t^{2} & v_{t + 1} = v_{t} + a_{t} * t$ (13)where, x_t is the world joint coordinate, v_t is the joint’s velocity, and a_t is the acceleration in the X axis. The state update function for one joint is: $[\begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} x_{t + 1} \\ y_{t + 1} \\ z_{t + 1} \end{matrix} \\ \dot{x_{t + 1}} \end{matrix} \\ \dot{y_{t + 1}} \end{matrix} \\ {\dot{z}}_{t + 1} \end{matrix}] = [\begin{matrix} 1 & 0 & 0 & t & 0 & 0 \\ 0 & 1 & 0 & 0 & t & 0 \\ 0 & 0 & 1 & 0 & 0 & t \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \end{matrix}] [\begin{matrix} x_{t} \\ y_{t} \\ z_{t} \\ \dot{x_{t}} \\ \dot{y_{t}} \\ \dot{z_{t}} \end{matrix}] + [\begin{matrix} \frac{t^{2}}{2} \\ \frac{t^{2}}{2} \\ \frac{t^{2}}{2} \\ t \\ t \\ t \end{matrix}] a$ (14)where: $Q_{t + 1} = [\begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} x_{t + 1} \\ y_{t + 1} \\ z_{t + 1} \end{matrix} \\ \dot{x_{t + 1}} \end{matrix} \\ \dot{y_{t + 1}} \end{matrix} \\ \dot{y_{t + 1}} \end{matrix}], A = [\begin{matrix} 1 & 0 & 0 & t & 0 & 0 \\ 0 & 1 & 0 & 0 & t & 0 \\ 0 & 0 & 1 & 0 & 0 & t \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \end{matrix}], Q_{t} = [\begin{matrix} x_{t} \\ y_{t} \\ z_{t} \\ \dot{x_{t}} \\ \dot{y_{t}} \\ \dot{z_{t}} \end{matrix}], B = [\begin{matrix} \frac{t^{2}}{2} \\ \frac{t^{2}}{2} \\ \frac{t^{2}}{2} \\ t \\ t \\ t \end{matrix}] and a = 0.008$ (15)

The matrix values for position noise are derived from experimental measurements in the Kinect collected data^31,32 and are given by: $E_{z} = [\begin{matrix} n ois e_{x} & 0 & 0 \\ 0 & n ois e_{y} & 0 \\ 0 & 0 & n ois e_{z} \end{matrix}], where nois e_{x / y / z} = 4 m m$ (16)

The covariance matrix E_x, is derived from the standard deviation and σ_p, σ_p, and σ_v and are: $E_{x} = [\begin{matrix} σ_{p} * σ_{p} & 0 & 0 & σ_{p} * σ_{v} & 0 & 0 \\ 0 & σ_{p} * σ_{p} & 0 & 0 & σ_{p} * σ_{v} & 0 \\ 0 & 0 & σ_{p} * σ_{p} & 0 & 0 & σ_{p} * σ_{v} \\ σ_{p} * σ_{v} & 0 & 0 & σ_{v} * σ_{v} & 0 & 0 \\ 0 & σ_{p} * σ_{v} & 0 & 0 & σ_{v} * σ_{v} & 0 \\ 0 & 0 & σ_{p} * σ_{v} & 0 & 0 & σ_{v} * σ_{v} \end{matrix}]$ (17)

The constant matrix C is defined as: $C = [\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \end{matrix}]$ (18)

The parameters t, noise_magnitude, and noise_x/y/z are the KF adjustable parameters used to fine-tune the predictions accuracy. The value of t is set to 4/30, because the Kinect’s 1/30 frame rate is adjusted to 4/30 due to the computational load, and the noise_magnitude is fixed at 4 as specified in equation 16.

Kalman estimation calculates the distance traveled by the operator’s hand on each axis by comparing current and previous positions. This difference is then added to the end-effector’s Cartesian position for consistent movement. Force sensor measurements are filtered using Kalman filtering, and the estimated value is utilized by the proportional controller (proportional gain kp = 1.35) to compute position error.

The KF noise magnitude for the force sensor was set at 5N. Force feedback aids collaboration only in the x± and y± directions (Fig. 5). In these directions, the vision sensor alone fails to detect hand motion since the operator can only apply force and stretch the fabric but cannot physically pull the robotic arm toward them, keeping their hand motionless. The force feedback error serves two main purposes: allowing the human to pull the robotic arm toward them as needed and preventing the robot’s end-effector from inadvertently pulling the human during incorrect hand motions.

The force sensor state update is: $[\begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} {F_{x}}_{t + 1} \\ {F_{y}}_{t + 1} \\ {F_{z}}_{t + 1} \end{matrix} \\ {T_{x}}_{t + 1} \end{matrix} \\ {T_{y}}_{t + 1} \end{matrix} \\ {T_{x}}_{t + 1} \end{matrix}] = [\begin{matrix} 1 & 0 & 0 & t & 0 & 0 \\ 0 & 1 & 0 & 0 & t & 0 \\ 0 & 0 & 1 & 0 & 0 & t \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \end{matrix}] [\begin{matrix} F_{x_{t}} \\ F_{y_{t}} \\ F_{z_{t}} \\ {T_{x}}_{t} \\ {T_{y}}_{t} \\ {T_{z}}_{t} \end{matrix}] + [\begin{matrix} \frac{t^{2}}{2} \\ \frac{t^{2}}{2} \\ \frac{t^{2}}{2} \\ t \\ t \\ t \end{matrix}] a$ (19)where F_x/y/z and T_x/y/z are the force and torque feedback on each axis, respectively.

The position output is derived by: $[\begin{matrix} {Robot}_{x} \\ {Robot}_{y} \\ {Robot}_{z} \end{matrix}] = (Q_{t + 1} - Q_{t}) + [\begin{matrix} F_{x_{t + 1}} \\ F_{y_{t + 1}} \\ 0 \end{matrix}] * k_{p}$ (20)where, Robot_x/y/z is the Robot’s position, Q_t+1 is the human hand predicted state, Q_t is the human hand current, F_x/y is the KF force estimation in the X and Y axes and k_p is the proportional gain.

The robot’s joint values are determined through inverse kinematics to facilitate linear movement. This integrated system is developed and assessed in both simulation and real-world scenarios. In practice, the system aids a human in folding a rectangular fabric, with results detailed in the following section.

Simulation

The framework simulation and robot communication utilize RoboDK software¹⁷ and the open-source Kukavarproxy server¹⁸ via ethernet. Initially, the collaborative cell’s 3D model is created using CATIA software,¹⁹ then imported into RoboDK (Fig. 6). The robot model, motion, and inverse kinematics are established using DH table conventions, with RoboDK library managing frame conversions during collaboration. A 3D hand model, updated by Kinect’s hand values, represents the human’s hand position, with color indicating hand state. The fabric’s position is represented by a 3D rectangle using its actual width and height.

FIG. 6.

3D designed workspace.

Before real-world testing, the robot’s motion was assessed in the RoboDK simulation to determine a safe folding strategy and prevent potential harm to the operator or the robot. The collaborative system was tested with a rectangular fabric, with the human leading the folding attempt and the robot model following suit, as depicted in Figure 6a–c. The simulation confirmed the system’s ability to follow the human’s lead and accomplish the task.

Results for the Experimental Case Study of Folding

The collaborative framework (Fig. 1) utilizes a KUKA LWR IV+ robot, equipped with an ATI Gamma force–torque sensor to measure fabric force and a Reflex One gripper from RightHand Robotics for fabric handling. The Kinect 2 for Windows serves as the RGB-D sensor, tracking fabric placement and operator hand pose/state via Microsoft’s algorithm and a KF. The system architecture is depicted in Figure 7¹. Operator hand monitoring, decision making, and collaboration algorithms are coded in Python on the main laptop running Windows. Gripper functions are implemented in ROS using Python on Ubuntu, with an API interface facilitating communication with the Windows laptop. Robot communication employs RoboDK software. Force and vision control share a computer, communicating with the force controller via serial RS-232 to USB.

FIG. 7.

System architecture.

The system is tested by performing two sequential folds using a rectangular fabric, as shown in Figure 8. Initially, the operator places the unfolded fabric on the table, responsible for ensuring its state (unfolded/unwrinkled). Using the Kinect sensor and OpenCV library, the framework extracts the fabric’s corner positions (depicted as four red stars in Fig. 8.1a–8.2a), crucial for the decision-making model. Collaboration commences when the operator grabs a fabric corner, identified by proximity to a corner and a closed hand gesture. The decision-making model then determines the robot’s starting and ending positions, directing it to approach and grasp the fabric corner using the proposed technique. Subsequently, the system assists the operator in folding the fabric, employing vision–force control, as observed in Figure 8.1b–1f and 2b–2f).

FIG. 8.

First fold and second fold.

The operator’s hand position path, using the Kinect and Kalman estimations, as well as the robot’s end—effector path, in both folds, is presented in Figure 9.

FIG. 9.

Human and robot path during collaboration for both folds.

As expected, the human leader maintains stability in the perpendicular folding axis, exhibits high dexterity in the parallel axis, and moves upward in the Z-axis to reduce fabric wrinkles. Position error over time is depicted in Figure 10 for the first and second fold, respectively.

FIG. 10.

Position error between robot’s end effector and operator’s hand Kalman prediction for both folds.

The first fold exhibits a position error of 3.28 cm in the folding direction, while the second fold shows an error of 3.27 cm. These errors arise because the fabric is not perfectly aligned perpendicular to the Kinect plane, affecting initial human and robot positions. However, in fabric folding tasks, a 3 cm position error is generally inconsequential. While the system effectively follows the human’s path, limitations in hardware and software lead to a communication frequency of 1.125 Hz and a total execution time of 20 seconds. This imposes challenges for the operator, who must adapt to the robot’s movements rather than seamlessly guide it. Despite using modest hardware, such as a third-generation Intel Core i3 CPU with 2 cores and 8GB RAM, the system adequately assists the operator. Performance improvements are expected with higher-spec computing systems.

Conclusions

This article introduces a HRC for fabric folding. The developed framework presents a versatile, high-level architecture, demonstrated to effectively enable collaborative fabric manipulation. The KF significantly improves the accuracy of the operator’s hand position by smoothing Kinect’s values and eliminating tracking algorithm errors. The force–torque sensor efficiently corrects the robot’s trajectory when vision feedback is lacking. The robotic gripper successfully grasps fabric with minimal deformations. Despite expected errors due to equipment limitations, they are insignificant for fabric folding tasks. The system effectively guides the robot in supporting human tasks, with the advantage of mounting the RGB-D sensor on a mobile robot, expanding its application in larger workspaces. By leveraging human intelligence, the proposed method bypasses complex computational strategies, making it adaptable to various fabric shapes and applications.

The core limitation of our proposed system is the fabric’s color. The white tabletop makes white fabrics invisible to the RGB-D camera. Another critical limitation is the robotic arm angle joint limits, reducing the robotic end effector reach, supporting only small fabrics, which can be addressed, in future work, by using a robotic arm attached to a mobile robot. Other minor potentially treatable limitation includes the computational limits of our software/hardware, the vision/force sensor limitation in communication as well as the use of a single RGB-D camera for computing the fabric’s state while tracking the operator’s motion. The robotic end effector can easily block the view of the operator during the manipulation resulting in the tracking algorithm losing focus.

Future work should prioritize incorporating the pitch/yaw/roll of the operator’s hand to enhance the robot’s dexterity and accurately mirror the operator’s hand pose. Additionally, adapting the framework for use with a mobile robot would enable flexible fabric manipulation in domestic settings and larger workspaces with bigger fabrics. Integration of multiple RGB-D sensors to monitor the operator’s position and maintain clear workspace visibility during collaboration can enhance data accuracy.

Footnotes

Authors’ Contributions

Conceptualization: P.K. and E.D.; Methodology: P.K. and E.D.; Investigation: K.A.; Supervision: P.K. and E.D.; Experiments: K.A.; Validation: P.K. and E.D.; Writing—original draft: K.A.; Writing—review & editing: P.K. and K.A. All authors have read and agreed to the published version of the article.

Author Disclosure Statement

No competing financial interests exist.

Funding Information

This research received no specific grant from any funding agency, commercial, or not-for-profit sectors.

Abbreviations Used

References

Malhan

, Shembekar

, Kabir

, et al. Automated planning for robotic layup of composite prepreg. Robotics Computer-Integrated Manufacturing, 2021; 67:102020–102027; doi: 10.1016/j.rcim.2020.102020

Koustoumpardis

, Aspragathos

. Intelligent hierarchical robot control for sewing fabrics. Robotics Computer-Integrated Manufacturing, 2014; 30(1):34–46; doi: 10.1016/j.rcim.2013.08.001

Maitin-Shepard

, Cusumano-Towner

, Lei

, Abbeel

. “Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding”. 2010 IEEE International Conference on Robotics and Automation, 2010; pp. 2308–2315; doi: 10.1109/ROBOT.2010.5509439

van den Berg

, Miller

, Goldberg

, Abbeel

. “Gravity-based robotic cloth folding”. In Hsu

, Isler

, Latombe

J.-C

, & Lin

M. C

.(Eds.), Algorithmic Foundations of Robotics IX (Vol. 68, pp. 409–424). Springer Berlin Heidelberg; 2010; doi: 10.1007/978-3-642-17452-0_24

Tanaka

, Arnold

, Yamazaki

. EMD Net: An encode–manipulate–decode network for cloth manipulation. IEEE Robot Autom Lett, 2018; 3(3):1771–1778; doi: 10.1109/LRA.2018.2800122

Doumanoglou

, Stria

, Peleka

, et al. Folding clothes autonomously: A complete pipeline. IEEE Trans Robot, 2016; 32(6):1461–1478; doi: 10.1109/TRO.2016.2602376

Kimitoshi

, Masayuki

. “A cloth detection method based on image wrinkle feature for daily assistive robots”. MVA2009 IARP Conference on Machine Vision Applications, 2009; pp. 366-369, Available from: https://www.researchgate.net/publication/228909587_A_cloth_detection_method_based_on_image_wrinkle_feature_for_daily_assistive_robots.

Kawamura

, Iizuka

, Suzuki

. “Development clothes folding system with small mobile robots”. 2017 56th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE), 2017; pp. 190-193; doi: 10.23919/SICE.2017.8105731

Colome

, Planells

, Torras

. “A friction-model-based framework for Reinforcement Learning of robotic tasks in non-rigid environments”. 2015 IEEE International Conference on Robotics and Automation (ICRA), 2015; pp. 5649–5654; doi: 10.1109/ICRA.2015.7139990

10.

Zoumponos

, Aspragathos

. A fuzzy strategy for the robotic folding of fabrics with machine vision feedback. Industrial Robot, 2010; 37(3):302–308; doi: 10.1108/01439911011037712

11.

Moriya

, Tanaka

, Yamazaki

, Takeshita

. A method of picking up a folded fabric product by single-armed robot. Robomech J, 2018; 5(1):1–12; doi: 10.1186/s40648-017-0098-y

12.

Shibata

, Yoshimi

, Mizukawa

, Ando

. “A Trajectory generation of cloth object folding motion toward realization of housekeeping robot”. 2012 9th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), 2012; pp. 126-131; doi: 10.1109/URAI.2012.6462950

13.

Koustoumpardis

, Chatzilygeroudis

, Synodinos

, Aspragathos

. “Human robot collaboration for folding fabrics based on force/RGB-D Feedback”. 371, 2015; pp. 235–243; doi: 10.1007/978-3-319-21290-6_24

14.

Kruse

, Radke

, Wen

. “Collaborative human-robot manipulation of highly deformable materials”. 2015 IEEE International Conference on Robotics and Automation (ICRA); 2015; pp. 3782–3787; doi: 10.1109/ICRA.2015.7139725

15.

Koustoumpardis

, Aspragathos

. “Neural network force control for robotized handling of fabrics”. 2007 International Conference on Control, Automation and Systems, 2007, pp. 566–571; doi: 10.1109/ICCAS.2007.4407088

16.

Koustoumpardis

, Nastos

, Aspragathos

. “Underactuated 3-finger robotic gripper for grasping fabrics”. 2014 23rd International Conference on Robotics in Alpe-Adria-Danube Region (RAAD), 23; 2014; pp. 1–8; doi: 10.1109/RAAD.2014.7002271

17.

RoboDK. Simulator for industrial robots and offline programming. 2021. Available from: https://robodk.com.

18.

IMTS. KukavarProxy. 2019. Available from: https://github.com/lmtsSrl/KUKAVARPROXY.

19.

CATIA. Dassault Systemes. 2021. Available from: https://www.3ds.com/products-services/catia/.

20.

Vretos

, Loumponias

, Daras

, Tsaklidis

. “Using Kalman Filter and Tobit Kalman filter in order to improve the motion recorded by Kinect Sensor II”. 2016 29th Panhellenic Statistics Conference; 2016; pp. 322–334 Available from: https://www.researchgate.net/publication/311576044_Using_Kalman_Filter_And_Tobit_Kalman_Filter_In_Order_To_Improve_The_Motion_Recorded_By_Kinect_Sensor_II.

21.

Tripathy

, Chakravarty

, Sinha

, et al. “Constrained Kalman filter for improving Kinect based measurements”. 2017 IEEE International Symposium on Circuits and Systems (ISCAS), 2017, pp. 1–4; doi: 10.1109/ISCAS.2017.8050664

22.

Angelopoulos Konstantinos. “Human-robot collaboration for fabric folding using RGB/D and Kalman filters”. Master Thesis; 2020. https://nemertes.library.upatras.gr/items/0c16c996-35ab-4032-af36-dd284b9918c2/full.

23.

Sidiropoulos

, Karayiannidis

, Doulgeri

. “Human-robot collaborative object transfer using human motion prediction based on Dynamic Movement Primitives”. 2019 18th European Control Conference (ECC), 2019, pp. 25–28; doi: 10.23919/ECC.2019.8796249

24.

Torkar

, Yahyanejad

, Pichler

, et al. “RNN-based human pose prediction for human-robot interaction”. Proceedings of the ARW & OAGM Workshop 2019; 2019; doi: 10.3217/978-3-85125-663-5-12

25.

Kofman

, Wu

, Luu

, Verma

. Teleoperation of a Robot Manipulator Using a Vision-Based Human-Robot Interface. IEEE Trans Ind Electron, 2005; 52(5):1206–1219; doi: 10.1109/TIE.2005.855696

26.

Guang-Long

, Zhang

, Xiao-Min

, Ze-Ling

. Robot manipulator using a vision-based human-manipulator interface. Journal of Theoretical and Applied Information Technology, 2013; 50(1)

27.

McConachie

, Dobson

, Ruan

, Berenson

. Manipulating deformable object by interleaving prediction, planning and control. The International Journal of Robotics Research, 2020; 39(8):957–982; doi: 10.1177/0278364920918299

28.

Makris

, Kampourakis

, Andronas

. On deformable object handling: Model-based motion planning for human-robot co-manipulation. CIRP Annals, 2022; 71(1):29–32; doi: 10.1016/j.cirp.2022.04.048

29.

Nicola

, Villagrossi

, Pedrocchi

. “Human-robot co-manipulation of soft materials: Enable a robot manual guidance using a depth map feedback”, 2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 2022; doi: 10.1109/RO-MAN53752.2022.9900710

30.

Schepper

, Moyaers

, Schouterden

, et al. Towards robust human-robot mobile co-manipulation for tasks involving the handling of non-rigid materials using sensor-fused force-torque, and skeleton tracking data. 8^th CIRP Conference of Assembly Technology and Systems, 2021; 97:325–330; doi: 10.1016/j.procir.2020.05.245

31.

Wang

, Kurillo

, Ofli

, Bajcsy

. “evaluation of pose tracking accuracy in the first and second generations of Microsoft Kinect”, 2015 International Conference on Healthcare Informatics; 2015; 380–389; doi: 10.1109/ICHI.2015.54

32.

Wasenmüller

, Stricker

, Chen

C-S

, et al. and “Comparison of Kinect V1 and V2 depth images in terms of accuracy and precision”, Computer Vision – ACCV 2016 Workshops (Vol. 10117, pp. 34–45). Springer International Publishing. New York; 2017; doi: 10.1007/978-3-319-54427-4_3

33.

Sakhraoui

, Trajin

, Rotella

. Design procedure for linear unknown input functional observers. IEEE Trans Automat Contr, 2020; 65(2):831–838; doi: 10.1109/TAC.2019.2921435

34.

Tranninger

, Niederwieser

, Seeber

, Horn

. “Unknown input observer design for linear time-invariant systems – A unifying framework”; 2021; doi: 10.48550/arXiv.2111.14404

35.

Sharma

, Agrawal

, Sharma

, Nath

. Unknown input nonlinear observer design for continuous and discrete time systems with input recovery scheme. Nonlinear Dyn, 2016; 85(1):645–658; doi: 10.1007/s11071-016-2713-5

36.

Triki

, Massaoud

, Bouani

. Unknown input observer-based design for a class of nonlinear system with time-variable delay. J Control Autom Electr Syst, 2020; 31(5):1097–1107; doi: 10.1007/s40313-020-00605-9

37.

Chen

, Gao

. On geometric interpretation of extended state observer: A preliminary study. Control Theory Technol, 2023; 21(1):89–96; doi: 10.1007/s11768-023-00130-5

38.

Chakrabarty

, Żak

, Sundaram

. “State and unknown input observers for discrete-time nonlinear systems,” 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, NV; 2016, pp. 7111–7116; doi: 10.1109/CDC.2016.7799365

Human–Robot Collaboration Solution for Folding Fabrics Based on RGB-D Force and Kalman Filters

Abstract

Introduction

Proposed Framework

Human tracking

Decision-making model

Fabric grasping technique

System architecture

Simulation

Results for the Experimental Case Study of Folding

Conclusions

Footnotes

Authors’ Contributions

Author Disclosure Statement

Funding Information

Abbreviations Used

References