Sage Journals: Discover world-class research

Abstract

Worker motion simulation synthesizes human movements in specific work scenarios to analyze behavior and assess performance, offering a cost- effective way to evaluate safety and productivity. However, existing studies struggle with high fidelity and precision. Recent advances in Generative AI enhance motion simulation. This study integrates ChatGPT and MotionGPT AI models to generate high-fidelity motions for tasks like lifting objects with the non-dominant hand or walking five steps to the right. To improve accuracy, ChatGPT-generated guidance was aligned with MotionGPT’s training vocabulary. By analyzing the HumanML3D dataset, a JSON file of word frequencies was created to adjust input prompts to match the training data’s patterns. This strategy mitigates out-of-distribution issues, refining MotionGPT’s accuracy. Simulated motions were validated against real human motions using computer vision-based video analysis. By comparing body landmarks, we quantitatively improved fidelity. This study advances AI-aided worker motion simulation and provides a new method for AI performance evaluation in industrial settings.

Keywords

artificial intelligence motion analysis ChatGPT physical ergonomics generative AI

Worker motion simulation involves synthesizing human movements within specific work scenarios to analyze behavior or assess performance. It has been explored as a cost-effective means for virtually evaluating worker safety and productivity. Nonetheless, the existing studies on worker motion simulation confront bottlenecks in accuracy and precision. In other words, the fidelity of the simulated motions is relatively low. High-fidelity worker motion simulation has been challenging. Fortunately, the recent breakthroughs in Generative AI provide promising capabilities for enhancing worker motion simulation. This study integrates text-to-text (i.e., ChatGPT) and text-to-motion (i.e., MotionGPT, [Guo et al., 2022]) AI models for the high-fidelity generation of designated working tasks. For instance, tasks like lifting an object with the non-dominant hand, walking five steps to the right, and then picking up an object are synthesized with increased fidelity. Achieving high-fidelity motion simulation poses a major challenge, particularly in explaining the motion generation AI model.

To address this challenge, we leverage ChatGPT to produce “guidance” aligned with MotionGPT’s training vocabulary (Jiang et al., 2024). First, we analyze the underlying HumanML3D dataset upon which the MotionGPT model was trained. This process involved extracting all unique words and their corresponding frequencies from the training dataset. The resulting structured JSON file encapsulated a detailed representation of word frequency distributions within the dataset.

Recognizing the significance of aligning the generated prompts with the language patterns prevalent in MotionGPT’s training data, we subsequently devised a targeted approach. Leveraging the capabilities of the Open AI’s GPT models, we employed the generated word frequency JSON file to selectively replace words in the input prompts with those that exhibited higher frequencies in the dataset.

This strategic adjustment alleviated the out-of-distribution problem by tailoring the prompts to match better the language dynamics ingrained in the model’s training corpus. Through this meticulous adjustment, we aimed to reduce the likelihood of unexpected outputs and enhance the model’s ability to generate motions consistent with the learned distribution of the training data. This generated guidance replaces human language descriptions as the input for MotionGPT. Through this AI-generated “guidance,” MotionGPT refines its motion simulation accuracy. We propose a computer-vision-based video analysis method to validate the fidelity of simulated worker motions against real human motions. This process captures in situ videos of human subjects performing guided working tasks in a controlled lab environment. Computer vision algorithms extract human body landmarks representing the joints that define the motions. By comparing the body landmarks from both real humans and simulated workers, we quantitatively evaluate the fidelity of motion simulation. We aim to measure the improvement in fidelity facilitated by ChatGPT’s assistance. The fruition of this study will be a new method for AI-aided worker motion simulation, which can be applied in worker training, low-cost worker supervision, and other practices involving digital twins within industrial settings.

The contributions of this study are multi-faceted. First, human workers have an indispensable role in production and manufacturing systems, but the studies about worker performance and health are limited. Many published works on smart manufacturing mainly focused on manufacturing processes and platforms but omit human workers. This study focuses on human workers welfare in a smart manufacturing context and proposes an AI solution to deepen the understanding of human motions in work. Second, this study is a novel attempt for exploiting and integrating multiple Generative AI models for smart manufacturing applications. The recent booming of commercial and open-access AI has raised questions about AI utilization in manufacturing and engineering fields. This work will be a paradigm of effectively using Generical AI tools to simulate human motions for performing manufacturing tasks. Third, this study validates AI-simulated outcomes against real-human data, providing a new means of AI performance evaluation. It will encourage more scientific AI validation approaches, thus benefiting the further advancement of commercial AI. Finally, the proposed AI-based worker motion simulation methodology is generalizable in various factories and manufacturing environments. The key enablers of this study are AI tools and in situ human videos. The AI tools are open-access and easily obtained. When taking this work into a real industrial environment, in situ human videos can be conveniently collected as validation data by placing cameras properly on the factory floors, enabling the extension of our lab-based study to industrial applications.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: N.M. acknowledges the Master’s Opportunity for Research in Engineering (MORE) program at ASU.

References

Guo

Zou

Zuo

Wang

Cheng

(2022). Generating diverse and natural 3d human motions from text. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5152–5161). IEEE.

Jiang

Chen

Liu