Sage Journals: Discover world-class research

Abstract

This study proposes a computer vision framework to monitor worker joint motion from task videos. The framework focuses on landmark features, particularly those associated with participants’ upper and lower limbs, to extract spatial joint movements. By utilizing the Hotelling T-squared statistic, multivariate joint motion distributions were monitored. The application of control chart techniques involved two phases: Phase I (offline) and Phase II (online) monitoring. For implementation, task videos were strategically partitioned into two segments. The first segment was designated for offline training, allowing for the establishment of baseline patterns. The second was allocated for online monitoring, enabling the real-time evaluation of worker demand levels during operational activities. The correlation between the amount of motion and task perception aligns with participants’ ratings from the perception survey, validating the framework’s effectiveness. Understanding how workers interact with products and equipment allows designers to create tools that are easier and more comfortable to use.

Keywords

computer vision joint motion amount posture estimation video analysis

The evaluation and measurement of worker movement is a challenging task, due to difficulties in sensing, tracking, and quantifying it accurately. However, the influence it has on performance is significant. Recent developments have seen the use of in situ videos as valuable tools for real-time monitoring of worker actions, facilitating the data-driven assessment of movement levels. Despite these advancements, substantial hurdles remain in the effective use and analysis of these videos for worker movement monitoring. Existing techniques for processing sensor data related to human movement are largely inefficient. The assessment of physical workers, particularly the quantification of joint movements, remains a complex task. Studies, as noted by Moran and Wallace (2007), have shown varied outcomes in measuring joint movement across different activities like jumping, likely influenced by the variations in joint motion permitted by different eccentric loading conditions. Although in situ videos aid in real-time observation of worker behaviors, processing these videos to extract posture data on a large scale remains a formidable challenge.

To overcome these difficulties, we suggest an automated framework employing computer vision (CV) techniques (Bazarevsky et al., 2020; Lugaresi et al., 2019) for extracting key landmarks from task-related videos for posture analysis. This insight is crucial for ergonomists to provide recommendations on designing work environments and tools that reduce physical strain (Palikhe et al., 2020). Analyzing how workers utilize tools and equipment informs the design of user-friendly, ergonomically optimized products. Furthermore, motion analysis of experienced workers can inform the development of comprehensive training programs, enhancing new employees’ efficiency and safety. Such analysis is also instrumental in highlighting training needs to increase worker skills and performance levels.

This research introduces the use of the Hotelling T-squared (Prokhorov & Hazewinkel, 2001) control chart for systematic tracking of worker demand statistics, recommending its application through Phase I (offline) and Phase II (online) monitoring. Our methodology divides lengthy videos into two segments: one for establishing baseline patterns through offline training and the other for real-time operational monitoring. This dualphase approach aims to proactively manage and mitigate worker demand, enhancing safety and productivity in the workplace.

Our study proposes a CV-based system to monitor and quantify movements of workers’ upper and lower limbs, issuing alerts when movements reach critical thresholds. This system uses joint data from posture estimation to assess movement, integrating CV tools into joint motion analysis to address challenges in worker training and promote further research.

In our analysis, landmarks on limbs were pivotal for assessing motion by calculating positional differences between frames over several steps (0, 2, 4). While selecting specific landmarks can restrict data variety, utilizing all available landmarks has proven to increase sensitivity. Data was treated as a continuous multivariate stream from uniform distributions, appropriate for specific tasks. MediaPipe (Lugaresi et al., 2019) was instrumental in identifying 33 landmarks per frame, focusing on arms and hands. The Hotelling T-squared statistic monitored the multivariate feature distribution across frames, alerting to changes in joint motion.

It was observed that the Root Mean Square Deviation, used for measuring joint motion, was significantly higher in “S” tasks than in “L” tasks, possibly due to inadequate camera visibility impacting posture detection. The Pearson correlation coefficient indicated a 35% higher correlation in “S” tasks compared to “L” tasks for motion amount versus T-squared statistic. Task assessments revealed higher perceived motion demands for “L” tasks (average score of 69.9) compared to “S” tasks (average score of 21.9), aligning with the observed motion data trajectories. While the CV-based assessment of joint motion aligned well with perceived task demands, indicating the method’s effectiveness, limitations exist. The retrospective application of the CV method suggests potential for real-time deployment in field studies, although currently constrained by the fixed camera angles used. Future studies can explore integrating multiple camera angles to enhance motion data comprehensiveness. Further, we aim to design more complex experimental tasks to refine and validate the CV approach, highlighting the technique’s potential for accurate and reliable motion analysis.

Footnotes

Acknowledgements

The authors are thankful for Boyang Xu’s participation in the project’s initialization. We would also like to express our gratitude to the recruited participants who performed the experiments. N.M. acknowledges the Master’s Opportunity for Research in Engineering (MORE) program at ASU.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Bazarevsky

Grishchenko

Raveendran

Zhu

Zhang

Grundmann

(2020). Blazepose: On-device real-time body pose tracking. arXiv preprint arXiv:2006.10204.

Lugaresi

Tang

Nash

McClanahan

Uboweja

Hays

Grundmann

(2019). Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172.

Moran

K. A.

Wallace

E. S.

(2007). Eccentric loading and range of knee joint motion effects on performance enhancement in vertical jumping. Human Movement Science, 26(6), 824–840.

Palikhe

Yirong

Choi

B. Y.

Lee

D.-E.

(2020). Analysis of musculoskeletal disorders and muscle stresses on construction workers’ awkward postures using simulation. Sustainability, 12(14), 5693.

Prokhorov

Hazewinkel

(2001). Hotelling t2-distribution. Encyclopedia of mathematics. Springer.