Abstract
Frequent or prolonged manual material handling (MMH) is a major risk factor for work-related musculoskeletal disorders, which cause considerable health and economic burdens. Assessing physical exposures is essential for identifying high-risk tasks and implementing targeted ergonomic interventions. However, variability in MMH task performance across individuals and work settings complicates physical exposure assessments. Further, conventional tools often suffer from limitations such as bias, discomfort, behavioral interference, and high costs. Noncontact (ambient) methods and automated data collection and analysis present promising alternatives for assessing physical exposure. We investigated the use of vision transformers and recurrent neural networks for non-contact MMH task classification using RGB video for eight simulated MMH tasks. Spatial features were extracted using the Contrastive Language-Image Pre-training vision transformer, then classified by a Bidirectional Long Short-Term Memory model to capture temporal dependencies between video frames. Our model achieved a mean accuracy of 88% in classifying MMH tasks, demonstrating comparable performance to methods using depth cameras or wearable sensors, while potentially offering better scalability and feasibility for real environments. Future work includes improving temporal modeling, integrating task-adapted feature extraction, and validating across more diverse workers and occupational environments.
Keywords
Get full access to this article
View all access options for this article.
