Sage Journals: Discover world-class research

Abstract

Electromyography (EMG) signal analysis is critical in understanding and treating work-related musculoskeletal disorders (WMSDs). Despite the increasing use of EMG signals combined with machine learning to assess biomechanical risks in various occupational settings, a significant shortage of extensive EMG datasets hinders progress. This shortage is primarily due to stringent data management plans and the limited availability of EMG datasets representing occupational tasks. To address this, our research leverages diffusion models to synthesize EMG signals tailored to manual material handling (MMH) tasks, aiming to enrich occupational EMG data repositories while maintaining data privacy. Using a conditional diffusion model with a residual U-Net architecture, we synthesized EMG signals for MMH activities such as pulling, pushing, and lifting. The synthesized data, evaluated across time and frequency domains, demonstrated fidelity to the original EMG signals, capturing distinct patterns and amplitudes characteristic of different tasks. Our findings highlight the potential of diffusion models in generating high-fidelity EMG data, providing a novel solution to the data scarcity challenge in occupational health research.

Keywords

generative model AI machine learning time-series data bio-signals synthesis ergonomics data synthesis

Introduction

Electromyography (EMG) signal analysis is essential in the identification and treatment of work-related musculoskeletal disorders (WMSDs) as it provides insights into muscle function and fatigue (Campanini et al., 2022; Gazzoni et al., 2016). In recent years, there has been a rise in studies leveraging EMG signals, combined with machine learning models, to evaluate biomechanical risk exposures associated with WMSDs in various occupational settings (Donisi et al., 2023; Mudiyanselage et al., 2021). However, the success of these techniques relies heavily on access to extensive EMG datasets of occupational tasks, which are currently in short supply (Phinyomark & Scheme, 2018). This shortage can result in models that are overly tailored to the limited data, leading to overfitting issues that compromise the generalizability and effectiveness of ergonomic assessments and interventions (Alzubaidi et al., 2023; Bansal et al., 2022).

The reason for data shortage is twofold. First, data management plans aimed at protecting patient confidentiality usually limit the sharing and usage of data gathered from human subject (Pereira, 2020). These plans, while necessary for protecting personal information, pose significant challenges to researchers who require access to a broad range of EMG data for their studies. Secondly, while the majority of EMG datasets are dedicated to fields like sports science (Taborri et al., 2020), prosthetics and robotics (Atzori et al., 2014), and human-computer interaction (Sun et al., 2020), there is a noticeable shortfall in datasets specifically aimed at occupational tasks (Bassani et al., 2021). Therefore, there is a pressing need for innovative methods to enrich the repository of EMG datasets associated with workplace ergonomics, while strictly maintaining user data privacy.

Data synthesis, maintaining the statistical characteristics of the original data without collecting data from real-world events or individuals, provides an effective solution to the challenges posed by data shortage. (Park et al., 2018). Networks like Generative Adversarial Networks (GANs; Goodfellow et al., 2014) and Variational Autoencoders (VAEs; (Kingma & Welling, 2013) have been widely adopted for synthesizing new data (Barsoum et al., 2018; Pu et al., 2016; Qing et al., 2023). However, these generative networks often face challenges in generating time-series signals, such as EMG signals, due to issues like temporal coherence and the complexity of capturing dynamic physiological variations (Lin et al., 2020). In contrast, diffusion model, with its distinctive method of incrementally refining signals from noise, provides an innovative approach for generating time-dependent data (Ho et al., 2020). Its capability to effectively capture the intricate patterns and dynamics inherent in time-series data makes it potentially suitable for generating EMG signals.

This research aims to leverage the inverse process of diffusion models to synthesize EMG signals tailored to occupational settings. Specifically, we focus on widespread work scenario-manual material handling (MMH), characterized by frequent tasks such as pushing and lifting. The goal is to generate EMG signals conditioned on the input task type. This effort seeks to overcome the challenges associated with data privacy concerns and to enrich the pool of available EMG data for occupational studies.

Method

The training dataset used to construct the proposed diffusion model originated from a prior study (Chen, 2022). This dataset includes raw EMG data obtained from participants in four typical movements during MMH: pulling, pushing, squat/stoop lifting. The pulling and pushing tasks involved maneuvering a loaded handcart over a set distance, while the lifting tasks required adopting squat or stoop lifting techniques to raise a load slightly above the floor and maintaining these postures for a period. EMG signals were collected from the lumbar erector spinae. The raw data were first filtered with a fourth order Butterworth filter at cutoff frequencies 10 to 500 Hz, followed by segmenting the continuous EMG recordings into 3-second intervals using sliding windows techniques. These segments were transformed into 2D spectrum images via a short-time Fourier transform.

We then trained a conditional diffusion model with a residual U-Net architecture to synthesize EMG signals of different MMH tasks. This network architecture can efficiently capture long-term dependencies and temporal dynamics within the data due to its deep residual blocks. The diffusion model utilized 2D spectral images as both inputs and outputs, taking advantage of U-Net’s proficiency in image processing. The model’s loss function combined both spectral and temporal components: the spectral loss measured the differences between input and reconstructed spectral images; and the temporal loss aimed to reduce the discrepancies between the original time-series data and its spectral reconversion. After training, the output spectral images were converted back to 1-D signals, mimicking EMG signals in time domain among various activities.

Results

We developed an evaluation framework to analyze our synthesized data across both time and frequency domains. Recognizing that different MMH tasks produce distinct signal patterns and amplitudes, we conducted a class-specific comparison between our synthesized outputs and the original data. In the frequency domain (Figure 1), we observed that for both original data and synthesized data, “pull” and “push” exhibit periodic patterns, reflecting fluctuating muscle efforts required around the lumbar spine. “Squat lifting” and “stoop lifting” tasks are characterized by a broad uniform frequency distribution spanning from 10 to 200 Hz, mirroring the postural and muscle effort maintenance required during these tasks. The intensity of frequencies for “squat lifting” task is approximately 10⁻³, while for “stoop lifting” task is around 10⁻⁴.

Figure 1.

Spectral comparison among four tasks.

The spectrums were then converted to the time domain for further evaluations (Figure 2): for both original data and synthesized data, “pull” and “push” signals demonstrate periodic fluctuations around a baseline, with ‘pull’ task marked by an amplitude of approximately 4 × 10⁻³ and “push” task by 2 × 10⁻³. Meanwhile, “squat lifting” and “stoop lifting” exhibit a uniform distribution around the zero point, with ‘squat lifting’ task reaching an amplitude of 10⁻², significantly higher than “stoop lifting” at 10⁻³.

Figure 2.

Time domain comparison among four tasks.

Through the visual evaluation, it appears the patterns and amplitudes of the synthesized outputs align well with those of the original data, confirming the fidelity of the proposed synthesis process in replicating EMG signal variations in both the frequency and time domains.

Conclusion

This study assessed the effectiveness of conditional diffusion models in generating high-fidelity EMG signals associated with MMH tasks. The results present a novel approach to address the significant challenge of data shortage in occupational health research by generating non-identified EMG signals. Aggregating these non-identified EMG signals could enhance the creation of a more comprehensive dataset covering various occupational tasks. It is important to note that in the current study, the evaluation of the synthesized data relied solely on visual inspection and comparison. Future research should also incorporate analytical scrutiny of the synthesized data using statistical metrics. Additionally, the potential of combining synthesized data with machine learning techniques for creating a robust biomechanical risk exposure assessment tool warrants further investigation.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This manuscript is based on work supported by the National Science Foundation under Grant #2024688.

ORCID iDs

Liwei Qing

Sehee Jung

Bingyi Su

References

Alzubaidi

Bai

Al-Sabaawi

Santamaría

Albahri

A. S.

Al-dabbagh

B. S. N.

Fadhel

M. A.

Manoufali

Zhang

Al-Timemy

A. H.

(2023). A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications. Journal of Big Data, 10(1), 46.

Atzori

Gijsberts

Castellini

Caputo

Hager

A.-G. M.

Elsig

Giatsidis

Bassetto

Müller

(2014). Electromyography data for non-invasive naturally-controlled robotic hand prostheses. Scientific Data, 1(1), 1–13.

Bansal

M. A.

Sharma

D. R.

Kathuria

D. M.

(2022). A systematic review on data scarcity problem in deep learning: solution and applications. ACM Computing Surveys (CSUR), 54(10s), 1–29.

Barsoum

Kender

Liu

(2018). Hp-gan: Probabilistic 3D human motion prediction via gan [Conference session]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1418–1427). IEEE.

Bassani

Filippeschi

Avizzano

C. A.

(2021). A dataset of human motion and muscular activities in manual material handling tasks for biomechanical and ergonomic analyses. IEEE Sensors Journal, 21(21), 24731–24739.

Campanini

Merlo

Disselhorst-Klug

Mesin

Muceli

Merletti

(2022). Fundamental concepts of bipolar and high-Density surface EMG understanding and teaching for clinical, occupational, and sport applications: Origin, detection, and main errors. Sensors, 22(11), 4150.

Chen

. (2022). Studies of enhanced instructional modalities and biosignals on user performance in virtual reality systems. Dissertations & Theses. North Carolina State University, Raleigh (2723856081).

Donisi

Jacob

Guerrini

Prisco

Esposito

Cesarelli

Amato

Gargiulo

(2023). sEMG spectral analysis and machine learning algorithms are able to Discriminate biomechanical risk classes associated with manual material liftings. Bioengineering, 10(9), 1103.

Gazzoni

Afsharipour

Merletti

(2016). Surface EMG in ergonomics and occupational medicine. Surface Electromyography: Physiology, Engineering, and Applications, (pp.361–391).

10.

Goodfellow

Pouget-Abadie

Mirza

Warde-Farley

Ozair

Courville

Bengio

(2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27.

11.

Jain

Abbeel

(2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840–6851.

12.

Kingma

D. P.

Welling

(2013). Auto-encoding variational bayes. ArXiv Preprint ArXiv:1312.6114.

13.

Lin

Jain

Wang

Fanti

Sekar

(2020). Using gans for sharing networked time series data: Challenges, initial promise, and open questions [Conference session]. Proceedings of the ACM Internet Measurement Conference (pp. 464–483). ACM.

14.

Mudiyanselage

S. E.

Nguyen

P. H. D.

Rajabi

M. S.

Akhavian

(2021). Automated workers’ ergonomic risk assessment in manual material handling using sEMG wearable sensors and machine learning. Electronics, 10(20), 2558.

15.

Park

Mohammadi

Gorde

Jajodia

Park

Kim

(2018). Data synthesis based on generative adversarial networks. ArXiv Preprint ArXiv:1806.03384.

16.

Pereira

(2020). Ethical challenges in collecting and analysing biometric data. Ethical Challenges in Collecting and Analysing Biometric Data, (pp.108–114).

17.

Phinyomark

Scheme

(2018). EMG pattern recognition in the era of big data and deep learning. Big Data and Cognitive Computing, 2(3), 21.

18.

Gan

Henao

Yuan

Stevens

Carin

(2016). Variational autoencoder for deep learning of images, labels and captions. Advances in Neural Information Processing Systems, 29.

19.

Qing

Xie

Jung

Wang

Fitts

E. P.

(2023). A Conditional Variational Auto-encoder Model for Reducing Musculoskeletal Disorder Risk during a Human-Robot Collaboration Task. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 67(1), 425–431.

20.

Sun

Kong

Jiang

Tao

Chen

(2020). Intelligent human computer interaction based on non redundant EMG signal. Alexandria Engineering Journal, 59(3), 1149–1157.

21.

Taborri

Keogh

Kos

Santuz

Umek

Urbanczyk

van der Kruk

Rossi

(2020). Sport biomechanics applications using inertial, force, and EMG sensors: A literature overview. Applied Bionics and Biomechanics, 2020(1), 2041549.