Abstract
Motion estimation from surface electromyogram (sEMG) signals has been studied extensively over the past decades. Nevertheless, it is challenging for novel subjects to adapt to a trained estimation model since sEMG signals inherently contain user-dependent features that interfere with the estimation model and reduce the estimation accuracy. To achieve accurate motion estimation, a strategy of correlated components analysis-based random forest regressor (CorrCA-RFG) was proposed. The proposed CorrCA-RFG firstly uses CorrCA to extract user-independent features related to motion among multiple subjects, and obtain the projection vectors from sEMG data to the motion-dependent feature space. Then, the RFG is trained by the user-independent sEMG features and establishes the estimation model. To validate the effectiveness of the proposed CorrCA-RFG, this strategy was tested on a public dataset and an experimental study and compared to three methods, namely random forest regressor (RFG), canonical components analysis-based random forest regressor (CCA-RFG), and a convolutional neural network (CNN). For both cases, the estimation performance of the CorrCA-RFG outperformed the other three methods. These results demonstrate that the proposed CorrCA-RFG enables robust motion estimation by extracting user-independent sEMG features.
Keywords
Introduction
The surface electromyogram (sEMG) signals detected by surface electrodes are the representation of the electric potential field generated by the muscle fiber contraction, 1 which is non-invasive and contains the essential information on human motion.2,3 Continuous motion estimation from sEMG is effective to ensure accuracy and safety in many emerging applications such as human-robot-interaction (HRI), robotic systems for rehabilitation, assistance, and performance enhancement.4–6
However, due to the user-dependent nature of sEMG signals mainly affected by factors of the subcutaneous tissue and the physiological cross-sectional area of the muscle7,8 the measured sEMG signals are individual varying differently from different users even performing the same motion. The user-dependent nature of sEMG leads to user-dependent estimation models9–11 which have less generalization and should be retrained for a new user. The retraining process is laborious and time-consuming, which hinders the application of sEMG-based HRI for commercial robotic devices.12–15
To address these issues, many efforts have been made to improve the generalization of the estimation models in cross users. Firstly, some studies extended the size and diversity (more training users) of the training data to enhance the generalization ability of machine learning models, 16 such as random forest regression (RFG) and support vector regression (SVR), which is costly. In these methods, the generalization performance heavily depended on the training dataset. Secondly, deep learning and transfer learning methods have been implemented to estimate human motion. Deep learning methods have the ability to automatically learn features that can share similar distributions across various subjects.17,18 Yang et al. 19 proposed a novel convolutional neural network (CNN) structure with the capacity of generalization to estimate wrist movement, in which the CNN contains a large number of parameters. Bao et al. 20 designed a two-stream CNN with a complexity structure for supervised domain adaptation to reduce the domain shift effect. Transfer learning aims to explore the knowledge from the source domain and use the knowledge to a target domain. 21 To estimate the elbow torque based on sEMG signals, Jiang et al. 22 proposed a correlation-based data weighting scheme based on transfer learning, which is unsupervised in the modeling stage. However, the above methods require a large dataset or high computational cost.
To further improve the generalization of the estimation models, some researchers have begun to use user-independent sEMG features to establish the estimation models.23,24 The user-independent sEMG features were related to the motion with strong correlations among multiple users. Especially, they possibly share similar distributions across different subjects who perform the same motion. Therefore, the user-independent features are beneficial for improving the generalization. Some studies applied canonical components analysis (CCA) to extract the user-independent features and built generalized models based on these features to estimate the cross-user motion. Khushaba 25 applied CCA to project different users’ data onto a unified-style space to overcome individual differences. Xue et al. 26 used CCA to extract the inherent user-independent properties of sEMG signals and applied optimal transport to further reduce the discrepancies between the transformed features from the training and testing set. In their studies, the CCA requires the canonical projection vectors to be orthogonal, which is not a reasonable assumption for the sEMG analysis. In addition, the estimation performance of CCA-based methods depends on the chosen expert set whose choosing standard has not been well defined and is usually given based on subjective experience.
In this study, we focused on extracting user-independent features without any expert sets and building a new estimation strategy for improving the performance of continuous motion estimation from sEMG signals. Since correlated components analysis (CorrCA) can help extract components that are maximally correlated among subjects, we firstly used the CorrCA to extract the user-independent sEMG features. In addition, RFG can achieve excellent prediction performance using a smaller dataset compared to the other machine learning methods. 27 Based on the above advantages of the CorrCA and RFG, we propose a new estimation strategy: CorrCA-RFG. Firstly, a single set of linear projections are yielded from multiple subjects’ sEMG signals using CorrCA. Projection vectors can transform all subjects’ datasets into the same space to obtain user-independent features. Then these extracted features are applied to construct an RFG model to estimate human motion. To validate the effectiveness of the proposed method, we tested the CorrCA-RFG for estimating the knee angles in both a public dataset 28 and an own experimental study. The main contribution of this study is to provide a new strategy, namely CorrCA-RFG, from the viewpoint of extracting the user-independent sEMG features for the generalization of sEMG-based continuous motion estimation across multiple subjects.
The remainder of this paper is organized as follows. Section 2 presents the proposed CorrCA-RFG strategy. Section 3 describes the experimental study and the public dataset and presents results both on the experiment and public dataset. Section 4 and 5 give the discussion and conclusion of this study, respectively.
Methods
In this section, the CorrCA-RFG strategy is firstly described in details, as shown in Figure 1. Secondly, the processing of sEMG signals, sEMG features extraction and the performance measurement are introduced, respectively.

The designed CorrCA-RFG strategy for user-independent sEMG-based motion estimation.
Correlated correlation analysis
The CorrCA tries to identify a single set of linear projections which can maximize the correlation among the
The goal of CorrCA is to seek a projection vector
where
To seek the projection vector, inter-subject correlation (ISC) was required maximum and defined as,
where
Inserting (1) into (2) gives the following equation,
where
where
To maximize the ISC, differentiation of (3) with respect to the projection vector
Supposing
CorrCA-RFG strategy
Taking advantage of the CorrCA for multiple subjects analysis, in this paper, we proposed the CorrCA-RFG strategy for sEMG-based motion estimation in cross-subject shown in Figure 1. Specifically, to reject noise and unimportant sEMG signals for each subject,
30
sEMG features were firstly extracted per each channel according to the gait cycle. Secondly, the features from different subjects in the same channel were concatenated into matrices,
Signal processing
A Butterworth band-pass filter (10–500 Hz) was applied to pre-process the sEMG signals before extracting sEMG features.31,32 Next, the pre-processed sEMG signals were normalized using the maximum amplitude. Then, an overlapping time window with a sliding window was applied to extract features from the normalized signals. To ensure the same number of feature samples over a gait cycle for all subjects, in this study, we proposed that sEMG features were extracted by segmenting with the gait cycle. Assuming
where
Feature extraction
In each time window, 12 time-domain features and 2 frequency-domain features were extracted from each of the six sEMG channels. Specifically, the time-domain features consist of integrated EMG, mean absolute value, mean, root mean square, variance, kurtosis, skewness, zero crossing, slop sign change, and auto-regressive model coefficients. The frequency-domain features include median frequency and mean frequency.
33
For each channel, the extracted features were concatenated into a single vector as one sample,
Performance measurement
The normal root-mean-squared error (NRMSE) and correlation coefficient (CC) criteria were considered to assess the estimation performance. The NRMSE represents the errors between the measurement and the estimation. The CC represents the match between the measurement and the estimation. The values of NRMSE and CC are close to 0 and 1, respectively, which indicates good estimation performance. The NRMSE is defined as:
where
The CC is defined as:
where
Experiments and results
This section presents the comparison results among the proposed CorrCA-RFG, conventional RFG, CCA-RFG, and CNN for knee angles estimation in both a public dataset and an experimental study. In the conventional RFG, the original sEMG features directly were fed into the RFG to train the model. In the CCA-RFG, for each subject, the CCA was used to extract the training features correlated with the expert features from the original sEMG features. 25 Then, these extracted training features were fed into the RFG. A leave-one-subject-out cross-validation method was applied for cross validation. Specifically, one subject was picked as a novel subject for testing each time. Data from all remaining subjects were used for training.
Data acquisition
In this paper, a public subject dataset and an experimental study were applied to validate and analyze the proposed CorrCA-RFG strategy.
Public subject dataset
In the public datasets, gait data of 10 healthy subjects (all males) running on a treadmill at speeds of 2 m/s contains marker trajectories, ground reaction forces and moments, and sEMG signals. Totally 54 reflective markers were placed on each subject, whose positions were collected with 100 Hz using 8 Vicon MX40+ cameras. A Delsys Bagnoli System was used to measure the sEMG signals from 10 muscles: soleus, lateral gastrocnemius, medial gastrocnemius, tibialis anterior, biceps femoris long head, vastus medialis, vastus lateralis, rectus femoris, gluteus maximus, and gluteus medius. More details about the public dataset were provided in Hamner and Delp. 28 In this study, to be consistent with our experimental study, only six-sEMG signals mainly related to knee flexion/extension were used, including biceps femoris (BF), lateral gastrocnemius (LG), medial gastrocnemius (MG), semitendinosus (ST), vastus lateral (VL), and vastus medial (VM).
Experimental study
In the experimental study, 12 healthy subjects (8 males and 4 females; Age: 23.83 ± 3.24; Weight: 66.83 ± 11.28 kg; Height: 172.67 ± 7.36 cm) were recruited to perform walking experiments on a treadmill. All of them knew the process and signed the informed consent before experiments. The experiments were proved by the local ethics committee of Nankai University.
Each subject was asked to walk for 1 min at the speed of 1.25 m/s per trial. Totally 11 trials were performed on each subject while leaving 3 min rest at the interval to avoid muscle fatigue. During the experiments, knee angle signals and sEMG signals were recorded by the device shown in Figure 2. The experimental study measured knee angle signals and sEMG signals of 12 subjects walking on a treadmill by the experiment device shown in Figure 2. Knee angle signals were recorded with 2 inertial measurement units (IMUs) with a data acquisition frequency of 100 Hz. Simultaneously, sEMG signals from 6 muscles, motioned above, were collected by the Delsys Bagnoli system with a sampling rate of 5 kHz.

Experimental setup: (a) schematic diagram of the experimental setup and (b) photograph of the experiment.
Estimation results on the public dataset
We first tested the proposed CorrCA-RFG in the public dataset and compared its results to the other three methods. The mean values of NRMSE and CC were presented in Table 1, which showed that:
Across all subjects, the NRMSE values of CorrCA-RFG were lower than the RFG on 7 out of the 10 subjects. Meanwhile, the CC values of CorrCA-RFG were higher than the RFG on 6 out of the 10 subjects, suggesting the proposed CorrCA was effective.
The CorrCA-RFG outperformed CCA-RFG (lower NRMSE values and higher CC values) on 9 out of the 10 subjects, suggesting that the CorrCA could be more effective than CCA.
The NRMSE values of CorrCA-RFG were lower than the CNN on 7 out of the CNN. And the CC values of CorrCA-RFG were higher than CNN on 8 out of the 10 subjects, suggesting that the CorrCA-RFG could be more effective than CNN.
Evaluation criteria of the accuracy of knee angles estimated by CorrCA-RFG, RFG, CCA-RFG, and CNN for each subject on the public dataset.
The case with the highest estimation performance for each subject is shown in bold.
Figure 3 showed the distribution of the evaluation criteria of the knee angles estimation across all subjects (

Distribution of the evaluation criteria of the knee angles estimation across all subjects (
Estimation results on experimental study
We also tested the CorrCA-RFG on the experimental study and compared its performance with the other three methods. Knee angles estimated by CorrCA-RFG, RFG, CCA-RFG, and CNN is illustrated in Figure 4. The estimation profile by CorrCA-RFG is closer to reference angles calculated from IMUs. The CCA-RFG has the least smooth angles estimation profile, with sudden deviations and spikes.

Normalized knee angles of subject-11 on the experimental study. Dashed and solid lines are reference and estimated angles, averaged across 183 strides during the subject walking.
For each subject, the average NRMSE values and CC values were presented in Table 2, which showed that:
Across all subjects, the NRMSE values of the CorrCA-RFG were lower than the CorrCA-RFG on all subjects. Meanwhile, CC values of CorrCA-RFG were higher than the RFG on 11 out of the 12 subjects, suggesting again the proposed CorrCA was effective.
The CorrCA-RFG outperformed CCA-RFG (lower NRMSE values and higher CC values) on all subjects, suggesting that the CorrCA could be more effective than CCA.
The CorrCA-RFG outperformed CNN (lower NRMSE values and higher CC values) on all subjects, indicating that the CorrCA-RFG could be more effective than CNN.
Evaluation criteria of the accuracy of knee angles estimated by CorrCA-RFG, RFG, CCA-RFG, and CNN for each subject on the experimental study.
The case with the highest estimation performance for each subject is shown in bold.
Figure 5 displayed the distribution of the evaluation criteria of the estimation of knee angles across all subjects of experimental study for the four methods. The NRMSE and CC values of CorrCA-RFG distributed more centrally and closely to 17% and 80%, respectively, suggesting that the estimation of CorrCA-RFG were more robust across all subjects. The NRMSE and CC values of CCA-RFG distributed widely, indicating that the CCA-RFG was unstable. Outliers appeared on NRMSE values (RFG and CNN) and CC values (CNN) was associated with the unstableness of the methods. The Kruskal-Wallis test was also conducted to identify differences in NRMSE and CC between the proposed CorrCA-RFG and the other three methods in the experimental study. The NRMSE values of CorrCA-RFG had statistically significant differences with respect to RFG (

Distribution of the evaluation criteria of the knee angles estimation across all subjects (
It is worth noting that for a small number of subjects (both in the public dataset and the experimental study), the CorrCA did not improve the estimation performance. Some possible reasons are discussed in the next Section.
Discussion
In this paper, we have provided a new strategy from the viewpoint of extracting the user-independent sEMG features, namely CorrCA-RFG, for the generalization of sEMG-based continuous motion estimation over cross subjects. Although the CorrCA has been used to analyze electroencephalogram (EEG) signals recorded from multiple subjects or multiple channels,29,34,35 there are no studies of the effectiveness of the CorrCA on sEMG signals. The CorrCA can transfer the features from multiple subjects into the same space with any expert sets. To validate the effectiveness of the proposed CorrCA-RFG, it was tested both on a public dataset and an experimental study. We evaluated the CorrCA-RFG in terms of two criteria, namely NRMSE and CC, and compared it to other three methods: RFG, CCA-RFG and CNN. The estimation results presented in Figures 3 to 5 and Tables 1 and 2 indicated that the proposed CorrCA-RFG could improve the estimation performance compared to RFG, CCA-RFG or CNN. In addition, the distributions of NRMSE and CC values of the CorrCA-RFG were more centrally than that of the other three methods, suggesting that results of the CorrCA-RFG were more robust.
To qualitatively compare the sEMG features before and after CorrCA, a nonlinear dimensionality reduction technique called

The NRMSE and CC values of CCA-RFG distributed more widely than these of CorrCA-RFG and the RFG (shown in Figures 3 and 5), indicating that the estimation performance of CCA-RFG was worse than that of CorrCA-RFG, even the RFG. The main reason may be that the CorrCA yields a single set of linear projections from multiple subjects to transfer their sEMG features into the same space, whereas the CCA computed a different set of projection vectors for each subject to transfer the sEMG features into different spaces. Features from different spaces possibly share different distributions across different subjects. In addition, due to the limitation of CCA to find the projection vector only between two data matrices, an expert set should be selected to yield the projection vectors for multiple subjects. However, how to choose the expert set has not been well defined and is usually given based on experience.
The comparison between the CorrCA-RFG and CNN showed that the estimation performance of CNN is unstable. High uncertainties possibly exist in the training process of CNN, which leads to the learned features not always sharing a similar distribution.
This paper has proposed a CorrCA-RFG for the generalization of sEMG-based motion estimation over cross subjects, which has two desirable beneficial properties: (1) it extracts user-independent sEMG features among multiple subjects directly without any expert sets; and (2) it does not need any information from the new subjects. Knee angles estimation on the public dataset and the own experimental study verified the effectiveness of the CorrCA-RFG.
However, the CorrCA-RFG still has some limitations. It actually degrade the estimation accuracy for a small number of subjects from the public dataset or the experimental study. It is possible that motion-dependent components in sEMG signals may become more complex with some specific subjects and can not be adequately extracted by CorrCA for these subjects. For these specific subjects, the projection vectors obtained from different subjects may even cause the concept shift among transformed features. This could be one of the reasons why the proposed CorrCA-RFG could improve the estimation performance on most but not all subjects in the public dataset and the experimental study. Another possible reason may be that some outliers were in these subjects. 37 The CorrCA-RFG could not deal with these outliers.
Conclusion
This study focused on improving the accuracy of human joint angle estimation for more robust practical applications with respect to user-dependent influence. For this purpose, it provides a new strategy, namely CorrCA-RFG, from the viewpoint of extracting the user-independent sEMG features without any expert sets. The proposed strategy was respectively evaluated with a public dataset (containing 10 healthy subjects) and an experimental study (containing 12 healthy subjects). The results both of the public dataset and the experimental study show that the proposed CorrCA-RFG outperforms the other three compared methods, including RFG, CCA-RFG, and CNN. These results indicate the proposed estimation strategy has significant potential benefits over the current estimation models for joint angle estimation with robotic devices in practice.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (U1913208, 61873135) and the Chinese fundamental research funds for the central universities.
