Feature subset evaluation method for upper limb rehabilitation training based on joint feature discernibility

Abstract

A feature subset discernibility hybrid evaluation method using Fisher score based on joint feature and support vector machine is proposed for the feature selection problem of the upper limb rehabilitation training motion of Brunnstrom 4–5 stage patients. In this method, the joint feature is introduced to evaluate the discernibility between classes due to the joint effect of both candidate and selected features. A feature subset search strategy is used to search a set of candidate feature subsets. The Fisher score based on joint feature method is used to evaluate the candidate feature subsets and the best subset is selected as a new selected feature subset. From these selected subsets such as obtained by the above process, the subset with the best performance of support vector machine classification is finally selected as the optimal feature subset. Experiments were carried out on the upper limb routine rehabilitation training samples of the Brunnstrom 4–5 stage. Compared with both the F-score and the discernibility of feature subset methods, the experimental results show the effectiveness and feasibility of the proposed method which can obtain the feature subsets with higher accuracy and smaller feature dimension.

Keywords

Rehabilitation training motion recognition hybrid feature subset evaluation joint feature discernibility Fisher score based on joint feature support vector machine

Introduction

With the increasing number of aging population in the world, the number of stroke patients is increasing; 85% of stroke patients have upper limb dysfunction in the early stage of the disease,¹ which seriously affects their quality of life.² Traditional rehabilitation training is mostly assisted by physiotherapists and large training equipment.³ This method can effectively complete the training. However, it is monotonous and difficult to evaluate the recovery effect in real time.⁴

With the continuous innovation and development of human sensing technology, more and more scholars have applied the human motion sensing technology to the rehabilitation training of stroke patients.⁵ However, the method is not suitable for all rehabilitation stages. According to the Brunnstrom stage theory, in the 4–5 stage, the patients begin to disengage from the common movements.⁶ At this time, the range of the movement of patients is increased, and some training can be completed independently, so it is valuable to use the body sensing technology to provide effective information for evaluation on training motions.

The key to the evaluation method of rehabilitation training is how to use the human body sensing technology to identify the motions of patients, in which the motion recognition algorithm is used to recognize the rehabilitation training motions of the patient and determine whether these motions meet the standard of rehabilitation training. If these motions done by the patients meet the requirements, they are identified qualified, otherwise unqualified.

Support vector machine (SVM) is popular among many classification methods because of its excellent characteristics and generalization ability.⁷ Although it has many outstanding advantages, its classification and generalization performance are often affected by the high-dimension feature vectors of samples.

Feature selection is an important tool to improve the performance of SVMs. It can eliminate redundant features, reduce model complexity effectively, avoid over-fitting, and improve the classification and generalization performance of SVM. It can also effectively reduce the calculation and improve the real-time performance of remote health treatment system.

In the feature selection methods, the Filter method is popular among researchers, but most studies of feature subset which involve Filter method only consider the effect of individual candidate feature on the discernibility between classes, but ignore the impact of the joint effect of candidate features and selected features.

For this problem, this article introduces the distance between joint features into the framework of F-score method, and proposes a feature subset discernibility hybrid evaluation method by using Fisher score based on joint feature and support vector machine (FSJF-SVM). In this method, FSJF is used as the evaluation criteria in Filter method, and SVM is used as the learning algorithm for evaluation in the Wrapper method.

This article starts with a brief introduction of the basic framework of F-score feature subset evaluation method, the multi-class F-score evaluation method, and the discernibility of feature subset (DFS) method. The “Problem formulation” section describes the existing problems of the DFS method. The “FSJF evaluation method” section describes the proposed method. The “Experimental design” section presents the rehabilitation training data collection and experimental design. The “Experimental results and analyses” section performs analyses on the experimental results, and conclusions are given in the “Conclusion” section.

Related work

Feature selection includes two main steps: feature subset search strategy and feature subset evaluation criterion.⁸ Commonly used feature subset search strategies include sequential forward search (SFS) and sequential backward search (SBS).⁹ The feature subset evaluation criterion can be further divided into three categories: Embedded method, Filter method, and Wrapper method.¹⁰ The Embedded method embeds the process of feature selection into the construction of a learning model. For example, the decision tree algorithm, Induction of Decision Tree 3 (ID3), itself is an embedded feature selection algorithm.¹¹ The Embedded method avoids the relearning of the learning machine to evaluate every feature, which is more efficient but difficult to construct a suitable function to optimize the model.¹²

The Filter method defines the importance of each feature according to its effect on the discernibility between classes and selects the important features to constitute a feature subset. Dash and Liu¹³ proposed an inconsistent computing method which is independent of feature search process and can improve the efficiency of feature selection. Battiti¹⁴ first applied the information theory to feature selection and proposed a feature selection algorithm for neural networks, Mutual Information Feature Selection (MIFS). Kwak and Choi¹⁵ proposed the feature selection algorithm based on Parzen window’s mutual information. The Filter method is independent of the learning process and the time efficiency is high, but it does not interact with the classifier.

The Wrapper method is closely related to the learning algorithm, and it uses the performance of learning algorithm as the evaluation criterion of the feature subset. The Wrapper method is first proposed by Kohavi and John;¹⁶ then Yang et al.¹⁷ proposed the Wrapper feature selection algorithm based on the output random disturbance, which sorts the features by the importance of the sensitivity measurement attribute generated by the random disturbance; and then selects the feature subset that meets the requirement of the dimension. The Wrapper method is interactive with the specific classifier and has a high accuracy, but it needs to retrain the classifier for each candidate feature subset. Its computational complexity is high and the efficiency is low.

Both the Filter and the Wrapper methods have their own advantages and disadvantages. The hybrid feature evaluation method combining the two methods integrates the efficiency of the Filter method and the high accuracy of the Wrapper method, which can lead to obtaining a better feature subset.

Problem formulation

The Filter evaluation method in the hybrid method has great influence on the final feature selection result. The classic Fisher feature subset evaluation criterion (F-score method), hereinafter referred to as the F-score method, has set up a framework of Filter-type evaluation method. This section will introduce the related research of F-score method and its existing problems.

F-score method

F-score can measure the separating capacity of a single feature between two sets of samples.¹⁸ Given a binary classification problem on m-dimensional space $R^{m}$ , $n_{+}$ and $n_{-}$ are the numbers of positive and negative instances, respectively. The F-score of the ith feature is shown in equation (1)

F_{i} = \frac{{({\bar{x}}_{i}^{(+)} - {\bar{x}}_{i})}^{2} + {({\bar{x}}_{i}^{(-)} - {\bar{x}}_{i})}^{2}}{\frac{1}{n_{+} - 1} \sum_{k = 1}^{n_{+}} {(x_{ki}^{(+)} - {\bar{x}}_{i}^{(+)})}^{2} + \frac{1}{n_{-} - 1} \sum_{k = 1}^{n_{-}} {(x_{ki}^{(-)} - {\bar{x}}_{i}^{(-)})}^{2}}

(1)

where ${\bar{x}}_{i}$ , ${\bar{x}}_{i}^{(+)}$ , and ${\bar{x}}_{i}^{(-)}$ are the means of the ith feature on the whole, positive, and negative sample datasets, respectively. $x_{ki}^{(+)}$ and $x_{ki}^{(-)}$ are the ith feature of the kth instance of positive and negative, respectively.

The F-score method cannot be directly used for multi-class classification problems (MCPs), and a multi-class F-score method is proposed to solve,¹⁹ which is described as follows. Given an MCP on m-dimensional space $R^{m}$ , if the instance number of the jth class is $n_{j}$ , $j = 1, \dots, l$ , then the multi-class F-score of the ith feature is defined as

{F_{i}}^{*} = \frac{\sum_{j = 1}^{l} {({\bar{x}}_{i}^{(j)} - {\bar{x}}_{i})}^{2}}{\sum_{j = 1}^{l} \frac{1}{n_{j} - 1} \sum_{k = 1}^{n_{j}} {(x_{k, i}^{(j)} - {\bar{x}}_{i}^{(j)})}^{2}}

(2)

where ${\bar{x}}_{i}$ and ${\bar{x}}_{i}^{(j)}$ are the means of the ith feature on the whole and the sample dataset of jth class, respectively. $x_{k, i}^{(j)}$ is the ith feature of the kth instance of the jth class.

The F-score method has successfully established a framework of the Filter method, but this F-score method and most studies of feature selection involving Filter method only consider the effect of individual candidate feature on the discernibility between classes, but the joint effect of candidate features and selected ones on the discernibility between classes is ignored. Xie and Xie²⁰ proposed the DFS method, which attempts to introduce the joint effect into the feature subset evaluation method.

Given a l-class classification problem on m-dimensional space $R^{m}$ . The number of the training samples is n, and the number of instances of the jth class is $n_{j}$ , $j = 1, \dots, l$ . The DFS method of the feature subset with I $(I = 1, \dots, m)$ features is defined as

{DFS}_{I}^{(l)} = \frac{\sum_{j = 1}^{l} \sum_{i = 1}^{I} ({({\bar{x}}_{i}^{(j)} - {\bar{x}}_{i})}^{2})}{\sum_{j = 1}^{l} \frac{1}{n_{j} - 1} \sum_{k = 1}^{n_{j}} (\sum_{i = 1}^{I} {(x_{k, i}^{(j)} - {\bar{x}}_{i}^{(j)})}^{2})}

(3)

where each variable is the same as in equations (1) and (2).

Problem description

The DFS method proposes a way to introduce the joint effect of the candidate and selected features into the evaluation method, but there are still some defects.

We transform the DFS method of equation (3) into that of equation (4) as shown below

DF {S_{I}}^{(l)} = \frac{\sum_{i = 1}^{l} \sum_{j = 1}^{I} ({({\bar{x}}_{i}^{(j)} - {\bar{x}}_{i})}^{2})}{\sum_{i = 1}^{I} \sum_{j = 1}^{l} \frac{1}{n_{j} - 1} \sum_{k = 1}^{n_{j}} {(x_{k, i}^{(j)} - {\bar{x}}_{i}^{(j)})}^{2}}

(4)

By comparing equation (4) with equation (2), it is easy to find that the numerator and the denominator of DFS are derived from the addition of numerator and denominator of F-score method of each feature, respectively. Especially, when I = 1, the DFS method is completely equivalent to the multi-class F-score method.

When the forward search strategy is combined with the DFS method and I = 1, we assume that $x_{i}$ , $y_{i}$ , and $k_{i}$ are the numerator, denominator, and DFS score of the ith feature, respectively. Features can be sorted from large to small according to the evaluation results of DFS method as follows

k_{1} = \frac{x_{1}}{y_{1}}, k_{2} = \frac{x_{2}}{y_{2}}, \dots, k_{i} = \frac{x_{i}}{y_{i}}, \dots, k_{n} = \frac{x_{n}}{y_{n}}

(5)

Assume that

l_{i} = \frac{y_{i}}{y_{1}}

(6)

when I = 2, the DFS method is as follows

\frac{x_{1} + x_{i}}{y_{1} + y_{i}} \to k_{1} + \frac{1}{\frac{1}{l_{i}} + 1} (k_{i} - k_{1})

(7)

where $k_{i} \leq k_{1}$ . When $k_{i} < k_{1}$ , the value of equation (7) decreases as the value of $l_{i}$ increases. The DFS method is determined by $l_{i}$ and $k_{i}$ . From equation (6), we can see that $l_{i}$ can represent the magnitude of the ith feature, but the magnitude of feature has nothing to do with the discernibility between classes. On one hand, when a feature uses the units of different scales, it will also change the evaluation result of DFS. On the other hand, when each feature has the same value of l, the DFS evaluation result is the same as that of the multi-class F-score.

Therefore, the DFS method does not introduce the joint effect of features into the feature subset evaluation method but the magnitude information of features instead. In addition, when DFS faces the problem of more than two classes, it will appear as shown in Figure 1. In such condition, the result of DFS method is not too bad, but the distinction ability of this feature subset between these four classes is not good.

Figure 1.

Multi-class overlapping problem.

FSJF evaluation method

From the previous discussion, we can see that the above method fails to introduce the joint effect of the candidate and selected features on the discernibility between classes into feature subset evaluation. To solve this problem, we put forward an FSJF evaluation method, in which we first introduce the concept of joint feature and then we use the distance between joint features to measure the distinction ability of joint feature between classes.

Set candidate feature value $x^{*}$ , selected feature subset C, which includes m selected features $x_{1}, x_{2}, \dots, x_{m}$ . In order to introduce the effect of the joint feature contribution to the degree of inter-class diversity, the candidate feature $x^{*}$ and the m features in the selected feature subset C constitute a m + 1 dimensional vector X called the joint feature vector.

In calculating the distance between joint feature vectors, the concept of Euclidean distance is introduced. Set two m-dimensional joint feature vectors $X_{a} = (x_{1}, x_{2}, \dots, x_{m})$ and $Y_{a} = (y_{1}, y_{2}, \dots, y_{m})$ . The distance of the joint feature subsets $X_{a}$ and $Y_{a}$ is shown in equation (8)

D (X_{a}, Y_{a}) = \sqrt{\sum_{i = 1}^{m} {(x_{i} - y_{i})}^{2}}

(8)

We introduce the joint feature and the distance between the joint feature vectors into the F-score framework to form our FSJF method, and the joint feature distance is used to measure the degree of inter-class density and the degree of density within classes. The joint contribution of the candidate and selected features can be successfully introduced into the evaluation on the feature subset. In addition, the decomposition and ensemble methods (DEMs) are often used to solve the MCP. This method decomposes an MCP into a set of binary classification problems and then integrates the results of each binary classification to solve the MCP. Therefore, when choosing features of MCPs, the feature selection can be carried out, respectively, to each of the binary classifiers, which will increase the computation but can get better performance of multi-class classifier. While FSJF is used as a feature subset evaluation method for binary classification problems, it can also be used to solve the MCPs. Each feature selection of the FSJF method involves only positive and negative classes, so there will not be multi-class overlapping cases in the DFS method. Assuming that the feature subset contains I features, the FSJF method is defined as follows

FSJ {F'}_{I} = \frac{\sqrt{\sum_{i = 1}^{I} {({\bar{x}}_{i}^{(+)} - {\bar{x}}_{i})}^{2}} + \sqrt{\sum_{i = 1}^{I} {({\bar{x}}_{i}^{(-)} - {\bar{x}}_{i})}^{2}}}{\frac{1}{n_{+} - 1} \sum_{k = 1}^{n_{+}} (\sqrt{\sum_{i = 1}^{I} {(x_{ki}^{(+)} - {\bar{x}}_{i}^{(+)})}^{2}}) + \frac{1}{n_{-} - 1} \sum_{k = 1}^{n_{-}} (\sqrt{\sum_{i = 1}^{I} {(x_{ki}^{(-)} - {\bar{x}}_{i}^{(-)})}^{2}})}

(9)

As shown in equation (9), the numerator is the sum of the distance between the joint feature mean vectors on the whole and positive sample datasets, and the distance between the joint feature mean vectors on the whole and negative sample datasets, which is used to measure the degree of inter-class density between the feature subsets. The denominator is the sum of the average distance between the joint feature mean vector and each joint feature vector in positive and negative class sample datasets, which is used to measure the degree of intra-class density of each class on joint feature subset. The larger the numerator value is, the lower the density between the two classes on the joint feature subset is. The smaller the denominator is, the higher the intra-class density on the joint feature subset is. Therefore, the larger the value of equation (9) is, the better the distinction ability of the candidate feature subset between classes is. Figure 1 as shown before plots the schematic diagram of equation (9). For the two-class problem, the mean points of the candidate feature and the feature subset of the selected features on the whole dataset are located on the line of the mean point of the two-class sample dataset. Therefore, we can further simplify equation (9), remove the candidate features and the mean points of the feature subset of the selected features on the whole dataset, and directly use the Euclidean distance between the mean point vectors on the two class of sample datasets as shown in equation (10), and the meaning of each variable is the same as in equation (9)

FSJ F_{I} = \frac{\sqrt{\sum_{i = 1}^{I} {({\bar{x}}_{i}^{(+)} - {\bar{x}}_{i}^{(-)})}^{2}}}{\frac{1}{n_{+} - 1} \sum_{k = 1}^{n_{+}} (\sqrt{\sum_{i = 1}^{I} {(x_{ki}^{(+)} - {\bar{x}}_{i}^{(+)})}^{2}}) + \frac{1}{n_{-} - 1} \sum_{k = 1}^{n_{-}} (\sqrt{\sum_{i = 1}^{I} {(x_{ki}^{(-)} - {\bar{x}}_{i}^{(-)})}^{2}})}

(10)

FSJF-SVM we proposed is a hybrid evaluation method for DFS based on joint feature Fisher criterion and SVM. Combining the feature subset search methods, FSJF is used as the evaluation criterion in the Filter method, SVM is used as the learning algorithm in the evaluation of the Wrapper method, and both of them are used to form the hybrid evaluation method. Algorithm 1 is given as follows.

Algorithm 1: FSJF-SVM hybrid feature subset evaluation method
Input: Training set and test set.
Output: Feature subset C.
1. Set up $S = {f_{i} \| i = 1, \dots, m}$ set for all features, C is the subset of selected feature, initialize $C = \emptyset$ .
2. Use search strategy to search m candidate features, then candidate features are added to subset C separately to get subset $C_{i}$ , $i = 1, 2, \dots, m$ . Use equation (10) to evaluate the discernibility of m new feature subsets, and select the best new subset $C_{i}$ , then let $C = C_{i}$ .
3. Use the features in subset C to train SVM, then use this SVM to classify test set, and record the classification accuracy. 4. If it achieves the termination condition of search strategy, the feature subset with the highest accuracy of SVM will be selected, otherwise back to step 2.

Experimental design

In this article, our FSJF-SVM hybrid feature subset evaluation method aims at the feature selection for the upper limb rehabilitation training motions in the Brunnstrom 4–5 stage. Based on the six kinds of training motions, we use the human upper limb motion collection suit designed by ourselves to collect motion samples of six kinds. F-score-SVM, DFS-SVM, and the proposed method are combined with the SFS and SBS feature search strategies, respectively, to form different feature selection methods. Comparison experiments are carried out to verify the effectiveness of our method.

Motion data acquisition

The motion data of the rehabilitation training are obtained by a wearable human motion acquisition system we developed to acquire the motion data of each joint of human body.²¹ Figure 2 shows the wearing demonstration of the acquisition system, which consists of eight inertial measurement modules which include one head, one waist (or back), two forearms, two upper arms, and two hand modules, respectively. Two data gloves can acquire the actions of 10 fingers, but they are not used in this article. Each acquisition module as shown in Figure 2(c) is mainly made up of a nine-axis inertial measurement unit (IMU) consisting of a three-axis gyroscope, a three-axis accelerometer, and a three-axis magnetometer, which can be used to measure the velocity and accelerate values and magnetic intensity, respectively, and finally the angle values can be calculated. The direct IMU output quaternion is used to represent the human motion data for the real-time calculation process, which can be calculated into the joint angles, including the pitch, yaw, and the roll angles of the neck, waist, and the shoulder joints, and the pitch and roll angles of the elbow joints, respectively.

Figure 2.

The motion acquisition system: (a) and (b) front and back wearing conditions, and (c) one acquisition module of the system.

In the 4–5 stage of Brunnstrom theory, six commonly used sitting training exercises used for training the shoulder, elbow, and waist joints are selected to build motion dataset. Figure 3 shows the starting motion and six groups of training motions, in which the left is the starting motion of all the movements, followed by m1, m2, m3, m4, m5, and m6 movements from left to right. The handshake in the six movements known as Bobath handshake is a common method of neurophysiological therapy, which is widely used in the clinical rehabilitation treatment of stroke and is suitable for most of the upper limb rehabilitation training.

Figure 3.

Start action and six groups of training actions.

We invited 20 subjects (13 men and 7 women, average 170 cm tall and 71 kg) to participate in the experiment and set up a motion sample dataset on the upper limb rehabilitation training. Each motion was performed 10 times per person, and 200 samples were collected for each motion. There were 1200 motion samples in the sample dataset.

Feature extraction

Extracting effective features from the joint angles we collected lays a foundation for feature selection and classifier building. From the 13 angle data of human upper limbs, we extract four time domain features, including mean value (mean), standard deviation (std), peak and peak value (ppv), and above mean point (aboveMean), in which mean represents the mean position of the data of joint angles, std represents the degree of deviation from the average of the joint angles, and ppv represents the range of the variation of the data of the joint angles. The three features and the above mean point (aboveMean), respectively, are expressed as follows

mean = \frac{1}{n} \sum_{i = 1}^{n} x (i)

(11)

std = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} {(x (i) - \bar{x})}^{2}}

(12)

ppv = max_{i} x_{i} - min_{i} x_{i}

(13)

aboveMean = \sum_{i = 1}^{n} I (x_{i} > \bar{x})

(14)

where $I (\cdot)$ is the indicator function, and the function value is 1 when the conditions in parentheses are satisfied.

The data we need are the pitch, yaw, and roll angles of the left shoulder joint; the pitch and roll angles of the left elbow joint; the pitch, yaw, and roll angles of the right shoulder joint; the pitch and roll angles of the right elbow joint; and the pitch, yaw, and roll angles of the waist joint. In this order, we number them from 1 to 13 and construct the feature vector as follows

\begin{matrix} x_{i} = & [mean 1, std 1, ppv 1, aboveMean 1, \dots, mean 13, \\ std 13, ppv 13, aboveMean 13 \end{matrix}]

(15)

Experimental schemes

In order to verify the effectiveness of the FSJF-SVM feature subset discernibility hybrid evaluation method, we first combine the F-score and DFS with SVM algorithm to form F-score-SVM and DFS-SVM hybrid evaluation method, and we combine the F-score-SVM, DFS-SVM, and FSJF-SVM with the SFS and the SBS feature search strategy, respectively, to form two groups of feature selection methods, and then we carry out two groups of comparative experiments.

The first group uses the SFS feature selection search strategy to conduct feature selection as shown in Scheme 1. In the first group of experiments, the FSJF feature subset discernibility hybrid evaluation method is changed into the corresponding evaluation one to form the experimental scheme of the comparative methods mentioned above.

Scheme 1: SFS-FSJF-SVM feature selection
Input: Training set and test set.
Output: Feature subset C.
1. Set up $S = {f_{i} \| i = 1, \dots, m}$ set for all features, C is the subset of selected feature, initialize $C = \emptyset$ .
2. Use equation (10) to evaluate the discernibility of features on the training set, and add the best feature i into subset $C$ , then remove the feature i from $S$ , let $m = m - 1$ .
3. Use the features in subset C to train SVM, then use this SVM to classify test set, and record the classification accuracy. 4. The remaining features of S are added to subset C separately to form feature subsets $C_{i}, i = 1, 2, \dots, m$ , equation (10) is used to evaluate the discernibility of m new feature subsets, and select the best new subset $C_{i}$ , then let $C = C_{i}$ , then remove the feature i from $S$ , let $m = m - 1$ . 5. Use the features in subset C to train SVM, then use this SVM to classify test set, and record the classification accuracy. 6. If $S \neq \emptyset$ , we will select the feature subset with the highest accuracy of SVM as selected feature subset C, otherwise back to step 4.

As shown in Scheme 2, the second group uses the SBS feature selection search strategy to perform feature selection. In the second group of experiments, the FSJF feature subset discernibility hybrid evaluation method is changed into the corresponding evaluation one to form the experimental scheme of the comparative methods.

Scheme 2: SBS-FSJF-SVM feature selection
Input: Training set and test set.
Output: Feature subset C.
1. Set up $S = {f_{i} \| i = 1, \dots, m}$ set for all features, C is the subset of selected feature, initialize $C = S$ . 2. Use the features in subset C to train SVM, then use this SVM to classify test set, and record the classification accuracy. 3. The features of C are removed in turn to form m new subsets $c_{1}, c_{2}, \dots, c_{m}$ . According to equation (10), we evaluate the inter-class discrimination ability of m new subsets and select the best subset $c_{i}$ , set $C = c_{i}$ , $m = m - 1$ . 4. Use the features in subset C to train SVM, then use this SVM to classify test set, and record the classification accuracy. 5. If the features contained in C are more than one, jump back to step 3, otherwise select the feature subset with the highest accuracy of SVM classification as the selected feature subset C.

Experimental results and analyses

As described above, the rehabilitation training motion sample dataset contains six kinds of motions: $m 1, m 2, \dots, m 6$ , which are classified into $class 1, class 2, \dots, class 6$ , respectively. We use the one-vs-one (OVO) strategy to decompose a six-class classification problem and get 15 binary classification problems: ${1 v 2, 1 v 3, 1 v 4, 1 v 5, 1 v 6, 2 v 3, 2 v 4, \dots, 5 v 6}$ .

We use the implementation schemes proposed in the “FSJF evaluation method” section and carry out the feature selection experiment on the above six binary classification problems separately. In order to compare the performance of different feature subset evaluation methods more objectively, the SVM classifier used in the experiment uses the radial basis function (RBF) kernel, and the penalty parameter C and the RBF kernel function parameter g are set to 1 and 0.5, respectively.

In order to obtain the random experimental data, we randomly disorder 1200 samples. Each class of samples is added to five empty sample sets one by one, so that the collected samples are divided into five parts randomly and evenly. Four samples were taken as the training set and the other was taken as the test set. Starting from the first sample, each sample is sequentially used as a test set to finish five experiments. We get the average of accuracy of five experiments to get the result of fivefold cross-validation.

Characteristics of FSJF method

The feature selection process of the 1v2 class by FSJF In the first comparison experiment is taken as an illustration to show the characteristics of the FSJF method. In this process, all the optimal feature subsets of different dimensions from 1 to 52 are chosen out and their corresponding FSJF scores are calculated as shown in Table 1, respectively. For example, when the feature subset capacity is 7, the subset with the highest FSJF scores is {50, 5, 10, 18, 23, 31, 36}, and its score is 3.638946577. It can be seen from Table 1 that when the selection of n + 1 dimension subset is selected, the selected optimal subset of n dimension will be selected as the selected feature subset, and then the candidate features which can constitute the optimal subset of n + 1 dimension are added to the selected feature subset to form the new n + 1 dimension selected feature subset. Finally, according to the SVM classification accuracy of the best feature subset of each dimension on the test set, the subset with the best classification rate is selected as the final optimal feature subset.

Table 1.

The best feature subsets and scores for FSJF of class pair 1v2 of the first experiment.

Feature subset capacity	Selected features	FSJF score
1	50	3.712567834
2	50, 5	3.752037253
3	50, 5, 10	3.751003778
4	50, 5, 10, 18	3.750367291
5	50, 5, 10, 18, 23	3.742037253
6	50, 5, 10, 18, 23, 31	3.722677836
7	50, 5, 10, 18, 23, 31, 36	3.638946577
8	50, 5, 10, 18, 23, 31, 36, 44	3.657874987
9	50, 5, 10, 18, 23, 31, 36, 44, 49	3.687935643
10	50, 5, 10, 18, 23, 31, 36, 44, 49, 24	2.965473821
11	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21	3.751239516
12	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16	3.748302713
13	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37	3.745247198
14	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8	3.74084232
15	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34	3.735847315
16	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22	3.730182461
17	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29	3.700137576
18	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19	3.667816867
19	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13	3.615701485
20	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26	3.564297946
21	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11	3.467158135
22	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25	3.289870025
23	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35	3.128915746
24	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47	2.977353929
25	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32	2.905383898
26	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3	2.795593531
27	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20	2.689588853
28	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9	2.581350788
29	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38	2.474726598
30	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15	2.366861615
31	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14	2.229649609
32	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1	2.111960846
33	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46	1.993466078
34	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2	1.89651043
35	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12	1.813685548
36	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12, 39	1.735346167
37	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12, 39, 17	1.660232239
38	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12, 39, 17, 33	1.550298749
39	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12, 39, 17, 33, 45	1.458040867
40	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12, 39, 17, 33, 45, 6	1.391354755
41	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12, 39, 17, 33, 45, 6, 27	1.331754386
42	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12, 39, 17, 33, 45, 6, 27, 28	1.26293142
43	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12, 39, 17, 33, 45, 6, 27, 28, 4	1.203029492
44	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12, 39, 17, 33, 45, 6, 27, 28, 4, 7	1.144622979
45	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12, 39, 17, 33, 45, 6, 27, 28, 4, 7, 30	1.07967134
46	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12, 39, 17, 33, 45, 6, 27, 28, 4, 7, 30, 42	1.020831737
47	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12, 39, 17, 33, 45, 6, 27, 28, 4, 7, 30, 42, 52	0.962700139
48	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12, 39, 17, 33, 45, 6, 27, 28, 4, 7, 30, 42, 52, 51	0.916104817
49	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12, 39, 17, 33, 45, 6, 27, 28, 4, 7, 30, 42, 52, 51, 41	0.874547373
50	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12, 39, 17, 33, 45, 6, 27, 28, 4, 7, 30, 42, 52, 51, 41, 40	0.821677153
51	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12, 39, 17, 33, 45, 6, 27, 28, 4, 7, 30, 42, 52, 51, 41, 40, 48	0.770811292
52	50, 5, 10, 18, 23, 31, 36, 44, 49, 24, 21, 16, 37, 8, 34, 22, 29, 19, 13, 26, 11, 25, 35, 47, 32, 3, 20, 9, 38, 15, 14, 1, 46, 2, 12, 39, 17, 33, 45, 6, 27, 28, 4, 7, 30, 42, 52, 51, 41, 40, 48, 43	0.800263748

FSJF: Fisher score based on joint feature.

Comparison experiment with SFS method

In Figure 4, four classes of 1v3, 1v5, 1v6, and 3v4 in the experiment are taken as examples to show the classification accuracy curve of the feature subset selected by the F-score-SVM, DFS-SVM, and FSJF-SVM methods that vary with the number of features in selected subset. The change curve of classification accuracy rate is the result of fivefold cross-validation experiment. The classification accuracy and the number of features of the selected optimal feature subset of FSJF-SVM, DFS-SVM, and F-score-SVM methods are given in Tables 2 and 3, respectively.

Figure 4.

Feature selection of three methods in the first set of experiment.

Table 2.

The accuracy of best feature subset of SFS-FSJF-SVM, SFS-DFS-SVM, and SFS-F-score-SVM methods (%).

Class pair	1v2	1v3	1v4	1v5	1v6	2v3	2v4	2v5	2v6	3v4	3v5	3v6	4v5	4v6	5v6
F-score	96.67	93.10	90	90	90	100	100	100	100	93.10	96.55	86.21	96.67	96.67	100
DFS	96.67	100	100	96.67	96.67	100	100	100	100	100	100	93.10	96.67	100	100
FSJF	100	100	100	100	98.87	100	100	100	100	100	100	98.65	100	100	100

SFS: sequential forward search; FSJF-SVM: Fisher score based on joint feature and support vector machine; DFS: discernibility of feature subset.

Table 3.

The number of feature in the best feature subsets of SFS-FSJF-SVM, SFS-DFS-SVM, and SFS-F-score-SVM methods.

Class pair	1v2	1v3	1v4	1v5	1v6	2v3	2v4	2v5	2v6	3v4	3v5	3v6	4v5	4v6	5v6
F-score	1	1	3	2	1	2	3	5	1	1	2	4	8	1	4
DFS	1	16	14	8	13	2	1	2	1	20	21	16	3	5	3
FSJF	1	13	10	7	8	2	1	1	1	16	20	17	3	3	3

SFS: sequential forward search; FSJF-SVM: Fisher score based on joint feature and support vector machine; DFS: discernibility of feature subset.

From Figure 4, we found that the accuracy rate of the FSJF-SVM method increases faster but then falls more slowly than the other ones, and particularly the curve of 1v3 is the most obvious.

For 1v5, 1v6, 3v6, and 4v5 class pairs as shown in Table 2, the classification accuracy on the optimal feature subset selected by FSJF-SVM is 10%, 8.87%, 12.44%, and 3.33% higher than F-score-SVM, and 3.33%, 2.20%, 5.55%, and 3.33% higher than DFS-SVM, which is superior to the two others. For 1v3, 1v4, 3v4, and 3v5 class pairs, the accuracy of the FSJF-SVM method is the same as that of DFS-SVM, which is 6.90%, 10.00%, 6.90%, and 3.45% higher than that of F-score-SVM.

As shown in Table 3, most of the number of features contained in the optimal subset selected by the FSJF-SVM method is smaller than the number of features contained in the optimal subset selected by DFS-SVM. For 1v2, 3v6, and 4v5 class pairs, while the dimension of the feature vectors selected by FSJF-SVM is not smaller than that of DFS-SVM, the classification accuracy of the optimal feature subset selected by FSJF-SVM is 3.33%, 5.55%, and 3.33% higher than DFS-SVM.

Comparison experiment with SBS method

In Figure 5, the four class pairs of 1v3, 1v5, 1v6, and 4v5 are taken as examples to show the classification accuracy of the feature subset selected by the F-score-SVM, DFS-SVM, and FSJF-SVM methods that vary with the number of features in selected subset. The classification accuracy rate is the result of fivefold cross-validation experiment. The classification accuracy and the number of features of the selected optimal feature subset of FSJF-SVM, DFS-SVM, and F-score-SVM methods are given in Tables 4 and 5, respectively.

Figure 5.

Feature selection of three methods in the second set of experiment.

Table 4.

The accuracy of best feature subset of SBS-FSJF-SVM, SBS-DFS-SVM, and SBS-F-score-SVM methods (%).

Class pair	1v2	1v3	1v4	1v5	1v6	2v3	2v4	2v5	2v6	3v4	3v5	3v6	4v5	4v6	5v6
F-score	96.67	93.10	90	90	90	100	100	100	100	93.10	96.55	86.21	96.67	96.67	100
DFS	100	100	100	93.33	96.67	100	100	100	100	96.55	100	93.10	100	100	100
FSJF	100	100	100	96.67	100	100	100	100	100	100	100	96.55	100	100	100

SBS: sequential backward search; FSJF-SVM: Fisher score based on joint feature and support vector machine; DFS: discernibility of feature subset.

Table 5.

The number of feature in the best feature subsets of SBS-FSJF-SVM, SBS-DFS-SVM, and SBS-F-score-SVM methods.

Class pair	1v2	1v3	1v4	1v5	1v6	2v3	2v4	2v5	2v6	3v4	3v5	3v6	4v5	4v6	5v6
F-score	1	1	1	1	1	14	5	6	14	1	2	1	8	7	1
DFS	10	16	14	16	13	13	6	9	12	8	20	18	14	12	15
FSJF	8	1	14	12	14	8	3	7	10	18	20	4	1	5	1

SBS: sequential backward search; FSJF-SVM: Fisher score based on joint feature and support vector machine; DFS: discernibility of feature subset.

From Figure 5, we found that the accuracy rate of the FSJF-SVM method increases faster but then falls more slowly than the other ones, and particularly the curve of 1v5 is the most obvious.

For 1v5, 1v6, 3v4, and 3v6 class pairs as shown in Table 4, the classification accuracy on the optimal feature subset selected by the FSJF-SVM method is 6.67%, 10%, 6.90%, and 10.34% higher than that of F-score-SVM, and 3.34%, 3.33%, 3.45%, and 3.45% higher than that of DFS-SVM, which is superior to the other two methods.

For the class pairs other than 1v4, 1v6, 3v4, and 3v5 as shown in Table 5, the number of features contained in the optimal subset selected by the FSJF-SVM method is smaller than that contained in the optimal subset selected by DFS-SVM. For 1v6 and 3v4 class pairs, while the dimension of the feature vectors selected by FSJF-SVM is 1 and 10 bigger than that of DFS-SVM, the classification accuracy on the optimal feature subset selected by FSJF-SVM is 3.33% and 3.45% higher than that of DFS-SVM.

The two groups of experiments above prove that the proposed FSJF-SVM hybrid feature subset evaluation method can obtain the feature subsets with higher accuracy and smaller feature dimension, which improves its effectiveness and feasibility. Comparing the two sets of experiments, we can also find that this method can get better performance when combined with SBS subset strategy.

Conclusion

In this article, an FSJF-SVM feature subset discernibility hybrid evaluation method for Brunnstrom 4–5 stage upper limb rehabilitation training is presented. The FSJF evaluation method is used as the evaluation criterion in the Filter method, and SVM is used as the learning algorithm in the Wrapper method to form the FSJF-SVM hybrid feature subset evaluation method. The FSJF-SVM, DFS-SVM, and F-score-SVM methods are combined with the SFS and the SBS subset strategy, and the comparison experiments were carried out on the six kinds of upper limb rehabilitation training datasets in the Brunnstrom 4–5 stage. The results show that the hybrid feature subset evaluation method proposed in this article can improve the classification accuracy of the classifier and reduce the dimension of the selected optimal feature vector, and this method can get better performance when combined with SBS subset search strategy.

Footnotes

Handling Editor: Antonio Lazaro

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study is supported by National Natural Science Foundation of China (61873008) and Beijing Natural Science Foundation (4182008).

ORCID iD

Guoyu Zuo

References

Alamri

Shoyaib

Biggers

et al . Applicability of the grip strength and automated von Frey tactile sensitivity tests in the mouse photothrombotic model of stroke. Behav Brain Res 2018; 336: 250–255.

Kwakkel

Kollen

van der Grond

et al . Probability of regaining dexterity in the flaccid upper limb: impact of severity of paresis and time since onset in acute stroke. Stroke 2003; 34(9): 2181–2186.

Mancisidor

Zubizarreta

Cabanes

et al . Kinematical and dynamical modeling of a multipurpose upper limbs rehabilitation robot. Robot Cim-Int Manuf 2018; 49: 374–387.

Trombetta

Bazzanello

Brum

et al . Motion Rehab AVE 3D: a VR-based exergame for post-stroke rehabilitation. Comput Meth Prog Bio 2017; 151: 15–20.

Burdea

Cioi

Martin

et al . The Rutgers Arm II rehabilitation system—a feasibility study. IEEE T Neur Sys Reh 2010; 18(5): 505–514.

Suo

Yang

Clinical observation of Brunnstrom technology in treatment of cerebral infarction hemiplegia patients. Chin J Integr Med Cardio Cerebrovasc Dis 2017; 15(11): 1395–1398 (in Chinese).

Ghaddar

Naoum-Sawaya

High dimensional data classification and feature selection using support vector machines. Eur J Oper Res 2017; 265(3): 993–1004.

Aminanto

Choi

Tanuwidjaja

et al . Deep abstraction and weighted feature selection for Wi-Fi impersonation detection. IEEE T Inf Foren Sec 2018; 13(3): 621–636.

Liu

Toward integrating feature selection algorithms for classification and clustering. IEEE T Knowl Data En 2005; 17(4): 491–502.

10.

Gao

Zhao

et al . Feature selection considering two types of feature relevancy and feature interdependency. Expert Syst Appl 2018; 93: 423–434.

11.

Quinlan

JR.

Induction of decision trees. Mach Learn 1986; 1(1): 81–106.

12.

Gunal

Gerek

Ece

et al . The search for optimal feature set in power quality event classification. Expert Syst Appl 2009; 36(7): 10266–10273.

13.

Dash

Liu

Consistency-based search in feature selection. Artif Intell 2003; 151(1–2): 155–176.

14.

Battiti

Using mutual information for selecting features in supervised neural net learning. IEEE T Neural Networ 1994; 5(4): 537–550.

15.

Kwak

Choi

CH.

Input feature selection by mutual information based on Parzen window. IEEE T Pattern Anal 2002; 24(12): 1667–1671.

16.

Kohavi

John

GH.

Wrappers for feature subset selection. Artif Intell 1997; 97(1–2): 273–324.

17.

Yang

Shen

Ong

et al . Feature selection for MLP neural network: the use of random permutation of probabilistic outputs. IEEE T Neural Networ 2009; 20(12): 1911–1922.

18.

Akay

MF.

Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst Appl 2009; 36(2): 3240–3247.

19.

Xie

Wang

Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases. Expert Syst Appl 2011; 38(5): 5809–5818.

20.

Xie

Several feature selection algorithms based on the discernibility of a feature subset and support vector machines. Chin J Comput 2014; 37(8): 1704–1718 (in Chinese).

21.

Zuo

Gong

Operator attitude algorithm for telerobotic nursing system. Acta Automat Sin 2016; 42(12): 1839–1848 (in Chinese).