Abstract
Background:
Many neurodegenerative diseases affect human gait. Gait analysis is an example of a non-invasive manner to diagnose these diseases. Nevertheless, gait analysis is difficult to do because patients with different neurodegenerative diseases may have similar human gaits. Machine learning algorithms may improve the correct identification of these pathologies. However, the problem with many classification algorithms is a lack of transparency and interpretability for the final user.
Methods:
In this study, we implemented the PS-Merge operator for the classification, employing gait biomarkers of a public dataset.
Results:
The highest classification percentage was 83.77%, which means an acceptable degree of reliability.
Conclusions:
Our results show that PS-Merge has the ability to explain how the algorithm chooses an option, i.e., the operator can be seen as a first step to obtaining an eXplainable Artificial Intelligence (XAI).
Introduction
Neurodegenerative diseases are a global problem, that related with the impairment in the nervous system, producing human gait disorders, memory decline, psychopathology, depressive symptoms, sleep disorders, loss of quality of life, labor incapacity, among others afflictions [5, 54]. There are neurodegenerative diseases that similarly affect gait, including Parkinson’s Disease (PD), Huntington’s Disease (HD), and Amyotrophic Lateral Sclerosis (ALS), which is why subtle aspects should be considered such as slow gait velocity, short step, stride lengths, among other [24, 40]. However, their classification is difficult by simple observation. Therefore, computational approaches arise as carried out in [7, 52] to identify correctly these diseases being less invasive methods [23, 61]; demonstrating be useful to prescribe medical treatment and therapies. In addition, datasets publicly available about gait biomarkers of patients suffering from neurodegenerative diseases are emerged [20, 64]. In this sense, we assume that the PS-merge operator is a promising option for classifying PD, HD, and ALS, but above all, because this algorithm has the added benefit that it can explain to the medical specialist how the disease classification was performed.
Belief merging employs propositional language for combining symbolic information from different sources. As has been reviewed previously in [29], the operator Δ considers as input the belief bases (profile) E, and a new consistent merged belief base Δ(E) is the output. A “belief base” a set of propositional formulae is employed to represent sources; even if the bases are inconsistent, through PS-Merge implementation can be transformed into a consistent belief base with the group representation [26]. In the literature, PS-Merge is an operator that allows for solving different decision problems [8], it has been adjusted [43], the operator has been used with two complex consensus-seeking examples [44], and has been adapted for the diagnosis of oral cancer [26]. Also, PS-Merge can offer an explainable solution to a medical specialist, since the authors consider that, partially, PS-Merge complies with the four principles of XAI defined by Phillips et al.: a) explanation, b) meaningful, c) explanation accuracy, and d) knowledge limits [41].
According to Gunning & Aha, the XAI is a novel approach to introduce novel techniques that help to offers a high degree of pattern recognition, at the same time, is also man-understandable [21]. However, a big issue is the lack of a standard legislative-ethics framework for AI technology implementation [2, 13], even when there are some governance frameworks [66], the first step to guaranteeing such a complex framework is to assure an XAI, the interpretability of XAI helps in impartiality in decision-making [3]. This research focuses on XAI implementation employing the PS-Merge operator for binary classification. The purpose of a binary classification problem is to make a prediction in which the value to be predicted can only have one of two possible values: 1 or 0, true or false, in this study, healthy or diseased. We have considered that many approaches to transforming multiclass to binary classification problems are applicable in the literature, nevertheless, one-against-one or one-versus-one (OVO) and one-against-all or one-versus-all (OVA) being the most commonly used strategies [17, 57].
The study objective is to show that the PS-Merge may be employed to identify the neurodegenerative disease given a dataset of gait biomarkers of patients that suffer PD, HD, and ALS. Every record in the database is then considered as a belief base to be merged using PS-Merge. This research extends the work published in [51]: while the same database is employed, the format in this research is binary, with the purpose of comparing the PS-Merge operator with the Multilayer perceptron (MLP) algorithm. The method for binary transformation uses the mean, since it is the most popular and well-known [12, 62]. Also, PS-Merge is described as an XAI.
The rest of the paper is structured as follows: Section 2 shows related work about neurodegenerative diseases classification. Section 3 depicts the materials and methods. Section 4 exposes experiments and results. Finally, Section 5 shows conclusions and future directions.
Related work
This section is structured into two: in the first subsection, we describe related works employing a public dataset named Gait in Neurodegenerative Disease Database (GaitND-DB) [20], which contains patients data facing PD, HD, and ALS, in the second subsection, some works are showed, employing the PS-Merge operator in classification tasks.
Experiments with GaitND-DB
Xia et al. performed experiments with support vector machine for the classification of ALS vs. healthy controls (Ctrl), achieving a 92.86 % accuracy [58].
Vipani et al. employed a Logical Regression Classifier (LRC), achieving an overall accuracy of 86.05% for multiclass classification, 87.79% for the Ctrl vs. pathological subjects classification, and e 85.22% for HD vs. PD [56].
Ye et al. using Adaptive Neuro-Fuzzy Inference System (ANFIS), obtaining 93.10% for ALS vs. Ctrl, 90.32% for PD vs. Ctrl, 94.44% for HD vs. Ctrl, and 90.63% for {ALS,HD,PD} vs. Ctrl [60].
Gupta et al. implemented a Decision tree, which trained with features of gait time series to identify the neurodegenerative disease HD vs. Ctrl, PD vs. Ctrl, and ALS vs. Ctrl obtaining results of 88.5%, 92.3%, and 96.2%, respectively [23].
Zhao et al. implemented a Long Short-Term Memory (LSTM) neural network. The best accuracies they achieved in ALS vs. Ctrl, PD vs. Ctrl, HD vs. Ctrl, and pathological subjects were 99.57%, 100%, 100%, and 97.88%, respectively [63].
Sreeja & Sujatha implemented Positive, Negative Peak Histogram Analysis (PNPHA), obtaining the best classification accuracies for Ctrl vs. ALS 90.95%, for Ctrl vs. PD 98.25%, and Ctrl vs. HD 91.25% [38].
Beyrami & Ghaderyan proposed a method based on non-negative least squares to classify ALS, PD, and HD, obtaining the best percentages of 100%, 99.78%, and 99.90%, respectively [6].
Nam Nguyen et al. used Multiscale Sample Entropy (MSE) in the data pre-processing, 10-sec time window, and the best results were with the classifier k-nearest neighbors. They obtained the best classification accuracies of 99.90%, 99.80%, 100%, 99.75%, 99.90%, 99.55%, and 99.68% for the Ctrl vs. PD, Ctrl vs. HD, Ctrl vs. ALS, PD vs. HD, PD vs. ALS, HD vs. ALS and {Ctrl vs. PD vs. HD vs. ALS}, respectively [39].
Ghaderyan & Beyrami employed sparse non-negative least squares (NNLS) to classify Ctrl vs. PD, Ctrl vs. ALS, and Ctrl vs. HD, obtaining accuracies of 98% ± 7.91, 97% ± 10.54, and 95% ± 10.54, respectively [18].
Yan et al. compared different algorithms of which, the best performance was achieved using a decision tree, obtaining for the Ctrl vs. ALS, Ctrl vs. HD, and Ctrl vs. PD, accuracies of 82.76%, 91.67%, and 90.32%, respectively [59].
Fraiwan & Hassanin compared different gait features and different algorithms, of them, the best performance was achieved using adaptive boosting (AdaBoost) obtaining for the Ctrl, ALS, PD, and HD classes, accuracies of 98.8%, 98.8%, 98.8%, and 99.4%, [15].
Erdaş et al. implemented a convolutional LSTM (ConvLSTM) to perform the multiclass classification obtaining an accuracy of 89.44%. Instead, binary classification was obtained accuracies of 96.33%, 97.68%, 94.69%, and 95.05% for pathological subjects vs. Ctrl, ALS vs. Ctrl, HD vs. Ctrl, and PD vs. Ctrl, respectively [14].
Mengarelli et al. conducted experiments employing time domain and time-dependent spectral features. They implemented k-nearest neighbor classifier, achieving 100% accuracy for binary classification and 94.84% for a 4-class classification [37].
In the previous works, we can notice that traditional algorithms have been implemented in binary and multiclass classification. In addition, it can be observed in Table 1 that acceptable results have been obtained, however, the algorithms do not explain how they obtained the results.
Results in the state-of-the-art of classification using the GaitND-DB
Results in the state-of-the-art of classification using the GaitND-DB
Borja-Macías & Pozos-Parra defined and proposed PS-Merge operator as an approach that employs the Partial Satisfiability notion [35], also, it considers inconsistent bases and the frequency of each explicit item of beliefs, moreover, it was implemented computationally [8].
Chávez-Bosquez et al. extended PS-Merge for implementing a Belief Revision process and compared it again with two belief revision operators. Results showed a good performance of PS-Merge solving four examples of real-life [10].
Kareem et al. made the first implementation of a belief merging operator with real-world data in oral cancer diagnosis. Results showed that PS-Merge can be used for classification in the machine learning area through the correct classification percentage may be improved with extensions at the operator [26].
Pozos-Parra et al. proposed an open-source software called “Merginator”, which implements three belief merging operators (including PS-Merge). The results of this logic-based tool showed the viability of belief merging to make decisions in various study cases [44].
Velasco-Cétera et al. addressed the medical problem of identifying diabetic neuropathy in gait biomarkers. They used a dataset of 16 study subjects and implemented the PS-Merge operator for binary classification {sick, healthy}. The results showed a percentage of instances higher than 80% when using: belief merging, belief revision, and the notion of partial satisfaction [55].
Material and methods
Dataset
The dataset GaitND-DB [25] employed in this study contains data from subjects diagnosed with PD, HD, ALS, and Ctrl. The subjects’ age and gender distribution are shown in Table 2.
Subjects’ age and gender distribution in GaitND-DB
Subjects’ age and gender distribution in GaitND-DB
For the construction of the GaitND-DB, Hausdorff et al. captured data from subjects walking 77 m long hallway as usual for 5 min. Force-sensitive sensors placed inside the subject’s shoes were used to capture gait data employing. A 300-Hz sampling rate with a 12-bit resolution per sample was used to capture data. In addition, 20 seconds of each record were excluded to minimize the start-up effects. The dataset contains the following 13 attributes [20]: Elapsed time, Left stride interval (seconds), Right stride interval (seconds), Left swing interval (seconds), Right swing interval (seconds), Left swing interval (% of stride), Right swing interval (% of stride), Left stance interval (seconds), Right stance interval (seconds), Left stance interval (% of stride), Right stance interval (% of stride), Double support interval (seconds), and Double support interval (% of stride).
The dataset creation process for the experiments was as follows: Raw data was downloaded from [20], then they were converted to.csv format. Subsequently, the real numbers were converted into binary 1 , adapting them for the PS-Merge algorithm.
The experimental design used was employing OVO and OVA, as the same that [51]: OVO: The idea of this approach consists of a classifier for each pair of classes. Transforms a problem of c classes in c (c - 1)/2 binary problems <i, j>, one for each set of classes {i, j}, where i, j ∈ {1, …, c} and i < j. The binary classifier for an <i, j> case is training with class i and j while the samples of k (k ¬ = i, k ¬ = j) classes are ignored. OVA: is the most popular approach, and it consists of taking a class and learning to discriminate that class from the rest. Transform a problem of c class into c binary problems, so that l = c (l represents binary training subsets). These binary problems are constructed using the i class records as positive examples and the rest as negative examples.
Attribute selection procedure and dataset subsets
Similarly to [51], the GaitND-DB with all the 13 attributes was employed, then, 5 attributes were selected based on the weights of discrete attributes given by the Chi-square method: Double support interval, Right stance interval, Left stride interval, Right stride interval, and Left stance interval. The method is one of the methods that offer better performance in the selection [32, 65]. For this study, the data were converted to binary format, taking as a threshold the arithmetic mean of each column of the dataset GaitND-DB.
OVO and OVA approaches were implemented to obtain the datasets described below. Datasets employing OVO: Ctrl vs. ALS, Ctrl vs. HD, Ctrl vs. PD, ALS vs. HD, ALS vs. PD and PD vs. HD; Datasets employing OVA: Ctrl vs. {ALS,HD,PD}, ALS vs. {Ctrl,HD,PD}, HD vs. {Ctrl,ALS,PD}, and PD vs. {Ctrl,ALS,HD}.
After building the datasets, the random criterion 2/3-1/3 was used to construct the training and testing sets, given that this criterion is widely used in the literature [28, 53].
Algorithms
Multilayer perceptron
MLP is a feed-forward artificial neural network that contains multiple layers [45]. The MLP pseudocode is shown in Algorithm 1. The implementation of MLP was done by using the framework Waikato Environment for Knowledge Analysis (WEKA).
Input: Training set X(0) = (x0, x1, x2, …, x n 0 )
Initialize weights W and thresholds with the Xavier method
For every layer, perform a weighted linear summation
Apply the Sigmoid activation function
Output: Activation of neurons in the output layer out Y = X(out) = (y0, y1, y2, …, y m )
Weights (for (x0, x1, …, x
n
)) are initialized using the Xavier method [19]:
In this study, we implement the MLP algorithm, because it is a basic algorithm of the great variety that exists for classification, i.e., we use it only to compare it with the PS-Merge algorithm, since we consider that PS-Merge is in its stage initial of exploration with various datasets
PS-Merge
According to Borja-Macías & Pozos-Parra [8] and Kareem et al [44], a language
A belief base K is a finite set of propositional formulae of
A belief profile E = {K1, …, K m } is a multiset (bag) of m belief bases.
The classification method implements as usual the training and testing phases; the training (merging) phase is based on the following operator.
if K ∈ P, then w
ps
(K) = w (K); if K = ¬ p, then w
ps
(K) =1 - w
ps
(p); if K = l1 ∨ … ∨ l
n
, then w
ps
(K) = max {w
ps
(l1) , …, w
ps
(l
n
)} and if K = C1 ∧ … ∧ C
m
, then
Once obtained the merged base PS-Merge(E) by the training phase, the testing phase is performed. This second phase, in its turn, is divided into two steps. The first step uses the merged base to provide a diagnosis for every test case; however, given that there is a proportion of test cases with unknown or ambiguous diagnoses, the second step of testing was introduced to obtain a diagnosis and reduce the number of ambiguous diagnoses. For the second step, an extended version of the operator, PS-Merge μ is needed, considering a set of belief constraints μ, where the result must satisfy the constraints, the extended operator is defined as follows:
The method considers every case (belief base) in the dataset as a source of information; thus, every case K is of form l1 ∧ … ∧ l13 → l14 or l1 ∧ … ∧ l5 → l6. In the first formula, the 13 attributes and the class are considered, in the second one only 5 attributes and the class are taken. It is in this second step of the Normal Partial Satisfiability, where the authors consider that the third of the four principles of XAI, is fulfilled: a) explanation, b) meaningful, c)
In this preliminary approximation of XAI, we compare the PS-Merge operator with the MLP algorithm, in that sense, we only use the confusion matrix and the Classification rate or Accuracy (CR) since they are the metrics that consider the PS-Merge operator which is based on logic.
Confusion matrix
It is a metric within supervised learning to observe the behavior of the classification model. The test data allows us to see where there is confusion in the model to classify the classes correctly. In Table 3, the confusion matrix is shown, where the columns indicate classes previously labeled in the dataset and the rows the values that the model predicts. Based on the information shown in the matrix, the following can be indicated:
Confusion matrix
Confusion matrix
CR is the accuracy rate in detecting abnormal or normal behavior. It is normally used when the dataset is balanced to avoid a false sense of good model performance.
Experiments and results
Performance for MLP algorithm
OVO approach
The highest accuracy using MLP and the OVO approach for 5 attributes, was obtained for the case of Ctrl vs. HD as can be seen in Table 4. The confusion matrices for the MLP and the OVO approach using 5 and 13 attributes are given in Tables 5 and 6, respectively.
Results of OVO approach using MLP algorithm
Results of OVO approach using MLP algorithm
Confusion matrices for 5 attributes in OVO approach using MLP algorithm
Confusion matrices for 13 attributes in OVO approach for MLP algorithm
Table 7 shows that the highest accuracy of the MLP and the OVA approach based on 5 and 13 attributes, is obtained for the case of ALS vs. {Ctrl,HD,PD}, and HD vs. {Ctrl,ALS,PD} respectively. The confusion matrices for the MLP and OVA approach using 5 and 13 attributes are provided in Tables 8 and 9, respectively.
Results of OVA approach using MLP algorithm
Results of OVA approach using MLP algorithm
Confusion matrices for 5 attributes in OVA approach for MLP algorithm
Confusion matrices for 13 attributes in OVA approach for MLP algorithm
OVO approach
Table 10 shows the results of the PS-Merge approach for 5 and 13 attributes, the highest percentage was for case ALS vs. PD with 76.22%. The confusion matrices for the PS-Merge and the OVO approach using 5 and 13 attributes are shown in Tables 11 and 12, respectively.
Results of OVO approach for PS-Merge algorithm
Results of OVO approach for PS-Merge algorithm
Confusion matrices for 5 attributes in OVO approach for PS-Merge algorithm
Confusion matrices for 13 attributes in OVO approach for PS-Merge algorithm
In this approach, as we can see in Table 13 the highest percentages were for cases ALS vs. {Ctrl,HD,PD} with 5 attributes and 13 attributes. The confusion matrices for the PS-Merge and the OVA approach using 5 and 13 attributes are given in Tables 14 and 15 respectively.
Results of OVA approach for PS-Merge algorithm
Results of OVA approach for PS-Merge algorithm
Confusion matrices for 5 attributes in OVA approach for PS-Merge algorithm
Confusion matrices for 13 attributes in OVA approach for PS-Merge algorithm
MLP algorithm
In the case of the OVO approach, the highest classification percentage was obtained with the dataset Ctrl vs. HD: 99.86% for 5 attributes. Other competitive results were obtained for 13 attributes employing the dataset ALS vs. PD: 78.74%. In the OVA approach, we note significant results for datasets ALS vs. {Ctrl,HD,PD} with 5 attributes and HD vs. {Ctrl,ALS,PD} with 13 attributes, 83.77%, and 82.21%, respectively.
PS-Merge operator
Even when the results of PS-Merge do not reach the values obtained by other approaches, the method can explain why the algorithm learns or does not learn. For example, for dataset Ctrl vs. HD with 5 attributes, we have a lower percentage of 61.22% compared with 99.86% of MLP. However, it can be observed that there are opposed diagnoses generated by the MLP, v.gr.: for the 5 attributes (Double support interval, Right stance interval, Left stride interval, Right stride interval, and Left stance interval), the test records 3342 and 1754 only meet Left stride interval but the result for record 3342 is healthy and for record 1754 is diseased (see Table 16). PS-Merge in both test records provides diseased diagnosis as a result because after merging the attribute values given for the training dataset, the result for this combination of attribute values is one, i.e. 0 ∧ 0 ∧1 ∧ 0 ∧ 0 → 1. Human expert has a similar behavior when analyzing two cases with the same symptoms, the expert provides the same diagnosis, i.e, PS-Merge meets the XAI principle of explaining their rationale to a human user (in this case medical) [1] in a way more transparent [22].
The explanation for this case is given by the highest logical model, i.e. the domain explanation is not Double support interval, not Right stance interval, Left stride interval, not Right stride interval, and not Left stance interval.
Example of contradictory test records for MLP
Example of contradictory test records for MLP
Most XAI approaches are based on machine learning or deep learning and add an additional post hoc method for interpretation. Two representative post hoc methods are: Local Interpretable Model Agnostic Explanations (LIME) [46] and SHApley Additive Explanations (SHAP) [34]. LIME has the disadvantage of providing explanations that can be really unstable. LIME introduces some randomness to the process of producing explanations given that it samples data points from a Gaussian distribution. If the sampling process is repeated a sufficient number of times, it produces different explanations for a single prediction instance. Concerning SHAP, it is computationally expensive as it computes Shapley values to several features in one prediction instance. This also makes SHAP slow and unworkable to calculate global explanations if there are many cases of prediction [27]. On the other hand, PS-Merge is a method that by construction provides an explanation; in the logic-based testing phase, PS-Merge creates an order in the logic models and then provides the highest logic models as prediction, so explanation is an internal process. The user can know the cases (the highest-ranking logical models) that support the prediction which fulfills the second of the four principles of XAI: a) explanation, b)
As a novel approach to supervised learning in the machine learning field, PS-Merge is still in its early stages of exploration. A limitation of PS-Merge is the computational cost since it orders all the logical models; however, PS-Merge can be applied to multiple domains for a small number of features or it can consider hundreds of features using a supercomputer. To advance our understanding of PS-Merge and enhance its usability, further studies are required, e.g.: computational complexity and resource consumption. As well as improvements, in order to make it more user-friendly, i. e., that the results show reflections with a simpler language giving answers to the following questions how and why are obtained. This can be achieved with front-end software development.
Conclusions and future work
Artificial intelligence has shown great progress with applications in medical diagnosis. However, the need for transparency and comprehensibility in the decision-making process remains paramount. In the literature, many of the results are not man-understandable, i.e. have ‘black box’ mechanisms, since the final user or specialist does not understand how the algorithm chooses an option, which could lead to wrong decisions that can be harmful [50].
In this research, we have obtained significant results in gait biomarker classification using the PS-Merge operator-based approach. It is important to highlight that we obtained 99.86% in the classification of the case Ctrl vs. HD using only 5 attributes, which is a competitive result concerning those reported in the state-of-the-art. Our PS-Merge operator-based approach provides not only high accuracy, but also a unique advantage: interpretability. It allows us to trace the logical paths leading to each classification decision, a feature that is often missing in traditional "black box" models. However, it is important to recognize that our approach did not consistently outperform other methods in terms of precision, which highlights the need for further refinement 2 . The requirement of binary input in PS-Merge is a great limitation because by transforming real numbers to binary ones the search space is reduced and then the solution is not as fine as the original domain. If we analyze the binary dataset, we can observe that the use of the arithmetic mean for the binarization introduces a high imprecision in the new dataset, in consequence, the small accuracy can be avoided using a better criterion of binarization. However, we believe that a mixed approach, which uses the advantages of the two main approaches of AI (connectionist and logic-based), will allow us to create a robust XAI. We must remember that logic-based approaches allowed creating expert systems that allowed tracing the premises to have an explanation of a given conclusion. We believe that PS-Merge combine with machine learning o deep learning methods will allow us to obtain a mixed approach that provides intuitive and robust explanations.
This first approximation to the XAI based on the PS-Merge operator in the classification of gait biomarkers is important, because as Samek et al. say: “the ability to explain the rationale behind one’s decisions to other people is an important aspect of human intelligence”, and in some cases, it is a prerequisite, e.g., when a medical doctor explains the therapy decision to the patients [50].
In future work, we propose the study of classification as follows: To transform ordinal values to binary, one-hot strategy can be used. To implement the PS-Merge operator with real datasets from other areas. To implement other merging operators and compare the results.
