Sage Journals: Discover world-class research

Abstract

Handwriting is a preferred identifier in detecting Alzheimer’s disease that enables diagnosis about people. The aim of this study is to evaluate the handwriting and make the early detection and diagnosis of Alzheimer’s disease with the highest possible prediction rates. In this regard, 9 machine learning algorithms were used. Seven feature selection methods were used to determine the most effective features for Alzheimer’s disease prediction to eliminate unnecessary ones and increase model prediction performance. The models were trained and tested on the DARWIN dataset with both train - test split and cross-validation methods. According to the results, it has been evaluated that the highest performance criterion values are generally achieved when the SHAP is used as the feature selection method. According to the results, the appropriate model that achieved the highest performance values was determined as the hybrid SHAP-Support Vector Machine model with 0.9623 accuracy, 0.9643 precision, 0.9630 recall and 0.9636 F1-Score.

Keywords

Alzheimer’s disease diagnosis feature selection methods machine learning handwriting analysis

Introduction

Diseases that originate from factors such as lifestyle, head trauma, and aging¹ often resulting in cognitive problems, have been frequently encountered in recent years. Since it is known that life expectancy is increasing, it is thought that these diseases will become more common in the future.² Alzheimer’s disease is the most common form of dementia, which is an important cognitive problem and accounts for 60% of it. It is known that the main risk factor for Alzheimer’s disease, which starts quite insidiously and progresses slowly, is old age.³ Alzheimer’s is an extremely challenging neurological disorder that causes patients to become so severe that they cannot fulfill their daily needs.⁴ It is known that the preclinical period of this chronic disease, in which neurons undergo structural deterioration, is approximately 20 years, and the clinical period is 8-10 years.⁵ It is also considered a community disease, as 1 person is diagnosed with the disease every 3 seconds.⁶ Although there is no definitive treatment for this disease, which has a high incidence, early diagnosis can slow down the disease.⁴ Early diagnosis has become very important, especially due to the complexity in the treatment processes of neurodegenerative diseases. This is known to reduce the severity of the disease and improve the quality of life of patients who have problems in the pre-symptomatic phase.⁷ For this reason, developing strategies for early diagnosis has become popular and interest in this field has increased.⁸

When considering Alzheimer’s disease, diagnostic methods such as identification of blood-based protein biomarkers,⁹ odor identification screening¹⁰ and analysis of MR images^11-14 can be used. In addition, handwriting, which requires fine motor skills, is considered an important determinant of Alzheimer’s disease because it is sensitive to age-related impairments in cognitive functioning. When it comes to Alzheimer’s, individuals’ handwriting differs from healthy individuals. Examining handwriting is of interest to researchers in different fields such as physicians, neuroscientists, psychologists and computer scientists and is an important subject of interdisciplinary study. In addition, thanks to developing technologies, there are changes in data collection processes and both static and dynamic data about handwriting can be collected. In this way, real-time diagnosis and monitoring of the disease is possible.¹⁵ Therefore, analysis of handwriting data is thought to facilitate the diagnosis, detection and monitoring of Alzheimer’s disease.¹⁶

Machine learning algorithms have been successfully applied in medicine and healthcare^17-20 to extract useful and understandable information and develop automated solutions. Thanks to its flexibility and scalability, machine learning has the feature of being a useful method for many purposes in the field of healthcare, such as risk classification, analyzing various types of data, diagnosis and classification, and survival predictions.²¹ In particular, by automating the analysis process, it contributes to early diagnosis, allowing timely medical interventions and providing a more in-depth understanding of Alzheimer’s disease.²² The success of this method in classification applications has been proven to be quite high.²³ However, the development of technology has enabled the production of large volumes of data, which has caused the data size and complexity to gradually increase.²⁴ The most important step in a machine learning modeling process is to use data that is clean. Because the performance of machine learning models depends significantly on the quality of the data obtained.²⁵ Feature selection is also 1 of the critical data preprocessing tasks used to improve data quality.²⁶ This task is the process of creating a subset of features by determining the features that are effective on the result among the features in the dataset.²⁷ Feature selection methods are basically divided into 2 categories: filters and wrappers. Wrapper-based methods, which provide better results than filters by considering the relationships of features, have recently achieved significant success in feature selection tasks.²⁸

The aim of this study is to evaluate the handwriting and make the early detection and diagnosis of Alzheimer’s disease with the highest possible prediction rates. The DARWIN dataset²⁹ was used for handwriting data for the accurate identification of patients with and without Alzheimer’s disease. The main contributions of this review article can be summarized as follows:

1. Seven different optimization methods adapted to the feature selection task were used to remove irrelevant and unnecessary features from the process in order to improve classification results and increase the performance of the models by reducing model complexity.

2. Nine different supervised machine learning algorithms, 6 single (Bayes Point Machine, Averaged Perceptron, Logistic Regression, Support Vector Machine, Neural Network, nu- Support Vector Machine and Decision Tree) and 2 ensemble (Boosted Decision Tree, Random Forest, and Decision Jungle), were used to obtain classification prediction results.

3. A total of 72 (9 machine learning models × (7 feature selection algorithms + all features)) classifier models were trained and their performances were evaluated to determine the optimal feature selection method-machine learning algorithm combination and to examine the effects of feature selection methods.

After presenting the introduction, similar studies in the literature are described in the Related Work section. The Methodology and Methods section explains the study design and methods. Experimental Results are then presented, followed by the Explainability Metrics and Interpretability of the Model section. Finally, the paper concludes with the Results and Discussion and Conclusion sections.

Related Work

Handwriting is a characteristic element that allows making certain comments about people. Handwriting, which is a very complex function, varies depending on age and health status. Therefore, analyzing it will guide researchers. This subject, which appeals to a very wide field of study, has become an important research topic for data analysts and computer scientists, especially in health sciences. It is possible to early detect and diagnose Alzheimer’s disease, a very important neurological disease, by examining the handwriting of individuals, especially those in the older age group. For this reason, researchers are interested in diagnosing Alzheimer’s disease through handwriting.

Dhanusha et al⁷ stated that labeled data is not always available when using clinical data, and they used an unsupervised deep learning model for the early diagnosis of Alzheimer’s disease. It has been stated that more successful clustering results are achieved with the proposed optimization-based clustering model than with other standard models. Another study stated the difficulties of data collection³⁰ and used a data augmentation method increase the data size. It was stated that the classification model developed with a convolutional neural network on this data set was successful with 87.04% accuracy, 85.19% precision and 88.89% recall results. In the machine learning study conducted by Önder et al³¹ to diagnose Alzheimer, 4 classification methods, XGBoost, GradientBoost, AdaBoost, and voting, are used on the dataset obtained from the University of California. A prediction accuracy of 85% was achieved equally with the XGBoost and Gradient Boost algorithms, contributing to the diagnosis.

Our literature review shows that, since researchers could not obtain a quality data set, they tried to produce solutions such as unsupervised learning and data augmentation. However, with the development of methods in recent years, it is possible to produce convenient and efficient online handwriting data and to apply feature extraction methods to understand the important features that most contribute to disease prediction. The DARWIN data set is 1 of the well-known data sets used in predicting Alzheimer’s disease. To obtain this data set, an experimental protocol was first presented in the study by Cilia et al² which aims to create a database of handwriting dynamics of subjects with cognitive impairment. DARWIN data set²⁹ includes data for 174 individuals with Alzheimer’s disease and a control group. Former studies that performed the classification task using this dataset are presented in Table 1.

Table 1.

Studies Carried Out on the DARWIN Dataset

Related work	Method/Algorithm	Feature selection	Performance criteria
De Gregorio et al⁸ (2021)	Combination of results by using a majority vote (random forest, logistic regression, K-nearest neighbor, linear discriminant analysis, Gaussian naive Bayes and, support vector machine)	-	Accuracy: 0.91 Precision: 0.83Recall: 1.00
Borra et al²² (2024)	K-nearest neighbors classifier, decision tree classifier, and extra trees classifier	-	Accuracy: 0.849
Subha et al²³ (2022)	Logistic-regression, K-nearest neighbors, support vector machine, decision tree, random forest and adaptive boosting (AdaBoost)	Particle swarm optimization	Accuracy: 0.9057
Ngnamsie et al³² (2023)	KNearest neighbors, random forest, decision tree, logistic regression, Gaussian naive bayes, and support vector machine	Forward and backward feature selection	Accuracy: 0.88 Precision: 0.90Recall: 0.87F1-Score: 0.87
Gattulli and Semeraro⁶ (2023)	Random forest, logistic regression, K-nearest neighbor, linear discriminant analysis, support vector machines, Bayesian networks, Gaussian Naïve Bayes, multilayer perceptron, and learning vector quantization	A feature selection approach was developed based on ANOVA analysis and classification results	Accuracy: 0.8357Precision: 0.9601Recall: 0.8217
Vimaladevi et al³³ (2024)	Linear support vector classifier, random forest classifier, and XGBoost	PCA	Accuracy: 0.9429
Rani, Goel and Singh³⁴ (2024)	Support vector machine, K-nearest neighbour, and random forest	-	Accuracy: 0.9264
Caraveo et al³⁵(2025)	KNN, Naive Bayes and ANN	-	F1 score: 0.96
Gonzalez and Gogovi³⁶ (2025)	Random forest, bootstrap aggregating, extreme gradient boosting, light gradient boosting machine, adaptive boosting and gradient boosting	SHAP	Accuracy: 0.84

De Gregorio et al⁸ carried out an application with a data set containing 175 participants performing 25 different handwriting and drawings to distinguish between healthy individuals and individuals with Alzheimer’s disease. In this study, 6 machine learning algorithms were used and by combining the best classifier results for the tasks, an overall accuracy of 91% was achieved. In the study conducted by Borra et al²² 3 different machine learning-based prediction models were developed and their performances were compared. It was found that the Extra Trees Classifier model, 1 of the models tested on the DARWIN dataset, showed an accuracy of 0.849 in Alzheimer’s diagnosis. Additionally, the results are thought to guide measures to prevent the disease. However, since there is a very high number of attributes in the DARWIN dataset with 450 attributes, there are also studies that focus on feature selection methods in the data pre-processing processes in order to increase the performance of the models developed on this dataset. In the study conducted by Subha et al²³ the importance of feature selection was emphasized, and a Particle swarm optimization feature selection-based hybrid machine learning model was developed in the diagnosis and prediction of Alzheimer’s disease. Results of the application with the DARWIN dataset, show that the best performance was achieved with the Random Forest algorithm, with an accuracy of 0.9057, using 20 features. In the study conducted by Gattulli et al⁶ using the DARWIN data set, 9 different classification models were developed, and it was stated that the proposed model revealed cases that were incorrectly predicted by other classification models. In the study conducted by Ngnamsie et al³² researchers addressed the problem known as the curse of dimensionality caused by the large number of features in the handwritten data set. The study that the proposed method was successful in increasing the performance in the early detection of Alzheimer’s disease. While the dataset maintains its popularity in the detection of Alzheimer’s disease, it has also been used in studies^33,35,36 in recent years with single and ensemble different machine learning algorithms and feature selection methods.

Even though former research shows pathways on feature selection methods used, including forward and backward feature selection, and Particle Swarm Optimization, they are still limited. In order to fill this gap in the literature, this study aims to examine the effects of 6 different nature-inspired wrapper feature selection optimization algorithms (Particle Swarm Optimization, Gray Wolf Optimization, Dragonfly Optimization, Harris Hawks Optimization, Genetic Optimization and Gravitational Optimization) and game theory-based SHAP algorithm on machine learning models.

Methodology and Methods

The methodology proposed in the study for Alzheimer’s diagnosis and detection is presented in Figure 1. The developed methodology basically consists of 5 steps. First, the DARWIN dataset, which was developed for the early prediction of Alzheimer’s disease, was obtained. Then, the data transformation process was applied in the data pre-processing step. Six feature selection methods were applied separately on the dataset ready for analysis, resulting in new dataset refined from unnecessary features. In the model development step, 9 different machine learning models for Alzheimer’s disease prediction are built, using the dataset containing all features and the datasets refined by feature selection methods. The modeling stage was carried out in 2 different approaches. First, the dataset was randomly split and classified using the training-test split method, with 70% allocated for training and 30% for testing. Second, the models were trained by applying k-fold cross-validation for bias and robustness assessment. In the step 5, Alzheimer’s disease prediction models were trained using 9 different two-class classification algorithms on the dataset containing all features and datasets obtained by feature selection methods. In the last step, the performances of the developed models were evaluated with accuracy, precision, recall and F1-Score performance criteria, and the best-performing feature selection-machine learning algorithm combination was selected. In the last step of the study, the evaluation and interpretation of the models were performed. Cross-validation and confidence intervals were used for model evaluation and SHAP analysis was used for interpretability.

Figure 1.

Methodology of the Study

Data Description

The dataset to be used in the study was obtained from the UCI machine learning data repository. The reason for creating the DARWIN dataset is due to its richness in including biological, genetic and clinical data,²⁹ for both individuals with Alzheimer’s disease and the control group. This comprehensive data set allows us to see the variation across individuals’ handwriting information and classify them with machine learning algorithms. The data set contains data from a total of 174 participants; (i) 89 of these individuals with Alzheimer’s disease, and (ii) 85 are healthy individuals. During data collection, individuals engaged with 25 tasks that belong to the categories of graphic tasks, copy tasks, memory tasks, and dictation tasks. In this study, the DARWIN dataset and its associated tasks were selected because they are frequently used in clinical settings to evaluate early signs of cognitive and motor decline, particularly in individuals with neurodegenerative diseases. Their inclusion enhances the clinical relevance of the dataset. Additionally, feature selection and interpretable modeling techniques were employed to increase model transparency and interpretability, thereby supporting the reliability and reproducibility of the study—especially considering the limited number of available samples.

In the feature selection phase of Cilia (2022)²⁹ study from the data for the tasks, 18 features with continuous values were created for each task, as shown in Figure 2. In this way, a total of 450 input features were obtained. The data set also contains a two-class output attribute called ‘class’, which indicates whether each sample is a healthy (H) or patient (P) individual.

Figure 2.

Features of the DARWIN Dataset

The features in the DARWIN dataset are also grouped into 3 main feature sets: time-based features, movement-focused features, and pressure-related features. Time-based features include the time it takes for an individual to complete a task and the time it takes to hold the pen in the air, providing clues about cognitive processing speed. These times can be observed to be longer in Alzheimer’s patients. Movement-related features are criteria that evaluate motor functions such as average writing speed, acceleration, and tremors during writing. Slowing or irregularities in such parameters may indicate cognitive or motor impairments. Pressure-based features analyze the regularity and variation of the force applied by the pen to the surface, which may reveal deteriorations in motor skills. The grouping of features by themes is as follows.³⁶

• Time-related: total_time, air_time, paper_time

• Movement-related: mean_speed_on_paper, mean_speed_in_air, mean_acc_on_paper, mean_acc_in_air, mean_jerk_in_air, gmrt_on_paper, gmrt_in_air, mean_gmrt, num_of_pendown, max_x_extension, max_y_extension, disp_index

• Pressure-related: pressure_mean, pressure_var

Data Preprocessing

Data preprocessing is the cleaning, coding, and transformation processes performed on the data to increase the consistency and success of the model’s result value.³⁷ In this step, data transformation was applied as data pre-processing. The values of the input features in the dataset vary in quite different ranges. For this reason, the MinMax normalization method, 1 of the data transformation methods, was applied to all input attributes using equation (1), and the input features were transformed to take values in the [0–1] range. Here, X denotes the attribute value to be normalized, X_n denotes the new normalized value, X_min denotes the minimum value of the attribute, and X_max denotes the maximum value of the attribute.

X_{n} = \frac{X - X_{\min}}{X_{\max} - X_{\min}}

(1)

Feature Selection

The feature selection process aims to increase accuracy and shorten calculation time by removing irrelevant variables from the dataset.³⁸ Applying feature selection methods, can overcome the overfitting problem, reduce the cost of obtaining attributes, and increase model interpretability. After the data was ready for analysis, 6 wrapper optimization algorithms and a SHAP algorithm based on cooperative game theory were used to determine the most effective features for predicting Alzheimer’s disease. These methods, which are 6 wrapper optimization algorithms inspired by nature, are Particle Swarm Optimization (PSO), Gray Wolf Optimization (GWO), Dragonfly Optimization (DFO), Harris Hawks Optimization (HHO), Genetic Optimization (GO) and Gravitational Optimization (GVO) algorithm. The method based on game theory is the SHAP method. Each feature selection method was applied independently and their results were compared to assess their individual impact on classification performance.

PSO

It is a meta-heuristic method based on swarm movements and has a very simple structure and the number of parameters it requires is small. It is stable because it uses local and global search capabilities together.³⁹ As a working principle, after a certain number of particles are placed in the search area, each particle evaluates the objective function according to its position. Particles determine their movements according to the particles in the swarm. The process ends when each particle completes its movement and a new iteration begins.⁴⁰

GWO

This optimization method is a type of swarm intelligence algorithm. It is based on the hunting processes of wolf packs. These processes consist of pursuit, containment and attack stages.⁴¹ The working principle is based on solving each stage: alpha, beta and delta.⁴² The algorithm is highly preferred because it is parameter-free, user-friendly, and is designed with flexible and adaptable features.⁴¹

DFO

This method, which is based on the static and dynamic swarm movements and swarming behaviors of dragonflies in nature, is stated to be more effective and efficient than other metaheuristic algorithms.⁴³ While their hunting behavior refers to static movements, their one-way flying behavior during migration refers to dynamic movements.⁴⁴ The algorithm determines movements by searching for solutions based on 5 basic behaviors: separation, alignment, adaptation, attraction to food sources, and distraction towards the enemy.⁴⁵

HHO

This algorithm was developed based on the hunting behavior of Harris Falcons, 1 of the most intelligent birds.⁴⁶ In this method, which uses the tracking styles that emerge while tracking the prey, hawk groups show candidate solutions and the location of the prey shows the optimum candidate solution.⁴⁷ It is frequently preferred by researchers due to the successes achieved in different range of problems.⁴⁸

GO

Genetic optimization, a computational method based on the principles and processes in natural genetics, begins its search activities with a random solution set. Then, assignments are made to the objective function and a new population is obtained by applying generation, crossover and mutation operations to the solution population. Due to its high adaptability, it can be easily applied to real-life problems and offers a global perspective.⁴⁹

GVO

This algorithm is based on the theory of gravity. It is a heuristic optimization algorithm that is successful in producing solutions to multimodal problems.⁵⁰ It is subject to the laws of gravity and motion.⁵¹

SHAP

This method is a concept influenced by game theory, SHAP is a feature selection method that determines how much each feature affects the prediction result in machine learning models.^52,53 After the original dataset is entered into the model, the method calculates the contribution value by measuring the importance of each feature, then ranks each feature to find the importance order.⁵⁴ This method, which offers a more consistent and comprehensive approach to evaluating feature importance than classical metrics, aims to determine a small number of meaningful features that most affect the model’s decisions.^52,55

The feature selection steps were implemented in Python 3.6.13 using the Spyder 3.1 development environment. At this stage, the open source ‘zoofs’ library was used. In applying the optimization algorithms, the ‘LightGBM’ Python library⁵⁶ was used for the LightGBM machine learning algorithm. For the parameters of the LightGBM model, the number of leaves was determined as,³¹ the learning rate was 0.1, and the number of estimators was 100. As a result of the experiments, the number of input attributes in the dataset was decreased to 202 with PSO, 24 with GWO, 206 with DFO, 82 with HHO, 128 with GO, and 216 with GVO and 20 with SHAP.

Model Development

For this research, a total of 72 machine learning models were developed by applying 9 different supervised machine learning algorithms separately on the features determined by feature selection methods and on full features, in order to obtain the classifier model that will enable the prediction of Alzheimer’s disease from handwritten information. The two-class classification algorithms used are Bayes Point Machine (BPM), Averaged Perceptron (AP), Logistic Regression (LR), Gradient Boosted Decision Tree (BDT), Support Vector Machine (SVM), Neural Network (NN), Random Forest (RF), nu-Support Vector Machine (nu-SVM) and Decision Tree (DT).

SVM

This kernel-based algorithm, which emerged in 1992, is a supervised learning model and is based on statistical learning theory. In addition, a strong aspect of this algorithm is that it can be used in the development of both classification and regression models. The algorithm that performs analysis on input data recognizes relationships in the hyperplane space. As a working principle, it focuses on speed rather than accuracy.^57,58

BPM

Based on the Bayes principle, this algorithm applies a Bayesian approach using the Support Vector Machine algorithm for linear classification. Therefore, it turns into a non-linear classifier that does not adapt to the training data.⁵⁹

NN

The neural network, which mimics the brain structure, has a network structure consisting of interconnected layers. The neuron, which forms the layers and is the basic unit of the algorithm, takes inputs and creates the output through a function. If the model run in the network structure is repeated with sufficient time, very successful results can be obtained.⁶⁰

AP

This algorithm is the simplest form of a neural network and generates output in response to inputs based on a linear function. The output is created by combining various weights obtained from the feature vector. Despite its simpler structure compared to neural networks, this algorithm has the advantage of producing faster solutions.⁶¹

GBDT

Decision trees are a frequently preferred approach with their easily interpretable structures, used to produce results from existing data.⁶² Boosted Decision Tree is an ensemble model created to reduce the errors of decision trees. The possibility of visually presenting the results obtained from this model increases the understandability of the algorithm, and therefore it is frequently preferred.^59,60

RF

In this method used for classification and prediction, a series of decision trees are combined, and the decision is made according to the voting principle, and the result of the decision tree that receives the most votes is valid. In this process, tests are performed separately on all tree classes, and the process continues by increasing the tree structure level until the result is produced at the leaf node.⁵⁷

DT

Decision trees, which are a very practical method that recursively divides the sample space, consist of root, intermediate and leaf nodes.^63,64 The operation of the algorithm starts from the root node and progresses through intermediate nodes to leaf nodes. Leaves represent classes, and there is only 1 path to each leaf.⁶⁵ Instances are classified from the root to the leaf as a result of tests on this path; the class prediction of the resulting leaf node can be expressed in the form of a rule.⁶⁶ Decision trees can also be reconstructed with rule sets in the IF-THEN format and can handle both nominal and numerical features.^64,66

nu-SVM

This method is an alternative SVM variant to C-SVM (C-Support Vector Machine) proposed by Schölkopf et al (2000).⁶⁷ nu-SVM introduces a parameter called nu, which directly controls both the upper bound of the support vectors and the upper bound of the training errors. This approach increases the flexibility of the model in classification and regression problems, while also facilitating hyperparameter tuning. The ν parameter helps prevent over-learning of the model by adjusting the balance between margin (boundary width) and slack variables (elasticity margin).⁶⁸ It has shown better generalization performance, especially in imbalanced datasets and noisy environments.⁶⁹

LR

This algorithm, based on statistics, is based on estimating the probability of events occurring. This algorithm, which provides probabilistic analysis for categorical data, performs the prediction task through independent variables that directly affect the dependent variable. It also examines the effects of variables on the result by explaining the relationships and interactions through parameter extraction.⁷⁰

The modeling and training procedures were carried out using Python, which is widely used for machine learning applications, along with libraries such as scikit-learn, LightGBM, and zoofs. The models were trained using train - test split and cross-validation. Moreover, cross-validation was applied to assess data bias and improve model robustness. The performance results obtained from the train - test split and cross-validation approaches were compared to evaluate the consistency and generalizability of the models.

Train - Test Split

The models were trained using a 70% training and 30% testing split.The training and testing processes of the models, as well as the determination of the optimal hyperparameters for the algorithms, were carried out within the Python machine learning ecosystem. Hyperparameter values of the developed models are given in Table 2.

Table 2.

Hyperparameter Values Determined for the Train - Test Split Models

Model	Hiperparameter	Train - test split
Model	Hiperparameter	All features	Particle swarm	Grey wolf	Dragon fly	Harris hawks	Genetic	Gravitational	SHAP
BPM	var_smoothing	1e-9	1e-9	1e-9	1e-9	1e-9	1e-9	1e-9	1e-9
AP	Learning rate	0.065	0.567	0.009	0.009	0.0678	0.001	0.009	0.856
AP	max_iter (rounds)	12	50	60	90	7	12	7	778
LR	Solver	L_BFGS	auto	Coordinate_descent_navie	auto	Coordinate_descent_navie	Coordinate_descent_navie	Coordinate_descent_navie	auto
	max_iter	9	55	2	2	7	28	9	28
	Alpha	0.679	0.0256	0.0234	0.0234	0.0167	0.0234	0.0159	0.0256
	Lambda	0.005	0.8274	0.125	0.125	0.569	0.125	0.458	0.9254
GBDT	n_estimators	125	126	12	20	25	120	48	125
	max_depth	4	4	6	10	2	25	6	4
	Learning rate	0.01	0,01	0.01	0.01	0,01	0.009	0.009	0.01
SVM	Kernel type	Anova	dot	Anova	Anova	Anova	Anova	Anova	Anova
SVM	max_iter	85	17	10	8	10	8	10	85
NN	Learning rate	0.01	0.089	0.009	0.009	0.049	0.009	0.009	0.01
	Loss function	SquaredError	CrossEntropy	SquaredError	SquaredError	SquaredError	SquaredError	CrossEntropy	CrossEntropy
	Training cycle	200	195	55	30	2	55	25	200
RF	n_estimators	50	50	70	70	50	52	50	50
	max_depth	18	8	3	3	6	4	6	18
	Criterion	gain_ratio	gini_index	gini_index	gain_ratio	accuracy	gain_ratio	accuracy	gini_index
nu-SVM	Kernel type	Linear	rbf	Linear	rbf	Sigmoid	rbf	Sigmoid	Linear
	nu	0.5	0.5	0.5	0.5	0.5	0.5	0.2	0.5
	Epsilon	0.001	0.001	0.01	0.01	0.01	0.01	0.01	0.001
DT	Criterion	accuracy	gain ratio	gain ratio	gini_index	gini_index	gain ratio	gini_index	gain ratio
	max_depth	124	49	5	5	8	50	50	89
	min_samples_split	2	4	4	2	4	4	4	2

Cross Validation

Cross-validation which is based on data resampling, is a widely used method in the literature for model selection, increasing the generalizability of the results and performance evaluation.⁷¹ The principle of the method is that the dataset is divided into a certain number of subgroups; each subgroup is used as test (validation) data, respectively, while the remaining groups are used for training the model. This process is repeated until all subgroups become the test once. Thus, the model’s performance is measured across various data sections, yielding a more reliable estimate of its overall performance.^72,73 One of the most frequently preferred approaches among cross-validation methods is k-fold cross-validation.⁷⁴ In addition, k-fold cross-validation stands out as a suitable method for evaluating model performance especially in small data sets.^75,76 For this reason, k-fold cross-validation was used in the study. Additionally, because it is frequently preferred in small data sets and provides balanced error-bias performance^75,77 5-fold cross-validation (k = 5) was used. A shuffled data sampling was used because it provides a more unbiased and generalizable performance estimate by reducing ranking bias⁷⁸ and allows the dataset to be randomly mixed. Hyperparameter values of the developed models are given in Table 3.

Table 3.

Hyperparameter Values Determined for the Cross Validation Models

Model	Hiperparameter	Cross validation
Model	Hiperparameter	All features	PSO	GWO	DFO	HHO	GO	GVO	SHAP
BPM	var_smoothing	1e-9	1e-9	1e-9	1e-9	1e-9	1e-9	1e-9	1e-9
AP	Learning rate	0.05	0.567	0.005	0.03	0.005	0.005	0.05	0.05
AP	max_iter (rounds)	550	80	550	555	500	550	5	30
LR	Solver	L_BFGS	L_BFGS	L_BFGS	L_BFGS	L_BFGS	L_BFGS	L_BFGS	auto
LR	max_iter	20	20	20	50	5	20	5	28
GBDT	n_estimators	80	80	80	90	120	90	120	125
	max_depth	4	4	4	4	60	4	6	6
	Learning rate	0.005	0.009	0,01	0.01	0.005	0.01	0.009	0.01
SVM	Kernel type	Anova	Anova	Anova	Anova	Anova	Anova	Anova	Anova
SVM	max_iter	10	80	10	15	10	8	30	30
NN	Learning rate	0.05	0.01	0.01	0.001	0.001	0.009	0.001	0.01
	Loss function	SquaredError	CrossEntropy	SquaredError	SquaredError	SquaredError	SquaredError	CrossEntropy	CrossEntropy
	Training cycle	80	20	50	20	60	50	200	200
RF	n_estimators	50	50	50	98	80	50	50	200
	max_depth	20	20	20	48	40	4	20	70
	Criterion	gain_ratio	gini_index	accuracy	gini_index	gini_index	informationgain	accuracy	informationgain
nu-SVM	Kernel type	Linear	rbf	Sigmoid	Linear	Linear	rbf	rbf	Linear
	nu	0.5	0.2	0.5	0.2	0.5	0.2	0.2	0.2
	Epsilon	0.05	0.001	0.005	0.001	0.001	0.001	0.001	0.001
DT	Criterion	accuracy	gini_index	Information_gain	gini_index	gain ratio	gain ratio	gain ratio	gini_index
	max_depth	2	20	2	5	2	5	20	20
	min_samples_split	2	4	2	2	2	2	4	2

Performance Evaluation

The performance criteria to be used to evaluate and compare the performances of the different models were determined as accuracy, precision, recall and F1-Score, which are widely used for classification models in the literature. Performance measures were calculated using equations (2)-(5).⁷⁹

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(2)

P r e c i s i o n = \frac{T P}{T P + F P}

(3)

R e c a l l = \frac{T P}{T P + F N}

(4)

F 1 - S c o r e = \frac{2 \cdot P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l}

(5)

Here, TP refers to the number of diseased samples predicted as patient, and TN refers to the number of healthy samples predicted to be healthy. FP is the number of healthy individuals predicted to be patients, FN is the number of diseased individuals predicted to be healthy.

Experimental Results

Train and Test Split Results

Figure 3 presents the accuracy values obtained from 9 different machine learning algorithms that are trained using a train - test split for on both full feature sets and features selected by feature selection methods. It can be seen that the SVM model, which is trained on the features selected with the SHAP algorithm, provides the highest accuracy value, with a value of 0.9623. The lowest accuracy value, with a value of 0.4528, was obtained with the AP model trained on all features.

Figure 3.

Accuracy Values of the Train/Test Models

Precision values of the 72 classifier models developed are presented in Figure 4. When the values are analyzed, it is seen that the highest precision value, 0.9643, was obtained by using the SHAP feature selection method and SVM classification algorithm hybridly. The second highest precision value was reached as 0.9443 when the PSO and DFO algorithm was used as the feature selection method and the RF algorithm was used as the classifier algorithm. The lowest precision performance was seen as 0.2353 when the AP algorithm with all features.

Figure 4.

Precision Values of the Train/Test Models

Figure 5 demonstrates the recall performance criterion values obtained for all prediction models trained in the study. The results show that the highest recall value, 0.963, was obtained with SVM trained using the values of the features selected with the SHAP method. The second highest precision value was reached as 0.9443 when the PSO and DFO algorithm was used as the feature selection method and the RF algorithm was used as the classifier algorithm. The lowest precision performance was seen as 0.4615 when the AP algorithm with all features.

Figure 5.

Recall Values of the Train/Test Models

Finally, Figure 6 presents the F1-Score values for the prediction results of the models. The highest F1-Score result, with a value of 0.9636, was achieved by using the SHAP feature selection method and the SVM algorithm together. The second-best prediction performance in terms of F1-Score, with a value of 0.9436, was achieved by using the PSO algorithm feature selection methods and the RF prediction algorithm.

Figure 6.

F1-Score Values of the Train/Test Models

Cross Validation Results

Figure 7 presents the mean accuracy values obtained by testing classifiers that are trained using cross- validation models developed with 9 different machine learning algorithms on all features and on features selected by feature selection methods. It is observed that the SVM model, trained on features selected using the SHAP algorithm, yields the highest mean accuracy value, at 0.9313. The lowest mean accuracy value, with a value of 0.4773, was obtained with the AP model trained on the features selected with the PSO algorithm.

Figure 7.

Mean Accuracy Values of the Cross-Validation Models

The confidence intervals of the models whose accuracy values are presented above are as in Table 4. Table 4 presents the confidence intervals for the prediction models. It is seen that generally, SHAP and GO feature selection resulted in higher lower and upper bounds. In particular, the SVM model used with SHAP stands out with its confidence interval of [0.8911, 0.9713], which shows both high accuracy and low variance. On the other hand, models such as AP and BPM produced lower confidence intervals in most scenarios. This shows that these models work with lower performance and more uncertainty in the dataset.

Table 4.

Confidence Intervals of Accuracy Values

Model	All features	PSO	GWO	DFO	HHO	GO	GVO	SHAP
BPM	[0.7025, 0.9055]	[0.6177, 0.8538]	[0.7385, 0.9158]	[0.7377, 0.8706]	[0.6541, 0.9186]	[0.5100, 0.8106]	[0.7385, 0.9158]	[0.7569, 0.9335]
AP	[0.4000, 0.5435]	[0.4243, 0.5302]	[0.4230, 0.7056]	[0.4025, 0.7496]	[0.4595, 0.5293]	[0.3997, 0.6006]	[0.4230, 0.7056]	[0.6294, 0.7386]
LR	[0.7094, 0.8542]	[0.6473, 0.8341]	[0.7458, 0.8982]	[0.7378, 0.8719]	[0.6709, 0.8806]	[0.7610, 0.8363]	[0.7456, 0.8982]	[0.7779, 0.8997]
GBDT	[0.7119, 0.8857]	[0.7670, 0.8531]	[0.7981, 0.9030]	[0.7110, 0.9551]	[0.7723, 0.8706]	[0.7867, 0.8798]	[0.7981, 0.9030]	[0.7824, 0.9519]
SVM	[0.8793, 0.9371]	[0.8306, 0.9159]	[0.8333, 0.9018]	[0.8299, 0.9407]	[0.8063, 0.9170]	[0.8581, 0.9697]	[0.8333, 0.9018]	[0.8911, 0.9713]
NN	[0.7470, 0.8391]	[0.7503, 0.8701]	[0.7461, 0.8857]	[0.8128, 0.8998]	[0.7140, 0.8140]	[0.7418, 0.8901]	[0.7461, 0.8857]	[0.8423, 0.9280]
RF	[0.8151, 0.9324]	[0.7854, 0.9624]	[0.8189, 0.9615]	[0.8599, 0.9101]	[0.8612, 0.9092]	[0.8047, 0.9074]	[0.8189, 0.9615]	[0.8576, 0.9699]
nu-SVM	[0.8044, 0.8509]	[0.6948, 0.8335]	[0.8353, 0.9000]	[0.7538, 0.9137]	[0.7106, 0.8970]	[0.7508, 0.8475]	[0.8353, 0.9000]	[0.8210, 0.8800]
DT	[0.7960, 0.8815]	[0.6586, 0.8370]	[0.7960, 0.8815]	[0.6320, 0.7242]	[0.7960, 0.8815]	[0.7597, 0.8718]	[0.7960, 0.8815]	[0.7609, 0.9170]

Mean precision values of the classifier models developed are presented in Figure 8. The highest mean precision value, 0.9347, was obtained by using the SHAP feature selection method and SVM classification algorithm together. The second-highest mean precision value was achieved at 0.9443 when the GO algorithm was used as the feature selection method and the SVM algorithm was used as the classifier algorithm.

Figure 8.

Mean Precision Values of the Cross-Validation Models

Table 5 shows the confidence intervals of the precision values of machine learning models working with different feature selection methods. When evaluated in general, the SHAP method, especially with the SWM [0.8927, 0.9765] and RF [0.8644, 0.9687] models, provided the highest and combination precision results. This shows that SHAP makes a strong contribution to correctly predicting the positive class. On the other hand, the AP model produced low precision intervals with all feature selection methods and the robustness in positive predictions was found to be low.

Table 5.

Confidence Intervals of Precision Values

Model	All features	PSO	GWO	DFO	HHO	GO	GVO	SHAP
BPM	[0.7161, 0.9271]	[0.7161, 0.9271]	[0.7402, 0.9196]	[0.7423, 0.8749]	[0.6889, 0.9056]	[0.6986, 0.8743]	[0.6986, 0.8743]	[0.7594, 0.9415]
AP	[0.2595, 0.6165]	[0.0567, 0.6256]	[0.0793, 0.6404]	[0.1082, 0.7347]	[0.1612, 0.5336]	[0.0694, 0.5986]	[0.0793, 0.6404]	[0.7546, 0.8175]
LR	[0.7121, 0.8554]	[0.6883, 0.8233]	[0.7487, 0.8991]	0.7400, 0.8722]	[0.6802, 0.8827]	[0.7664, 0.8497]	[0.7487, 0.8991]	[0.7953, 0.9072]
GBDT	[0.7112, 0.8869]	[0.7889, 0.8513]	[0.7994, 0.9059]	[0.7189, 0.9588]	[0.7963, 0.8701]	[0.7975, 0.8816]	[0.7994, 0.9059]	[0.8234, 0.9393]
SVM	[0.8804, 0.9389]	[0.8550, 0.9123]	[0.8371, 0.9048]	[0.8317, 0.9429]	[0.8106, 0.9193]	[0.8582, 0.9698]	[0.8371, 0.9048]	[0.8927, 0.9765]
NN	[0.7471, 0.8397]	[0.7862, 0.8703]	[0.7619, 0.8904]	[0.8170, 0.9054]	[0.7194, 0.8168]	[0.7680, 0.9050]	[0.7619, 0.8904]	[0.8541, 0.9307]
RF	[0.8297, 0.9276]	[0.7863, 0.9630]	[0.8255, 0.9633]	[0.8600, 0.9113]	[0.8619, 0.9173]	[0.8193, 0.9111]	[0.8255, 0.9633]	[0.8644, 0.9687]
nu-SVM	[0.8093, 0.8734]	[0.7035, 0.8331]	[0.8368, 0.9058]	[0.7569, 0.9218]	[0.7237, 0.8958]	[0.7670, 0.8545]	[0.8368, 0.9058]	[0.8251, 0.8903]
DT	[0.8007, 0.8895]	[0.7130, 0.8330]	[0.8007, 0.8895]	[0.6335, 0.7498]	[0.8007, 0.8895]	[0.7654, 0.8767]	[0.8007, 0.8895]	[0.7682, 0.9266]

Mean recall values of the classifier models developed are presented in Figure 9. The results show that the highest mean recall value of 0.9318, was obtained with a SVM Machine trained using the values of the features selected with the SHAP method. The Averaged Perceptron algorithm with the DFO method showed the lowest performance.

Figure 9.

Mean Recall Values of Cross-Validation Models

Table 6 shows the recall confidence intervals obtained with different feature selection methods. The highest and most consistent sensitivity values were generally obtained with SHAP, GO and GVO methods. Especially SVM [0.8943, 0.9693] and RF [0.8552, 0.9704] models stand out with SHAP method. AP, on the other hand, showed poor performance by showing low sensitivity in all methods.

Table 6.

Confidence Intervals of Recall Values

Model	All features	PSO	GWO	DFO	HHO	GO	GVO	SHAP
BPM	[0.7084, 0.8970]	[0.6055, 0.8544]	[0.7396, 0.9165]	[0.7384, 0.8739]	[0.6641, 0.9137]	[0.5055, 0.8025]	[0.7396, 0.9165]	[0.7581, 0.9352]
AP	[0.4343, 0.5251]	0.4468, 0.5289]	[0.3961, 0.6983]	[0.3871, 0.7413]	[0.4991, 0.5029]	[0.4162, 0.6033]	[0.3961, 0.6983]	[0.6467, 0.7345]
LR	[0.7112, 0.8532]	[0.6819, 0.8225]	[0.7480, 0.8996]	[0.7393, 0.8717]	[0.6669, 0.8818]	[0.7671, 0.8332]	[0.7480, 0.8996]	[0.7812, 0.9039]
GBDT	[0.7119, 0.8863]	[0.7695, 0.8507]	[0.7962, 0.9051]	[0.7111, 0.9560]	[0.7780, 0.8678]	[0.7790, 0.8829]	[0.7962, 0.9051]	[0.7905, 0.9488]
SVM	[0.8795, 0.9398]	[0.8352, 0.9140]	[0.8372, 0.9027]	[0.8303, 0.9412]	[0.8080, 0.9197]	[0.8587, 0.9703]	[0.8372, 0.9027]	[0.8943, 0.9693]
NN	[0.7461, 0.8414]	[0.7503, 0.8725]	[0.7463, 0.8876]	[0.8125, 0.9008]	[0.7115, 0.8135]	[0.7498, 0.8893]	[0.7463, 0.8876]	[0.8491, 0.9247]
RF	[0.8220, 0.9296]	[0.7874, 0.9615]	[0.8214, 0.9598]	[0.8599, 0.9083]	[0.8570, 0.9124]	[0.8078, 0.9067]	[0.8552, 0.9704]	[0.8552, 0.9704]
nu-SVM	[0.8040, 0.8568]	[0.6940, 0.8353]	[0.8347, 0.9054]	[0.7529, 0.9165]	[0.7162, 0.8974]	[0.7556, 0.8502]	[0.8368, 0.9058]	[0.8165, 0.8861]
DT	[0.7980, 0.8764]	[0.6960, 0.8259]	[0.7980, 0.8764]	[0.6310, 0.7287]	[0.7980, 0.8764]	[0.7634, 0.8680]	[0.7980, 0.8764]	[0.7633, 0.9164]

Mean F1-Score values of the classifier models developed are presented in Figure 10. The highest mean F1-Score result with a value of 0.9332, was achieved by using the SHAP feature selection method and the SVM algorithm together. The second-best mean F1-Score, with a value of 0.9142, was achieved by using the HHO and GO and SVM algorithm.

Figure 10.

Mean F1-Score Values of Cross Validation Models

Table 7 shows the F1 score confidence intervals obtained with different feature selection methods. In particular, the combinations of SHAP with SVM [0.8935, 0.9729] and SHAP with RF [0.8598, 0.9695] demonstrated the most robust and consistent performance, as indicated by their high scores and narrow confidence intervals. This shows that SHAP and GO methods are particularly effective in improving model performance and stability.

Table 7.

Confidence Intervals of F1-Score Values

Model	All features	PSO	GWO	DFO	HHO	GO	GVO	SHAP
BPM	[0.7124, 0.9115]	[0.6786, 0.8436]	[0.7399, 0.9180]	[0.7404, 0.8743]	[0.6763, 0.9096]	[0.5924, 0.8325]	[0.7399, 0.9180]	[0.7588, 0.9382]
AP	[0.3218, 0.5790]	[0.2100, 0.5471]	[0.1885, 0.6602]	[0.2075, 0.7312]	[0.2706, 0.5219]	[0.1911, 0.5877]	[0.1885, 0.6602]	[0.7064, 0.7630]
LR	[0.7117, 0.8543]	[0.6852, 0.8228]	[0.7484, 0.8994]	[0.7396, 0.8719]	[0.6735, 0.8822]	[0.7669, 0.8412]	[0.7484, 0.8994]	[0.7883, 0.9054]
GBDT	[0.7115, 0.8866]	[0.7794, 0.8506]	[0.7978, 0.9055]	[0.7151, 0.9573]	[0.7872, 0.8688]	[0.7884, 0.8820]	[0.7978, 0.9055]	[0.8073, 0.9433]
SVM	[0.8800, 0.9393]	[0.8454, 0.9128]	[0.8371, 0.9038]	[0.8310, 0.9420]	[0.8093, 0.9195]	[0.8585, 0.9700]	[0.8371, 0.9038]	[0.8935, 0.9729]
NN	[0.7466, 0.8405]	[0.7686, 0.8707]	[0.7542, 0.8888]	[0.8149, 0.9029]	[0.7159, 0.8146]	[0.7595, 0.8962]	[0.7542, 0.8888]	[0.8518, 0.9275]
RF	[0.8259, 0.9286]	[0.7869, 0.9622]	[0.8235, 0.9615]	[0.8600, 0.9097]	[0.8596, 0.9148]	[0.8141, 0.9082]	[0.8235, 0.9615]	[0.8598, 0.9695]
nu-SVM	[0.8072, 0.8644]	[0.6988, 0.8341]	[0.8358, 0.9056]	[0.7549, 0.9191]	[0.7200, 0.8966]	[0.7618, 0.8517]	[0.8358, 0.9056]	[0.8213, 0.8877]
DT	[0.7996, 0.8827]	[0.7047, 0.8291]	[0.7996, 0.8827]	[0.6326, 0.7387]	[0.7996, 0.8827]	[0.7645, 0.8722]	[0.7996, 0.8827]	[0.7660, 0.9212]

Explainability Metrics and Interpretability of Model

SHAP analysis was used to quantify the impact of features on the outcome. In addition, the statistical reliability of the model’s performance was evaluated by calculating 95% confidence intervals for accuracy, precision, recall, and F1 score. Thus, the model was examined in terms of both explainability and reliability.

Explainability Metrics

In this study, the SHAP algorithm was used to increase the interpretability of the classification model. SHAP is an effective explanation method that quantitatively measures the contributions of individual features to the model and thus makes the decision-making process.⁸⁰ This method was also used for feature selection in the study.

Motivated by the study of Ahmed et al (2025),⁸¹ 20 features were selected for model training. The themes and details of the features are presented in Table 8. Among those features, 2 belong to pressure-related, 5 belong to movement-related, and the rest belong to time-related themes. The details of these features are presented in Table 8.

Table 8.

Selected Features and Descriptions

Features	Description
Pressure-related
pressure_var19	Variance of the pressure levels exerted by the pen tip for task 19
pressure_mean4	Average of the pressure levels exerted by the pen tip for task 4
Movement-related
max_y_extension2	Maximum horizontal distance covered during writing for task 2
mean_jerk_on_paper15	Average jerk of on-paper movements for task 15
max_x_extension23	Maximum vertical distance covered during writing for task 23
num_of_pendown19	Total number of times the pen touched the paper for task 19
mean_jerk_in_air3	Average jerk of in-air movements for task 3
Time-related
air_time5, air_time6, air_time23	Time spent to perform in-air movements for task 5, 6 and 23
paper_time8, paper_time12, paper_time19	Time spent to perform on-paper movements for task 8, 12 and 19
total_time6, total_time9, total_time15, total_time17, total_time22, total_time23	Total time spent to perform the entire task for task 6, 9, 15, 17, 22 and 23

Next, the importance of these attributes is evaluated and presented in Table 9. It is seen that ‘total_time23’ has the highest importance with a value of 0.9304, followed by ‘num_of_pendown19’ with a value of 0.8092, ‘total_time15’ with a value of 0.7408, ‘air_time6’ with a value of 0.3655, and ‘pressure_mean4’ with a value of 0.3226. Among the 20 attributes, the 1 that has the least effect on the result is the ‘pressure_var19’ attribute with a value of 0.1027.

Table 9.

Importance Values for Selected Attributes With SHAP Analysis

Feature	Importance
‘total_time23’	0.9304
‘num_of_pendown19’	0.8092
‘total_time15’	0.7408
‘air_time6’	0.3655
‘pressure_mean4’	0.3226
‘total_time17’	0.2931
‘mean_jerk_in_air3’	0.2737
‘paper_time19’	0.2366
‘air_time5’	0.2060
‘paper_time12’	0.1966
‘total_time9’	0.1754
‘total_time6’	0.1415
‘paper_time8’	0.1290
‘total_time22’	0.1287
‘max_x_extension23’	0.1229
‘air_time23’	0.1190
‘mean_jerk_on_paper15’	0.1147
‘max_y_extension2’	0.1132
‘pressure_var19’	0.1027

The attribute ‘num_of_pendown19’, which contributed the most to the result, is under the movement-related category. This category signals that the patient has a cognitive and motor skill disorder.³⁶ Detection of motor skill disorders, such as tremors in Alzheimer’s patients, is extremely important for early diagnosis and intervention. Detection of this attribute paves the way for holistic approaches that can slow down the progression of the disease.⁸² Moderate physical activities can support cognitive and motor functions. Therefore, programs that combine physical and cognitive exercises can be offered to individuals affected by the identified attribute, contributing to the preservation and development of motor skills.^83,84

Time-related features are critical as they reflect cognitive slowdown, attention deficit, and executive function impairments in the early stages of Alzheimer’s disease. These features are not only a diagnostic tool, but can also guide the construction of individualized digital intervention systems.⁸⁵ The time spent in the air (air_time) reflects the cognitive load of the individual in the process of planning and initiating the next movement. Prolongation of this time has been associated with a slowdown in executive functions and distraction.⁸⁶ An increase in time on paper (paper_time) indicates psychomotor slowdown and regression in motor skills. Significant prolongation of task completion times (total_time) is parallel to impairments in decision-making, sequence tracking, and attention maintenance processes in Alzheimer’s patients.⁸⁷ Such time-based data provides diagnostic support in the early stages by capturing cognitive dysfunctions in an objective and task-based manner.

Pressure-related features reveal subtle changes in motor skills in Alzheimer’s disease. Increased pressure variance (pressure_var) in Alzheimer’s patients indicates loss of control in the hand muscles.⁸⁴ This causes the writing activity to become more irregular and weak. Pen pressure levels (pressure_mean) can clearly show how mental decline is reflected in movements.⁸⁵ By monitoring these variables, Alzheimer’s disease can be diagnosed early, and personalized prevention approaches can be offered

Interpretability of Model

To minimize the bias caused by the small-size data set and to enhance models’ robustness, cross-validation is recommended.⁸⁸ Cross-validation reduces the risk of over-fitting.⁸⁹ Since the size of the DARWIN data set considered in the study is relatively small, cross-validation is used instead of the train - test split method. Although the results obtained with the initial 70% train - 30% test data split showed that the models performed at a reasonable level, the cross-validation application did not provide a significant increase in accuracy. It is possible for cross-validation to show poor performance measurements, but even in this case, it provides important insight into the generalization abilities of the models.⁹⁰ In this study, the confidence intervals of the performance metrics are calculated through 5-fold cross-validation. SHAP and SVM, run with 70% train - 30% test separation, achieved superior performance with accuracy 96.23%, precision 96.43%, recall 96.30%, and F1 score: 96.36%. The 95% confidence intervals for performance metrics with 5-fold cross-validation are as follows: accuracy [0.891, 0.971], precision [0.8928, 0.9766], recall [0.8943, 0.9694], and F1-Score [0.8936, 0.9729].

Conclusion

A definitive treatment has not yet been developed for Alzheimer’s disease, which is generally known as the disease of the elderly and can cause dementia by damaging brain cells. However, it is known that accurate monitoring of symptoms can prevent the progression of the disease process. This makes early diagnosis of Alzheimer’s disease an important element. Managed by the brain, handwriting is an ability that is affected by neurodegenerative diseases. Because neurodegenerative diseases such as Alzheimer’s, Schizophrenia, Parkinson’s disease, and cognitive disorders affect kinetic movement and result in some changes in an individual’s handwriting. For these reasons, handwriting, which contains individual characteristics, is used as an important determinant for the diagnosis of Alzheimer’s disease. This research was carried out using the DARWIN data set, which was developed for Alzheimer’s disease prediction and obtained during the performance of handwriting tasks. The main objective of the study was to determine the most appropriate feature selection method-machine learning algorithm combination in order to achieve the highest performance for Alzheimer’s disease prediction. In this regard, 7 different optimization-based algorithms were used to determine the most suitable features. In addition, 9 different classification algorithms were used for training purposes in the developed models.

The performance of models was compared using both train - test split and cross-validation methods. The evaluation phase was carried out according to the performance criteria of accuracy, precision, recall and F1-Score. In addition, SHAP was used to determine the importance levels of the features on the prediction results. Confidence intervals were used to evaluate the statistical validity, consistency and reliability of the model. When the resulting performance criteria were examined (Figures 3-10), it was seen that the feature selection methods generally increased the performance of the models. The most successful results were obtained with SHAP and SVM models across all performance metrics using the train/test split method. When the performance metrics were evaluated;

• Accuracy is a performance metric that is quite important when the classes are balanced and measures the overall accuracy rate. For the model, the accuracy was 0.9623.

• The precision metric, which expresses the accuracy of positive predictions — in this study, the ratio of patients prediction to have Alzheimer’s disease who have Alzheimer’s disease — is 0.9643.

• The recall metric value, which shows the detection rate of true positives, the correct detection of Alzheimer’s patients, is 0.9630.

• The F1-score, which balances between the accuracy of the model and the errors, is 0.9636.

The findings are consistent with recent studies in the field.^6,23,32 Similar to the approach adopted by Ahmed et al (2025),⁸¹ 20 key features were selected in this study. Furthermore, consistent with other research,^84-87 the results emphasize that handwriting-based indicators—such as temporal delays, reduced pressure variability, and spatial irregularities—can effectively distinguish healthy individuals from those with early-stage Alzheimer’s disease. The high predictive accuracy and interpretability obtained from the proposed model suggest that handwriting-based machine learning systems may serve as low-cost, non-invasive screening tools suitable for use in primary care or remote settings. Additionally, the integration of SHAP not only enhances model transparency but also supports the development of personalized monitoring and intervention systems by identifying clinically relevant features.

The study contributes to the literature in multiple ways. First, the study develops a highly accurate machine learning model for Alzheimer’s disease prediction, which contributes to the early diagnosis. Second, the SHAP feature selection technique is integrated into the machine learning prediction model to increase the model’s explainability with less number of features. The reduction in the number of features in disease prediction reduces the time and the cost of prediction. Third, bias and robustness effects, which are often overlooked in prediction models, have been taken into account through cross-validation.

These contributions may enable the acceleration of early diagnosis processes in clinical applications and the making of correct decisions. However, in order for the obtained results to be used more efficiently in clinical applications, it is aimed to collect and analyze real-time data as suggested in the study conducted by Vessio,¹⁵ test the effectiveness of the model in clinical applications, and provide personal support by developing individual clinical support applications.

Footnotes

ORCID iD

Deniz Demircioglu Diren

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The dataset used in this study, DARWIN, is publicly available at the UCI Machine Learning Repository: .

References

Javeed

Dallora

Berglund

Ali

Anderberg

. Machine learning for dementia prediction: a systematic review and future research directions. J Med Syst. 2023;47(1):17. doi:10.1007/s10916-023-01906-7

Cilia

Fontanella

Di Freca

. An experimental protocol to support cognitive impairment diagnosis by using handwriting analysis. Procedia Comput Sci. 2018;141:466-471. doi:10.1016/j.procs.2018.10.141

Blennow

de Leon

Zetterberg

. Alzheimer’s disease. Lancet. 2006;368(9533):387-403. doi:10.1016/S0140-6736(06)69113-7

Erdogmus

Kabakus

. The promise of convolutional neural networks for the early diagnosis of the Alzheimer’s disease. Eng Appl Artif Intell. 2023;123:106254. doi:10.1016/j.engappai.2023.106254

Masters

Bateman

Blennow

Rowe

Sperling

Cummings

. Alzheimer’s disease. Nat Rev Dis Primers. 2015;1(1):1-18. doi:10.1038/nrdp.2015.59

Gattulli

Impedovo

Pirlo

Semeraro

. Handwriting task-selection based on the analysis of patterns in classification results on Alzheimer dataset. In: Proceedings of the IEEE International Conference on Sustainable Data Science (SDS 2023), Workshop on Data Science Techniques on Data for Neurodegenerative Diseases and Mental Disorders. October 2023:18-29. IEEE.

Dhanusha

Kumar

Musirin

Abdullah

HMA

. Chaotic chicken swarm optimization-based deep adaptive clustering for Alzheimer disease detection. Pervasive Computing and Social Networking: Proceedings of ICPCSN. 2022;2021:709-719. doi:10.1007/978-981-16-5640-8_53

De Gregorio

Desiato

Marcelli

Polese

. A multi classifier approach for supporting Alzheimer’s diagnosis based on handwriting analysis. In: Loog

Yang

Shan

, et al. eds. Pattern Recognition. ICPR International Workshops and Challenges: Virtual Event, January 10–15, 2021; Proceedings, Part I. Lecture Notes in Computer Science. Vol 12661. Cham, Switzerland: Springer International Publishing; 2021:559-574. doi:10.1007/978-3-030-68763-2_43

Doecke

Laws

Faux

, et al. Blood-based protein biomarkers for diagnosis of Alzheimer disease. Arch Neurol. 2012;69(10):1318-1325. doi:10.1001/archneurol.2012.1282

10.

Quarmley

Moberg

Mechanic-Hamilton

, et al. Odor identification screening improves diagnostic classification in incipient Alzheimer’s disease. J Alzheimers Dis. 2917;55(4):1497-1507. doi:10.3233/jad-160842

11.

Agarwal

Dutta

Agrawal

Mehra

Mehta

. Hybrid nature-inspired algorithm for feature selection in Alzheimer detection using brain MRI images. Int J Comput Intell Appl. 2022;21(03):2250016. doi:10.1142/S146902682250016X

12.

Deshmane

Yadav

Bendre

. Intelligent system for brain disease diagnosis using rotation invariant features and fuzzy neural network. In: 2022 6th International Conference on Computing, Communication, Control and Automation ICCUBEA; 26-27 August 2022; Pune, India:1-6. doi:10.1109/ICCUBEA54992.2022.10010752

13.

Topannavar

Yadav

Bendre

. Automated Alzheimer’s disease detection with optimized fuzzy neural network. In: Shinde

Bendre

Hemanth

Balafar

, eds. Applied Artificial Intelligence: A Biomedical Perspective. Boca Raton, FL: CRC Press; 2023:165-178.

14.

Kaya

Çetin-Kaya

. A novel deep learning architecture optimization for multiclass classification of Alzheimer’s disease level. IEEE Access. 2024;12:46562-465681.

15.

Vessio

. Dynamic handwriting analysis for neurodegenerative disease assessment: a literary review. Appl Sci. 2019;9(21):4666. doi:10.3390/app9214666

16.

Bensalah

Parziale

De Gregorio

, et al. I can’t believe it’s not better: in-air movement for Alzheimer handwriting synthetic generation. : In: Díaz

Fairhurst

Plamondon

, eds. In: International Graphonomics Conference. Cham, Switzerland: Springer Nature Switzerland; 2023:136-148. doi:10.48550/arXiv.2312.05086

17.

Gumussoy

Haylaz

Duman

, et al. Automatic segmentation of the infraorbital canal in CBCT images: anatomical structure recognition using artificial intelligence. Diagnostics. 2025;15(13):1713. doi:10.3390/diagnostics15131713

18.

Moran

Altilar

Ucar

Bilgin

Bozkurt

. Deep transfer learning for chronic obstructive pulmonary disease detection utilizing electrocardiogram signals. IEEE Access. 2023;11:40629-40644. doi:10.1109/ACCESS.2023.3269397

19.

Oğur

Kotan

Balta

, et al. Detection of depression and anxiety in the perinatal period using Marine Predators Algorithm and kNN. Comput Biol Med. 2023;161:107003. doi:10.1016/j.compbiomed.2023.107003

20.

Mutlu

Çetinel

Gül

. A fully-automated computer-aided breast lesion detection and classification system. Biomedical Signal Processing and Control. 2020;62:102157. doi:10.1016/j.bspc.2020.102157

21.

Ngiam

Khor

. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 2019;20(5):e262-e273. doi:10.1016/s1470-2045(19)30149-4

22.

Borra

Vahini

Reddy

, et al. Machine learning approaches for Alzheimer’s disease prediction from Darwin dataset. J Interdiscip Cycle Res. 2024;26(1):661-673. doi:10.1049/icp.2025.0839

23.

Subha

Nayana

Selvadass

. Hybrid machine learning model using particle swarm optimization for effectual diagnosis of Alzheimer’s disease from handwriting. In: 2022 4th International Conference on Circuits, Control, Communication and Computing (I4C); 21-23 December 2022; Bangalore, India:491-495. doi:10.1109/I4C57141.2022.10057948

24.

Zebari

Abdulazeez

Zeebaree

Zebari

Saeed

. A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction. J Appl Sci Technol Trends. 2020;1(1):56-70. doi:10.38094/jastt1224

25.

Nguyen

, et al. Influence of data splitting on performance of machine learning models in prediction of shear strength of soil. Math Probl Eng. 2021;2021:1-15. doi:10.1155/2021/4832864

26.

Agrawal

Abutarboush

Ganesh

Mohamed

. Metaheuristic algorithms on feature selection: a survey of one decade of research (2009-2019). IEEE Access. 2021;9:26766-26791. doi:10.1109/ACCESS.2021.3056407

27.

Pan

Chu

. Improved binary grey wolf optimizer and its application for feature selection. Knowl Base Syst. 2020;195:105746. doi:10.1016/j.knosys.2020.105746

28.

Zhang

Liu

Wang

Chen

. Boosted binary Harris hawks optimizer and feature selection. Eng Comput. 2021;37:3741-3770. doi:10.1007/s00366-020-01028-5

29.

Cilia

De Gregorio

De Stefano

Fontanella

Marcelli

Parziale

. Diagnosing Alzheimer’s disease from on-line handwriting: a novel dataset and performance benchmarking. Eng Appl Artif Intell. 2022;111:104822. doi:10.1016/j.engappai.2022.104822

30.

Dao

El-Yacoubi

Rigaud

. Detection of Alzheimer disease on online handwriting using 1d convolutional neural network. IEEE Access. 2022;11:2148-2155. doi:10.1109/ACCESS.2022.3232396

31.

Önder

Şentürk

Polat

, et al. Diagnosis of Alzheimer’s disease using boosting classification algorithms. In: 2023 International Conference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE); 01-02 November 2023; Chennai, India. doi:10.1109/RMKMATE59243.2023.10369418

32.

Ngnamsie Njimbouom

Aly Abdelkader

Zonyfar

, et al. RD-classifier: reduced dimensionality classifier for Alzheimer’s diagnosis support system. In: Wagner

Decker

, eds. In: International Conference on Database and Expert Systems Applications. Cham, Switzerland: Springer Nature Switzerland; 2023:3-17. doi:10.1007/978-3-031-39821-6_1

33.

Vimaladevi

Thangamani

. Prediction of Alzheimer’s disease by analyzing handwriting dynamics using machine learning algorithms. In: 2024 5th International Conference on Electronics and Sustainable Communication Systems (ICESC). New York, NY: IEEE; August 2024:1298-1304. doi 10.1109/ICESC60852.2024.10690124

34.

Rani

Goel

Singh

. Enhancing Alzheimer’s disease prediction using random forest along with grid search optimization on Darwin dataset. IET Conf Proc. 2024;2024:167-173. doi:10.1049/icp.2025.0839

35.

Caraveo

Álvarez Cruz

Quintana

Romero Ramos

Flores

CMQ

Figueroa

CEC. ML

. design in handwriting analysis for classification of Alzheimer’s disease. In: Reyes-Cruz

Ramírez-Mendoza

, eds. Congreso Nacional de Ingeniería Biomédica. Cham, Switzerland: Springer Nature Switzerland; 2024:3-13.

36.

NTN

Gonzalez

Gogovi

. Writing the signs: an explainable machine learning approach for Alzheimer’s disease classification from handwriting. Healthc Technol Lett. 2025;12(1):e70006. doi:10.1038/s41598-024-51985-w

37.

Roy

Taguchi

, eds. Handbook of Machine Learning Applications for Genomics. Cham: Springer; 2022.

38.

Venkatesan

. Design an intrusion detection system based on feature selection using ML algorithms. Mathematical Statistician and Engineering Applications. 2023;72(1):702-710. doi:10.17762/msea.v72i1.2000

39.

Nayak

Swapnarekha

Naik

Dhiman

Vimal

. 25 years of particle swarm optimization: flourishing voyage of two decades. Arch Comput Methods Eng. 2023;30(3):1663-1725. doi:10.1007/s11831-022-09849-

40.

Poli

Kennedy

Blackwell

. Particle swarm optimization: an overview. Swarm Intell. 2007;1:33-57. doi:10.1007/s11721-007-0002-0

41.

Makhadmeh

Al-Betar

Doush

, et al. Recent advances in Grey Wolf optimizer, its versions and applications. IEEE Access. 2024;12:1. doi:10.1109/ACCESS.2023.3304889

42.

Zeng

Chen

Zhao

, et al. An optimized Grey Wolf algorithm. In: 2022 IEEE International Conference on Sensing, Diagnostics, Prognostics, and Control (SDPC); 05-07 August 2022; Chongqing, China:200-205. doi:10.1109/SDPC55702.2022.9915809

43.

Meraihi

Ramdane-Cherif

Acheli

Mahseur

. Dragonfly algorithm: a comprehensive review and applications. Neural Comput Appl. 2020;32(21):16625-16646. doi:10.1007/s00521-020-04866-y

44.

Sree Ranjini

Murugan

. Memory based hybrid dragonfly algorithm for numerical optimization problems. Expert Syst Appl. 2017;83:63-78. doi:10.1016/j.eswa.2017.04.033

45.

Demirci

Yurtay

Zaimoğlu

. Electrical search algorithm: a new metaheuristic algorithm for clustering problem. Arabian J Sci Eng. 2023;48(8):10153-10172. doi:10.1007/s13369-022-07545-3

46.

Heidari

Mirjalili

Faris

Aljarah

Mafarja

Chen

. Harris hawks optimization: algorithm and applications. Future Gener Comput Syst. 2019;97:849-872. doi:10.1016/j.future.2019.02.028

47.

Thaher

Arman

. Efficient multi-swarm binary Harris hawks optimization as a feature selection approach for software fault prediction. In: 2020 11th International Conference on Information and Communication Systems (ICICS); 07-09 April 2020; Irbid, Jordan:249-254. doi:10.1109/ICICS49469.2020.239557

48.

Alabool

Alarabiat

Abualigah

Heidari

. Harris hawks optimization: a comprehensive review of recent variants and applications. Neural Comput Appl. 2021;33:8939-8980. doi:10.1007/s00521-021-05720-5

49.

Deb

. Genetic algorithm in search and optimization: the technique and applications. In: Proceedings of International Workshop on Soft Computing and Intelligent Systems. Calcutta, India: ISI; 1998:58-87.

50.

Yazdani

Nezamabadi-Pour

Kamyab

. A gravitational search algorithm for multimodal optimization. Swarm Evol Comput. 2014;14:1-14. doi:10.1016/j.swevo.2013.08.001

51.

Yadav

Deep

. Constrained optimization using gravitational search algorithm. Natl Acad Sci Lett. 2013;36:527-534. doi:10.1007/s40009-013-0165-8

52.

Wang

Liang

Hancock

Khoshgoftaar

. Feature selection strategies: a comparative analysis of SHAP-value and importance-based methods. J Big Data. 2024;11:44. doi:10.1186/s40537-024-00905-w

53.

Marcílio

Eler

. From explanations to feature selection: assessing SHAP values as feature selection mechanism. In: Proceedings of the 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). Brazil. Piscataway, NJ: IEEE; 2020:340-347. doi:10.1109/SIBGRAPI51738.2020.00053

54.

Liu

Luo

Zhao

. Diagnosis of Parkinson’s disease based on SHAP value feature selection. Biocybern Biomed Eng. 2022;42(3):856-869. doi:10.1016/j.bbe.2022.06.007

55.

Gramegna

Giudici

. Shapley feature selection. FinTech. 2022;1(1):72-80. doi:10.3390/fintech1010006

56.

LightGBM . LightGBM classifier. 2023. https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html. Accessed April 24, 2024.

57.

Dewi

Chen

. Random forest and support vector machine on features selection for regression analysis. Int J Innov Comput Inf Control. 2019;15(6):2027-2037. doi:10.24507/ijicic.15.06.2027

58.

Ramkissoon

Mohammed

. An experimental evaluation of data classification models for credibility based fake news detection. In: 2020 International Conference on Data Mining Workshops (ICDMW); November 17-20, 2020. Sorrento, Italy. Piscataway, NJ: IEEE; 2020:67-74. doi:10.1109/ICDMW51313.2020.00022

59.

Shivanna

Agrawal

. Prediction of defaulters using machine learning on azure ML. In: Proceedings of the 2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON); November 4-7, 2020. Vancouver, BC, Canada. Piscataway, NJ: IEEE; 2020:320-325. doi:10.1109/IEMCON51383.2020.9284884

60.

Chidroop

Moharir

. Predicting the propensity of order cancellation in the ecommerce domain. Int J Res Eng Sci Manag. 2020;3(6):658-664.

61.

Huang

Zhao

. Stock market prediction by daily news via natural language processing and machine learning. In: 2021 International Conference on Computer, Blockchain and Financial Development (CBFD); April 23-25, 2021. Nanjing, China: IEEE; 2021:190-196. doi:10.1109/CBFD52659.2021.00044

62.

Jia

Zeng

Liao

, et al. Mixture survival trees for cancer risk classification. Lifetime Data Anal. 2022;28(3):356-379. doi:10.1007/s10985-022-09552-w

63.

Agrawal

Imielinski

Swami

. Database mining: a performance perspective. IEEE Trans Knowl Data Eng. 1993;5(6):914-925.

64.

Mitchell

. Machine learning, M. Mcgraw-hill science. Engineering/Math. 2014;1:27. Mohammed, M., Khan, M. B., & Bashier, E. B. M. (2016). Machine learning: algorithms and applications. Crc Press.

65.

Utgoff

Berkman

Clouse

. Decision tree induction based on efficient tree restructuring. Mach Learn. 1997;29:5-44. doi:10.1023/A:1007413323501

66.

Maimon

Rokach

, eds. Data Mining and Knowledge Discovery Handbook, 2. New York, NY: Springer; 2005.

67.

Schölkopf

Smola

Williamson

Bartlett

. New support vector algorithms. Neural Comput. 2000;12(5):1207-1245. doi:10.1162/089976600300015565

68.

Cortes

Vapnik

. Support-vector networks. Mach Learn. 1995;20:273-297.

69.

Hou

Chen

. A new imbalanced data oversampling method based on Bootstrap method and Wasserstein Generative Adversarial Network. Math Biosci Eng. 2024;21(3):4309-4327.

70.

Yeo

Kang

. A study on a car insurance purchase prediction using two-class logistic regression and two-class boosted decision tree. Korea J Artifi Intel. 2021;9(1):9-14. doi:10.24225/kjai.2021.9.1.9

71.

Arlot

Celisse

. A survey of cross-validation procedures for model selection. Stat Surv. 2010;4:40-79. doi:10.1214/09-SS054

72.

Browne

. Cross-validation methods. J Math Psychol. 2000;44(1):108-132. doi:10.1006/jmps.1999.1279

73.

Picard

Cook

. Cross-validation of regression models. J Am Stat Assoc. 1984;79(387):575-583. doi:10.1080/01621459.1984.10478083

74.

Refaeilzadeh

Tang

Liu

. C Cross-Validation. Boston: Springer; 2009:1-3.

75.

Kohavi

. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Paper Presented at the Proceedings of the 14th International Joint Conference on Artificial Intelligence-Volume 2, Montreal, Quebec, Canada, 1995.

76.

Hastie

Tibshirani

Friedman

. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Berlin: Springer Science & Business Media; 2009.

77.

Kuhn

Johnson

. Applied Predictive Modeling, 26. New York, NY: Springer; 2013:13.

78.

Meng

Chen

Wang

Z-M

Liu

. Tie-Yan Liu, convergence analysis of distributed stochastic gradient descent with shuffling. Neurocomputing. 2019;337:46-57. doi:10.1016/j.neucom.2019.01.037

79.

Lamba

Gulati

Jain

Rani

. A speech-based hybrid decision support system for early detection of Parkinson’s disease. Arabian J Sci Eng. 2023;48(2):2247-2260. doi:10.1007/s13369-022-07249-8

80.

Lundberg

Lee

. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4766-4777.

81.

Ahmed

Hao

Jin

. Enhancing Alzheimer’s detection: VAE-augmented handwriting analysis. CCF Trans Pervasive Comp Interact. 2025;7:1-17. doi:10.1007/s42486-024-00170-z

82.

Werner

Rosenblum

Bar-On

Heinik

Korczyn

. Handwriting process variables discriminating mild Alzheimer’s disease and mild cognitive impairment. J Gerontol B Psychol Sci Soc Sci. 2006;61(4):P228-P236. doi:10.1093/geronb/61.4.P228

83.

Park

, et al. Combined intervention of physical activity, aerobic exercise, and cognitive exercise intervention to prevent cognitive decline for patients with mild cognitive impairment: a randomized controlled clinical study. J Clin Med. 2019;8(7):940. doi:10.3390/jcm8070940

84.

Nardone

De Stefano

Cilia

Fontanella

. Handwriting strokes as biomarkers for Alzheimer’s disease prediction: a novel machine learning approach. Comput Biol Med. 2025;190:110039. doi:10.1016/j.compbiomed.2025.110039

85.

Thebaud

Favaro

Chen

Chavez

. Explainable metrics for the assessment of neurodegenerative diseases through handwriting analysis. arXiv preprint arXiv:2409.08303. 2024. https://arxiv.org/abs/2409.08

86.

Bensalah

Parziale

De Gregorio

Marcelli

Fornés

Lladós

. I can’t believe it’s not better: in-air movement for Alzheimer handwriting synthetic generation. arXiv preprint arXiv:2312.05086. 2023. https://arxiv.org/abs/2312.05086

87.

Laouedj

Wang

Villalba

Thebaud

Moro-Velazquez

Dehak

. Detecting neurodegenerative diseases using frame-level handwriting embeddings In: ICASSP 2025 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); April 6-11, 2025. Hyderabad, India. Piscataway, NJ: IEEE; 2025:1-5. doi:10.1109/ICASSP49660.2025.10887880

88.

Carroll

Wang

. Estimating misclassification error with small samples via bootstrap cross-validation. Bioinformatics. 2005;21(9):1979-1986. doi:10.1093/bioinformatics/bti294

89.

Moore

. Cross-Validation for Detecting and Preventing Overfitting. Pittsburgh, PA: School of Computer Science Carnegie Mellon University; 2001:133.

90.

Cardoso

Barros

Gonçalves

Premebida

Nunes

. Multispectral image segmentation in agriculture: evaluating deep learning models with train-test split and cross-validation strategies. In: 2024 7th Iberian Robotics Conference (ROBOT); November 6–8, 2024. Madrid, Spain. Piscataway, NJ: IEEE; 2024:1-6. doi:10.1109/ROBOT61475.2024.10797395

Design and Validation of a Hybrid Machine Learning Model for Alzheimer’s Detection Using Handwriting Data

Abstract

Keywords

Introduction

Related Work

Methodology and Methods

Data Description

Data Preprocessing

Feature Selection

PSO

GWO

DFO

HHO

GO

GVO

SHAP

Model Development

SVM

BPM

NN

AP

GBDT

RF

DT

nu-SVM

LR

Train - Test Split

Cross Validation

Performance Evaluation

Experimental Results

Train and Test Split Results

Cross Validation Results

Explainability Metrics and Interpretability of Model

Explainability Metrics

Interpretability of Model

Conclusion

Footnotes

ORCID iD

Funding

Declaration of Conflicting Interests

Data Availability Statement

References