Sage Journals: Discover world-class research

Abstract

Objective

This study aims to create a robust and interpretable method for predicting dementia in Parkinson's disease (PD), especially in resource-limited settings. The model aims to be accurate even with small datasets and missing values, ultimately promoting its use in clinical practice to benefit patients and medical professionals.

Methods

Our study introduces LightGBM–TabPFN, a novel hybrid model for predicting dementia conversion in PD. Combining LightGBM's strength in handling missing values with TabPFN's ability to exploit small datasets, LightGBM–TabPFN outperforms seven existing methods, achieving outstanding accuracy and interpretability thanks to SHAP analysis. This analysis leverages data from 242 PD patients across 17 variables.

Results

Our LightGBM–TabPFN model significantly outperformed seven existing methods. Achieving an accuracy of 0.9592 and an area under the ROC curve of 0.9737.

Conclusions

The interpretable LightGBM–TabPFN with SHAP signifies a significant advancement in predictive modeling for neurodegenerative diseases. This study not only improves dementia prediction in PD but also provides clinical professionals with insights into model predictions, offering opportunities for application in clinical settings.

Keywords

Dementia Parkinson Light-GBM SHAP TabPFN

Introduction

Parkinson's disease (PD) stands as the second most prevalent neurodegenerative ailment among the elderly, impacting nearly 2% of individuals aged 65 and above.¹ The emergence of dementia in PD is a common and belated occurrence, with estimates suggesting an annual 10% incidence of dementia development in PD patients.² The global dementia burden, exceeding 55 million individuals, is noteworthy, with over 60% of cases concentrated in low- and middle-income countries (LMICs).³ LMICs grapple with a pronounced deficit in mental health professionals such as psychologists, nurses and social workers, a circumstance exacerbated in comparison to high-income nations.⁴ For instance, in 2015, China reported a psychiatrist availability rate of 2.2 per 100,000 in the mental health sector, contrasting with the United States’ rate of 10.54 per 100,000. African countries like Zimbabwe, Mozambique and Angola displayed even lower rates at 0.095, 0.046 and 0.057, respectively, accentuating the dearth of accessible treatment and care in rural LMIC areas. The imperative for early and precise dementia prediction in PD patients is underscored by the necessity to optimize treatment approaches and enhance patient outcomes, thereby delaying disease progression.⁵ Nonetheless, the development of dependable prediction methodologies, particularly in scenarios characterized by limited data, remains a formidable challenge.

The conventional approach to predicting complex outcomes like dementia in PD involves using statistical models^6,7 or machine learning (ML) algorithms. While these methods achieve acceptable accuracy, they often lack transparency, making it challenging to discern the key factors influencing dementia development. This trade-off between accuracy and interpretability is particularly evident in deep learning (DL) models, which boast intricate neural networks with millions of parameters.⁸ Adding complexity to the situation is the limited availability of data in this research domain, which present a challenge for applying conventional methods to small tabular datasets of PD patients.

To overcome these challenges, our investigation introduces a novel hybrid LightGBM–TabPFN, demonstrating its potential for robust PD's dementia diagnosis in real-world clinical settings. Our approach further demonstrates remarkable performance even in scenarios with data restrictions and missing values, making it a promising solution for real-world data. Additionally, we integrate SHapley Additive exPlanations (SHAP) to elucidate the model's predictions, thereby enhancing its transparency and interpretability for medical professionals. Successfully navigating these challenges has the potential to promote broader acceptance of our proposed methodology in practical clinical settings. This, in turn, could contribute to enhanced dementia prediction for PD patients under resource constraints, offering advantages to both patients and clinicians.

Literature review

Recent developments in the field of dementia classification have witnessed a growing research trend, as indicated by the literature,⁹ Notably, the prevailing trend in current research leans heavily towards the preference for MRI data over clinical data. This inclination is discernible through a substantial discrepancy in the number of studies conducted for each data type.

Specifically, in the context of distinguishing patients with dementia from healthy controls, the performance of their research models in terms of accuracy spans a range from 0.770 to 0.968. Notably, Qiu et al. achieved the highest accuracy of 0.968, complemented by an area under the ROC curve (AUC) of 0.996. Their approach involved employing a fully convolutional network (FCN) with a traditional multiple perceptron (MLP) for generating visualizations indicative of AD's dementia risk. It is worth highlighting that this research utilized non-imaging features, incorporating gender, age and the Mini-Mental State Examination (MMSE) Score.

In a recent study by Venugopalan et al.,¹⁰ a novel method of addressing gaps in disease understanding was proposed—data integration across modalities. The authors found that optimal fusion setups involved combining electronic health records¹¹ (EHR), imaging data, and single-nucleotide polymorphisms. A noteworthy observation from their work was that small sample sizes present a challenge for DL models.

A previous study¹² used a Bayesian network classifier on MRI data from 45 PD patients to predict dementia. A filtered Naïve Bayes model performed best, showing high sensitivity of 0.9233, specificity of 1, and accuracy of 0.9655. It identified hippocampi, lateral ventricles, and cerebral white matter as key dementia-related structures.

Materials and methods

Data source

This study leveraged epidemiological data pertaining to patients with PD's dementia (EPD) sourced from the National Biobank of the Korea Disease Control and Prevention Agency. The data's provenance and specifics are elucidated in the work by Byeon.¹³ In a concise overview, the dataset was amassed during the period spanning January to December 2015, drawing from 14 tertiary medical institutions nationwide. The data collection process was conducted under the auspices of the Korea Centers for Disease Control and Prevention (CDC) and employed computer-assisted personal interviews (CAPI) to execute a comprehensive health survey. Prior to the receipt and analysis of the data, requisite approvals were secured from both the Korea Disease Control and Prevention Agency's Research Ethics Review Committee (Approval No. KBN-2019-005) and the Lotting-out Committee of the National Biobank Korea (Approval No. KBN-2019-1327).

Data preprocessing

Prior to model training, the PD patient dataset underwent a thorough cleaning process. A comprehensive cleansing procedure was applied to the PD patient dataset. Non-informative attributes such as IDs and dates were systematically excluded, and data pertaining to Alzheimer's patients and their associated variables were specifically omitted to maintain a focused analysis. Rows with missing values, with particular emphasis on the “DEM DEMENTIA” target feature, were eliminated to mitigate bias and ensure accuracy. Through meticulous tackling of these preprocessing steps, we meticulously curated a dataset poised for adept dementia prediction within the PD domain. The “DEM_DEMENTIA” feature, designating dementia status, was selected as the target in the context of this study.

To ensure all features contributed equally to model performance and improve generalizability, we addressed the different scales of both numerical and categorical variables in our dataset. For numerical features, we applied scaling to standardize their values within a common range (e.g. 0–1). This prevents features with larger scales from dominating the model and ensures all features exert proportional influence. In the case of categorical features, like “DEM EDU” representing years of education, we encoded them into categories relevant to the Korean education system (e.g. elementary, middle school, high school, college). This transformation not only enhances interpretability but also increases the model's sensitivity to meaningful variations within the feature's domain, potentially boosting its accuracy and applicability.

Following data cleaning, the dataset shrank to 242 patients: 166 without dementia and 76 with it, reflecting a highly imbalanced class distribution. We refined the dataset to 36 informative variables and the target feature, applying ordinal encoding to categorical variables. Table 1 details these 36 variables and the target feature, including their data types and encoding details for categorical ones.

Table 1.

Description of variables used in this research.

Variable	Description	Type	Values and description
DEM_SEX	Gender	Categorical	Male (1), Female (2)
DEM_AGE	Age	Continuous	() years old
DEM_EDU	Training period	Continuous	() years
DEM_HAND	Dominant hand	Categorical	Right (1), Left (2), Both (3)
DEM_SMOKE	Smoking experience	Categorical	No (1), Smoking in the past (2), Smoking in the current (3)
DEM_COFFEE	Whether or not drink coffee	Categorical	No (1), Drinking in the past (2), Drinking in the current (3)
DEM_AGRICULCHEM	Pesticide exposure	Categorical	No (1), Exposure in the past (2), Exposure in the current (3)
DEM_COINTOXI	Carbon monoxide poisoning	Categorical	No (1), Yes (2)
DEM_MN	Manganese poisoning	Categorical	No (1), Yes (2)
DEM_ENCEPHAL	Encephalitis history	Categorical	No (1), Yes (2)
DEM_HEADINJ	Head injury	Categorical	No (1), Yes (2)
DEM_CVA	Stroke	Categorical	No (1), Yes (2)
DEM_ALCOHOL	History of alcoholism	Categorical	No (1), Yes (2)
DEM_DM	Diabetes	Categorical	No (1), Yes (2)
DEM_HT	Hypertension	Categorical	No (1), Yes (2)
DEM_LP	Hyperlipidemia	Categorical	No (1), Yes (2)
DEM_AF	Atrial fibrillation	Categorical	No (1), Yes (2)
DEM_DISEASEACC	Comorbidities	Categorical	No (1), Yes (2)
DEM_PDFAM	Family history of PD	Categorical	No (1), Yes (2)
DEM_ADDEMFAM	Family history of Dementia	Categorical	No (1), Yes (2)
DEM_TREMOR	Tremor	Categorical	No (1), Yes (2)
DEM_RIGIDITY	Rigidity	Categorical	No (1), Yes (2)
DEM_AKBK	Bradykinesia/Akinesia	Categorical	No (1), Yes (2)
DEM_PI	Postural instability (PI)	Categorical	No (1), Yes (2)
DEM_LMC	Late motor complications	Categorical	No (1), Yes (2)
DEM_RBD	Rapid eye movement (REM) sleep behavior disorders	Categorical	No (1), Yes (2)
DEM_RBD_EVD	Reasons for judging REM sleep behavior disorder	Categorical	He talks deeply in his sleep (1), He acts in his dreams (2), He does both “Heavy sleep talking” and “He acts in his dreams"(3), Prosecutor (4)
DEM_KMMSE_SCR	Korean MMSE	Continuous	_ points/30 points
DEM_KMOCA_SCR	Korean Montreal Cognitive Assessment (KoCA) Score	Continuous	_points/30 points
DEM_CDR_GSCR	Global Clinical dementia rating (CDR) score	Continuous	() points/5 points
DEM_CDR_SSCR	Clinical dementia rating score (sum of boxes)	Continuous	() points/5 points
DEM_DEMENTIA	Dementia based on DSM-IV (Diagnostic and statistical manual of mental disorders IV)	Categorical	No (1), Dementia (2)
DEM_KIADL_SCR	Korean instrumental activities of daily living score	Continuous	() points/5 points
DEM_UPDRS_TSCR	Total Untitled Parkinson disease rating scale (UPDRS) score	Continuous	() points/199 points
DEM_DEPRESSION	Depression determined by BDI (Beck's Depression Inventory) or GDS (Geriatric Depression Score)	Categorical	Yes (0), No (1)
DEM_SEADL_PCT	Schwab & England ADL (Schwab & England activities of daily living scale)	Continuous	()% / 100%

Data split for training, validation and testing

Following the data preparation steps, our dataset included information from 242 patients, encompassing 36 different factors along with the target variable. To build and select features for our model, we randomly picked 80% of this data, constituting 193 patients, to serve as the training set. The remaining 20%, comprising 49 patients, was reserved as the validation set.

Feature selection with RFECV

To optimize model performance and enhance interpretability, we employed Recursive Elimination with Cross-Validation with Random Forest (RFECV)¹⁴ for feature selection. This robust approach iteratively removes the least informative feature while ensuring generalizability through k-fold cross-validation.¹⁵ This offers several advantages:

Efficient exploration: RFECV systematically eliminates features, progressively evaluating various combinations and avoiding exhaustive testing.

Generalizability: k-fold cross-validation mitigates overfitting by training the model on multiple subsets of the data, leading to more reliable estimates of feature importance.

Interpretability: By identifying the most relevant features through RFECV, we gain valuable insights into the key drivers of the model's predictions, facilitating effective interpretation.

RFECV procedure

Initialization: All features are included in the initial model.

Fold-wise training and evaluation: The data is divided into k folds.

For each fold, the model is trained on k–1 folds, excluding the current fold.

Within each fold of the cross-validation, excluding the current fold used for training, the Random Forest model's performance is evaluated, and feature importance is measured based on this evaluation.¹⁶

Feature elimination: The characteristic with the least significant importance score is excluded from the model across all cross-validation folds.

Iteration: Steps 2 and 3 are repeated until a desired number of features or stopping criterion is reached (e.g. minimal performance improvement on the validation set).

RFECV implementation

We implemented RFECV using the scikit-learn package in Python. The number of folds k used in cross-validation affects the robustness of feature importance estimates. While higher k values offer greater robustness, they also increase computational cost. For most datasets, k values between 5 and 10 are preferred.¹⁷ In this dataset, due to highly imbalance mentioned above, we chose k = 10.

Tabular Prior-Data Fit Network (TabPFN)

TabPFN¹⁸ emerges as a cutting-edge approach to supervised classification tasks on tabular data, harnessing the power of Prior-Data Fitted Networks (PFNs). PFNs challenge traditional model training by embracing a vast knowledge base derived from simulated datasets, enabling rapid and insightful inference on new data. Key concepts of PFNs include:

Prior-Data Fitted Networks (PFNs)¹⁹ challenge the traditional model training paradigm with a Bayesian-inspired approach. Instead of relying on a single training dataset, PFNs leverage knowledge from a vast collection of simulated datasets. This pre-training, combined with their unique single-pass inference capability through Bayesian inference, makes them a promising alternative for analyzing small tabular data. Table 2 explains how the PFN model was trained by Fitting Prior-Data.

Pre-training on diverse simulated data: A PFN is pre-trained on a collection of diverse simulated datasets, each with its prior distribution. This allows the network to learn a wide range of potential relationships between features and labels. Figure 1 illustrates the key components of PFNs.

Using Transformer: PFNs typically utilize Transformer architectures,²⁰ known for their effectiveness in language processing and their ability to capture complex relationships. PFNs leverage a slightly modified version that's permutation invariant. This means the model can handle input sequences in any order, eliminating the need for positional encoding. These adapted Transformers play a crucial role in learning to represent the posterior distribution of model parameters for each simulated task during pre-training.

In-context inference with new data: When presented with a new dataset, the PFN takes the data and some test features as input and executes a single forward pass through the network. This single pass estimates the posterior distribution for the new data point, effectively capturing the uncertainty alongside the prediction.

Figure 1.

Concept of prior-data fitted networks (PFNs).¹⁹

Table 2.

Explained PFN algorithm.

Input: A prior distribution over datasets p(D), from which samples can be drawn and the number of sample K to drawOutput: A model

q_{0}

that will approximate the PPD

for j

\leftarrow 1

to K do

Sample

D \cup {(x_{i}, y_{i})}_{i = 1}^{m} \sim p (D)

;

Calculate an approximation of the stochastic loss

{\bar{l}}_{0} = \sum_{i = 1}^{m} (- l o g q_{0} (y_{i} | x_{i}, D));

Update parameter

θ

with stochastic gradient descent on

\nabla_{0} {\bar{l}}_{0}

;

end for

TabPFN builds upon the foundation of PFNs, specifically designed to excel in the realm of tabular data. It distinguishes itself with two key modifications:

Optimized attention: To address the computational bottleneck of PFNs during inference, TabPFN employs meticulously crafted attention masks. These masks strategically allocate attentional resources, resulting in significantly faster predictions compared to traditional PFNs.

Zero-Padding: Furthermore, TabPFN embraces the diverse nature of tabular data by incorporating zero-padding techniques. This allows it to seamlessly handle datasets with varying numbers of features, offering unmatched flexibility and adaptability.

Hybrid LightGBM–TabPFN

In modern datasets, the prevalent issue of missing data poses a pervasive challenge. This common occurrence emerges when specific observations or variables lack recorded information, creating a substantial obstacle as part of the process of extracting insights and meaning from the data. The origins of missing data are diverse, encompassing human error during data collection and the inherent characteristics of certain data sources. Missing values can introduce bias, potentially skewing the quality of learned patterns and/or the performance of classification tasks.²¹

Despite initial preprocessing and the removal of features with over 50% missing values, several features in the dataset continued to exhibit high null percentages. Notably, the “DEM_UPDRS_TSCT” feature contained 48.76% missing values. To address this issue, we employed ML-based techniques for missing value imputation.²²

To evaluate the effectiveness of the ML algorithm in predicting missing values, we conducted a comparison across five state-of-the-art architectures in a regression task using the “DEM_UPDRS_TSCT” feature. The evaluated algorithms include Random Forest, LightGBM, Gradient Boosting, XGBoost, and Extra Trees. Notably, both Random Forest and LightGBM achieved an identical R-squared value of 0.58. However, LightGBM demonstrated a significantly faster execution time, completing the task in 0.08 seconds compared to Random Forest, and it slightly outperformed in terms of Root Mean Square Error (RMSE) as an evaluation measure. Consequently, LightGBM was selected as the ML model for predicting missing values due to its efficient performance. Further details and comprehensive evaluation results are outlined in Table 3.

Table 3.

Evaluation result on predicting “DEM_UPDRS_TSCT” feature.

Model	R-squared	RMSE	Time taken
RandomForestRegressor	0.58	11.37	0.27
LGBMRegressor	0.58	11.39	0.08
GradientBoostingRegressor	0.57	11.54	0.16
XGBRegressor	0.43	13.21	0.19
ExtraTreesRegressor	0.46	12.87	0.2

Boldvalue indicates the highest R-squared and RMSE values, as well as the lowest time taken.

After selecting LightGBM as the model for missing value prediction, we proceed as follows:

Data preparation:

Assume the dataset is a $n \times p$ -dimensional data matrix.

Sort the features ( $s = 1, \dots, p$ ) based on the ascending number of missing values, prioritizing those with fewer missing values.

Iterative imputation:

For each feature with missing values, we separate the data into two parts:

$X_{m i s s i n g}$ : Contains instances with missing values for the current feature.

$X_{t r a i n i n g}$ : Contains instances with observed values for the current feature, used for training.

Further split Xtrain into predictor and target variables for the model:

$x_{t r a i n}^{(s)}$ : Predictor variables for the current feature.

$y_{t r a i n}^{(s)}$ : Target variable (the current feature with observed values).

The LightGBM was trained on

x_{t r a i n}^{(s)}

and

y_{t r a i n}^{(s)}

. Use the trained model to predict missing values in

X_{m i s s i n g}

, generating imputed values

X_{i m p u t a t i o n}

. Combine

X_{i m p u t a t i o n}

with the observed values in

X_{t r a i n i n g}

to form a new, more complete dataset. Repeat the process for the next feature with missing values until all missing values have been imputed. The next step involves applying TabPFN to the processed dataset. The pseudocode of the algorithm is illustrated in Table 4.

Table 4.

Pseudocode for using LightGBM to predict missing values.

Require : Sorted X a

n \times p

matrix, with

s = 1, \dots, p

for s from 1 to p

Split the data into training data (

X_{t r a i n i n g}

) and missing data (

X_{m i s s i n g}

);

Separate

X_{t r a i n i n g}

the features (

x_{t r a i n}^{(s)}

) and target variable (

y_{t r a i n}^{(s)}

.) for the training data with k fold cross validation;

if this column is categorical type: Using LightGBM Classification model else this column is continuous type: Using LightGBM Regression modelEncode categorical labels in

y_{t r a i n}^{(s)}

.; Fit LightGBM:

y_{t r a i n}^{(s)} \sim x_{t r a i n}^{(s)}

; Predict

y_{m i s s i n g}^{(s)}

using

x_{m i s s i n g}^{(s)}

;

X_{i m p u t a t i o n} \leftarrow

update imputed matrix, using

y_{t r a i n}^{(s)}

Combine the

X_{t r a i n i n g}

and

X_{m i s s i n g}

with

X_{i m p u t a t i o n}

; return the column with imputed value

end for return dataset with missing value filled by LightGBM algorithm

Traditional imputation techniques

To evaluate our proposed missing value imputation method's impact on model performance for dementia prediction in PD's patients, we compared its efficacy with well-known techniques. This analysis allows us to identify the most suitable method by evaluating each approach's relative strengths and weaknesses.

K-Nearest neighbors imputation for missing value handling

The K-Nearest Neighbors (KNN) imputation method utilizes the KNN algorithm to estimate and substitute missing data values.²³ This method identifies the k nearest neighbors of a data point with a missing value. These neighbors are the data points within the dataset that exhibit the greatest similarity to the point in question, based on the available features. This similarity is often quantified using a distance metric, such as the Euclidean distance captured in Equation (1). By leveraging the information contained within these k nearest neighbors, the KNN imputation method can predict a suitable value to replace the missing data point.

d_{(p, q)} = \sqrt{\sum_{j = 1}^{s} {(p_{j} - q_{j})}^{2}}

(1)where

d_{(p, q)}

is the Euclidian distance, j is the data attribute with j = 1, 2, 3, …, s, s is the data dimension,

p_{j}

is the value from j-attribute containing missing data,

q_{j}

is the value from other j-attribute containing complete data.

The KNN imputation method, while valuable for handling missing data, presents two noteworthy challenges. The first challenge lies in determining the optimal value for the parameter k, which represents the number of nearest neighbors used for imputation. Selecting an excessively small k-value may limit the information available for accurate prediction, while a large k-value could introduce noise from dissimilar neighbors. The second challenge concerns the dataset-specific nature of KNN imputation. The selection of the most suitable k neighbors may vary depending on the characteristics of the specific dataset under analysis.²⁴ To address these challenges, we explored a range of k values. While research²⁵ by Lall et al. suggests $k = \sqrt{n}$ for $n > 100$ as a potential starting point, we focused on a narrower range from 1 to 17, specifically odd numbers. To facilitate the exploration of different k values, we employed the “KNNImputer” function from the “sklearn.impute” library in Python. This function offers a convenient and efficient way to implement KNN imputation while allowing us to experiment with various k options.

Multiple imputation by chained equations

Multiple Imputation by Chained Equations (MICE) offers a robust approach to managing missing data within datasets. As outlined in research by Azur et al.,²⁶ MICE iteratively construct multiple complete versions of the data to address this challenge. Initial imputation fills missing entries with placeholders (means for continuous, most frequent category for categorical data).

MICE then employ an iterative loop focused on one variable with missing data at a time. Within each iteration, a statistical model is built for the target variable leveraging relationships with other variables. The model type (e.g. linear regression) depends on the target variable's nature. Once constructed, the model predicts missing values, effectively updating the data. The loop iterates through all variables, repeating model building, prediction and imputation. Crucially, each iteration incorporates previous imputed values, creating a chain of dependencies. Convergence signifies consistent imputed values and stable parameter estimates. The final imputed values replace the original missing data, resulting in a complete dataset for analysis. MICE assume Missing at Random (MAR) data, where missingness depends only on observed data.

Simple imputation

Simple imputation offers a straightforward approach to handling missing data in datasets. It replaces missing entries with a single value derived from the observed data, typically based on central tendency measures properties of the variable with missing values.

For continuous variables (numerical data), mean imputation fills in missing entries with the average of the observed values. This assumes the missing values are randomly distributed around the mean. For categorical variables, mode imputation replaces missing entries with the most frequent category. This assumes missing values are most likely to belong to the most common category.

This research will specifically implement mean imputation for numerical data features, where the mean of the observed values in each feature will replace missing entries. Similarly, mode imputation will be applied to categorical data features, where the most frequent category within each feature will replace missing entries.

Shapley Additive exPlanations (SHAP)

SHAP,²⁷ based on game theory,²⁸ are popular local explanation²⁹ and model-agnostic approaches. It is designed to elucidate predictions made by any “black-box” classifier³⁰ classifier. These approaches provide interpretable and faithful explanations for individual predictions by locally learning an interpretable model (e.g. a linear model) around each prediction. More precisely, SHAP estimates feature attributions on individual instances, effectively capturing the contribution of each feature to the “black-box” prediction.

The present study employed SHAP to elucidate the feature-level contributions within a hybrid model comprising n features grouped as N. This approach assigns Shapley values, quantifying the marginal impact of each feature on the model's final prediction. The underlying methodology adheres to principles of fairness, ensuring equitable attribution of credit to individual features. This safeguards against biases arising from feature interactions or model complexity, thus fostering a nuanced understanding of the model's inner workings, as shown in Equation (2).

ϕ_{i} = \sum_{S \subseteq N {i}} \frac{| S |! (n - | S | - 1)!}{n!} [v (S \cup {i}) - v (S)]

(2)Equation (3) introduces a method for quantifying the contribution of a binary feature, denoted as g, to a model's output. This method constructs a linear function that isolates the influence of g, allowing us to determine its specific impact within the overall prediction process.

g (z^{'}) = ϕ_{0} + \sum_{i = 1}^{M} ϕ_{i} z {^{'}}_{i}

(3)Let

z^{'} \in {0, 1}^{M}

, where it takes the value of 1 when a feature is observed and 0 otherwise. Here, M represents the total number of input features.

SHAP values are often visualized using force plots or bee-swarm plots. Each feature receives an arrow (positive Shapley value pushes the prediction up, negative pulls it down). The sum of all arrow balances at the final prediction, resembling the balanced scales of justice. These visualizations reveal patterns, outliers and potential biases in the model's decision-making process.

Benchmarking the hybrid LightGBM–TabPFN architecture

To thoroughly assess the effectiveness of the hybrid LightGBM–TabPFN architecture we propose, we conducted a benchmarking analysis against various prominent baseline models in ML and DL. This comparison allowed us to establish performance benchmarks. The ML and DL baseline models utilized in this investigation include:

XGBoost,³¹ or eXtreme Gradient Boosting, extends gradient boosting by employing a unique regularization term (e.g. L1/L2) and parallel computing to achieve superior accuracy across a diverse range of tasks, including regression, classification and ranking.

LightGBM³² builds upon the Gradient Boosting Decision Tree (GBDT) with innovative techniques like Gradient-based One-Side Sampling and the Histogram-based Algorithm. These methods accelerate training time, reduce memory usage and ultimately enhance the precision of its GBDT model.

Random Forest¹⁶ an ensemble method that combines numerous decision trees. Known for its robustness and accuracy, this technique combines the predictions of diverse trees to deliver reliable results, demonstrating resilience against noise and overfitting.

Bagging Classifier,³³ an ensemble learning method rooted in Bootstrap Aggregating, is adept at improving the robustness and accuracy of classification tasks. By constructing multiple classifiers through bootstrap sampling, where the training process is conducted on each respective model, Bagging mitigates overfitting and enhances predictive performance. The amalgamation of diverse predictions from these classifiers yields a more resilient and accurate overall classifier.

AdaBoost³⁴ stands for “Adaptive Boosting.” Unlike typical approaches, AdaBoost enhances the capabilities of its decision tree learners by emphasizing difficult-to-classify examples. In an iterative fashion, each subsequent learner addresses the errors of its forerunners, culminating in the creation of a resilient and precise powerhouse tailored for intricate classification tasks.

ExtraTree³⁵ short for Extremely Randomized Trees is an ensemble learning method within the decision tree algorithm class. Similar to Random Forests, it builds an ensemble of decision trees, each trained on a subset of the dataset. What distinguishes ExtraTree is its unique approach to constructing individual trees. Unlike its counterparts, ExtraTree rapidly selects split points for nodes by choosing them randomly instead of exhaustively searching for optimal splits among selected features. This heightened randomness enhances model robustness, encouraging more variety among the individual trees in the ensemble.

HyperTab³⁶ a DL framework designed to address challenges posed by limited sample problems on tabular datasets. It leverages the power of hypernetwork,³⁷ where a neural network learns to generate the weights for another network. Unlike traditional DL models prone to overfitting with scarce data, HyperTab's hypernetwork architecture facilitates data-efficient learning on tabular datasets. This allows for accurate model construction and reliable predictions, even with limited samples.

Model evaluation metrics

In assessing the effectiveness of these models, we utilized a range of standard metrics commonly applied in classification tasks, including recall, F1-score, precision, AUC, and accuracy, as detailed in Equations (4)–(8). True positives (TP) represent accurate classifications of belonging cases, while false positives (FP) showcase misleading inclusions. Conversely, true negatives (TN) capture accurate exclusions of non-belonging cases, whereas false negatives (FN) reveal missed positive instances with potentially significant consequences. These metrics form the foundation for performance measures, enabling a comprehensive evaluation of the model's ability to navigate class boundaries effectively.

Accuracy: This is a pivotal metric in model evaluation and gauges the percentage of correctly classified samples. It offers a general measure of a model's effectiveness in distinguishing between positive (ROSC achieved) and negative (ROSC failure) outcomes, providing a straightforward assessment of overall predictive correctness.

Precision: In model evaluation, the crucial metric of precision measures the accuracy of a model's positive predictions, emphasizing the reliability of affirmative classifications.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(4)

P r e c e s i o n = \frac{T P}{T P + F P}

(5)

Recall: A critical metric in classification tasks, it assesses the model's capacity to correctly identify TPs. This translates to minimizing FNs, instances where actual faults remain undetected. Recall is particularly pivotal in settings where FNs carry significant consequences.

R e c a l l = \frac{T P}{T P + F N}

(6)F1-Score: For tasks demanding equal accuracy in both predicting true positives and true negatives, the F1-score emerges as a preferred metric. This metric bridges the gap between recall and precision, harmoniously accounting for both misclassifications—mistakes of omission (FN) and commission (FP).

F 1 - S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(7)The AUC, or Area Under the Curve, serves as a representation of the degree or measure of separability and functions as a summary of the Receiver Operator Characteristic (ROC) curve. The ROC curve is a visualization graph depicting imputation performance. A higher AUC value is indicative of greater accuracy in class prediction:

R O C A U C = \int_{1}^{0} T P R (t_{i}) d (F P R (t_{i}))

(8)

Results

Results of feature selection

In this study, REFCV selected features by using 10-fold cross-validation and accuracy as a measure-score. As a result, the highest accuracy was shown with a number of 17 features chosen from a total of 36 variables. REFCV features included DEM_AGE, DEM_EDU, DEM_COFFEE, DEM_DM, DEM_PDFAM, DEM_ADDEMFAM, DEM_PI, DEM_LMC, DEM_RBD, DEM_KMMSE_SCR, DEM_KMOCA_SCR, DEM_DEPRESSION, DEM_CDR_GSCR, DEM_CDR_SSCR, DEM_KIADL_SCR, DEM_UPDRS_TSCR and DEM_SEADL_PCT.

Evaluation on validation set

Hyperparameter optimization (HPO) stands as a pivotal role in building robust ML models. These often-invisible settings profoundly shape model behavior and generalization capabilities. HPO involves a systematic search for the optimal configuration of these parameters, ensuring peak model performance on a specific dataset. While numerous HPO strategies exist, contemporary techniques like Bayesian optimization have gained prominence for their efficiency and speed. In the course of this study, we utilized Optuna [23], a Python library available as open-source (version 3.5.0). Optuna provides a comprehensive suite of tools, including both Bayesian optimization and random search algorithms, allowing for the meticulous fine-tuning of hyperparameters. Its integration with prominent ML frameworks such as TensorFlow and PyTorch adds a layer of efficiency, simplifying and enhancing the optimization process seamlessly within the workflow of model development. Table 5 provides the HPO for each model.

Table 5.

Optimized hyperparameters of each model.

Model	Hyperparameters
ExtraTree	n_estimators: 102, max_depth: 13, max_leaf_nodes: 19, criterion: entropy
LightGBM	num_leaves: 5, max_depth: 6, learning_rate: 0.02, n_estimators: 1002, class_weight: balanced, min_child_samples: 33, subsample: 0.9976, colsample_bytree: 0.83, reg_alpha: 0.01, reg_lambda: 0.05
BaggingClassifier	n_estimators: 100, max_samples: 0.1, max_feature: 0.1
Random Forest	n_estimators: 200, max_features: sqrt, max_depth: 50, min_samples_split: 8, min_samples_leaf: 4
AdaBoost	n_estimators: 350, learning_rate: 0.018
XGBoost	lambda: 0.0062, alpha: 0.0737, colsample_bytree: 0.4, subsample: 0.8, learning_rate: 0.02, max_depth: 5, random_state: 2020, min_child_weight: 2
HyperTab	test_nodes: 250, epochs: 10, hidden_dims: 5

Random Forest Classification model = Random Forest, Extra Tree Classification model = Extra Trees, AdaBoost Classification model = AdaBoost, Light Gradient Boosting Machine Classification model = LightGBM, Extreme Gradient Boosting Classification model = XGBoost.

Following optimization, a thorough evaluation of each model's performance on the validation set was conducted. To prevent overfitting and ensure robust performance on imbalanced data, we implemented a 10-fold cross-validation strategy.

Our hybrid LightGBM–TabPFN model emerges as the standout performer across key metrics, demonstrating the highest accuracy at 0.9378, AUC at 0.9381 and F1 score at 0.9393. These results highlight TabPFN's exceptional ability to accurately classify instances, distinguish between classes, and balance precision and recall. XGBoost follows closely, with strong accuracy at 0.9234 and a respectable F1 score of 0.8764. HyperTab, while achieving lower scores in other metrics, notably attains perfect recall at 1.000, aligning with the hybrid TabPFN. Extra Trees, Random Forest, Bagging, AdaBoost and LightGBM demonstrate moderately good performance across the evaluated metrics. A comprehensive summary of the intricate performance details is presented in Table 6.

Table 6.

Performance on the validation set.

Model	Accuracy	AUC	F1 Score	Precision	Recall
Extra Trees	0.9076	0.8861	0.8509	0.9100	0.8190
Random Forest	0.9181	0.9060	0.8759	0.9000	0.8667
Bagging	0.9079	0.8906	0.8594	0.9053	0.8357
AdaBoost	0.9021	0.8934	0.8593	0.8672	0.8642
LightGBM	0.9126	0.9010	0.8740	0.9024	0.8643
XGBoost	0.9234	0.9033	0.8764	0.9332	0.8380
Hybrid TabPFN	0.9378	0.9381	0.9393	0.9237	0.9615
HyperTab	0.8776	0.9211	0.7857	0.6471	1.000

Boldface indicates the highest values for accuracy, AUC, precision, recall, and F1 score, respectively, among the selected models.

Random Forest: Random Forest Classification model; Extra Trees: Extra Tree Classification model; AdaBoost: AdaBoost Classification model; LightGBM: Light Gradient Boosting Machine Classification model; XGBoost: Extreme Gradient Boosting Classification model.

Evaluation performance on testing set

Upon completion of the fine-tuning and training stages, evaluating model performance on an unseen dataset became crucial. During training, our hybrid LightGBM–TabPFN model consistently outperformed baseline models across key metrics, including accuracy, AUC and F1 Score. Its effectiveness was further corroborated during testing, as evident in the confusion matrix (Figure 2). Figure 2 illustrates the confusion matrix of the Hybrid TabPFN model on the hold-out test set. This matrix visualizes the model's performance in classifying dementia and non-dementia cases. Each cell shows the number of data points that fall into a specific combination of predicted and actual labels. Notably, there are two data points that were incorrectly classified as dementia, but actually belonged to the non-dementia class (false positive). The model correctly classified 47 out of 49 patients, demonstrating its robustness and accuracy. It successfully predicted 36 out of 38 non-dementia patients and accurately identified all 11 instances of dementia.

Figure 2.

Confusion matrix of the proposed hybrid LightGBM–TabPFN model on the hold-out test set.

Based on the insights from Table 7, the Hybrid TabPFN model showcases outstanding generalization capabilities with notable accuracy of 0.9592, AUC of 0.9737 and F1 Score of 0.9167, indicating its potential for real-world applications. Bagging and Extra Trees also perform well, exhibiting high accuracy and AUC scores. In contrast, renowned models like LightGBM and XGBoost fall short, underscoring the importance of evaluating models on domain-specific datasets. HyperTab, though slightly below the top performers, demonstrates a balanced F1 Score, valuable in scenarios prioritizing both false positives and false negatives. The hybrid TabPFN's exceptional recall, identifying all positive cases, is critical, but its precision warrants further investigation. Finally, the high AUC values achieved by the Hybrid TabPFN models suggest their exceptional ability to discriminate between classes across different decision thresholds. This robustness strengthens their candidacy for real-world deployment, equipping medical professionals with reliable tools for early prediction and timely intervention.

Table 7.

Performance on test set.

Model	Accuracy	AUC	F1 Score	Precision	Recall
Extra Trees	0.9184	0.9151	0.8333	0.7692	0.9091
Random Forest	0.8980	0.8696	0.7826	0.7500	0.8182
Bagging	0.9388	0.9282	0.8696	0.8333	0.9091
AdaBoost	0.8980	0.9019	0.8000	0.7143	0.9091
LightGBM	0.8571	0.8433	0.7200	0.6429	0.8182
XGBoost	0.8776	0.8565	0.7500	0.6923	0.8182
Hybrid TabPFN	0.9592	0.9737	0.9167	0.8462	1.0000
HyperTab	0.8635	0.8631	0.8552	0.8862	0.8352

Boldface indicates the highest values for accuracy, AUC, precision, recall, and F1 score, respectively, among the selected models.

Comparison to traditional imputation techniques

We evaluated the performance of KNN imputation on the dataset using different K values. The TabPFN model was then applied to the imputed data. Results for the validation set and hold-out test set are presented in Tables 8 and 9, respectively.

Table 8.

TabPFN model performance with different k value on validation dataset.

k value	Accuracy	AUC	F1 Score	Precision	Recall
1	0.9145	0.9151	0.9185	0.9005	0.9462
3	0.9342	0.9343	0.9375	0.9141	0.9692
5	0.9145	0.9151	0.9170	0.9133	0.9308
7	0.9185	0.9228	0.9250	0.9190	0.9385
9	0.9263	0.9266	0.9296	0.9205	0.9462
11	0.9263	0.9266	0.9296	0.9205	0.9462
13	0.9263	0.9266	0.9294	0.9099	0.9538
15	0.9200	0.9189	0.9211	0.9099	0.9385
17	0.9418	0.9420	0.9450	0.9200	0.9769

Boldface indicates the highest values for accuracy, AUC, precision, recall, and F1 score, respectively, among the selected models.

Table 9.

TabPFN model performance with different k value on hold-out dataset.

k value	Accuracy	AUC	F1 Score	Precision	Recall
1	0.8980	0.9019	0.8000	0.7143	0.9091
3	0.8980	0.8696	0.7826	0.7500	0.8182
5	0.8776	0.8565	0.7500	0.6923	0.8182
7	0.8980	0.9019	0.8000	0.7143	0.9091
9	0.8776	0.8565	0.7500	0.6923	0.8182
11	0.8980	0.8696	0.7826	0.7500	0.8182
13	0.8980	0.8696	0.7826	0.7500	0.8182
15	0.9184	0.8828	0.8182	0.8182	0.8182
17	0.8980	0.8696	0.7826	0.7500	0.8182

Boldface indicates the highest values for accuracy, AUC, precision, recall, and F1 score, respectively, among the selected models.

Our experiments on the validation dataset revealed that the TabPFN classification model achieved the highest accuracy (0.9418) for predicting dementia in PD patients when k, the number of nearest neighbors, was set to 17. Additionally, the model achieved an AUC of 0.9420, an F1-score of 0.9450 and a recall of 0.9769. However, precision was highest when k was set to 9.

For the hold-out test set, the model performed best with k = 15, achieving an accuracy of 0.9184, an F1-score of 0.8182 and a precision of 0.8182. To ensure a fair comparison with our proposed hybrid method and other imputation methods, we adopt k = 15 for further analysis. Table 10 summarizes the performance comparison between the previously discussed traditional imputation methods and our proposed hybrid method on hold-out test set.

Table 10.

Comparison of traditional imputation methods and proposed hybrid method hold-out test set.

Model	Accuracy	AUC	F1 Score	Precision	Recall
k-NN (k = 15)	0.9184	0.8828	0.8182	0.8182	0.8182
MICE	0.8776	0.8242	0.7273	0.7273	0.7273
Simple Imputation	0.8980	0.8696	0.7826	0.7500	0.8182
Hybrid TabPFN	0.9592	0.9737	0.9167	0.8462	1.0000

k-NN: k Nearest Neighbors imputation with TabPFN Classifier; MICE: multiple imputation by chained equations with TabPFN; simple imputation: simple imputation with TabPFN.

Based on the table, the Hybrid TabPFN method outperforms all other imputation methods across all metrics. It achieves the highest accuracy (0.9592), AUC (0.9737) and F1 Score (0.9167). Additionally, it exhibits a high precision (0.8462) and perfect recall (1.0000), suggesting it effectively identifies positive cases while minimizing false positives.

Evaluation the interpretation of the proposed hybrid SHAP model

To illuminate both the predictive power and interpretability of our model, we present a compelling case study in Figure 3. This case involves a patient predicted to have dementia by model but who did not exhibit dementia in real life. Utilizing a color-coded visualization, we highlight features that strongly influence this prediction, casting them in a vivid shade of red to signify their alignment with Class 1 (Dementia). One of the highest risk factors on this case is having a family member with dementia, which aligns with research on the genetics of dementia. As research by Loy et al.,³⁸ 25% of all people aged 55 years and older have a family history of dementia, which can affect model prediction. Other features such as education level,³⁹ depression history,⁴⁰ and age⁴¹ have also been shown in various research to influence model predictions. Table 1 unveils the specific feature values for this patient, guiding us through a detailed interpretation of their significance:

KMMSE score: 13/30

Have a family history of dementia: 1 (Yes)

Clinical Dementia Rating Scale (DEM_CDR_SSCR): 4.5/5

Education: 0 (Elementary)

Korean Montreal Cognitive Assessment: 7.454/30

Age: 79

Depression: 1 (Yes)

Untitled Parkinson disease rating: 15/199.

Figure 3.

Feature importance waterfall plot for the false positive dementia case.

Figure 4 illustrates the bee-swarm plot depicting the model's SHAP values and their impact on the model output. The horizontal axis signifies the predictive influence of each feature, either positively (indicating dementia) or negatively (indicating non-dementia) for an individual. Furthermore, the bee-swarm chart employs color coding for each dot, highlighting how the feature value for an individual compares to the average for the entire population.

Figure 4.

Bee-swarm plot of the proposed hybrid LightGBM–TabPFN model on the hold-out test set.

The figure reveals the contributing factors, in order of importance, for predicting dementia in PD patients. Details about these factors can be found in Table 1. Consistent with these findings, research³⁹ by Sharp et al. has shown a link between lower education (DEM_EDU) and increased risk of dementia. Furthermore, the presence of depression (DEM_DEPRESSION) aligns with established knowledge of the connection between depression and dementia, especially in later life.⁴⁰ By considering these key features, both doctors and AI experts can gain valuable insights to address and potentially mitigate model bias.

The two figures not only showcase the predictive capability of the hybrid LightGBM–TabPFN model for individual patients but also reveal the collective influence of each feature on the model's output. This assists healthcare professionals in grasping the reasoning behind the model's predictions without requiring specialized knowledge in AI. Improving this interpretability is pivotal for instilling trust in AI and expanding its practicality for deployment in clinical settings.

Discussion

Our hybrid LightGBM–TabPFN model, complemented by SHAP analysis, has demonstrated the ability to visualize the impact of clinical features on predictions and accurately assess the probability of dementia conversion. Comparatively, our classifier exhibits superior performance when contrasted with seven other models, showcasing a noteworthy AUC score that fortifies its predictive strength for PD dementia.

Moreover, the adaptability of our approach suggests ease of deployment in clinical settings. The methods employed for handling missing values in our study hold potential applicability across the broader medical research domain, contributing robustness to models dealing with datasets featuring missing values. Although initially tested in a binary classification context, we anticipate extending this technique to address multi-class problems.

Successful integration of the hybrid LightGBM–TabPFN model into existing clinical workflows is paramount for widespread adoption by physicians. This is particularly important given the phenomenon of electronic health record (HER) burnout.⁴² To achieve this integration, future research will explore potential avenues for incorporating the model's functionality. This may involve embedding the model directly within the EHR system or developing a user-friendly mobile application interface specifically tailored for healthcare professionals. By readily presenting the model's predictions during patient consultations, such integration can significantly enhance clinical efficiency.

It is crucial to emphasize that the model's primary function is to augment, not supplant, clinical judgment. The model's output should be viewed as a valuable tool to inform and support physician decision-making processes. Envisioned scenarios for model utilization include flagging high-risk patients for further evaluation or prioritizing specific diagnostic tests based on the model's predictions. Ultimately, the responsibility for diagnosis and treatment planning would remain firmly within the domain of the healthcare professional.

It's important to acknowledge a limitation in our research—our model was exclusively evaluated on a single dataset. While internal validation through data splits demonstrated promising accuracy, broader acceptance by clinicians requires robust external validation on geographically and demographically diverse patient populations beyond PD. This external validation should ideally be conducted in real-world clinical settings, incorporating the model into existing workflows and evaluating its impact on physician decision-making and patient outcomes.

Although SHAP aids understanding, the complexity of the LightGBM–TabPFN model can hinder explaining its decision-making process for clinical use. This aligns with the concept of causability in interpretability research, as highlighted by Holzinger et al.⁴³ Causability emphasizes achieving a specific level of causal understanding for human experts, considering effectiveness, efficiency and satisfaction in a given context. To address this, future research will explore methods that incorporate not only SHAP but also expert explanations. This combined approach could provide a more comprehensive and causally grounded understanding of the model's reasoning for clinicians.

While RFECV demonstrated effectiveness in our feature selection process, it is important to acknowledge limitations inherent to this approach, particularly regarding interpretability of the selected features for complex models. Given the high dimensionality of the dataset, future research should involve a comprehensive evaluation of various feature selection methods. This could potentially lead to the identification of an even more effective feature subset that optimizes the performance of dementia prediction models in PD patients. We further acknowledge that achieving the optimal balance between a model's predictive power and the clinical interpretability of its features remains an ongoing area of exploration.

In future endeavors, our objective is to combine insights derived from both clinical data and neuroimaging studies, establishing a more comprehensive prediction framework. This multimodal approach holds the potential to augment accuracy and reliability, ultimately advancing patient diagnosis and facilitating the development of personalized treatment strategies.

Conclusion

In conclusion, our study introduces an innovative hybrid model, the LightGBM–TabPFN, enriched by SHAP analysis. This novel approach significantly contributes to the field, offering advanced capabilities for predictive modeling. The inclusion of SHAP analysis not only improves the model's interpretability but also yields valuable insights into how clinical features impact predictions. Demonstrating superior performance compared to seven state-of-the-art alternative models, our hybrid framework achieved an accuracy of 0.9592 and an AUC of 0.9737 on the test set. This success holds promise for accurately predicting dementia conversion in PD patients. The adaptability of our approach, incorporating robust techniques to address missing data and effectively utilize limited dataset size, suggests its potential usefulness in various clinical settings. Our research lays the groundwork for future investigations, encompassing the extension of this technique to multi-class problems and the integration of clinical-feature-dependent models with other dementia imaging classifiers. Overall, our study makes a significant contribution to the dynamic field of predictive modeling in medical research, particularly within the context of neurodegenerative diseases.

Footnotes

Acknowledgements

The authors thank the Ministry of Health and Welfare for providing the raw data.

Contributorship

Conceptualization: V.Q.T. and B.H.; software: V.Q.T.; methodology: V.Q.T. and B.H.; validation: V.Q.T. and B.H.; investigation: V.Q.T. and B.H.; writing—original draft preparation: V.Q.T.; formal analysis: B.H.; writing—review and editing: B.H.; visualization: H.V.N.; supervision: B.H.; project administration: B.H.; funding acquisition: B.H. All authors have read and agreed to the published version of the manuscript.

Data availability

The data presented in this study are provided at the request of the corresponding author. The data is not publicly available because researchers need to obtain permission from the Korea Centers for Disease Control and Prevention.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Ethical approval

This work is not a clinical study, thus ethical approval is not required.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2018R1D1A1B07041091, NRF-2021S1A5A8062526, RS-2023-00237287) and the National R&D Program for Cancer Control through the National Cancer Center (HA23C02410061582062860001) and local government-university cooperation-based regional innovation projects (2021RIS-003).

Guarantor

Haewon Byeon.

Informed consent

Informed consent was obtained from all subjects involved in the study.

Institutional review board statement

The study was carried out in accordance with the Helsinki Declaration and was approved by the Korea Workers’ Compensation and Welfare Service's Institutional Review Board (or Ethics Committee) (protocol code 0439001, date of approval 31 January 2018).

ORCID iD

Haewon Byeon

References

Parkinson’s Disease. Alzheimer’s disease and dementia, https://alz.org/alzheimers-dementia/what-is-dementia/types-of-dementia/parkinson-s-disease-dementia (accessed 18 December 2023).

Meireles

Massano

. Cognitive impairment and dementia in Parkinson’s disease: clinical features, diagnosis, and management. Front Neurol 2012; 3: 88.

Dementia, https://www.who.int/news-room/fact-sheets/detail/dementia (accessed 18 December 2023).

Rathod

Pinninti

Irfan

, et al. Mental health service provision in low- and middle-income countries. Health Serv Insights; 2017; 10: 117863291769435.

Rasmussen

Langerman

. Alzheimer’s disease – why we need early diagnosis. Degener Neurol Neuromuscul Dis 2019; 9: 123–130.

Neal

Wright

. Validation therapy for dementia. Cochrane Database Syst Rev. Epub ahead of print 2003. DOI: 10.1002/14651858.CD001394.

Eriksson

Lundquist

Gustafson

, et al. Comparison of three statistical methods for analysis of fall predictors in people with dementia: negative binomial regression (NBR), regression tree (RT), and partial least squares regression (PLSR). Arch Gerontol Geriatr 2009; 49: 383–389.

Morocho-Cayamcela

Lee

Lim

. Machine learning for 5G/B5G Mobile and wireless communications: potential, limitations, and future directions. IEEE Access 2019; 7: 137184–137206.

Martin

Townend

Barkhof

, et al. Interpretable machine learning for dementia: A systematic review. Alzheimer’s Dementia 2023; 19: 2135–2149.

10.

Venugopalan

Tong

Hassanzadeh

, et al. Multimodal deep learning models for early detection of Alzheimer’s disease stage. Sci Rep 2021; 11: 3254.

11.

Häyrinen

Saranto

Nykänen

. Definition, structure, content, use and impacts of electronic health records: a review of the research literature. Int J Med Inf 2008; 77: 291–304.

12.

Morales

Vives-Gilabert

Gómez-Ansón

, et al. Predicting dementia development in Parkinson’s disease using Bayesian network classifiers. Psychiatry Res Neuroimaging 2013; 213: 92–98.

13.

Byeon

. Can the prediction model using regression with optimal scale improve the power to predict the Parkinson’s dementia? World J Psychiatry 2022; 12: 1031–1043.

14.

Guyon

Gunn

Nikravesh

, et al. Feature extraction: Foundations and applications. Springer Science & Business Media, 2006.

15.

Fushiki

. Estimation of prediction error by using K-fold cross-validation. Stat Comput 2011; 21: 137–146.

16.

Breiman

. Random forests. Mach Learn 2001; 45: 5–32.

17.

James

Witten

Hastie

, et al. An introduction to statistical learning: with applications in R. New York, NY: Springer US, 2021. DOI: 10.1007/978-1-0716-1418-1.

18.

Hollmann

Müller

Eggensperger

, et al. TabPFN: A transformer that solves small tabular classification problems in a second, http://arxiv.org/abs/2207.01848 (2023, accessed 20 December 2023).

19.

Müller

Hollmann

Arango

, et al. Transformers can do Bayesian inference. Epub ahead of print 8 February 2023. DOI: 10.48550/arXiv.2112.10510.

20.

Vaswani

Shazeer

Parmar

, et al. Attention is all you need, http://arxiv.org/abs/1706.03762 (2023, accessed 20 December 2023).

21.

Zhu

Zhang

Jin

, et al. Missing value estimation for mixed-attribute data sets. IEEE Trans Knowl Data Eng 2011; 23: 110–121.

22.

Lin

W-C

Tsai

C-F

. Missing value imputation: a review and analysis of the literature (2006–2017). Artif Intell Rev 2020; 53: 1487–1509.

23.

Batista

Monard

M-C.

A Study of K-Nearest Neighbour as an Imputation Method. 2002, pp. 251–260.

24.

Zhang

. Nearest neighbor selection for iteratively kNN imputation. J Syst Softw 2012; 85: 2541–2552.

25.

Lall

Sharma

. A nearest neighbor bootstrap for resampling hydrologic time series. Water Resour Res 1996; 32: 679–693.

26.

Azur

Stuart

Frangakis

, et al.

Multiple imputation by chained equations: what is it and how does it work?

Int J Methods Psychiatr Res 2011; 20: 40–49.

27.

Lundberg

Lee

S-I

, et al. A unified approach to interpreting model predictions. In: Guyon

Luxburg

Bengio

(eds) Advances in neural information processing systems. Curran Associates, Inc, 2017. https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf .

28.

Kuhn

Tucker

. Contributions to the theory of games (AM-28), volume II. Princeton University Press, 2016.

29.

Ribeiro

Singh

Guestrin

. Why should I trust you?’: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. San Francisco California USA: ACM, 2016, pp.1135–1144.

30.

Castelvecchi

. Can we open the black box of AI? Nature News 2016; 538: 20–23.

31.

Chen

Guestrin

. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data Mining. New York, NY: Association for Computing Machinery, 2016, pp.785–794.

32.

Meng

Finley

, et al. LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st international conference on neural information processing systems. Red Hook, NY, USA: Curran Associates Inc., 2017, pp.3149–3157.

33.

Breiman

. Bagging predictors. Mach Learn 1996; 24: 123–140.

34.

Freund

Schapire

. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 1997; 55: 119–139.

35.

Geurts

Ernst

Wehenkel

. Extremely randomized trees. Mach Learn 2006; 63: 3–42.

36.

Wydmański

Bulenok

Śmieja

. HyperTab: Hypernetwork approach for deep learning on small tabular datasets. Epub ahead of print 24 August 2023. DOI: 10.48550/arXiv.2304.03543.

37.

Dai

QV.

HyperNetworks. Epub ahead of print 1 December 2016. DOI: 10.48550/arXiv.1609.09106.

38.

Loy

Schofield

Turner

, et al. Genetics of dementia. Lancet 2014; 383: 828–840.

39.

Sharp

Gatz

. Relationship between education and dementia: An updated systematic review. Alzheimer Dis Associated Disord 2011; 25: 289–304.

40.

Byers

Yaffe

. Depression and risk of developing dementia. Nat Rev Neurol 2011; 7: 323–331.

41.

Savva George

Wharton Stephen

Ince Paul

, et al. Age, neuropathology, and dementia. N Engl J Med 2009; 360: 2302–2309.

42.

Budd

. Burnout related to electronic health record use in primary care. J Prim Care Community Health 2023; 14: 21501319231166921.

43.

Holzinger

Langs

Denk

, et al. Causability and explainability of artificial intelligence in medicine. WIRES Data Min Knowl Discovery 2019; 9: e1312.

Predicting dementia in Parkinson's disease on a small tabular dataset using hybrid LightGBM–TabPFN and SHAP

Abstract

Objective

Methods

Results

Conclusions

Keywords

Introduction

Literature review

Materials and methods

Data source

Data preprocessing

Data split for training, validation and testing

Feature selection with RFECV

RFECV procedure

RFECV implementation

Tabular Prior-Data Fit Network (TabPFN)

Hybrid LightGBM–TabPFN

Traditional imputation techniques

K-Nearest neighbors imputation for missing value handling

Multiple imputation by chained equations

Simple imputation

Shapley Additive exPlanations (SHAP)

Benchmarking the hybrid LightGBM–TabPFN architecture

Model evaluation metrics

Results

Results of feature selection

Evaluation on validation set

Evaluation performance on testing set

Comparison to traditional imputation techniques

Evaluation the interpretation of the proposed hybrid SHAP model

Discussion

Conclusion

Footnotes

Acknowledgements

Contributorship

Data availability

Declaration of conflicting interests

Ethical approval

Funding

Guarantor

Informed consent

Institutional review board statement

ORCID iD

References