Translational computerized clinical decision support systems for Alzheimer's disease: A systematic review

Abstract

Background

Alzheimer's disease (AD), marked by progressive memory loss and cognitive decline, poses diagnostic challenges due to its multifactorial nature. Therefore, researchers are increasingly leveraging artificial intelligence and data-driven approaches to develop computerized clinical decision support systems (CCDSS), aiming to enhance early detection, improve treatment, and slow disease progression.

Objective

This study seeks to conduct a systematic review of the most recently developed AD-CCDSS, delving into their progress and the challenges to guide future development and implementation of CCDSS for AD-related decision-making and intervention strategies.

Methods

We follow the PRISMA 2020 guideline to search for articles published within the past seven years across PubMed, ScienceDirect, IEEE Xplore Digital Library, Web of Science, and Scopus, with Google Scholar as a supplementary source. Key components are then extracted from the selected studies for qualitative analysis, including data modalities, computational modeling approaches, system explainability and interpretability, research priorities, and graphical user interfaces designed for non-technical stakeholders.

Results

After searching and removing duplicates, we meticulously selected 55 studies. After reviewing key components of CCDSS, we highlight advancements and potential clinical applications, demonstrating their promise in enhancing decision support. However, despite growing attention to explainability in AD-CCDSS, its clinical applicability remains limited. Moreover, challenges such as multi-center system interoperability and data security remain underexplored, hindering real-world implementation.

Conclusions

This study analyzes recent translational AD-CCDSS, identifying key challenges in advancing CCDSS for clinical applications. It offers insights for researchers to enhance CCDSS development and facilitate their integration into clinical practice.

Keywords

Alzheimer's disease artificial intelligence clinical decision support system clinical translation dementia mild cognitive impairment

Introduction

The World Alzheimer Report 2023 has presented that the number of people with Alzheimer's disease (AD) and the other types of dementia is projected to reach 139 million in 2050, which is 2.5 times higher than the figure in 2019.¹ Currently, over 55 million individuals worldwide are afflicted with dementia, with more than 60% of them residing in low- and middle- income countries.² Annually, approximately 10 million new cases emerge, highlighting the urgent need for comprehensive understanding and effective interventions to address this escalating public health concern.²

AD is a neurodegenerative disorder and it is widely recognized as one of the most common forms of aging disease, accounting for 63% to 70%.³ AD causes brain changes, including the accumulation of abnormal protein-amyloid and tau protein and neurocell degeneration.⁴ Early symptoms are mainly reflected in memory loss, language barriers and thinking problems.⁴ As the disease progresses, the degree of brain damage increases, resulting in memory loss and cognitive impairment, which negatively impact the patient's daily life and activities, bring a huge economic and mental burden to the family members or caregivers and may even lead to death.⁵

AD has a long incubation period, making it difficult to detect at an early stage and thus impossible to carryout timely drug intervention. So far, no cure for the disease has been found, and its symptoms can only be alleviated by drugs. This may be because the occurrence and progression of the complex disease are affected by many potential factors.⁶ Researchers in the area also recognize that early detection and intervention are essential for effective treatment. Mild cognitive impairment (MCI) is an early stage of AD, which is defined as a decline in cognitive ability with ageing. Mild injury in the brain is characterized by an impact on memory and executive function, which has less impact on daily life. Even when exhibiting more severe symptoms, it may not necessarily meet the diagnostic criteria for AD.⁷ MCI can be classified into stable (sMCI) and progressive (pMCI) categories, in which sMCI will remain stable in the next few years, while pMCI will eventually evolve into AD.⁸ The Alzheimer's Disease Neuroimaging Initiative (ADNI) divided MCI into early and late stages (i.e., EMCI and LMCI)⁹ and some studies focus on distinguishing EMCI and LMCI for early detection of the disease.^10,11 In addition, Csukly et al.¹² classified MCI into amnestic and non-amnestic types (i.e., aMCI and naMCI). It turns out that about 10%-15% of MCI patients worldwide are converted to AD every year.^13,14 While the conversion rate from EMCI and LMCI to AD is 4.33% and 18.6%, respectively.¹⁵ Hence early detection of MCI can support timely drug intervention to effectively slow down its progression into AD.

Given the challenges in diagnosing MCI in clinics, such as subtle differences in magnetic resonance imaging (MRI) images between MCI and the healthy controls (HC) that are hard to distinguish with the naked eyes, and the existence of numerous MCI subtypes making it difficult for clinicians to identify a specific subtype through a single test,¹⁶ there is an increasing trend towards utilizing computer tools to assist doctors in detecting MCI. Significant research endeavors have been dedicated to construct the computerized clinical decision support systems (CCDSS), which can provide clinical diagnosis, disease prognosis, drug advice and warnings, and clinical guidelines to reduce the misdiagnosis made by the tired clinicians and reduce the workload of medical staff.^17,18 CCDSS can also directly assist clinicians in developing personalized treatment plans that improve patients’ quality of life and slow the progression of the disease.¹⁹ Meanwhile, since AD is a heterogeneous disease, the data's uncertainty, missingness, insufficiency, and multi-modality often bring difficulties to decision-making.²⁰ Therefore, developing CCDSS will be beneficial to translational medicine which impacts AD-related decision-making and intervention strategies.

Specifically, the knowledge-based CCDSS, which was conventional in the past, strictly follows the knowledge contained in the pre-built knowledge base. While the artificial intelligence (AI)-based CCDSS, which has been a research focus nowadays, needs a sufficient amount of data to learn rules and then make suggestions by using machine learning (ML) and statistical pattern recognition techniques.^19,21 However, it is worth noting that deep learning (DL)-based CCDSS somehow limits their application due to the black box nature of DL. The produced results with limited explainability can't be fully evaluated and would not be accepted by the clinical community. Therefore, CCDSS does not only need to provide highly accurate and reliable decision-making support, but the process should be also reproducible, so that doctors can understand the results made by the system and can further provide an explanatory decision to diagnosing the diseases.²⁰ Interpretability is of utmost importance during the diagnostic phase, especially for CCDSS in the domain of AD, given its multi-factorial etiology and intricate nature.

Therefore, this comprehensive and systematic review addresses the following key research questions: What CCDSS have been developed by researchers for AD, and what methods have been employed for their modeling? Furthermore, how are these CCDSS used in the clinical management of AD, and what decision support can be provided to make clinical staff more trustworthy and convenient to use, so as to enhance the transformation potential of CCDSS?

The key contributions of this work are as follows:

Identification of research gap: The review identifies a gap in the existing literature by noting the limited coverage of updated comprehensive review articles specifically focusing on CCDSS in AD domain.

Comprehensive summary of advancements: The review provides a comprehensive summary of recent advancements in CCDSS tailored for AD, offering a consolidated overview of the state-of-the-art technologies and methodologies.

Comparison of functional models: A significant contribution lies in the comparison of performance, applicable scenarios, and clinical interpretability among different functional models used in CCDSS developed over the last seven years. This comparative analysis helps in understanding the strengths and limitations of various approaches applied.

Focus on clinical interface design: An interactive graphical user interface (GUI) introduces a transformative pathway, equipping stakeholders with the means to initiate the evolution of CCDSS. Hence, a user-friendly GUI-based CCDSS should be accessible not only to IT but also to non-IT stakeholders who can comprehend and utilize it effectively.

The rest of the paper is structured as follows. The Methods section presents the method of screening the relative research articles. Then, the Results detail the screening results and reviews the spotted CCDSS in the perspectives of data modalities, feature extraction and feature selection, data modelling approaches and their performance comparison, as well as system explainability and model interpretability, followed by the GUI design. Finally, in the Discussion, the challenges of implementing CCDSS in clinical settings and the potential solutions, as well as the limitations of this study and potential extensions are discussed.

Methods

Search strategy

The purpose of this review is to provide an overview of studies that meet the following criteria: (1) focused on AD dementia; (2) used AI technology; (3) used AI technology to model CCDSS; (4) focused on AD diagnosis and prognosis. Hence, we create the following four keyword groups (KGs), each related to different aspects of the review scope.

KG1: Keywords related to disease: Alzheimer' s disease, Alzheimer, dementia, mild cognitive impairment, cognitive disorder, cognitive decline.

KG2: Keywords related to AI methodology: artificial intelligence, machine learning, deep learning, computer reasoning.

KG3: Keywords related to application: clinical decision support system, CDSS, computer-assisted, computer-aided, expert system.

KG4: Keywords related to task of applying AI: prognosis, diagnosis, classification, prediction.

For the purpose of covering this study, we search the following prominent scholarly databases: (1) PubMed; (2) IEEE Xplore Digital Library; (3) Web of Science; (4) ScienceDirect; and (5) Scopus. In order to comprehensively search the relevant literature, Google Scholar is also used as a supplement tool for searching. From the above databases, we search the title, abstract, and keyword sections of articles. The initial search is conducted in December 2022, encompassing studies published within the preceding five years. Due to the extended review process, articles from 2023 and 2024 are subsequently supplemented. To limit the search to the review scope, we use a combination of 4 KGs in each online database. Table 1 lists the search query used in the PubMed database, for example.

Table 1.

The example search query from PubMed.

Search query	#1 AND #2 AND #3 AND #4 (format for PubMed)
#1	"alzheimer disease"[MeSH Terms] OR "alzheimer* disease"[Title/Abstract] OR "Dementia"[Title/Abstract] OR "cognitive dysfunction"[MeSH Terms] OR "cognitive disorder"[Title/Abstract] OR "cognitive impairment"[Title/Abstract] OR "mild cognitive impairment"[Title/Abstract] OR "cognitive decline*"[Title/Abstract]
#2	"artificial intelligence"[MeSH Terms] OR "Computer Reasoning"[Title/Abstract] OR "Machine learning"[Title/Abstract] OR "Deep Learning"[Title/Abstract]
#3	"decision support systems, clinical"[MeSH Terms] OR "decision support"[Title/Abstract] OR "CDSS"[Title/Abstract] OR "CDS"[Title/Abstract] OR "comput assist"[Title/Abstract] OR "comput aid*"[Title/Abstract]
#4	"prognosis"[MeSH Terms] OR "diagnosis"[MeSH Terms] OR "prognos"[Title/Abstract] OR "diagnos"[Title/Abstract] OR "classification"[Title/Abstract] OR "prediction"[Title/Abstract] OR "classif"[Title/Abstract] OR "predict"[Title/Abstract]

The search query was applied to all databases of PubMed, IEEE, Science Direct, Web of Science, and Scopus.

Eligibility criteria

Inclusion criteria:

Study Type: Articles with a target population of patients who underwent AD or MCI are considered.

Language: Articles written in English are considered.

Intervention & Exposure: Articles that apply AI algorithms to build a CCDSS are included.

Participant: Articles that include journal papers and scientific reports.

Outcome: Studies that include the detailed measure indicators which aim to implement CCDSS to improve AD management.

Exclusion criteria:

Study Type: Any articles on dementia that do not use AI technology or not provide specific software.

Participant: Survey papers, book chapters, and manuscripts are not included in the study.

Language: Any article written in a language other than English.

Intervention & Exposure: Articles that do not apply AI algorithms to design a CCDSS as the main intervention for AD management are excluded.

Outcome: Studies that fail to clearly report performance metrics of CCDSS.

Duplicate: Exclude duplicate publications or multiple reports of the same study to avoid double-counting of data.

By strictly applying these inclusion and exclusion criteria at every stage of the review process, this study ensures comprehensiveness, transparency, and reproducibility, thereby enhancing the reliability and validity of the research findings.

Data collection

We calculate the following summary statistics from the final set of included articles: (1) publication year; (2) research focus; (3) modality and accessibility of data set; (4) features extraction and selection process; (5) AI model for modeling problem; (6) performance indicator (e.g., area under the receiver operating characteristic curve (AUC), whole accuracy (ACC), sensitivity, and specificity etc); (7) model explainability and interpretability; and (8) the GUI design of the CCDSS.

Through the detailed and comprehensive statistical data above, we can evaluate the performance of different CCDSS in practical applications more deeply and precisely, as well as the significant differences of heterogeneous modalities in constructing CCDSS. These differences are reflected not only in the technical performance and effect, but also in its adaptability and stability in different medical scenarios. Moreover, with the interpretability of CCDSS, we can obtain different pathological features, covering aspects like the disease's occurrence mechanism, development trend and possible complications. This provides doctors additional information to understand the patient's condition, judge it more accurately and formulate better treatment plans, improving medical service quality and efficiency and bringing better treatment effects and hopes to patients.

Study assessment process

The search results are imported into Endnote X9 to remove duplicate items. One reviewer (R_A) independently screens the titles and abstracts of each article to determine whether it meets the inclusion and exclusion criteria. If there is uncertainty about whether to include a particular article, a second reviewer (R_B) will make the determination. Similarly, R_A independently reviews the full text of each included article to determine whether it meets the inclusion criteria. If there is uncertainty about whether to include a particular article, R_B will review the full text to decide whether to include it in the review. Then, R_A independently uses the data collection form developed by the first author to extract all relevant data from the included articles. If there are any questions, R_A and R_B will discuss to address them out.

Results

Searching result

Using a structured workflow that we designed in line with PRISMA 2020 (Figure 1), 1698 records are screened across five databases based on pre-defined search criteria, with an additional 49 records retrieved from Google Scholar. EndNote X9 identifies and removes 282 duplicate entries. After reviewing the titles and abstracts, 477 articles are shortlisted, of which 13 are unavailable. Each paper is thoroughly evaluated against inclusion and exclusion criteria. Ultimately, 55 studies are selected for the final analysis, considering factors such as study type, intervention, exposure characteristics, and quality assessment.

Figure 1.

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 flow diagram for systematic review outlining the number of studies identified and excluded at each stage. Note: Google Scholar is a supplement search tool.

Furthermore, the trending analysis shown in Figure 2 summarizes the number of the selected research articles published each year during 2018 and 2024, revealing the gradually increased focus on the AD-related CCDSS study.

Figure 2.

Number of articles included per year (2018–2024).

Data sources and data modalities

Since AD is a multimodal brain injury, gaining accurate detection should take into account various factors, such as cognitive assessment scores, neurophysiological indicators, symptoms, demographic statistics, and medical records.²² In the clinical environment, cognitive assessments such as the Mini-Mental State Examination (MMSE) and Montreal Cognitive Assessment (MoCA) are commonly used to evaluate the probability of a patient having cognitive dysfunction.²³ For patients with uncertain cognitive impairment staging, further examinations are required, such as utilizing MRI scans to identify early-stage changes in the hippocampus and entorhinal cortex associated with AD or employing positron emission tomography (PET) for the identification of liquid biomarkers.^24,25 The comprehensive use of multiple modalities helps to find small variations in each modality from the beginning stage and to obtain reliable diagnostic results. However, the accurate and timely diagnosis of AD continues to pose challenges for clinical staff due to multiple factors: first of all, they are not professional in dealing with a large number of complex numerical data and biomarker types; secondly, they can hardly prognose such small behavioral changes before the symptoms of AD patients present evidently. Therefore, there is an urgent need of building up CCDSS using AI techniques for advanced AD detection and prediction models to help clinicians diagnose and/or prognose the disease earlier and more accurately.

Currently, there are many publicly available datasets that integrate multimodal data in the field of dementia research, which provides valuable resources for researchers to deepen their understanding of the pathogenesis, diagnosis, and treatment strategies of dementia. Table 2 shows the description of certain publicly available databases for dementia research.

Table 2.

The brief description of commonly used publicly available databases for dementia research.

Database	Brief description
ADNI²⁶	A longitudinal, multi-center, observational study advancing AD research through shared data on neuroimaging, biomarkers, genetics, and clinical assessments.
Australian Imaging, Biomarkers & Lifestyle Flagship Study of Ageing (AIBL)²⁷	Data gathered every 18 months on AD-related biomarkers, neuroimaging, cognition, mood, health, and lifestyle factors, providing insights into the disease's development and progression.
National Alzheimer's Coordinating Center (NACC)²⁸	Data built through multi-center collaboration. It provides standardized clinical data on Alzheimer's and related dementias, including cognitive tests, neuroimaging, biological samples, and medical history, ensuring consistency and reliability.
Open Access Series of Imaging Studies (OASIS)²⁹	Freely available neuroimaging datasets to advance research in basic and clinical neuroscience.

As illustrated in Figure 3(A), various ADNI protocols (ADNI-1, ADNI-2, ADNI-GO, and ADNI-3) represent the most frequently utilized public dataset among the included studies (33 studies), followed by AIBL with 9 studies. The majority of studies rely on datasets obtained from hospitals, institutions, and companies, while a subset recruits participants independently. Notably, only 29.1% of the studies utilizes multiple datasets for their experiments, as shown in Figure 3(B). Relying on a single dataset can introduce bias into the evaluation process, potentially compromising the accuracy and reliability of study results. This limitation arises because a single dataset often lacks diversity, representativeness, and comprehensive sample coverage, making it insufficient to fully and objectively reflect the real-world characteristics of the research subject.

Figure 3.

Statistics of databsets in 55 studies. (A) Distribution of included studies using the diverse datasets. (B) Proportion of studies utilizing single versus multiple datasets. H & I & C: Hospital & Institute & Company.

Exploring what kind of data is adopted for modeling in the research is of great significance for in-depth understanding of the study. The data employed to build CCDSS typically consists of multiple modalities, including both imaging and non-imaging data, because the diagnosis of such heterogeneous diseases normally depends on more than one single marker. This is illustrated in the donut chart in Figure 4(A), where 65.45% of the articles utilize multimodal data, with 25.45% incorporating both imaging and non-imaging data. Studies using single-modal non-imaging data and multimodal neuroimaging data account for the lowest proportions, at 12.73% and 10.91%, respectively. Notably, studies based solely on single-modal neuroimaging data account for 21.82%, ranking third among the included papers. This indicates that a significant portion of research still prioritizes single-modality imaging, likely because AI models (especially those using DL techniques) perform well with image data, leading to promising results. Similarly, Figure 4(B) illustrates the distribution of data modalities used in the included studies, with neuroimaging (e.g., MRI, PET, and electroencephalography (EEG)) being the most commonly used, followed by cognitive function assessments (CFA) and demographic data. This highlights that neuroimaging, CFA, and demographics provide valuable information to support doctors in AD diagnosis and treatment. Detailed description of the data used in 55 studies can be found in Table 3.

Figure 4.

Statistics of data modality. (A) The proportions of the research articles using multi- or single- modal data. (B) The distribution of data modality usage. NeuroBatt: neuropsychological battery; CFA: cognitive function assessments; NI: neuroimaging; MH: medical history; Freehand figure: e.g., Archimedes spiral figure, Rey Osterrieth complex figure and other figure drawing test; Biomarkers: blood-biomarkers and neuroimaging-extracted biomarkers; other record: additional data type used.

Table 3.

Detailed description of the data used in the 55 CCDSS development.

Ref.	Dataset(s)	No. of data samples	Data modalities
Bhagwat et al. (2018)³⁰	ADNI-1,2, Go & AIBL	1302	Multi-modal non-imaging data, including demographics, CFA (MMSE, Alzheimer's disease assessment scale (ADAS)13), APOE4 gene type, and neuroimaging-extracted biomarkers (cortical thickness)
Rhodius-Meester et al. (2018)³¹	ADC^a, DCN^a & Barcelona	674	Multi-modal non-imaging data, including demographics, MMSE, neuropsychological battery (logical memory immediate recall, logical memory delayed recall, trail making test (TMT)-A, TMT-B, category fluency), APOE4 gene type, and neuroimaging-extracted biomarkers (hippocampal volume, a computed medial temporal lobe atrophy (cMTA) score, a computed global cortical atrophy (cGCA) score, region-of-interest (ROI)-based grading) as well as cerebrospinal fluid (CSF)
Angelillo et al. (2019) ³²	Recruitment	65	Single modal non-imaging data, i.e., Digital Attentional Matrices test
Bucholc et al. (2019) ³³	ADNI	488	Multi-modal non-imaging data, including CFA (MMSE, FAQ, ADAS13, MoCA) and neuropsychological battery (Rey auditory verbal learning test (RAVLT), logical memory recall)
Lazli et al. (2019)³⁴	ADNI, OASIS & GMPH^a	362	Multi-modal neuroimaging data, including MRI and FDG-PET images
Buegler et al. (2020) ³⁵	Recruitment & ClinicalTrials.gov (NCT02843529)	711	Multi-modal non-imaging data, including demographics and smartphone/tablet neuro-motor recordings from Altoida NMI.
Barnes et al. (2020) ³⁶	KPW^a	4330	Multi-modal non-imaging data, including demographics, vital signs, diagnoses (comorbid medical conditions), healthcare utilization (outpatient, ED and urgent care visits, and hospitalizations) and medication-related predictors (medications that could be related to dementia symptoms)
Carvalho et al. (2020)³⁷	CAD^a & CRASI^a	443	Multi-modal non-imaging data including CFA (CDR, MMSE, Pfeffer questionnaire, Katz ADL), and neuropsychological battery (verbal fluency test (VFT), clock drawing test (CDT), Depression, Lawton, TMT-A、stroop color word test, Berg balance scale, IQCODE questionnaire, teste computadorizado de atenção visual (TCA), CERAD word list memory)
Müller and Lio (2020) ³⁸	ADNI	Non-imaging: 3445 Imaging: 52	Multi-modal non-imaging data, including demographics, and CFA; Single modal neuroimaging data, i.e., MRI images
Rhodius-Meester et al. (2020)³⁹	ADC^a	535	Multi-modal non-imaging data, including APOE4 gene type, MMSE, CSF, neuropsychological battery (RAVLT, TMT-A, TMT-B, animals, neuropsychiatric inventory (NPI)); Single-modal neuroimaging data, i.e., MRI images
Saribudak et al. (2020)⁴⁰	ADNI	234	Multi-modal non-imaging data, including neuroimaging-extracted biomarkers (hippocampal volume), gene expressions (obtained from blood-based biomarkers) and MMSE
Cai et al. (2021)⁴¹	ADNI-2	66	Single modal neuroimaging data, i.e., rs-fMRI images
Canavan et al. (2021)⁴²	NACC	762	Multi-modal non-imaging data, including demographics, CFA (FAQ, MMSE) and neuropsychological battery (logical memory II A delayed, digital span forward, digital span backward, Wechsler adult intelligence scale - digit symbol, Boston naming test (BNT))
Dyrba et al. (2021)⁴³	ADNI-2, 3, GO, AIBL & DELCODE^a	ADNI-2/GO: 663 ADNI-3: 575 AIBL: 606 DELCODE: 474	Multi-modal non-imaging data, including demographics, neuroimaging-extracted biomarkers (total intracranial volume (TIV)); Multi-modal neuroimaging data, including MRI, PET images
Shoaip et al. (2020) ⁴⁴	ADNI	30	Multi-modal non-imaging data, including demographics, symptoms, medical history, physical examinations, CFA (CDR, FAQ, MMSE, ADAS11, ADAS13), neuropsychological battery (RAVLT), APOE4 gene type, and neuroimaging-extracted biomarkers as well as CSF
El-Sappagh et al. (2021)²²	ADNI	1048	Multi-modal non-imaging data, including demographics, medical history, symptoms, physical examinations, neurological exams, CFA (ADAS11, ADAS13, CDGLOBAL, CDRSB, FAQ, geriatric depression scale (GDS), MMSE, MoCA, NPI), neuropsychological battery (RAVLT, CDT, clock copying), neuropathology vital signs, and CSF, as well as APOE4 gene type; Multi-modal neuroimaging data, including FDG-, HCI-, and SROI- PET and MRI images
Kachouri et al. (2021)⁴⁵	BH^a	75	Single modal non-imaging data, i.e., Archimedes spiral figure (including pressure, pointwise altitude values, pointwise velocity values)
Oltu et al. (2021)⁴⁶	BUH^a	35	Single modal neuroimaging data, i.e., EEG images
Suárez-Araujo et al. (2021)⁴⁷	ADNI	331	Multi-modal non-imaging data, including demographics and CFA (MMSE, FAQ, GDS)
Venugopalan et al. (2021)⁴⁸	ADNI	2004	Multi-modal non-imaging data, including demographics, neurological exams, cognitive assessments, biomarkers (e.g., alanine, choline), medication (e.g., levodopa) and neuroimaging-extracted biomarkers (e.g., brain area volumes) as well as gene expressions; Single modal neuroimaging data, i.e., cross-sectional MRI images
Araújo et al. (2022)⁴⁹	ADNI	379	Single modal non-imaging data, i.e., 12 plasma proteins (ApoB, Calcitonin, C-peptide, CRP, IGFBP-2, Interleukin-3, Interleukin-8, PARC, Serotransferrin, THP, TLSP 1-309, and TN-C)
Chen et al. (2022)⁵⁰	OASIS	373	Multi-modal non-imaging data, including demographics, socio-economic status, neuroimaging-extracted biomarkers (eTIV (estimated TIV), nWBV (normalized whole brain volume), ASF (atlas scaling factor)) and MMSE
Chun et al. (2022)⁵¹	SMC^a	705	Multi-modal non-imaging data, including demographics, APOE4 gene type, and the neuropsychological battery of Seoul (K-BNT, ideomotor praxis, calculation total score, SVLT, RCFT, contrasting program, go/no-go test, COWAT, K-MMSE, CDR-SOB)
El-Sappagh et al. (2022)⁵²	ADNI	1371	Multi-modal non-imaging data, including demographics, medical history, symptoms, family history, CFA (ADAS11, ADAS13, CDR, FAQ, GDS, MMSE, MoCA, NPI), neuropsychological battery (RAVLT, daily cognition report), CSF, and APOE4 gene type; Single modal imaging data, i.e., MRI images
Ilias and Askounis (2022)⁵³	ADReSS	156	Multi-modal non-imaging data, including speech recordings along with their associative transcripts and MMSE
Salami et al. (2022) ⁵⁴	OASIS-3	1098	Multi-modal non-imaging data, including MMSE and APOE4 gene type Single modal neuroimaging data, i.e., 3D MRI images
Reinke et al. (2022) ⁵⁵	Allgemeine Ortskrankenkasse	117895	Single modal non-imaging data, i.e., German claims data
Ullah and Jamjoom (2022)⁵⁶	Kaggle	6400	Single modal neuroimaging data, i.e., MRI images
Almohimeed et al. (2023)⁵⁷	ADNI	1363	Multi-modal non-imaging data, including demographics, CFA (CDR, FQR, ADAS), APOE4 gene type, neuroimaging-extracted biomarkers (Fluoro-Deoxy-Glucose (FDG, average FDG-PET of angular, temporal, and posterior cingulate), ABETA, tubulin associated unit (TAU), phosphorylated TAU);
Bhattarai et al. (2023)⁵⁸	ADNI & AIBL	1969	Multi-modal non-imaging data, including demographics, CFA (ADAS13, CDRSB, MoCA, MMSE), neuropsychological battery (RAVLT), and neuroimaging-extracted biomarkers (FDG)
Chai et al. (2023)⁵⁹	Recruitment from community	79	Multi-modal non-imaging data, including handwriting figures (repeating capital ‘T’ letters, drawing spiral circles, drawing a meander, drawing a pentagram); Single modal neuroimaging data, i.e., EEG images
Chen et al. (2023)⁶⁰	OpenNeuro	88	Single modal non-imaging data, i.e., MMSE; Single modal neuroimaging data, i.e., EEG images
Di Febbo et al. (2023)⁶¹	IP^a	250	Single modal non-imaging data, i.e., the Rey Osterrieth complex figure
Emmanuel and Jabez (2023)⁶²	ADNI	210	Single modal neuroimaging data, i.e., MRI images
Moguilner et al. (2023)⁶³	ADNI, UNITED Consortium & LAC^a	2000	Single modal non-imaging data, i.e., demographics; Single modal neuroimaging data, i.e., 3D T1-weighted 3T/1.5T MRI images
Rahim et al. (2023)⁶⁴	ADNI	564	Multi-modal non-imaging data, including demographics, CFA (ADAS13, FAQ, MMSE, MoCA, CDRSB), neuropsychological battery (RAVLT), neuroimaging-extracted biomarkers (FDG, TAU, pTAU, hippocampal volume) and ApoE4 gene type; Single modal neuroimaging data, i.e., 3D MRI images
Park et al. (2023)⁶⁵	AMC, ADNI & AIBL	AMC: 971 ADNI & AIBL: 2008	Single modal non-imaging data, i.e., neuroimaging-extracted biomarkers (volume and radiomics data from 3D sMRI images)
Tomassini et al. (2023)⁶⁶	ADNI-1	438	Single modal neuroimaging data, i.e., 3D MRI images
Yi et al. (2023)⁶⁷	ADNI & AIBL	1603	Multi-modal non-imaging data, including demographics, CFA (MMSE, ADAS, CDRSB, FAQ, GDS, preclinical Alzheimer's cognitive composite scores), APOE4 gene type, neuroimaging-extracted biomarkers (ventricles volume, hippocampus volume, WBV, entorhinal volume, fusiform volume, middle temporal gyrus volume, TIV); Single modal neuroimaging data, i.e., sMRI images
Ayus and Gupta (2024)⁶⁸	Kaggle	6400	Single modal neuroimaging data, i.e., sMRI images
He et al. (2024)⁶⁹	ADNI-1, 2 & ADNI-3	ADNI-1, 2: 606 ADNI-3: 375	Single modal neuroimaging data, i.e., rs-fMRI images
Khatri and Kwon (2024)⁷⁰	ADNI	1075	Single modal neuroimaging data, i.e., sMRI images
Lei et al. (2024)⁷¹	ADNI-1, 2, 3, AIBL & MCAD^a	2644	Single modal neuroimaging data, i.e., sMRI images
Liu et al. (2024)⁷²	ADNI	720	Multi-modal neuroimaging data, including sMRI, PET images
Lu et al. (2024)⁷³	ADNI	577	Multi-modal non-imaging data, including demographics, CFA (CDR, ADAS11, ADAS13, MMSE, and FAQ), neuropsychological battery (RAVLT, Logical Memory-Delayed, and TMT-B) and single-nucleotide polymorphisms (SNP) Single modal neuroimaging data, i.e., sMRI images
Ma et al. (2024)⁷⁴	ADNI-2	442	Single modal neuroimaging data, i.e., rs-fMRI images
Qiu et al. (2024)⁷⁵	ADNI-1, 2, 3 & AIBL	ADNI: 1114 AIBL: 158	Multi-modal neuroimaging data, including sMRI, PET images
Rahim et al. (2024)⁷⁶	ADNI & NACC	ADNI: 564 NACC: 87	Multi-modal non-imaging data, including demographics, CFA (ADAS13, CDRSB, MMSE, FAQ, MoCA), neuropsychological battery (RAVLT), APOE4 gene type, neuroimaging-extracted biomarkers (FDG, TAU, PTAU, Hippocampus) Single modal neuroimaging data, i.e., sMRI images
Sun et al. (2024)⁷⁷	Recruitment	1068	Single modal non-imaging data, i.e., the care problems evaluation sheet (daily living care problems, behavioral and psychological symptoms, safety risks)
Tang et al. (2024)⁷⁸	Tongji Hospital Affiliated To Tongji University	297	Multi-modal non-imaging data, including demographics, paper- and electronic- TMTs
Tian et al. (2024)⁷⁹	ADNI	682	Multi-modal neuroimaging data, including sMRI, PET images
Wang et al. (2024)⁸⁰	ADNI, AIBL & NACC	ADNI: 564 AIBL: 133 NACC: 92	Multi-modal neuroimaging data, including sMRI, PET images
Xu et al. (2024)⁸¹	ADNI-1, 2 & AIBL	ADNI: 1335 AIBL: 531	Single modal neuroimaging data, i.e., sMRI images
Young et al. (2024)⁸²	Recruitment	300	Single modal non-imaging data, i.e., myCog test (assessing memory, executive function and cognitive flexibility, episodic memory, working memory)
Zuo et al. (2024)⁸³	ADNI	300	Multi-modal neuroimaging data, including fMRI, DTI, sMRI images

Note: GMPH is the dataset collected from the Gabriel-Montpied Hospital; BUH is the dataset collected from the Neurology Department of Baskent University Hospital; CAD is the data collected from the Center for Alzheimer's Disease at the Institute of Psychiatry of UFRJ (Federal University of Rio de Janeiro); CRASI is the data collected from the Center of Reference in Attention to Health of the Elderly at Antonio Pedro Hospital of UFF (Fluminense Federal University); ADC: Amsterdam Dementia Cohort, Danish Dementia Research Centre, the Neurocenter at Kuopio University Hospital; SMC: data collected from the Samsung Medical Center; IP: data collected from the Istituto Palazzolo, Fondazione Don Carlo Gnocchi in Milan (Italy); DELCODE: German Center for Neurodegenerative Diseases (DZNE) multicenter observational study on Longitudinal Cognitive Impairment and Dementia; DCN: Dementia Competence Network; KPW: Kaiser Permanente Washington; BH is the dataset collected at Broca Hospital in Paris; LAC: Latin American Countries dataset; AMC: Asan Medical Center; MCAD: Multi-center Alzheimer Disease Imaging dataset.

The goal of the CCDSS development

The most CCDSS in AD is developed for diagnosis assistance, particularly focusing on the binary classification of AD versus HC, resulting in an accuracy rate of around 95%.^84–86 A study in 2015 reported that the highest accuracy rate for multiple classifications of AD versus MCI versus HC is 63%, with AUC of 78.8%.⁸⁷ In recent years, the development based on DL achieved 97.5% accuracy in the multi-class classification of HC versus MCI versus AD.⁸⁸ Another common task of the CCDSS focuses on prognosis, predicting the conversion rate or accurately predicting the year/month of the conversion from MCI to AD patients over a period of time.

Therefore, we classify the purposes of CCDSS into three categories: diagnosis, prognosis (e.g., disease progression or risk prediction), and recommendations (e.g., care or medication). As shown in Figure 5, diagnosis is the primary focus of most studies, accounting for 77.2%, followed by prognosis (15.8%) and recommendations (7%). Among these, two studies address both diagnosis and prognosis to provide additional support, helping doctors develop more comprehensive treatment and care plans.^22,52 Of the 44 diagnostic studies, 75.6% employs binary classification (represented by the green ring filled with downward pattern in Figure 5), while 24.4% involves multi-class classification (represented by the green ring filled with upward pattern in Figure 5). This suggests that binary classification remains the dominant approach in the field. The reason may be due to the relatively strong performance of binary classification in dementia diagnosis, making it a preferred choice for researchers.

Figure 5.

Distribution of CCDSS purposes in the included studies. The inner ring represents the proportion of the three categories (diagnosis in green, prognosis in blue with vertical stripes, and recommendations in pink with horizontal stripes), while the outer ring shows the distribution between binary (filled with downward pattern) and multi-class (filled with upward pattern) classification within the 44 studies on diagnosis.

More importantly, since the complicate clinical situation varies from patient to patient, clinicians do not only need to make an accurate diagnosis, but also have to decide which drugs to prescribe and which tests to carry out. Therefore, the CCDSS could be designed with multiple functions apart from diagnosis and prognosis, such as examination,³⁹ care,⁷⁷ drug recommendations.^40,58 Table 4 reports the detailed goals of the 55 CCDSS reviewed.

Table 4.

The development goals of the 55 CCDSS studies.

Ref.	Development goals
Bhagwat et al. (2018)³⁰	Diagnosis of HC versus MCI using MMSE label
	Diagnosis of HC versus EMCI versus LMCI using ADAS label
Rhodius-Meester et al. (2018)³¹	Diagnosis of stable subjective cognitive decline (sSCD) versus progressive SCD (pSCD)
Angelillo et al. (2019)³²	Diagnosis of HC versus Dementia
Bucholc et al. (2019)³³	Detecting the severity of the AD
Lazli et al. (2019)³⁴	Diagnosis of AD versus HC
Buegler et al. (2020)³⁵	Predicting the progression from MCI to AD
Barnes et al. (2020)³⁶	Diagnosis of Dementia versus HC
Carvalho et al. (2020)³⁷	1st stage for diagnosis of Dementia versus Non-Dementia
	2nd stage for diagnosis of AD versus Non-AD
	3rd stage for diagnosis of MCI versus Non-MCI
Müller and Lio (2020)³⁸	Diagnosis of AD versus HC
Rhodius-Meester et al. (2020)³⁹	Suggestion for conducting CSF examination
Saribudak et al. (2020)⁴⁰	Medication recommendations for AD patients
Cai et al. (2021)⁴¹	Diagnosis of aMCI versus HC
Canavan et al. (2021)⁴²	Predicting the progression from MCI to AD
Dyrba et al. (2021)⁴³	Diagnosis of HC versus MCI, HC versus AD
Shoaip et al. (2021)⁴⁴	Diagnosis of AD versus HC
El-Sappagh et al. (2021)²²	1st stage for diagnosis of AD versus MCI versus HC
El-Sappagh et al. (2021)²²	2nd stage for predicting the conversion from MCI to AD
Kachouri et al. (2021)⁴⁵	Diagnosis of HC versus AD
Oltu et al. (2021)⁴⁶	Diagnosis of HC versus MCI versus AD
Suárez-Araujo et al. (2021)⁴⁷	Diagnosis of MCI versus HC
Venugopalan et al. (2021)⁴⁸	Diagnosis of HC versus MCI versus AD
Araújo et al. (2022)⁴⁹	Predicting the progression from MCI to AD
Chen et al. (2022)⁵⁰	Diagnosis of HC versus Dementia
Chun et al. (2022)⁵¹	Predicting the conversions from aMCI to AD in 3 years
El-Sappagh et al. (2022)⁵²	1st stage for diagnosis of AD versus MCI versus HC
El-Sappagh et al. (2022)⁵²	2nd stage for predicting the specific month of the conversions from MCI to AD
Ilias and Askounis (2022)⁵³	1st stage for diagnosis of AD versus HC
Ilias and Askounis (2022)⁵³	2nd stage for detecting severity of AD
Salami et al. (2022)⁵⁴	Diagnosis of HC versus AD
Reinke et al. (2022)⁵⁵	Predicting risk prediction of dementia
Ullah and Jamjoom (2022)⁵⁶	Diagnosis of non-Dementia versus very mild Dementia versus mild Dementia versus moderate Dementia
Almohimeed et al. (2023)⁵⁷	Diagnosis of HC versus AD, HC versus sMCI versus AD
Bhattarai et al. (2023)⁵⁸	Medication recommendations for AD patients
Chai et al. (2023)⁵⁹	Diagnosis of HC versus MCI
Chen et al. (2023)⁶⁰	Diagnosis of HC versus AD
Di Febbo et al. (2023)⁶¹	Diagnosis of HC versus MCI, HC versus Dementia, and MCI versus Dementia
Emmanuel and Jabez (2023)⁶²	Diagnosis of HC versus MCI versus AD
Moguilner et al. (2023)⁶³	Diagnosis of HC versus AD
Rahim et al. (2023)⁶⁴	Predicting the progression from CN to AD in four years
Park et al. (2023)⁶⁵	Diagnosis of AD versus HC, MCI versus AD, and HC versus MCI
Tomassini et al. (2023)⁶⁶	Diagnosis of AD versus HC, MCI versus HC, and HC versus MCI versus AD
Yi et al. (2023)⁶⁷	Predicting the progression from CN to MCI and MCI to AD in 1, 3, 5, and 10 years
Ayus and Gupta (2024)⁶⁸	Diagnosis of non-Dementia versus moderate Dementia, non-Dementia versus very mild Dementia, non-Dementia versus mild Dementia, very mild Dementia versus mild Dementia, very mild Dementia versus moderate Dementia, and mild Dementia versus moderate Dementia
He et al. (2024)⁶⁹	Diagnosis of AD versus HC and EMCI versus LMCI
Khatri and Kwon (2024)⁷⁰	Diagnosis of AD versus MCI versus HC, AD versus HC and MCI versus HC
Lei et al. (2024)⁷¹	Diagnosis of AD versus HC, MCI versus HC and AD versus MCI
Liu et al. (2024)⁷²	Diagnosis of AD versus HC
Lu et al. (2024)⁷³	Diagnosis of sMCI versus pMCI
Ma et al. (2024)⁷⁴	Diagnosis of HC versus EMCI, EMCI versus LMCI and NC versus EMCI versus LMCI
Qiu et al. (2024)⁷⁵	Diagnosis of AD versus HC, MCI versus HC and AD versus MCI
Rahim et al. (2024)⁷⁶	Predicting the progression from MCI to AD in 3 years
Sun et al. (2024)⁷⁷	Care recommendations
Tang et al. (2024)⁷⁸	Diagnosis of HC versus MCI versus AD
Tian et al. (2024)⁷⁹	Diagnosis of EMCI versus LMCI
Wang et al. (2024)⁸⁰	Diagnosis of AD versus HC
Xu et al. (2024)⁸¹	Diagnosis of AD versus HC, AD versus MCI, MCI versus HC and AD versus MCI versus HC
Young et al. (2024)⁸²	Diagnosis of HC versus Cognitive impairment
Zuo et al. (2024)⁸³	Diagnosis of EMCI versus HC, LMCI versus HC and AD versus HC

Feature extraction and feature selection

Feature extraction and feature selection are prerequisites for building a model. The former refers to the extraction of representative features from the original data that can reflect the essence of problem.⁸⁹ While the latter selects the most important or relevant part from a large number of features that have been extracted for further modeling. In real scenes, there may be a large number of redundant or irrelevant features, leading to the model being too complex and easy to overfit when not doing any screening. Therefore, feature selection can identify features that are truly valuable and associated with target variables for subsequent modeling task.⁹⁰

The filter method is one of the commonly used feature selection methods, which separates feature selection from classification by solely utilizing mathematical statistical methods to calculate the correlation between features and target variables, e.g., variance, Chi-square, information gain, relief, minimum redundancy maximum relevance.⁹¹ For instance, Chun et al.⁵¹ employed correlation coefficient-based methods to select features for predicting the conversions from aMCI to AD. Another one is the wrapper method, which evaluates the impact of feature combinations on model performance by selecting the best subset in each iteration of the ML process.⁹² This method is more time-consuming and computationally complex, but it usually gets more accurate and reliable results. The recursive feature elimination (RFE) is a wrapper-style feature selection algorithm that can be combined with a variety of ML algorithms. For instance, it has been combined with random forest (RF) for selecting features.^22,33 El-Sappagh et al.²² employed RFE in combination with RF, support vector machine (SVM), and gradient boosting for feature selection in the diagnosis and prognosis of AD. Meanwhile, Bucholc et al.³³ used RF-RFE to select features for multi-class classification of AD versus MCI versus HC. While one study used the information gain followed by the RFE with cross validation method to select the most significant features subset again for multi-class classification of AD versus MCI versus HC.⁵² In addition, there are embedded methods where the adjustments of model parameters are designed to directly influence the features to be selected, such as least absolute shrinkage and selection operator (LASSO).⁹³ LASSO uses L1 regularization to drive the coefficients of the model, thereby achieving sparsity for unimportant features and causing their coefficients in the model to become zero.⁹³ Also, Barnes et al.³⁶ used LASSO to select features from electronic health records to obtain key features of patients not diagnosed with dementia.

In addition, artificial neural network (ANN) has also been employed for feature selection, e.g., the study⁴⁷ adopted ANN as the fitness function and then used backward search to obtain the outperformed features combination for binary classification of MCI versus HC. Furthermore, DL can automatically learn highly abstract features with good expressiveness from raw data, especially in the field of images. For example, convolutional neural networks (CNNs) do not require complex feature engineering and can self-learn useful features from input data without affecting informational characteristics.⁹⁴ For instance, Salami et al.⁵⁴ built the ResNet18 model, which can automatically extract the favorable abstract features from MRI data for binary classification of AD versus HC, so as to obtain the outperformed model. Kachouri et al.⁴⁵ utilized the AlexNet to extract features from hand-drawn images (i.e., Archimedes spiral figure) for AD diagnosis. Chen et al.⁶⁰ combined vision transformer (ViT) technology with CNN to extract signal features from EEG data for diagnosing MCI versus HC. Also, other image extraction methods have been applied to different imaging data. The discrete wavelet transform can be utilized for extracting the subbands of EEG images from AD patients, followed by employing Burg's method for signal processing and spectrum analysis.⁴⁶ Chai et al.⁵⁹ utilized independent component analysis to process EEG images, extracting corresponding signal frequency bands. Subsequently, a band selection algorithm was employed to choose the optimal frequency bands as input features of the CCDSS for diagnosing MCI versus HC. Cai et al.⁴¹ employed an adaptive structure feature generation strategy (ASFGS) to extract the structural characteristics of the brain functional network. Then, the multi-scale local feature detection strategy was utilized to extract local features. Finally, the extracted information was integrated into multi-scale features for classifying MCI versus HC.

In general, for high-dimensional data, feature extraction and feature selection can not only effectively reduce computing costs, eliminate redundant information, and speed up model training, but can also improve the generalizability and interpretability of the model.

Data modeling methods

Given the disparities in data modality and dimensions, different data modelling methods are employed in order to effectively address specific objectives. Out of the 55 studies, 23 utilize traditional ML algorithms, while 30 are based on DL. The remaining 2 studies employ reinforcement learning (RL) and ontology, respectively. We summarize the commonly used models in Table 5, highlighting their pros and cons in the meantime.

Table 5.

Pros and cons of commonly used modeling methods in AD domain.

Modeling Methods	Pros	Cons
DT	- Simple, interpretable, and easy to use for small datasets of AD patients- Good for clinical decision-making and feature selection	- Prone to overfitting, especially with noisy or imbalanced AD datasets- Sensitive to data variations, potentially leading to poor generalization
RF	- Robust to noise and missing data- No need for feature scaling- Reduces overfitting- Easy to implement- Supports parallel computation	- High computational cost, slow training- Poor model interpretability- High memory usage
Logistic Regression (LR)	- Simple and interpretable, useful for clinical decision support- Computationally efficient, useful for smaller AD datasets	- Assumes linearity, which may not capture complex patterns in AD data- May underperform if relationships are non-linear or more complex
SVM	- Good generalization ability, useful for small datasets (e.g., neuroimaging or clinical data)- Handles high-dimensional data, like MRI features	- High computational cost for large AD datasets- Sensitive to hyperparameter tuning, requires careful kernel selection
XGBoost	- Efficient and works well with large datasets, suitable for combining various feature types (clinical, imaging, genetic data)- Prevents overfitting, useful when working with noisy AD data	- Complex hyperparameter tuning, time-consuming for AD studies
LightGBM	- Very fast and memory efficient, ideal for large AD datasets- Can handle categorical features, which may be present in clinical data	- Requires careful hyperparameter tuning
ANN	- Can learn complex, non-linear relationships in AD data, especially useful for combining clinical and imaging data- Strong scalability for large datasets	- Requires a large amount of labeled data, which may not be readily available in AD research- High computational demand and poor interpretability
AlexNet	- Effective for image-based data, particularly for large-scale neuroimaging datasets (e.g., MRI scans)- Deep CNN architecture with multiple layers for feature extraction- Well-established in computer vision tasks, easily transferable to AD diagnosis	- High computational cost, requires powerful GPUs for training- May not generalize well to small or limited datasets- Lacks interpretability, which is a challenge for clinical use
ResNet	- Can extract hierarchical features, which can be useful in neuroimaging (e.g., MRI) data- Solves vanishing gradient issues, enabling deeper networks for complex AD patterns	- High computational cost, may be impractical for smaller datasets- Poor interpretability, making it difficult to explain decisions in clinical settings
DenseNet	- Efficient feature reuse- Fewer parameters- Improves gradient flow	- High computational cost- Complex architecture- Not suitable for small datasets
LSTM	- Effective for sequential data (e.g., clinical progression over time), useful for predicting disease progression in AD- Can capture temporal dependencies, essential for AD diagnosis and monitoring	- High computational demands- Requires long time-series data, which may be challenging to collect for AD patients
ViT	- Excellent for image-based data (e.g., MRI scans)- Can capture long-range dependencies in images- Scalable for large datasets, especially when using pre-trained models	- High computational requirements, especially for training on large datasets- Poor interpretability compared to traditional models like DT or LR
Generative Adversarial Networks	- Useful for generating synthetic images, particularly when labeled data is scarce- Can augment AD image datasets (e.g., MRI scans), providing more data for training models- Can help simulate rare cases (e.g., early-stage AD) that are underrepresented in real datasets	- Difficult to train and requires a lot of computational resources- Generated images may not always match the real data distribution, leading to poor model generalization- Can produce unrealistic or low-quality images if not carefully tuned
Graph Convolutional Network (GCN)	- Can be used to model relationships between different brain regions or neuroimaging data, capturing complex structural patterns in AD- Suitable for analyzing brain network connectivity and other graph-structured data	- Computationally expensive, difficult to scale for large AD datasets- Requires graph-structured data, which should be segmented brain at first

The commonly used traditional ML algorithms are SVM and RF. Typically, within the 8 SVM-based studies,^{33,34,37,41,45,57,59,61} 7 works are for binary classification task.^{34,37,41,45,57,59,61} Cai et al.⁴¹ implemented the classification of aMCI versus HC by building an SVM classifier with the kernel of radial basis function (RBF). The nonlinear RBF-SVM classifier can effectively separate different types of data samples by mapping the nonlinear data which are inseparable in the original space into a high-dimensional feature space and then building up hyperplanes in the space.⁹⁵ This mechanism makes the linearly inseparable problem changed to a separable one in the feature space.

Among the 10 studies that utilize ensemble learning based on decision trees (DT), 5 of them employ the RF model,^{22,32,37,38,55} two use the extreme gradient boosting (XGBoost),^35,51 one utilizes bagged trees,⁴⁶ and the remaining two studies use light gradient boosting machine (LightGBM).^49,78 Since DT offer better interpretability and ease of understanding,⁹⁶ they are preferred by approximately 50% of studies for constructing ensemble models. This choice is driven by the need to enhance system performance through ensemble learning,⁹⁷ while ensuring non-IT users can comprehend the system results effectively. For example, Müller and Lio³⁸ constructed an RF model to classify AD versus HC, and then used fuzzy methods to extract subsets of rules from the RF model, thereby enhancing the interpretability of the model.

In recent years, DL has made remarkable achievements in image processing, bringing some advantages, such as being able to learn from raw data without expert knowledge, generalizing well to new unseen data, and so on. However, due to its black box properties, it should be used with caution especially in clinical applications. Over the past decade, CNN has been the preferred DL method for building diagnostic decision support systems for AD, and has achieved high performance in classification tasks.⁹⁸ However, in the comparative analysis of different tasks, it is hard to conclude which one of the classic CNN models such as AlexNet, residual network (ResNet) and densely connected network (DenseNet) perform best because of pros and cons of the trade-off. Specifically, AlexNet⁹⁹ is a relatively shallow neural network model with only 8 layers, while ResNet¹⁰⁰ incorporates the concept of residual blocks to facilitate the training of extremely deep networks (such as those with 50 or even 152 layers), and ResNet addresses the issues of gradient vanishing and exploding in deep models. Studies based on DL in AD are as follows. Kachouri et al.⁴⁵ used AlexNet to extract features from simple images and then employed SVM for classifying AD versus HC. Dyrba et al.⁴³ constructed a CNN model for binary classification of AD versus HC, which was inspired by AlexNet and VGG. Ullah and Jamjoom⁵⁶ constructed a CNN model comprising of 3 convolutional layers, 3 pooling layers, 2 fully connected layers, and 1 output layer. This model successfully achieved the classification of four different stages of dementia. Di Febbo et al.⁶¹ utilized ResNet50 for the analysis and processing of the Rey Osterrieth complex figure (ROCF) test, generating specific pattern datasets, and subsequently employed SVM to classify HC versus MCI versus dementia. Salami et al.⁵⁴ constructed an ensemble model of ResNet18 using MRI images and clinical data. Since the neurodegenerative disease may evolve over time, the long short-term memory (LSTM),¹⁰¹ which is a recurrent neural network, has been applied to predict the conversion year and month from MCI to AD.⁵² Also, LSTM can be used for classification tasks, e.g., Tomassini et al.⁶⁶ built a convolutional LSTM (ConvLSTM) based on MRI scan images for the multi-class classification of AD versus MCI versus HC. Rahim et al.⁶⁴ integrated CNN with LSTM for detecting the progression of AD. In addition, the depth model can be combined with other technologies. In the 1990s, Jang¹⁰² proposed an adaptive-network-based fuzzy inference system (ANFIS) which is based on IF-THEN rules and possesses explanatory properties. Building upon ANFIS, Emmanuel and Jabez⁶² developed an Advanced ANFIS (AANFIS) for multi-class classification of AD versus MCI versus HC. Besides, due to the rare clinical imaging data of different modal, researchers attend to use generative adversarial concept to augment AD image datasets. Ma et al.⁷⁴ integrated multimodal medical image data by estimating prior distributions, employing bidirectional adversarial mechanisms, and utilizing hypergraph perception networks with graph convolution operations to predict abnormal brain connectivity in AD.

It is noted that only one of the 55 studies uses ontology to build CCDSS.⁴⁴ In 1995, Gruber¹⁰³ proposed a standardized concept for explicitly constructing ontologies. Initially, the ontology is constructed by defining a series of terms, which requires a significant amount of work. Currently, CCDSS-based ontology utilizes the existing ontology and expands its functionalities to achieve research objectives. As in the study,⁴⁴ researchers constructed an AD Diagnosis Ontology (ADDO) by reusing two ontologies, i.e., Basic Formal Ontology and Ontology for General Medical Sciences.

Also, only one of the 55 studies adopts an intensive learning method, RL, which is usually applied in the field of competitive games and is optimized by rewarding and feedback on user behaviors.⁵⁸ The study used the idea of intensive learning to improve the MMSE score by analyzing the drug use of the AD patients and finally found the best treatment plan. A regression model and a DT were established to generate multiple states with different values of Alzheimer's disease assessment scale (ADAS), Rey auditory verbal learning test (RAVLT) immediate, RAVLT learning, age, the clinical dementia rating scales sum of boxes (CDRSB), MoCA, and Fluoro-Deoxy-Glucose (FDG). Furthermore, 6 drug combinations was used as behavioral selections, with MMSE score acting as a reward signal.

The analysis of the included studies highlights the diversity and complexity of techniques employed in CCDSS. Different data types (e.g., structured and unstructured data) and modalities (e.g., images, text, etc.) have unique characteristics and requirements. Constructing CCDSS with carefully selected methods and models tailored to these features can significantly enhance performance and accuracy, ultimately leading to improved outcomes.

Performance comparison

Regarding the performance comparison of different input data modalities, the overall ACC range for the binary classification of AD versus HC based solely on neuroimaging data type is from 85.09%³⁴ to 99.38%,³⁸ while for the multi-class classification of AD versus MCI versus HC, the accuracy ranges from 61.42%⁸¹ to 96.5%.⁴⁶

For single-modal non-imaging data, the ACC range for the binary classification of Dementia versus HC is 81.5%⁴⁵ to 91%.⁶¹ Furthermore, for multi-modal non-imaging data, the ACC range for the binary classification of AD versus HC is 86.25%⁵³ to 97%.³⁸ And the ACC range of the multi-class classification of AD versus MCI versus HC based on multiple non-imaging data, is 83%³³ to 90.03%.⁵⁷

For multi-modal data including neuroimaging and non-imaging, the ACC range of the binary classification of AD versus HC is 84.4%⁴³ to 94%,⁶³ and the ACC range of multi-class classification of AD versus MCI versus HC is 78%⁴⁸ to 93.33%.²² Meanwhile, it turns out that the model built on small-sized imaging data has achieved high accuracy, e.g., the ACC of 96.5% and AUC of 0.99 on 35 patients’ samples⁴⁶; ACC of 86.57% and AUC of 0.8636 on 66 patients’ samples⁴¹; ACC of 93.65% and AUC of 0.973 with 159 patients’ samples³⁴; and ACC of 99.38% with 52 patients’ samples.³⁸ This may be due to the relatively high-quality samples or the representative features extracted and selected, so that the models can effectively learn the insight of the data.

Currently, CCDSS for classification of AD employ single-modal or multi-modal data, simultaneously demonstrating commendable performance. Notably, studies utilizing neural imaging data from a singular modality exhibit superior classification outcomes, potentially attributed to the efficacy of deep neural networks in extracting highly representative image features. However, given the multifactorial and heterogeneous nature of AD, clinical diagnostics tend to lean towards the utilization of multi-modal data for enhanced diagnostic accuracy. The obtained classification results of the CCDSS developed on neuroimaging, non-imaging data, and multi-modal data including both imaging and non-imaging along with the built-in data modeling approaches are listed in Tables 6, 7 and 8, respectively.

Table 6.

The detailed classification performance of the studies using neuroimaging data.

Ref.	Data modalities	Data modelling approaches	Classification objectives		Performance
Ref.	Data modalities	Data modelling approaches	Classification objectives		AUC	ACC (%)	Sensitivity (%)	Specificity (%)	Precision (%)	Recall (%)	F1-score (%)
Lazli et al. (2019)³⁴	MRI+PET	SVDD	AD versus HC in	ADNI	0.973	93.65	90.08	92.75	—	—	—
				OASIS	0.967	91.46	92.00	91.78	—	—	—
				GMPH	0.946	85.09	92.00	84.92	—	—	—
Müller and Lio (2020)³⁸	sMRI	RF+HC_CMPR	AD versus HC		—	99.38	99	100	—	—	—
Cai et al. (2021)⁴¹	rs-fMRI	RBF-SVM	aMCI versus HC		0.8636	86.57	—	—	—	—	85.71
Oltu et al. (2021)⁴⁶	EEG	Bagged trees	AD versus MCI versus HC		0.99	96.50	96.21	97.96	—	—	—
Ullah and Jamjoom (2022)⁵⁶	sMRI	CNN	Non-Dementia versus very mild Dementia versus mild Dementia versus moderate Dementia		—	99.38	—	—	100 versus 99 versus 100 versus 100	99 versus 100 versus 98 versus 100	100 versus 99 versus 99 versus 100
Emmanuel and Jabez (2023)⁶²	sMRI	AANFIS	AD versus MCI versus HC		—	94.5 versus 92.5 versus 97.5	91.5 versus 89.5 versus 94.5	—	—	—	—
Tomassini et al. (2023) ⁶⁶	3D MRI	ConvLSTM	AD versus HC		0.9981	—	—	—	—	—	—
			MCI versus HC		0.9322	—	—	—	—	—	—
			AD versus MCI versus HC		0.9391 versus 0.9012 versus 0.9527	—	—	—	—	—	—
Ayus and Gupta (2024) ⁶⁸	sMRI	CNN-Conv1D-LSTM	non-Dementia versus moderate Dementia		0.9868	99.95	—	—	99.97	98.72	99.32
			non-Dementia versus very mild Dementia		0.999	99.88	—	—	99.85	99.9	99.87
			non-Dementia versus mild Dementia		0.9997	99.96	—	—	99.91	99.97	99.94
			very mild Dementia versus mild Dementia		0.9981	99.89	—	—	99.93	99.81	99.87
			very mild Dementia versus moderate Dementia		0.9868	99.93	—	—	99.96	98.68	99.32
			mild Dementia versus moderate Dementia		0.9991	99.83	—	—	98.72	99.91	99.30
He et al. (2024)⁶⁹	rs-fMRI	Spatiotemporal graph transformer network	AD versus HC		0.9845	92.3	95.81	94.00	—	—	—
He et al. (2024)⁶⁹	rs-fMRI	Spatiotemporal graph transformer network	EMCI versus LMCI		0.9288	84.78	87.17	84.55	—	—	—
Khatri and Kwon (2024) ⁷⁰	sMRI	Optimized convolution ViT (CViT)	AD versus MCI versus HC		—	94.31	97.14	94.11	—	—	—
			AD versus HC		—	95.37	91.09	100	—	—	—
			MCI versus HC		—	92.15	89.92	94.56	—	—	—
Lei et al. (2024)⁷¹	sMRI	Brain-Region Attention Network	AD versus HC		0.9623	90.2	89.86	92.45	—	—	—
			MCI versus HC		0.7593	66.55	66.67	70.64	—	—	—
			AD versus MCI		0.8158	72.13	70.56	73.50	—	—	—
Liu et al. (2024)⁷²	sMRI+PET	Hierarchical Attention-based Multi-modal Fusion model (HAMMF)	AD versusHC		0.9315 ± 0.0208	93.15 ± 2.01	—	—	93.57 ± 2.00	93.15 ± 1.92	93.14 ± 1.96
Ma et al. (2024)⁷⁴	rs-fMRI	GCN	HC versus EMCI		0.93 ± 0.04	92.2 ± 1.1	92.2 ± 1.3	92.3 ± 1.5	—	—	—
			EMCI versus LMCI		0.92 ± 0.04	91.3 ± 1.3	91.7 ± 1.6	90.8 ± 1.1	—	—	—
			HC versus EMCI versus LMCI		0.89 ± 0.03	85.5 ± 1.0	91.7 ± 1.5	79.2 ± 0.7	—	—	—
Qiu et al. (2024)⁷⁵	sMRI+PET	Multimodal diagnosis network based on multi-fusion and disease-induced learning (MDL-Net)	AD versus HC		0.9848 ± 0.0144	96.37 ± 2.52	97.40 ± 3.50	95.38 ± 3.66	—	—	—
			MCI versus HC		0.7608 ± 0.0336	73.61 ± 3.58	73.01 ± 10.97	73.02 ± 11.06	—	—	—
			AD versus MCI		0.8816 ± 0.0551	85.29 ± 4.83	93.09 ± 4.81	72.00 ± 11.13	—	—	—
Tian et al. (2024)⁷⁹	sMRI+PET	Multi-scale fully separable CNN with large kernels	EMCI versus LMCI		0.9393	95.87	93.31	—	—	—	—
Wang et al. (2024)⁸⁰	sMRI+PET	Unsupervised cross-modal synthesis network + Interpretable diagnosis network based on fully 2D convolutions	AD versus HC in	ADNI	0.933	87.5	87.5	87.4	—	—	87.4
				AIBL	0.947	87.9	86.7	90.7	—	—	82.7
				NACC	0.919	85.9	86.3	86.5	—	—	85.6
Xu et al. (2024)⁸¹	sMRI	Logits-constraint attention and graph-based multi-scale fusion model	AD versus HC		0.9493 ± 0.0174	93.02 ± 1.57	—	—	—	—	91.00 ± 2.11
			AD versus MCI		0.7512 ± 0.0338	73.75 ±1.89	—	—	—	—	60.76 ± 3.71
			MCI versus HC		0.7524 ± 0.0259	71.02 ± 2.27	—	—	—	—	70.15 ± 2.27
			AD versus MCI versus HC		0.7660 ± 0.0186	61.42 ± 2.39	—	—	—	—	61.35 ± 2.53
Zuo et al. (2024)⁸³	fMRI+DTI+sMRI	Prior-guided adversarial learning with hypergraph (PALH) model	EMCI versus HC		0.9959	96.47	98.43	94.87	—	—	—
			LMCI versus HC		0.9483	92.20	96.05	88.46	—	—	—
			AD versus HC		0.9272	87.5	92.68	82.05	—	—	—

SVDD: support vector data description; fMRI: functional MRI; sMRI: structural MRI; rs-fMRI: resting-state fMRI; DTI: Diffusion tensor imaging.

Table 7.

The detailed classification performance of the studies using non-imaging data.

Ref.	Data modelling approaches	Objectives		Performance								Data types
Ref.	Data modelling approaches	Objectives		AUC	ACC (%)	Sensitivity (%)	Specificity (%)	Precision (%)	Recall (%)	F1-score (%)	Error	Data types
Bhagwat et al. (2018)³⁰	Longitudinal Siamese ANN	MCI versus HC		0.99	94	—	—	—	—	—	—	Multi-modal
Bhagwat et al. (2018)³⁰	Longitudinal Siamese ANN	HC versus EMCI versus LMCI		0.883	72.4	—	—	—	—	—	—	Multi-modal
Rhodius-Meesteret al. (2018)³¹	DSI	sSCDversuspSCD	DSI < 0.2orDSI > 0.8	3-CV:0.83 ± 0.11test: 0.81	3-CV:84.1% ± 9.6test: 78.5	3-CV:85.4% ± 17.6test: 83.3	3-CV:82.8% ± 7.1test: 73.7	—	—	—	—
Rhodius-Meesteret al. (2018)³¹	DSI	sSCDversuspSCD	DSI < 0.3orDSI > 0.7	3-CV:0.84 ± 0.09test: 0.79	3-CV:84.1 ± 7.3test: 74.2	3-CV:84.9 ± 14.2test: 76.0	3-CV:83.3 ± 5.5test: 72.3	—	—	—	—
Bucholcet al. (2019)³³	KRR, SVM	AD versus QCI versus HC		0.949	83	100 versus 69.2 versus 100	97.7 versus 100 versus 76.7	—	—	—	—
Buegler et al. (2020)³⁵	XGBoost	MCI-to-AD		0.92 ± 0.03	86 ± 2	84 ± 4	88 ± 4	83 ± 4	—	83 ± 3	—
Barneset al. (2020)³⁶	LR	Dementia versus Non-Dementia		—	—	47.1	87.2	PPV: 10	NPV: 98.2	—	—
Carvalhoet al. (2020)³⁷	Rank the following classifiers and select the highest-ranking classifier: BN, NB, A1DE, C4.5 DT, RF, KNN and SVM	Dementia versus Non-Dementia		0.95	92%	—	—	—	—	—	—
Carvalhoet al. (2020)³⁷		AD versus Non-AD		0.89	85	—	—	—	—	—	—
Müller and Lio (2020)³⁸	RF + HC_CMPR	AD versus HC		—	97	96	97.4	—	—	—	—
Saribudaket al. (2020)⁴⁰	PReP-ADDifferential Evolution	Medication suggestions for AD patients	12 months HVL	—	—	—	—	—	—	—	NT: 0.0163MONO: 0.0266POLY: 0.0283
Saribudaket al. (2020)⁴⁰	PReP-ADDifferential Evolution	Medication suggestions for AD patients	60 months MMSE	—	—	—	—	—	—	—	NT: 0.048MONO: 0.0624POLY: 0.0775
Canavanet al. (2021)⁴²	Gaussian Hidden Markov Model	MCI-to-AD	5 visit group	—	82	—	—	75	93	—	—
			4 visit group	—	73	—	—	57	100	—	—
			3 visit group	—	71	—	—	52	93	—	—
Shoaipet al. (2020)⁴⁴	ADDO	AD versus HC		—	—	—	—	—	—	—	—
Suárez-Araujoet al. (2021)⁴⁷	CPN	MCI versus HC		0.8684	95.11	90	84.78	—	—	—	—
Chen et al. (2022)⁵⁰	DS-ANFIS	Dementia versus HC		—	81.69	—	—	—	—	—	—
Chun et al. (2022)⁵¹	XGBoost	aMCI-to-AD in 3 years		0.852	80.70	—	—	—	—	—	—
Ilias and Askounis (2022)⁵³	MTL-BERT	AD versus HC		—	10-CV:86.25 ± 2.1	—	10-CV:89.16 ± 3.33	10-CV:88.59 ± 3.05	10-CV:83.33 ± 2.64	10-CV:85.84 ± 2.12	—
Almohimeedet al. (2023)⁵⁷	Multi-level stacking model	AD versus HC		—	92.08	—	—	92.07	92.08	90.03	—
Almohimeedet al. (2023)⁵⁷	Multi-level stacking model	AD versus sMCI versus HC		—	90.03	—	—	90.19	90.03	90.05	—
Bhattaraiet al. (2023)⁵⁸	RL	Medication suggestions for AD patients		—	—	—	—	—	—	—	—
Kachouriet al. (2021)⁴⁵	SVM	AD versus HC		—	81.5 ± 5.5	79 ± 9.4	84 ± 6.6	—	—	—	—	Single modal
Araújo et al. (2022) ⁴⁹	LightGBM	MCI-to-AD		0.91 ± 0.01	91	84	98	—	—	—	—
Reinke et al. (2022)⁵⁵	LR	Dementia-risk prediction		0.714	—	—	—	—	—	—	—
	GBM			0.707	—	—	—	—	—	—	—
	RF			0.636	—	—	—	—	—	—	—
Di Febboet al. (2023)⁶¹	ResNet50 +SVM	MCI versus HC		—	85	—	—	—	—	—	—
		Dementia versus HC		—	91	—	—	—	—	—	—
		MCI versus Dementia		—	83	—	—	—	—	—	—
Park et al. (2023)⁶⁵	High-Performance Interpretable Network (TabNet)	AD versus HC		0.823	—	79	75	—	—	—	—
		AD versus MCI		0.623	—	64	57	—	—	—	—
		MCI versus HC		0.723	—	66	70	—	—	—	—
Sun et al. (2024)⁷⁷	Knowledge Graph	Care recommendation		—	—	98.92	—	—	—	—	—
Tang et al. (2024)⁷⁸	LightGBM	HC versus MCI versus AD		0.8692	73.56	—	—	—	—	73.57	—
Young et al. (2024)⁸²	myCog	HC versus Cognitive impairment		0.67	—	—	—	—	—	—	—

HVL: hippocampal volume loss

Table 8.

The detailed classification performance of the studies using multi-modal data with both imaging and non-imaging.

Ref.	Data modelling approaches	Objectives		Performance
Ref.	Data modelling approaches	Objectives		AUC	ACC (%)	Sensitivity (%)	Specificity (%)	Precision (%)	Recall (%)	F1-score (%)
Rhodius-Meester et al. (2020)³⁹	DSI	Suggestion for conducting CSF examination		—	71	—	—	—	—	—
Dyrba et al. (2021)⁴³	CNN	MCI versus HC in	ADNI-3	0.684	63.1	—	—	—	—	—
			DELCODE	0.775	71	—	—	—	—	—
			AIBL	0.763	84.4	—	—	—	—	—
		AD versus HC in	ADNI-3	0.913	84.4	—	—	—	—	—
			DELCODE	0.953	85.5	—	—	—	—	—
			AIBL	0.95	85	—	—	—	—	—
El-Sappagh et al. (2021)²²	RF	AD versus MCI versus HC		—	93.33	—	—	100 versus 86.79 versus 100	89.66 versus 100 versus 86.67	—
Venugopalanet al. (2021)⁴⁸	Ensemble DL model	AD versus MCI versus HC		—	78	—	—	77	78	78
El-Sappagh et al. (2022)⁵²	LSTM	AD versus MCI versus HC		—	91.22	—	—	91.28	91.84	91.22
Salami et al. (2022)⁵⁴	Ensemble DL model	AD versus HC		—	87.75	—	—	80.67 versus 91.35	76.83 versus 93.25	78.86 versus 91.6
Chai et al. (2023)⁵⁹	Ensemble model for multi-modal data fusion + SVM / RF/ XGBoost	MCI versus HC		—	SVM:96.73RF:92.54XGBoost:93	—	—	—	—	—
Chen et al. (2023)⁶⁰	ViT+CNN	AD versus HC		0.8819	87.33	84.56	85.15	—	—	—
Moguilner et al. (2023)⁶³	3D DenseNet	AD versus HC using	3T MRI in ADNI & UNITED	0.95 ± 0.02	94 ± 1	93 ± 3	95 ± 2	93 ± 2	94 ± 3	94 ± 1
			3T MRI in LAC	0.89 ± 0.03	90 ± 1	90 ± 2	89 ± 2	93 ± 2	90 ± 1	89 ± 2
			1.5T MRI in LAC	0.89 ± 0.03	80 ± 2	82 ± 1	79 ± 2	78 ± 1	83 ± 1	81 ± 2
Rahim et al. (2023)⁶⁴	CNN-Bi-LSTM	HC-to-MCI-to-AD		0.96 ± 0.01	95 ± 1	—	—	97 ± 2	96 ± 1	96 ± 2
Yi et al. (2023)⁶⁷	DL-based survival clustering model	HC-to-MCI	in 1 year	0.708	—	—	—	—	—	—
			in 3 years	0.802	—	—	—	—	—	—
			in 5 years	0.876	—	—	—	—	—	—
			in 10 years	0.886	—	—	—	—	—	—
		MCI-to-AD	in 1 year	0.810	—	—	—	—	—	—
			in 3 years	0.914	—	—	—	—	—	—
			in 5 years	0.957	—	—	—	—	—	—
			in 10 years	0.979	—	—	—	—	—	—
Lu et al. (2024)⁷³	Hierarchical Attention-Based Multimodal Fusion Model	sMCI versus pMCI		0.913	87.2	88.8	85.4	—	—	88.4
Rahim et al. (2024)⁷⁶	Ensemble of CNN + LSTM + GRU	HC-to-AD	in ADNI	0.9704 ± 0.0212	95.83 ± 2.32	—	—	96.11 ± 1.14	96.24 ± 1.38	96.22 ± 1.30
Rahim et al. (2024)⁷⁶	Ensemble of CNN + LSTM + GRU	HC-to-AD	in NACC	0.8844 ± 0.0334	88.85 ± 2.15	—	—	90.14 ± 3.04	91.24 ± 3.53	89.97 ± 2.43

For the performance comparison with respect to the goal of CCDSS, 34 of the 55 studies focus on binary classification, 11 studies on multi-class classification, and 9 studies on conversion prediction (among which there are 3 studies that focused on both diagnosis and prognosis tasks simultaneously). The specific performance indicators for CCDSS focusing on binary-, multi- class classification and AD progression are summarized in Tables 9, 10, and 11, respectively.

Table 9.

Specific performance indicators for the studies focusing on binary classification.

Ref.	Data modelling approaches	Objectives		Performance
Ref.	Data modelling approaches	Objectives		AUC	ACC (%)	Sensitivity (%)	Specificity (%)	Precision (%)	Recall (%)	F1-score (%)
Bhagwat et al. (2018)³⁰	Longitudinal Siamese ANN	HC versus MCI		0.99	94	—	—	—	—	—
Rhodius-Meester et al. (2018)³¹	DSI	sSCDversuspSCD	DSI < 0.2orDSI > 0.8	3-CV:0.83 ± 0.11test: 0.81	3-CV:84.1 ± 9.6test: 78.5	3-CV:85.4 ± 17.6test: 83.3	3-CV:82.8 ± 7.1test: 73.7	—	—	—
Rhodius-Meester et al. (2018)³¹	DSI	sSCDversuspSCD	DSI < 0.3orDSI > 0.7	3-CV:0.84 ± 0.09test: 0.79	3-CV:84.1 ± 7.3test: 74.2	3-CV:84.9 ± 14.2test: 76.0	3-CV:83.3 ± 5.5test: 72.3	—	—	—
Angelillo et al. (2019)³²	Ensemble of LR and RF	HC versus Dementia		0.873	84.1	86.11	82.76	—	—	—
Lazli et al. (2019) ³⁴	SVDD	HC versus AD in	ADNI	0.973	93.65	90.08	92.75	—	—	—
			OASIS	0.967	91.46	92	91.78	—	—	—
			GMPH	0.946	85.09	92	84.92	—	—	—
Barnes et al. (2020)³⁶	LR	Dementia versusNon-Dementia		—	—	47.1	87.2	PPV: 10	NPV: 98.2	—
Carvalho et al. (2020)³⁷	Rank the following classifiers and select the highest-ranking classifier: BN, NB, A1DE, C4.5 DT, RF, KNN and SVM	Dementia versusNon-Dementia		0.95	92	—	—	—	—	—
		AD versus Non-AD		0.89	85	—	—	—	—	—
		MCI versus Non-MCI		0.97	94	—	—	—	—	—
Müller and Lio (2020)³⁸	RF+HC_CMPR	HC versus AD for	Non-imaging	—	97	96	97.4	—	—	—
Müller and Lio (2020)³⁸	RF+HC_CMPR	HC versus AD for	Imaging	—	99.38	99	100	—	—	—
Cai et al. (2021) ⁴¹	RBF-SVM	HC versus aMCI		0.8636	86.57	—	—	—	—	85.71
Dyrba et al. (2021)⁴³	CNN	MCI versus HC in	ADNI-3	0.684	63.1	—	—	—	—	—
			DELCODE	0.775	71	—	—	—	—	—
			AIBL	0.763	84.4	—	—	—	—	—
		AD versus HC in	ADNI-3	0.913	84.4	—	—	—	—	—
			DELCODE	0.953	85.5	—	—	—	—	—
			AIBL	0.95	85	—	—	—	—	—
Kachouri et al. (2021)⁴⁵	SVM	HC versus AD		—	81.5 ± 5.5	79 ± 9.4	84 ± 6.6	—	—	—
Suárez-Araujoet al. (2021)⁴⁷	CPN	HC versus MCI		0.8684	95.11	90	84.78	—	—	—
Chen et al. (2022)⁵⁰	DS-ANFIS	HC versus Dementia		—	81.69	—	—	—	—	—
Ilias and Askounis (2022) ⁵³	MTL-BERT	HC versus AD		—	10-CV:86.25 ± 2.13	—	10-CV:89.16 ± 3.33	10-CV:88.59 ± 3.05	10-CV:83.33 ± 2.64	10-CV:85.84 ± 2.12
Salami et al. (2022)⁵⁴	Ensemble DL model	HC versus AD		—	87.75	—	—	91.35 versus 80.67	93.25 versus 76.83	91.6 versus 78.86
Almohimeed et al. (2023)⁵⁷	multi-level stacking model	HC versus AD		—	92.08	—	—	92.07	92.08	92.01
Chai et al. (2023) ⁵⁹	Ensemble model for multi-modal data fusion + SVM/RF/XGBoost	HC versus MCI		—	SVM:96.73RF:92.54XGBoost:93	—	—	—	—	—
Chen et al. (2023)⁶⁰	ViT+CNN	HC versus AD		0.8819	87.33	84.56	85.15	—	—	—
Di Febbo et al. (2023)⁶¹	ResNet50 +SVM	HC versus MCI		—	85	—	—	—	—	—
		HC versus Dementia		—	91	—	—	—	—	—
		MCI versus Dementia		—	83	—	—	—	—	—
Moguilner et al. (2023)⁶³	3D DenseNet	AD versus HC using	3T MRI in ADNI & UNITED	0.95 ± 0.02	94 ± 1	93 ± 3	95 ± 2	93 ± 2	94 ± 3	94 ± 1
			3T MRI in LAC	0.89 ± 0.03	90 ± 1	90 ± 2	89 ± 2	93 ± 2	90 ± 1	89 ± 2
			1.5T MRI in LAC	0.89 ± 0.03	80 ± 2	82 ± 1	79 ± 2	78 ± 1	83 ± 1	81 ± 2
Park et al. (2023) ⁶⁵	High-Performance Interpretable Network (TabNet)	AD versus HC		0.823	—	79	75	—	—	—
		AD versus MCI		0.623	—	64	57	—	—	—
		MCI versus HC		0.723	—	66	70	—	—	—
Tomassini et al. (2023)⁶⁶	ConvLSTM	HC versus AD		0.9981	—	—	—	—	—	—
Tomassini et al. (2023)⁶⁶	ConvLSTM	HC versus MCI		0.9322	—	—	—	—	—	—
Ayus and Gupta (2024)⁶⁸	CNN-Conv1D-LSTM	non-Dementia versus moderate Dementia		0.9868	99.95	—	—	99.97	98.72	99.32
		non-Dementia versus very mild Dementia		0.999	99.88	—	—	99.85	99.9	99.87
		non-Dementia versus mild Dementia		0.9997	99.96	—	—	99.91	99.97	99.94
		very mild Dementia versus mild Dementia		0.9981	99.89	—	—	99.93	99.81	99.87
		very mild Dementia versus moderate Dementia		0.9868	99.93	—	—	99.96	98.68	99.32
		mild Dementia versus moderate Dementia		0.9991	99.83	—	—	98.72	99.91	99.30
He et al. (2024)⁶⁹	Spatiotemporal graph transformer network	AD versus HC		0.9845	92.3	95.81	94.00	—	—	—
He et al. (2024)⁶⁹	Spatiotemporal graph transformer network	EMCI versus LMCI		0.9288	84.78	87.17	84.55	—	—	—
Khatri and Kwon (2024)⁷⁰	Optimized convolution ViT (CViT)	AD versus HC		—	95.37	91.09	100	—	—	—
Khatri and Kwon (2024)⁷⁰	Optimized convolution ViT (CViT)	MCI versus HC		—	92.15	89.92	94.56	—	—	—
Lei et al. (2024)⁷¹	Brain-region Attention Network	AD versus HC		0.9623	90.2	89.86	92.45	—	—	—
		MCI versus HC		0.7593	66.55	66.67	70.64	—	—	—
		AD versus MCI		0.8158	72.13	70.56	73.50	—	—	—
Liu et al. (2024)⁷²	Hierarchical Attention-based Multi-modal Fusion model (HAMMF)	AD versusHC		0.9315 ± 0.0208	93.15 ± 2.01	—	—	93.57 ± 2.00	93.15 ± 1.92	93.14 ± 1.96
Lu et al. (2024)⁷³	Hierarchical Attention-Based Multimodal Fusion Model	sMCI versus pMCI		0.913	87.2	88.8	85.4	—	—	88.4
Ma et al. (2024) ⁷⁴	GCN	HC versus EMCI		0.93 ± 0.04	92.2 ± 1.1	92.2 ± 1.3	92.3 ± 1.5	—	—	—
Ma et al. (2024) ⁷⁴	GCN	EMCI versus LMCI		0.92 ± 0.04	91.3 ± 1.3	91.7 ± 1.6	90.8 ± 1.1	—	—	—
Qiu et al. (2024) ⁷⁵	Multimodal diagnosis network based on multi-fusion and disease-induced learning (MDL-Net)	AD versus HC		0.9848 ± 0.0144	96.37 ± 2.52	97.40 ± 3.50	95.38 ± 3.66	—	—	—
		MCI versus HC		0.7608 ± 0.0336	73.61 ± 3.58	73.01 ± 10.97	73.02 ± 11.06	—	—	—
		AD versus MCI		0.8816 ± 0.0551	85.29 ± 4.83	93.09 ± 4.81	72.00 ± 11.13	—	—	—
Tian et al. (2024) ⁷⁹	Multi-scale fully separable CNN with large kernels	EMCI versus LMCI		0.9393	95.87	93.31	—	—	—	—
Wang et al. (2024)⁸⁰	Unsupervised cross-modal synthesis network + Interpretable diagnosis network based on fully 2D convolutions	AD versus HC in	ADNI	0.933	87.5	87.5	87.4	—	—	87.4
			AIBL	0.947	87.9	86.7	90.7	—	—	82.7
			NACC	0.919	85.9	86.3	86.5	—	—	85.6
Xu et al. (2024)⁸¹	Logits-constraint attention and graph-based multi-scale fusion model	AD versus HC		0.9493 ± 0.0174	93.02 ± 1.57	—	—	—	—	91.00 ± 2.11
		AD versus MCI		0.7512 ± 0.0338	73.75 ±1.89	—	—	—	—	60.76 ± 3.71
		MCI versus HC		0.7524 ± 0.0259	71.02 ± 2.27	—	—	—	—	70.15 ± 2.27
Young et al. (2024)⁸²	myCog	HC versus Cognitive impairment		0.67	—	—	—	—	—	—
Zuo et al. (2024) ⁸³	Prior-guided adversarial learning with hypergraph (PALH) model	EMCI versus HC		0.9959	96.47	98.43	94.87	—	—	—
		LMCI versus HC		0.9483	92.20	96.05	88.46	—	—	—
		AD versus HC		0.9272	87.5	92.68	82.05	—	—	—

Table 10.

Specific performance indicators for the studies focusing on multi-class classification.

Ref.	Data modelling approaches	Objectives	Performance
Ref.	Data modelling approaches	Objectives	AUC	ACC (%)	Sensitivity (%)	Specificity (%)	Precision (%)	Recall (%)	F1-score (%)
Bhagwat et al. (2018)³⁰	Longitudinal Siamese ANN	HC versusEMCI versus LMCI	0.883	72.4	—	—	—	—	—
Bucholc et al. (2019) ³³	SVM	HC versus QCI versus AD	0.949	83	100 versus 69.2 versus 100	76.7 versus 100 versus 97.7	—	—	—
El-Sappaghet al. (2021)²²	RF	HC versus MCI versus AD	—	93.33	—	—	100 versus 86.79 versus 100	86.67 versus 100 versus 89.66	—
Oltu et al. (2021)⁴⁶	Bagged trees	HC versus MCI versus AD	0.99	96.50	96.21	97.96	—	—	—
Venugopalan et al. (2021)⁴⁸	Ensemble DL model	HC versus MCI versus AD	—	78	—	—	77	78	78
El-Sappaghet al. (2022)⁵²	LSTM	HC versus MCI versus AD	—	91.22	—	—	91.28	91.84	91.22
Ullah and Jamjoom (2022)⁵⁶	CNN	Non-Dementia versus very mild Dementia versus mild Dementia versus moderate Dementia	—	99.38	—	—	100 versus 99 versus 100 versus 100	99 versus 100 versus 98 versus 100	100 versus 99 versus 99 versus 100
Almohimeed et al. (2023)⁵⁷	multi-level stacking model	HC versus sMCI versus AD	—	90.03	—	—	90.19	90.03	90.05
Emmanuel and Jabez (2023)⁶²	AANFIS	HC versus MCI versus AD	—	97.5 versus 92.5 versus 94.5	94.5 versus 89.5 versus 91.5	—	—	—	—
Tomassini et al. (2023)⁶⁶	ConvLSTM	HC versus MCI versus AD	0.9527 versus 0.9012 versus 0.9391	—	—	—	—	—	—
Khatri and Kwon (2024)⁷⁰	Optimized convolution ViT (CViT)	HC versus MCI versus AD	—	94.31	97.14	94.11	—	—	—
Ma et al. (2024)⁷⁴	GCN	HC versus EMCI versus LMCI	0.89 ± 0.03	85.5 ± 1.0	91.7 ± 1.5	79.2 ± 0.7	—	—	—
Tang et al. (2024)⁷⁸	LightGBM	HC versus MCI versus AD	0.8692	73.56	—	—	—	—	73.57
Xu et al. (2024)⁸¹	Logits-constraint attention and graph-based multi-scale fusion model	HC versus MCI versus AD	0.7660 ± 0.0186	61.42 ± 2.39	—	—	—	—	61.35 ± 2.53

Table 11.

Specific performance indicators for the studies focusing on the prediction of MCI conversions.

Ref.	Data modelling approaches	Predicting objectives		Performance
Ref.	Data modelling approaches	Predicting objectives		AUC	ACC (%)	Precision (%)	Recall (%)	F1-score (%)	Error
Canavan et al. (2021)⁴²	Gaussian Hidden Markov Model	MCI-to-AD	5 visit group	—	82	75	93	—	—
			4 visit group	—	73	57	100	—	—
			3 visit group	—	71	52	93	—	—
El-Sappagh et al. (2021) ²²	RF	aMCI-to-AD in 2.5 years		—	91.86	91.7	91.7	91.84	—
Buegler et al. (2020)³⁵	XGBoost	MCI-to-AD		0.92 ± 0.03	86 ± 2	83 ± 4	—	83 ± 3	—
Araújo et al. (2022)⁴⁹	LightGBM	MCI-to-AD		0.91 ± 0.01	91	84	98	—	—
Chun et al. (2022)⁵¹	XGBoost	aMCI-to-AD in 3 years		0.852	80.70	—	—	—	—
El-Sappagh et al. (2022)⁵²	LSTM	The specific month of MCI-to-AD		—	—	—	—	—	MAE: 0.1375MSE: 0.0538RMSE: 0.2318
Rahim et al. (2023)⁶⁴	CNN-Bi-LSTM	HC-to-MCI-to-AD		0.96 ± 0.01	95 ± 1	—	—	—	—
Yi et al. (2023)⁶⁷	DL-based survival clustering model	HC-to-MCI	in 1 year	0.708	—	—	—	—	—
			in 3 years	0.802	—	—	—	—	—
			in 5 years	0.876	—	—	—	—	—
			in 10 years	0.886	—	—	—	—	—
		MCI-to-AD	in 1 year	0.810	—	—	—	—	—
			in 3 years	0.914	—	—	—	—	—
			in 5 years	0.957	—	—	—	—	—
			in 10 years	0.979	—	—	—	—	—
Rahim et al. (2024)⁷⁶	Ensemble of CNN + LSTM + GRU	HC-to-AD	in ADNI	0.9704 ± 0.0212	95.83 ± 2.32	96.11 ± 1.14	96.24 ± 1.38	96.22 ± 1.30	—
Rahim et al. (2024)⁷⁶	Ensemble of CNN + LSTM + GRU	HC-to-AD	in NACC	0.8844 ± 0.0334	88.85 ± 2.15	90.14 ± 3.04	91.24 ± 3.53	89.97 ± 2.43	—

In binary classification studies, the highest ACC for AD versus HC is 99.38%³⁸ and the highest AUC is 0.9981.⁶⁶ Meanwhile, the average ACC is 89.19% over the 19 studies.^{32,34,37,38,43,45,50,53,54,57,60,61,63,65,72,75,80,81,83} While the highest ACC for MCI versus HC is 96.47%,⁸³ and the average ACC is 84.59% over the 12 studies,^{30,41,43,47,59,61,65,66,70,71,74,83} which are lower than those for AD versus HC. This may be because the difference of MCI versus HC is less distinguishable than that of AD versus HC. In the multi-class classification studies, the highest ACC for HC versus MCI versus AD is 96.50%,⁴⁶ and the average ACC is 87.16% over the 10 studies.^{22,33,46,52,57,61,62,70,78,81}

Furthermore, the 9 prediction studies are conducted for the conversions from HC to MCI and MCI to AD, 7 of which predict the conversion in patients (the AUC of 0.963 and ACC of 91.86% for MCI to AD²²; ACC of 82% for MCI to AD⁴²; AUC of 0.852 and ACC of 80.7% for MCI to AD⁵¹; AUC of 0.96 and ACC of 95% for HC to AD,⁶⁴ AUC of 0.92 and ACC of 86% for MCI to AD³⁵; AUC of 0.91 and ACC of 91% for MCI to AD⁴⁹; AUC of 0.9704 and ACC of 95.83 for MCI to AD⁷⁶), one aims to predict the specific year of the conversions (the mean absolute error (MAE) of 0.1375, mean squared error (MSE) and root mean squared error (RMSE) of 0.0538 and 0.2318, respectively),⁵² and the remaining one simultaneously focuses on the conversions of HC-to-MCI and MCI-to-AD (the AUC of 0.708, 0.802, 0.876 and 0.886 for predicting HC-to-MCI, and the AUC of 0.810, 0.914, 0.957 and 0.979 for predicting MCI-to-AD in 1, 3, 5 and 10 years, respectively).⁶⁷

System explainability and model interpretability

Using the constantly developed ML technologies, some well-trained models can even approach theoretically 100% classification accuracy.¹⁰⁴ However, the interpretability of the models is more crucial for clinicians to understand the diagnostic results from a pathological point of view and be able to explain the cause of the disease to the patient.¹⁰⁵ And the black-box mechanism in some ML methods may bring underlying bias and injustice, though the decision-making is correct.¹⁰⁶ Only the interpretable model can be supportive to find out the pathogenesis and thus validate pathological knowledge. Therefore, many current CCDSS are committed to improving the interpretability of the model embedded.

To achieve this, various explainable AI (XAI) techniques and methods have been developed.¹⁰⁷ For instance, the Shapley additive explanations (SHAP) method uses additive attribution to convert SHAP values from the machine learning feature space to the clinical variable space.¹⁰⁸ This transformation improves the interpretability of previous difficult-to-explain algorithms or models, which can effectively explain diagnostic results by visualizing the contributions of different features.^109,110 Furthermore, the local interpretable model-agnostic explanations (LIME) method is employed to analyze and provide explanations for the prediction outcomes of individual samples, which can be applied to any ML model.¹¹¹ Also, the DL library contains some XAI libraries, such as Pytorch Captum and tensorflow tf-explain, for interpreting the DL model. El-Sappagh et al.¹¹² referred to the methods used for explaining model results as post-hoc XAI techniques, which imitate or simulate the behavior of the model in order to provide explanations. Post-hoc XAI offers both global and local explanations.

In practice, explainability and interpretability are two different concepts in level of detail. The former focuses on explaining the decision made by the CCDSS, while the latter focuses on understanding the insight of the models embedded. Especially, there is no standard definition for interpretability, because models can be interpreted at a greater level of detail and the criteria used to evaluate model performance may vary from different models. Tjoa and Guan¹¹³ reported that, if the interpretations provided can achieve better task performance, then the model is considered highly interpretable, regardless of the traditional ML or DL models. For example, some medical segmentation works constitute a visual interpretation for further diagnosis or prognosis of the disease. They asserted that the proposed opaque models are deemed acceptable due to uncertain and incomplete medical knowledge. This implies that achieving high accuracy in diagnosis is prioritized over interpretability. However, in the medical field, where clinical decisions carry significant accountability, clinicians are required to meticulously consider additional details. In some critical situations, the decision made by the CCDSS without interpretations and explanations may not be entirely relied. Nevertheless, before the black box is opened, decision-making always has certain risks. Hence, the interpretability research is still essential even more crucial. Next two subsections will in more detail review system explainability and model interpretability, respectively. The XAI categorization is shown in Figure 6. Table 12 identifies which among the 55 CCDSS studies is developed based on XAI techniques and/or provides explainable results using an interpretable model. Of the 55 studies included, 44 offer diverse explanations from multiple perspectives and levels. These studies aim to provide doctors and users with valuable, practical insights, enabling them to make more accurate and informed judgments and decisions in AD management.

Figure 6.

Classification diagram of XAI technology.

Table 12.

Post-hoc explainability and model interpretability in terms of the 55 CCDSS.

Ref.	Post-hoc explainability	Model interpretability	Total
Bhagwat et al. (2018)³⁰	×	×	×
Rhodius-Meester et al. (2018)³¹	✔	×	✔
Angelillo et al. (2019)³²	×	×	×
Bucholc et al. (2019)³³	✔	✔	✔
Lazli et al. (2019)³⁴	×	×	×
Buegler et al. (2020)³⁵	×	✔	✔
Barnes et al. (2020)³⁶	×	✔	✔
Carvalho et al. (2020)³⁷	×	✔	✔
Müller and Lio (2020)³⁸	×	✔	✔
Rhodius-Meester et al. (2020)³⁹	✔	×	✔
Saribudak et al. (2020)⁴⁰	×	×	×
Cai et al. (2021)⁴¹	×	×	×
Canavan et al. (2021)⁴²	×	×	×
Dyrba et al. (2021)⁴³	✔	×	✔
Shoaip et al. (2021)⁴⁴	×	✔	✔
El-Sappagh et al. (2021)²²	✔	✔	✔
Kachouri et al. (2021)⁴⁵	✔	×	✔
Oltu et al. (2021)⁴⁶	×	✔	✔
Suárez-Araujo et al. (2021)⁴⁷	×	×	×
Venugopalan et al. (2021)⁴⁸	✔	×	✔
Araújo et al. (2022)⁴⁹	✔	×	✔
Chen et al. (2022)⁵⁰	×	✔	✔
Chun et al. (2022)⁵¹	✔	✔	✔
El-Sappagh et al. (2022)⁵²	×	×	×
Ilias and Askounis (2022)⁵³	✔	×	✔
Salami et al. (2022)⁵⁴	×	×	×
Reinke et al. (2022)⁵⁵	×	✔	✔
Ullah and Jamjoom (2022)⁵⁶	×	×	×
Almohimeed et al (2023)⁵⁷	✔	×	✔
Bhattarai et al. (2023)⁵⁸	×	×	×
Chai et al (2023)⁵⁹	✔	×	✔
Chen et al (2023)⁶⁰	✔	×	✔
Di Febbo et al. (2023)⁶¹	✔	×	✔
Emmanuel and Jabez (2023)⁶²	×	✔	✔
Moguilner et al (2023)⁶³	✔	×	✔
Rahim et al (2023)⁶⁴	✔	×	✔
Park et al. (2023)⁶⁵	✔	×	✔
Tomassini et al. (2023)⁶⁶	×	×	×
Yi et al. (2023)⁶⁷	✔	×	✔
Ayus and Gupta (2024)⁶⁸	×	×	×
He et al. (2024)⁶⁹	✔	×	✔
Khatri and Kwon (2024)⁷⁰	✔	×	✔
Lei et al. (2024)⁷¹	✔	×	✔
Liu et al. (2024)⁷²	✔	×	✔
Lu et al. (2024)⁷³	✔	×	✔
Ma et al. (2024)⁷⁴	✔	×	✔
Qiu et al. (2024)⁷⁵	✔	×	✔
Rahim et al. (2024)⁷⁶	✔	×	✔
Sun et al. (2024)⁷⁷	×	×	×
Tang et al. (2024)⁷⁸	✔	×	✔
Tian et al. (2024)⁷⁹	✔	×	✔
Wang et al. (2024)⁸⁰	✔	×	✔
Xu et al. (2024)⁸¹	✔	×	✔
Young et al. (2024)⁸²	×	×	×
Zuo et al. (2024)⁸³	✔	×	✔

Explainability of the CCDSS

In fact, many research endeavors have been made to design explainable CCDSS for AD. Some studies use XAI to conduct explanatory analysis on models’ results. Chun et al.⁵¹ implemented a graphical interface of both global and local explanations of the model. The global explanations can be achieved by the method of feature importance and the partial dependence plot (PDP), where feature importance is determined by observing the decrease in performance caused by randomly mixing specific features, and PDPs illustrate how average predictive values change when a specific feature varies across its marginal distribution.¹¹⁴ While local explanation can be achieved through individual conditional expectations (ICE), break-down chart, and SHAP. The ICE plots show the local behavior of the model by holding other features constantly at specific eigenvalues.¹¹⁵ These plots generate multiple curves, each representing the conditional expectation of a feature. The break-down chart illustrates the characteristic contribution to explaining the predictive results, while SHAP explains a single sample by calculating the contribution of each feature to the result. Judging from the results of global and local explanation, the CDRSB score is the most important feature due to its more objectiveness than the clinical diagnosis. The two-stage model,²² implemented the SHAP method for both global and local explanations, and the global feature importance could be calculated through the SHAP. The first-stage found that the most influential feature is CDRSB, followed by MMSE; FAQ plays a major role in the second-stage and ADNI_MEM (which is composite logical memory score for the 8 longitudinal changes in memory) is the runner-up feature. In addition, supplementary explanations were generated to explain the RF model results based on DT and fuzzy unordered rule induction algorithm, which enhance the interpretability and confidence of the model. Di Febbo et al.⁶¹ used the SHAP method to analyze the importance of partial features in ROCF, making their model easier to understand. Almohimeed et al.⁵⁷ utilized the SHAP method to conduct feature contribution analysis for decision-making, revealing the decisive features in the diagnosis of AD versus HC, as well as AD versus MCI versus HC. Yi et al.⁶⁷ attempted to unveil patterns in transformation predictions using SHAP and employed it as an indicator of its importance in pattern assignment. Ilias and Askounis⁵³ constructed a diagnostic model using text data transcribed from audio recordings of AD patients. They utilized LIME to explain the diagnostic results of the outperformed model and provide further information on language differences between AD and non-AD individuals.

Furthermore, some studies conduct explainability analysis on the results through custom visual interfaces. Dyrba et al.⁴³ designed an interactive visualization software for explaining the results from the 3D CNN model by generating a set of 2D relevant maps for each result and then capturing the brain regions related to AD. The visualization of the results show that hippocampus atrophy presents important information leading to AD and the atrophy of other cortical and sub-cortical regions also contributes to AD. Moguilner et al.⁶³ applied occlusion sensitivity to analyze sMRI images, identifying and visualizing the most relevant brain regions for the classification of AD and HC. Chen et al.⁶⁰ and Chai et al.⁵⁹ visualized brain activity by utilizing the power spectral density of different EEG frequency bands as features. Rahim et al.⁶⁴ utilized the guided gradient-weighted class activation mapping (Grad-CAM), which is a technique used in computer vision and DL algorithms to visualize image regions associated with specific class predictions, to provide information about the exact voxels involved in making accurate decisions within the CNN-Bi-LSTM model. Additionally, other views highlighting the same brain regions of interest are presented from different angles, offering a comprehensive explanation of the DL model. In 2018, Rhodius-Meester et al.³¹ provided a disease state index (DSI) fingerprint of the model results for each patient, which combines all available data from that patient. The DSI value is presented both numerically and visually in color. DSI fingerprint displays the correlation between features and disease through boxes of different sizes, with larger boxes indicating a greater contribution of that feature to classifying sSCD and pSCD. Rhodius-Meester et al.³⁹ designed a visual interface to explain the results of simulated CSF in order to determine if the actual CSF is necessary for further diagnosing AD. They set up a probability of correct class value and analyzed its impact on the diagnosis of AD. Another visual interface developed can depict the severity of AD along with the corresponding cognitive and functional assessment scores.³³

In addition, Venugopalan et al.⁴⁸ ranked features based on model decreasing accuracy to determine their importance and gain insights into the model's decision-making process for the purpose of explainability. On the other hand, Kachouri et al.⁴⁵ employed principal component analysis to pinpoint the aspects of features that are more representative, thereby gauging how effectively the model has learned from them. Carvalho et al.,³⁷ meanwhile, assessed feature importance by computing the certainty factor, revealing that the score of Clinical Dementia Rating scales (CDR) is particularly significant for diagnosing AD, with a certainty factor reaching 0.99.

Interpretability of the model

Linear models (e.g., linear regression, logistic regression (LR), general additive models, and (semantic) fuzzy models), discretization models (e.g., rule-based models, DT, or Bayesian network (BN)), and example-based models (including k-nearest neighbors (KNN) or case-based reasoning models) are generally considered interpretable.¹¹² It is found that more than half of the 55 studies chose traditional ML algorithms, because they are more interpretable than DL. It is particularly noteworthy that the ensemble algorithm based on the DT was widely used in these studies, as the DT produces rules and results that are easy to understand to a certain extent and the features importance can be calculated through DT structure. Methods such as bagged trees, as used for the classification of HC versus MCI versus AD,⁴⁶ XGBoost as utilized for predicting aMCI to AD,⁵¹ and RF as employed for the classification of AD versus HC³⁸ are employed to calculate feature importance in order to explain the diagnostic results. In terms of the SVM method which is even less intuitively understandable than RF, it still is used in CCDSS for AD,^33,41,59 because the SVM-based model, compared with that base on DL, can enhance its interpretability, such as by understanding feature weights through feature importance or understanding decision boundaries through the visual support vectors.

Some studies achieve interpretability through rule-based models. Mendel and Bonissone¹¹⁶ proposed a rule-based system which is considered as an interpretable model. Müller and Lio³⁸ obtained the rule library by extracting the construction rules of RF, in which the rules are usually in the form of “IF” “THEN” and “ELSE”. Finally, by visualizing the rules of the predictive results, the model can be interpretable and the output of the system can be explainable. Chen et al.⁵⁰ provided a fuzzy rule base and inference process, which illustrates each rule using Gaussian membership functions and linguistic terms represented by fuzzy numbers. Also, the rule visualization provided by the study⁶² is similar to that of the study.⁵⁰

However, deep models themselves lack interpretability, such as the ensemble model consisting of ResNet18⁵⁴ and the counter propagation network (CPN).⁴⁷ Deep models are more complex and challenging to understand compared to traditional ML models. Nevertheless, the lack of model transparency remains a significant obstacle for implementing CCDSS in clinical practice.¹¹⁷ When clinicians utilize non-interpretable models for clinical practice (despite their high accuracy), concerns may still arise regarding the decision outcomes.

GUI design review

The factors to consider when evaluating the CCDSS include its usability, specifically how easy it is to use and how much training is required for its utilization.¹¹⁸ One of the most straightforward aspects to assess in this regard is the evaluation of the GUI design tailored for non-technical stakeholders, because the GUI can introduce a transformative pathway, equipping stakeholders with the means to initiate the evolution of CCDSS that hold the potential to impact AD-related decision-making and intervention strategies.

When evaluating the GUI associated with CCDSS, we focus on several key aspects. First, the overall design of the interface, e.g., whether it includes patient demographic information, is examined. Second, we look at how decision supports are presented and whether there is a clear decision support available. Then, we assess whether the interface provides explainability analysis for various clinical tasks and if the explanations provided are simple and easy to understand. After all, CCDSS is an encapsulated design like a black box especially for healthcare professionals who need to know the decision-making process in order to have greater confidence in it and further make precisely personalized treatment. Lastly, we consider the usability of the interface by examining whether it is intuitive and friendly for new users to quickly get started. Taking all these aspects into consideration, our research findings in relation to the GUI design available in 11 CCDSS can be summarized in Table 13.

Table 13.

Evaluation results of GUI design available in 11 CCDSS.

Ref.	Patient information display	Clear diagnosis result	Interface for explainability of results	Is the explanation easy to understand?	Friendly usability
Rhodius-Meester et al. (2018)³¹	×	×	✔	✔	×
Bucholc et al. (2019)³³	✔	✔	✔	✔	✔
Carvalho et al. (2020)³⁷	×	✔	✔	×	✔
Müller and Lio (2020)³⁸	×	✔	✔	×	✔
Rhodius-Meester et al. (2020)³⁹	×	×	✔	×	×
Dyrba et al. (2021) ⁴³	×	✔	✔	✔	✔
El-Sappagh et al. (2021)²²	×	✔	✔	✔	✔
Chun et al. (2022)⁵¹	×	✔	✔	✔	✔
Salami et al. (2022)⁵⁴	×	✔	×	×	✔
Ayus and Gupta (2024)⁶⁸	×	✔	×	×	✔
Young et al. (2024)⁸²	×	×	×	×	✔

Results summary

This study evaluates 55 CCDSS based on data modalities, computational modeling, explainability and interpretability, research priorities, and GUI. These systems serve various functions, including diagnosis, disease management, prescription support and etc.¹¹⁹ Among them, 44 focus on classifying AD, MCI, and HC, while 9 predict the progression of MCI to AD. Current CCDSS research prioritizes diagnosis and prognosis, with limited focus on other decision-support areas—only two studies recommend drugs, one suggests examinations, and another offers care recommendations. Notably, some systems integrate both diagnosis and prognosis, enhancing clinical decision-making.^22,53

Using data from multiple different datasets can not only avoid the problem of data bias, but also avoid overfitting of modeling methods with the increase of data volume.¹²⁰ But in the results, less than a third of the CCDSS uses multiple datasets. Most of them use public ADNI data, while some rely on hospital or memory clinic datasets.

At the aspect of explainability and interpretability, traditional ML, favored for its interpretability over DL, remains widely used. Early CCDSS are expert systems requiring extensive domain knowledge, but modern approaches favor data-driven models, necessitating careful consideration of explainability and interpretability. This review examines interpretability methods, including Grad-CAM for image visualization and SHAP/LIME for feature attribution, aiding clinicians in understanding model predictions.

In addition, interoperability and data security remain a challenge for CCDSS across different medical systems, with only two studies taking it into account. Prior research proposes the incremental learning technology, which can incorporate new input and output without having to relearn the entire data again.¹²¹ Carvalho et al. (2020)³⁷ proposed a dynamic decision-making model that can adapt to different medical centers by increasing assessment data, while also improving diagnostic capabilities in this way. Furthermore, Lei et al. (2024)⁷¹ developed a federated learning framework to enhance data security without direct data sharing. Future CCDSS research should expand beyond diagnostic accuracy to improve explainability, interoperability and data security, and practical deployment.

Therefore, based on model construction requirements, result analysis, and clinical application levels, we classify these 55 CCDSS into grades from A to X, evaluating them across five dimensions: datasets scale & diversity, reliability of reported performance, interpretability, user-friendly GUI, and interoperability & data security of multi-center systems. The explanation of grades divided from A to X is shown in Table 14 (top panel) for details. Notably, only Carvalho et al. (2020)³⁷ and Lei et al. (2024)⁷¹ have taken into account the aspect of interoperability and data security in multi-center systems. However, Lei et al. (2024)⁷¹ does not report on the GUI. Overall, the study from Carvalho et al. (2020)³⁷ demonstrates excellent clinical translational potential, as evidenced by the grading of ABBBC in Table 14.

Table 14.

Summary grades of 55 CCDSS. N.A.: not applicable.

	Datasets Scale & Diversity	Result Reliability	Explainability & Interpretability	GUI	Interoperability & Data Security
A	Multiple data sets / Data size exceeds 1000	Report four or more metrics / CV / extra test set	Both explainability & interpretability	Fulfill all metrics in Table 13	N.A.
B	Data size exceeds 300	Report 2–3 metrics	Either one	≥3metrics	N.A.
C	Data size less than 300	Reported	N.A.	1–2 metrics	Reported
X	Not reported
Bhagwat et al. (2018)³⁰	A	B	X	X	X
Rhodius-Meester et al. (2018)³¹	A	A	B	C	X
Angelillo et al. (2019)³²	C	B	X	X	X
Bucholc et al. (2019)³³	B	B	A	A	X
Lazli et al. (2019)³⁴	A	A	X	X	X
Buegler et al. (2020)³⁵	B	A	B	X	X
Barnes et al. (2020)³⁶	B	B	B	X	X
Carvalho et al. (2020)³⁷	A	B	B	B	C
Müller and Lio (2020)³⁸	C	B	B	B	X
Rhodius-Meester et al. (2020)³⁹	B	C	B	C	X
Saribudak et al. (2020)⁴⁰	C	C	X	X	X
Cai et al. (2021)⁴¹	C	B	X	X	X
Canavan et al. (2021)⁴²	B	B	X	X	X
Dyrba et al. (2021)⁴³	A	A	B	B	X
Shoaip et al. (2021)⁴⁴	C	C	B	X	X
El-Sappagh et al. (2021)²²	B	B	A	B	X
Kachouri et al. (2021)⁴⁵	C	A	B	X	X
Oltu et al. (2021)⁴⁶	C	B	B	X	X
Suárez-Araujo et al. (2021)⁴⁷	B	B	X	X	X
Venugopalan et al. (2021)⁴⁸	A	B	B	X	X
Araújo et al. (2022)⁴⁹	B	A	B	X	X
Chen et al. (2022)⁵⁰	B	C	B	X	X
Chun et al. (2022)⁵¹	B	B	A	B	X
El-Sappagh et al. (2022)⁵²	A	B	X	X	X
Ilias and Askounis (2022)⁵³	C	A	B	X	X
Salami et al. (2022)⁵⁴	A	B	X	C	X
Reinke et al. (2022)⁵⁵	A	B	B	X	X
Ullah and Jamjoom (2022)⁵⁶	A	B	X	X	X
Almohimeed et al. (2023)⁵⁷	A	B	B	X	X
Bhattarai et al. (2023)⁵⁸	A	C	X	X	X
Chai et al. (2023)⁵⁹	C	C	B	X	X
Chen et al. (2023)⁶⁰	C	B	B	X	X
Di Febbo et al. (2023)⁶¹	C	C	B	X	X
Emmanuel and Jabez (2023)⁶²	C	B	B	X	X
Moguilner et al. (2023)⁶³	A	A	B	X	X
Rahim et al. (2023)⁶⁴	B	A	B	X	X
Park et al. (2023)⁶⁵	A	B	B	X	X
Tomassini et al. (2023)⁶⁶	B	C	X	X	X
Yi et al. (2023)⁶⁷	A	C	B	X	X
Ayus and Gupta (2024)⁶⁸	A	B	X	C	X
He et al. (2024)⁶⁹	A	B	B	X	X
Khatri and Kwon (2024)⁷⁰	A	B	B	X	X
Lei et al. (2024)⁷¹	A	B	B	X	C
Liu et al. (2024)⁷²	B	A	B	X	X
Lu et al. (2024)⁷³	B	B	B	X	X
Ma et al. (2024)⁷⁴	B	A	B	X	X
Qiu et al. (2024)⁷⁵	A	A	B	X	X
Rahim et al. (2024)⁷⁶	A	A	B	X	X
Sun et al. (2024)⁷⁷	A	C	X	X	X
Tang et al. (2024)⁷⁸	C	B	B	X	X
Tian et al. (2024)⁷⁹	B	B	B	X	X
Wang et al. (2024)⁸⁰	A	A	B	X	X
Xu et al. (2024)⁸¹	A	A	B	X	X
Young et al. (2024)⁸²	B	C	X	C	X
Zuo et al. (2024)⁸³	B	B	B	X	X

Discussion

This work systematically extracts and reviews 55 studies on CCDSS for AD that have been developed over the last seven years. These studies use different data modeling approaches based on different data modalities to assist clinicians making decisions on different tasks such as diagnosing and prognosing AD. Clinically, AD diagnosis often relies on a variety of information, such as cognitive and neuropsychological assessment scale, neuroimaging and genetic test results. However, due to the complexity and heterogeneity of AD pathology, it is prone to clinical misdiagnosis. The CCDSS built with AI technology can diagnose AD patients from healthy populations relatively accurately and predict the disease progression of MCI to AD. This not only enhances clinicians’ confidence in diagnosis but also enables the design of personalized treatment plans through various CCDSS configurations, thereby achieving more precise prognostic outcomes and accelerating the care pathway.¹²²

The challenges of this review arise from the use of various data modalities and modeling methods, as well as the need to compare a wide range of studies with differing objectives. Several systematic reviews have focused on using ML to predict the progression of AD.^7,123,124 Similarly, studies on ML and DL techniques for diagnosis aim to explore how these advanced technologies can improve diagnostic accuracy and efficiency.^125,126 These reviews typically focus on a single modality or goal. However, to the best of our knowledge, there are few systematic reviews addressing computerized decision support systems specifically designed for clinical practice in the AD domain. In this case, our work could fill the gap of a lack of such review and updates. By analyzing the screened articles from different angles, we have comprehensively considered the functionalities, performance and explainability of the CCDSS currently reported, as well as its clinical practicability and GUI availability. It hence would provide a reference and a direction for enhancing the subsequent research and improving the design of CCDSS.

However, the challenges are still remain in clinical settings and local deployment. The challenges and corresponding recommendations of implementing CCDSS in clinical settings are as follows:

Model transparency: When it comes to medical transformation, building a trustworthy AI-based CCDSS is so important because doctors should be feeling confident enough with the decisions made by such CCDSS that directly affect people's lives. Currently, there are numerous XAI techniques available for explaining decision outcomes. However, the acceptance of these explanations requires evaluation by clinical professionals, which relies on effective communication between technical experts and clinical practitioners.

The interoperability across different medical systems: Few studies have looked at interoperability in healthcare, but this must be addressed if CCDSS are to be promoted clinically. First of all, a unified standard in hospitals can be developed by an authority or relevant government departments to make the data structure of all hospitals consistent. Second, an incremental model can be used to provide different personalized systems.

Being regularly refinable over time: The model needs to be refined regularly to ensure that it can learn new data and the latest trends in medicine, such as retraining clinical samples over a period of time.

Data security: If the system is deployed across multiple locations on a cloud server, data security must be ensured to prevent any leakage of patient information.

Although the analysis of the filtered articles is as detailed as possible, this review still has limitations as follows: 1) non-English literature and some unobtainable literature are not included; 2) publications that only focus on the performance of algorithms are also excluded. To solve these limitations, our future work will consider incorporating studies that specifically focus on algorithm performance. If these studies demonstrate high-performance models and solutions with clinical usability, it will significantly contribute to the practical implementation of translational CCDSS in the medical domain.

To conclude, with the rise of precision medicine, CCDSS for AD has become an area of growing interest. The review highlights the potential for transformation of existing CCDSS through data-driven models. There are still challenges in integrating these CCDSS into clinical practice, particularly regarding interoperability and data security, as well as decision explainability. Overcoming these challenges requires improving model generalization with diverse data and balancing high accuracy in deep learning models with clinical interpretability needs. Additionally, designing user-friendly graphical interfaces is crucial for practical application, but must align with clinical needs. While there are significant challenges in integrating CCDSS for AD into clinical practice, there is also great potential for these systems to improve diagnostic accuracy, efficiency, and patient outcomes. By addressing the challenges of model transparency, interoperability, regular refinement, and data security, and by involving stakeholders in the design and evaluation process, future research can move closer to realizing the full potential of CCDSS in the medical domain.

Footnotes

Acknowledgments

The authors have no acknowledgments to report.

ORCID iDs

Fan Lin

Yuhua Wang

Xuemei Ding

Author contributions

Pinya Lu: Investigation; Methodology; Writing - original draft; Writing - review & editing.

Mingfeng Chen: Conceptualization; Investigation; Writing - review & editing.

Lili Chen: Investigation; Validation; Writing - review & editing.

Fan Lin: Investigation; Writing - review & editing.

Hongqin Yang: Investigation; Project administration; Supervision; Writing - review & editing.

Yuhua Wang: Formal analysis; Investigation; Writing - review & editing.

Xuemei Ding: Conceptualization; Formal analysis; Funding acquisition; Investigation; Methodology; Project administration; Supervision; Validation; Visualization; Writing - review & editing.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project was support by the Research Development Projects of Fujian Normal University, China (grant numbers DH-1736, DH-1711); Joint Funds for the Innovation of Science and Technology, Fujian province, China (grant number 2023Y9283); ARUK NI networking grant (grant number 71573R).

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

All data generated or analyzed during this study are included in this published article

References

Long

Benoist

Weidner

. World Alzheimer Report 2023: Reducing dementia risk: never too early, never too late. Report, Alzheimer’s Disease International, UK, September 2023.

World Health Organization. Dementia, https://www.who.int/zh/news-room/fact-sheets/detail/dementia (2023, accessed 23 October 2023).

Slot

Sikkes

SAM

Berkhof

, et al. Subjective cognitive decline and rates of incident Alzheimer’s disease and non-Alzheimer’s disease dementia. Alzheimers Dement 2019; 5: 465–476.

Alzheimer’s Association. 2024 Alzheimer’s disease facts and figures. Alzheimers Dement 2024; 20: 3708–3821.

Jack

Knopman

Jagust

, et al. Hypothetical model of dynamic biomarkers of the Alzheimer's pathological cascade. Lancet Neurol 2010; 9: 119–128.

Keine

Walker

Kennedy

, et al. Development, application, and results from a precision-medicine platform that personalizes multi-modal treatment plans for mild Alzheimer’s disease and at-risk individuals. Curr Aging Sci 2018; 11: 173–181.

Pellegrini

Ballerini

Hernandez

MDCV

, et al. Machine learning of neuroimaging for assisted diagnosis of cognitive impairment and dementia: a systematic review. Alzheimers Dement (Amst) 2018; 11: 519–535.

Busse

Hensel

Guhne

, et al. Mild cognitive impairment: long-term course of four clinical subtypes. Neurology 2006; 67: 2176–2185.

Aisen

Petersen

Donohue

, et al. Clinical core of the Alzheimer’s disease neuroimaging initiative: progress and plans. Alzheimers Dement 2010; 6: 239–246.

10.

Edmonds

McDonald

Marshall

, et al. Early versus late MCI: improved MCI staging using a neuropsychological approach. Alzheimers Dement 2019; 15: 699–708.

11.

Jitsuishi

Yamaguchi

. Searching for optimal machine learning model to classify mild cognitive impairment (MCI) subtypes using multimodal MRI data. Sci Rep 2022; 12: 4284.

12.

Csukly

Sirály

Fodor

, et al. The differentiation of amnestic type MCI from the non-amnestic types by structural MRI. Front Aging Neurosci 2016; 8: 52.

13.

Grundman

Petersen

Ferris

, et al. Mild cognitive impairment can be distinguished from Alzheimer disease and normal aging for clinical trials. Arch Neurol 2004; 61: 59–66.

14.

Peterson

Roberts

Knopman

, et al. Mild cognitive impairment: ten years later. Arch Neurol 2009; 66: 1447–1455.

15.

Lin

, et al. The clinical course of early and late mild cognitive impairment. Front Neurol 2022; 13: 685636.

16.

Díaz-Mardomingo

MDC

García-Herranz

Rodríguez-Fernández

, et al.

Problems in classifying mild cognitive impairment (MCI): One or multiple syndromes?

Brain Sci 2017; 7: 111.

17.

Castaneda

Nalley

Mannion

, et al. Clinical decision support systems for improving diagnostic accuracy and achieving precision medicine. J Clin Bioinforma 2015; 5: 4.

18.

Wright

Hickman

TTT

McEvoy

, et al. Analysis of clinical decision support system malfunctions: a case series and survey. J Am Med Inform Assoc 2016; 23: 1068–1076.

19.

Peiffer-Smadja

Rawson

Ahmad

, et al. Machine learning for clinical decision support in infectious diseases: a narrative review of current applications. Clin Microbiol Infect 2020; 26: 584–595.

20.

Holzinger

Langs

Denk

, et al. Causability and explainability of artificial intelligence in medicine. Wiley Interdiscip Rev Data Min Knowl Discov 2019; 9: e1312.

21.

Bates

Levine

Syrowatka

, et al. The potential of artificial intelligence to improve patient safety: a scoping review. NPJ Digital Med 2021; 4: 54.

22.

El-Sappagh

Alonso

Islam

, et al. A multilayer multimodal detection and prediction model based on explainable artificial intelligence for Alzheimer’s disease. Sci Rep 2021; 11: 2660.

23.

McDougall

. A review of screening instruments for assessing cognition and mental status in older adults. Nurse Pract 1990; 15: 18–28.

24.

Coyle

Maguire

, et al. Gray matter concentration and effective connectivity changes in Alzheimer’s disease: a longitudinal structural MRI study. Neuroradiology 2011; 53: 733–748.

25.

Porsteinsson

Isaacson

Knox

, et al. Diagnosis of early Alzheimer’s disease: clinical practice in 2021. J Prev Alzheimers Dis 2021; 8: 371–386.

26.

Mueller

Weiner

Thal

, et al. The Alzheimer’s disease neuroimaging initiative. Neuroimaging Clin N Am 2005; 15: 869–877.

27.

Ellis

Bush

Darby

, et al. The Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging: methodology and baseline characteristics of 1112 individuals recruited for a longitudinal study of Alzheimer's disease. Int Psychogeriatr 2009; 21: 672–687.

28.

Beekly

Ramos

Lee

, et al. The National Alzheimer’s Coordinating Center (NACC) database: the uniform data set. Alzheimer Dis Assoc Disord 2007; 21: 249–258.

29.

Marsden

Alferink

Andringa

, et al. Open Accessible Summaries in Language Studies (OASIS). 2018. Available at: https://www.oasis-database.org.

30.

Bhagwat

Viviano

Voineskos

, et al. Modeling and prediction of clinical symptom trajectories in Alzheimer' disease using longitudinal data. PLoS Comput Biol 2018; 14: e1006376.

31.

Rhodius-Meester

HFM

Liedes

Koikkalainen

, et al. Computer-assisted prediction of clinical progression in the earliest stages of AD. Alzheimers Dement (Amst 2018; 10: 726–736.

32.

Angelillo

Balducci

Impedovo

, et al. Attentional pattern classification for automatic dementia detection. IEEE Access 2019; 7: 57706–57716.

33.

Bucholc

Ding

Wang

, et al. A practical computerized decision support system for predicting the severity of Alzheimer's disease of an individual. Expert Syst Appl 2019; 130: 157–171.

34.

Lazli

Boukadoum

Ait Mohamed

. Computer-aided diagnosis system of Alzheimer’s disease based on multimodal fusion: tissue quantification based on the hybrid fuzzy-genetic-possibilistic model and discriminative classification based on the SVDD model. Brain Sci 2019; 9: 289.

35.

Buegler

Harms

Balasa

, et al. Digital biomarker-based individualized prognosis for people at risk of dementia. Alzheimers Dement (Amst) 2020; 12: e12073.

36.

Barnes

Zhou

Walker

, et al. Development and validation of eRADAR: a tool using EHR data to detect unrecognized dementia. J Am Geriatr Soc 2020; 68: 103–111.

37.

Carvalho

Seixas

Conci

, et al. A dynamic decision model for diagnosis of dementia, Alzheimer's disease and mild cognitive impairment. Comput Biol Med 2020; 126: 104010.

38.

Müller

Lio

. Neuro: a personalisable clinical decision support system for neurological diseases. Front Artif Intell 2020; 3: 23.

39.

Rhodius-Meester

HFM

van Maurik

Koikkalainen

, et al. Selection of memory clinic patients for CSF biomarker assessment can be restricted to a quarter of cases by using computerized decision support, without compromising diagnostic accuracy. PLoS One 2020; 15: e0226784.

40.

Saribudak

Subick

Kim

, et al. Gene expressions, hippocampal volume loss, and MMSE scores in computation of progression and pharmacologic therapy effects for Alzheimer's disease. IEEE/ACM Trans Comput Biol Bioinform 2020; 17: 608–622.

41.

Cai

Yan

Zhou

, et al. Towards effective classification of aMCI based on resting-state multiscale brain features and machine learning approaches. Wirel Commun Mob Comput 2021; 2021: 9975237.

42.

Canavan

Maguire

Bucholc

. Development of a two-state gaussian hidden Markov model for modelling dementia progression in patients with mild cognitive impairment. In: 2021 IEEE 9th International Conference on Healthcare Informatics (ICHI) Victoria, BC, Canada, IEEE, 9–12 Aug 2021, pp.113–119.

43.

Dyrba

Hanzig

Altenstein

, et al. Improving 3D convolutional neural network comprehensibility via interactive visualization of relevance maps: evaluation in Alzheimer' disease. Alzheimers Res Ther 2021; 13: 191.

44.

Shoaip

Rezk

El-Sappagh

, et al. A comprehensive fuzzy ontology-based decision support system for Alzheimer' disease diagnosis. IEEE Access 2021; 9: 31350–31372.

45.

Kachouri

Houmani

Garcia-Salicetti

, et al. A new scheme for the automatic assessment of Alzheimer' disease on a fine motor task with Transfer Learning. In: : 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) Mexico, IEEE, 1–5 Nov. 2021, pp.3823–3829.

46.

Oltu

Akşahin

Kibaroğlu

. A novel electroencephalography based approach for Alzheimer’s disease and mild cognitive impairment detection. Biomed Signal Process Control 2021; 63: 102223.

47.

Suárez-Araujo

García Báez

Cabrera-León

, et al. A real-time clinical decision support system, for mild cognitive impairment detection, based on a hybrid neural architecture. Comput Math Methods Med 2021; 2021: 5545297.

48.

Venugopalan

Tong

Hassanzadeh

, et al. Multimodal deep learning models for early detection of Alzheimer’s disease stage. Sci Rep 2021; 11: 3254.

49.

Araújo

Veloso

Gomes

, et al. A novel panel of plasma proteins predicts progression in prodromal Alzheimer' disease. J Alzheimers Dis 2022; 88: 549–561.

50.

Chen

Shen

, et al. A dominant set-informed interpretable fuzzy system for automated diagnosis of dementia. Front Neurosci 2022; 16: 867664.

51.

Chun

Park

Kim

, et al. Prediction of conversion to dementia using interpretable machine learning in patients with amnestic mild cognitive impairment. Front Aging Neurosci 2022; 14: 898940.

52.

El-Sappagh

Saleh

Ali

, et al. Two-stage deep learning model for Alzheimer’s disease detection and prediction of the mild cognitive impairment time. Neural Comput Appl 2022; 34: 14487–14509.

53.

Ilias

Askounis

. Explainable identification of dementia from transcripts using transformer networks. IEEE J Biomed Health Inform 2022; 26: 4153–4164.

54.

Salami

Bozorgi-Amiri

Hassan

, et al. Designing a clinical decision support system for Alzheimer’s diagnosis on OASIS-3 data set. Biomed Signal Process Control 2022; 74: 103527.

55.

Reinke

Doblhammer

Schmid

, et al. Dementia risk predictions from German claims data using methods of machine learning. Alzheimers Dement 2022; 19: 477–486.

56.

Ullah

Jamjoom

. A deep learning for Alzheimer' stages detection using brain images. Comput Mater Contin 2022; 74: 1457–1473.

57.

Almohimeed

Saad

Mostafa

, et al. Explainable artificial intelligence of multi-level stacking ensemble for detection of Alzheimer's disease based on particle swarm optimization and the sub-scores of cognitive biomarkers. IEEE Access 2023; 11: 123173–123193.

58.

Bhattarai

Rajaganapathy

Das

, et al. Using artificial intelligence to learn optimal regimen plan for Alzheimer’s disease. J Am Med Inform Assoc 2023; 30: 1645–1656.

59.

Chai

, et al. Classification of mild cognitive impairment based on handwriting dynamics and qEEG. Comput Biol Med 2023; 152: 106418.

60.

Chen

Wang

Zhang

, et al. Multi-feature fusion learning for Alzheimer's disease prediction using EEG signals in resting state. Front Neurosci 2023; 17: 1272834.

61.

Di Febbo

Ferrante

Baratta

, et al. A decision support system for Rey-Osterrieth complex figure evaluation. Expert Syst Appl 2023; 213: 119226.

62.

Emmanuel

Jabez

. An advanced adaptive neuro-fuzzy inference system for classifying Alzheimer's disease stages from SMRI images. In: 2023 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA) Ernakulam, India, 20–21 Jan. 2023, pp.1–8, IEEE.

63.

Moguilner

Whelan

Adams

, et al. Visual deep learning of unprocessed neuroimaging characterises dementia subtypes and generalises across non-stereotypic samples. eBioMedicine 2023; 90: 104540.

64.

Rahim

Abuhmed

Mirjalili

, et al. Time-series visual explainability for Alzheimer's disease progression detection for smart healthcare. Alexandria Eng J 2023; 82: 484–502.

65.

Park

Shim

Suh

, et al. Development and validation of an automatic classification algorithm for the diagnosis of Alzheimer's disease using a high-performance interpretable deep learning network. Eur Radiol 2023; 33: 7992–8001.

66.

Tomassini

Sbrollini

Morettini

, et al. CLAUDIA: cloud-based automatic diagnosis of Alzheimer's prodromal stage and disease from 3D brain magnetic resonance. In: 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS) L'Aquila, Italy, 22–24 June 2023, pp.450–455, IEEE.

67.

Zhang

Yuan

, et al. Identifying underlying patterns in Alzheimer's disease trajectory: a deep learning approach and Mendelian randomization analysis. eClinicalMedicine 2023; 64: 102247.

68.

Ayus

Gupta

. A novel hybrid ensemble based Alzheimer’s identification system using deep learning technique. Biomed Signal Process Control 2024; 92: 106079.

69.

Shi

Cui

, et al. A spatiotemporal graph transformer approach for Alzheimer’s disease diagnosis with rs-fMRI. Comput Biol Med 2024; 178: 108762.

70.

Khatri

Kwon

. Diagnosis of Alzheimer's disease via optimized lightweight convolution-attention and structural MRI. Comput Biol Med 2024; 171: 108116.

71.

Lei

Liang

Xie

, et al. Hybrid federated learning with brain-region attention network for multi-center Alzheimer's disease detection. Pattern Recognit 2024; 153: 110423.

72.

Liu

Miao

, et al. HAMMF: hierarchical attention-based multi-task and multi-modal fusion model for computer-aided diagnosis of Alzheimer’s disease. Comput Biol Med 2024; 176: 108564.

73.

Mitelpunkt

, et al. A hierarchical attention-based multimodal fusion framework for predicting the progression of Alzheimer’s disease. Biomed Signal Process Control 2024; 88: 105669.

74.

Cui

Liu

, et al. A multi-graph cross-attention based region-aware feature fusion network using multi-template for brain disorder diagnosis. IEEE Trans Med Imaging 2024; 43: 1045–1059.

75.

Qiu

Yang

Xiao

, et al. 3D Multimodal fusion network with disease-induced joint learning for early Alzheimer’s disease diagnosis. IEEE Trans Med Imaging 2024; 43: 3161–3175.

76.

Rahim

El-Sappagh

Rizk

, et al. Information fusion-based Bayesian optimized heterogeneous deep ensemble model based on longitudinal neuroimaging data. Appl Soft Comput 2024; 162: 111749.

77.

Sun

Leng

, et al. A knowledge graph-based recommender system for dementia care: design and evaluation study. Int J Med Inform 2024; 191: 105554.

78.

Tang

Wang

, et al. S2VQ-VAE: semi-supervised vector quantised-variational autoencoder for automatic evaluation of trail making test. IEEE J Biomed Health Inform 2024; 28: 4456–4470.

79.

Tian

Zhang

. MSCLK: multi-scale fully separable convolution neural network with large kernels for early diagnosis of Alzheimer’s disease. Expert Syst Appl 2024; 252: 124241.

80.

Wang

Piao

Huang

, et al. Joint learning framework of cross-modal synthesis and diagnosis for Alzheimer’s disease by mining underlying shared modality information. Med Image Anal 2024; 91: 103032.

81.

Yuan

, et al. Interpretable medical deep framework by logits-constraint attention guiding graph-based multi-scale fusion for Alzheimer’s disease analysis. Pattern Recognit 2024; 152: 110450.

82.

Young

Dworak

EMM

Byrne

, et al. Protocol for a construct and clinical validation study of MyCog Mobile: a remote smartphone-based cognitive screener for older adults. BMJ Open 2024; 14: e083612.

83.

Zuo

Chen

CLP

, et al. Prior-guided adversarial learning with hypergraph for predicting abnormal connections in Alzheimer’s disease. IEEE Trans Cybern 2024; 54: 3652–3665.

84.

Dalboni da Rocha

Bramati

Coutinho

, et al. Fractional anisotropy changes in parahippocampal cingulum due to Alzheimer’s disease. Sci Rep 2020; 10: 2660.

85.

Gupta

Lee

Choi

, et al. Early diagnosis of Alzheimer’s disease using combined features from voxel-based morphometry and cortical, subcortical, and hippocampus regions of MRI t1 brain image. PLoS One 2019; 14: e0222446.

86.

Shukla

Tiwari

. Review on Alzheimer disease detection methods: automatic pipelines and machine learning techniques. Sci 2023; 5: 13.

87.

Bron

Smits

Van Der Flier

, et al. Standardized evaluation of algorithms for computer-aided diagnosis of dementia based on structural MRI: the cad dementia challenge. Neuroimage 2015; 111: 562–579.

88.

Wang

Shen

Wang

, et al. Ensemble of 3d densely connected convolutional network for diagnosis of mild cognitive impairment and Alzheimer’s disease. Neurocomputing 2019; 333: 145–156.

89.

Zhou

Wang

Zhu

. Feature selection based on mutual information with correlation coefficient. Appl Intell (Dordr) 2022; 52: 5457–5474.

90.

Venkatesh

Anuradha

. A review of feature selection and its methods. Cybern Inform Technol 2019; 19: 3–26.

91.

Guyon

Elisseeff

. An introduction to variable and feature selection. J Mach Learn Res 2003; 3: 1157–1182.

92.

Ghojogh

Samad

Mashhadi

, et al. Feature selection and feature extraction in pattern analysis: A literature review. arXiv 2019; 190502845.

93.

Tibshirani

. Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol 1996; 58: 267–288.

94.

Hussain

Rehman

Othman

MTB

, et al. Accessing artificial intelligence for fetus health status using hybrid deep learning algorithm (AlexNet-SVM) on cardiotocographic data. Sensors 2022; 22: 5103.

95.

Govindarajan

. A hybrid RBF-SVM ensemble approach for data mining applications. Int J Intell Syst Appl 2014; 6: 84–95.

96.

Afzali

HHA

Karnon

. Specification and implementation of decision analytic model structures for economic evaluation of health care technologies. In: Culyer

(ed.) Encyclopedia of health economics. San Diego, CA: Elsevier, 2014, pp.340–347.

97.

Dietterich

. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 2000; 40: 139–157.

98.

Fathi

Ahmadi

Dehnad

. Early diagnosis of Alzheimer’s disease based on deep learning: a systematic review. Comput Biol Med 2022; 146: 105634.

99.

Krizhevsky

Sutskever

Hinton

. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 2012; 25: 1097–1105.

100.

Zhang

Ren

, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp.770–778, IEEE.

101.

Hochreiter

Schmidhuber

. Long short-term memory. Neural Comput 1997; 9: 1735–1780.

102.

Jang

JSR

. ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst Man Cybern 1993; 23: 665–685.

103.

Gruber

. Toward principles for the design of Ontologies used for knowledge sharing? Int J Hum Comput Stud 1995; 43: 907–928.

104.

AlSharabi

Salamah

Abdurraqeeb

, et al. EEG Signal processing for Alzheimer's disorders using discrete wavelet transform and machine learning approaches. IEEE Access 2022; 10: 89781–89797.

105.

Matuchansky

. Deep medicine, artificial intelligence, and the practising clinician. Lancet 2019; 394: 736.

106.

Shaban-Nejad

Michalowski

Buckeridge

. Explainability and interpretability: keys to deep medicine. In: Shaban-Nejad

Michalowski

Buckeridge

(eds) Explainable AI in healthcare and medicine: building a culture of transparency and accountability. Cham: Springer International Publishing, 2021, pp.1–10.

107.

Biran

Cotton

. Explanation and justification in machine learning: a survey. In: IJCAI-17 Workshop on Explainable AI (XAI), 2017, pp.8–13.

108.

Lundberg

Lee

. A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017). Long Beach, California, USA, 2017, pp.4768–4777, Curran Associates Inc.

109.

Xue

, et al. Use of machine learning to develop and evaluate models using preoperative and intraoperative data to identify risks of postoperative complications. JAMA Netw Open 2021; 4: e212240–e212240.

110.

Molnar

. Interpretable Machine Learning. 3rd ed. Victoria, BC: Leanpub, 2025. http://leanpub.com/interpretable-machine-learning

111.

Ribeiro

Singh

Guestrin

. "Why should I trust you?": Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, CA, USA, 2016, pp.1135–1144, Association for Computing Machinery.

112.

El-Sappagh

Alonso-Moral

Abuhmed

, et al. Trustworthy artificial intelligence in Alzheimer's disease: state of the art, opportunities, and challenges. Artif Intell Rev 2023; 56: 11149–11296.

113.

Tjoa

Guan

. A survey on explainable artificial intelligence (XAI): toward medical XAI. IEEE Trans Neural Netw Learn Syst 2020; 32: 4793–4813.

114.

Friedman

. Greedy function approximation: a gradient boosting machine. Ann Stat 2001; 29: 1189–1232.

115.

Goldstein

Kapelner

Bleich

, et al. Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. J Comput Graph Stat 2015; 24: 44–65.

116.

Mendel

Bonissone

. Critical thinking about explainable AI (XAI) for rule-based fuzzy systems. IEEE Trans Fuzzy Syst 2021; 29: 3579–3593.

117.

Markus

Kors

Rijnbeek

. The role of explainability in creating trustworthy artificial intelligence for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies. J Biomed Inform 2021; 113: 103655.

118.

Mahadevaiah

Bermejo

, et al. Artificial intelligence-based clinical decision support in modern medical physics: selection, acceptance, commissioning, and quality assurance. Med Phys 2020; 47: e228–e235.

119.

Sutton

Pincock

Baumgart

, et al. An overview of clinical decision support systems: benefits, risks, and strategies for success. NPJ Digil Med 2020; 3: 17.

120.

Xue

Kowshik S

Lteif

, et al. AI-based differential diagnosis of dementia etiologies on multimodal data. Nat Med 2024; 30: 2977–2989.

121.

Chan

Andre

Herrera

, et al. Incremental learning in a multilayer neural network as an aid to Alzheimer's disease diagnosis. In: Proceedings of IEEE Systems Man and Cybernetics Conference - SMC Le Touquet, France, 17–20 October 1993, pp.1–4, vol.4. IEEE.

122.

Lin

Liu

, et al. A mini review of transforming dementia care in China with data-driven insights: overcoming diagnostic and time-delayed barriers. Front Aging Neurosci 2025; 17: 1554834.

123.

Grueso

Viejo-Sobera

. Machine learning methods for predicting progression from mild cognitive impairment to Alzheimer’s disease dementia: a systematic review. Alzheimers Res Ther 2021; 13: 1–29.

124.

Kaur

Mittal

Bhatti J

, et al. A systematic literature review on the significance of deep learning and machine learning in predicting Alzheimer's disease. Artif Intell Med 2024; 154: 102928.

125.

Borchert

Azevedo

Badhwar

, et al. Artificial intelligence for diagnostic and prognostic neuroimaging in dementia: a systematic review. Alzheimers Dement 2023; 19: 5885–5904.

126.

Saikia

Kalita

. Alzheimer disease detection using MRI: deep learning review. SN Comput Sci 2024; 5: 507.