Abstract
Background
Early diagnosis of dementia is essential for enabling timely interventions that may slow disease progression, improve patient outcomes, and reduce healthcare costs. This study aims to develop machine learning models to predict dementia risk using longitudinal electronic health record (EHR) data.
Objective
This research aims to develop and evaluate machine learning models for dementia risk prediction using longitudinal EHR data from routine clinical care and to identify key clinical features associated with elevated dementia risks.
Methods
We conducted an incidence-based case-control study using EHR data from the UMass Memorial Health system (2017–2024) to develop a dementia risk prediction model.
Results
This study included 5622 dementia cases and 44,976 controls. The XGBoost model achieved the highest AUC (0.802), with top predictors included thyroid-stimulating hormone (TSH), vitamin B12, and HDL cholesterol. Model performance was consistent across sexes and remained robust in multiple sensitivity analyses.
Conclusions
Machine learning models that integrate comorbid conditions and longitudinal laboratory test patterns show their potential in predicting dementia risk. These findings highlight the promise of routinely collected EHR data as a scalable, low-cost resource for identifying individuals at elevated risk for dementia.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
