Abstract
Background
Alzheimer's disease (AD) patients frequently present to emergency departments (EDs) with complex comorbidities that complicate triage and management. Yet, little is known about how these multimorbidity patterns have evolved over time.
Objective
To identify temporal shifts in comorbidity-based phenotypes among older adults with AD visiting EDs between 2007 and 2022 using unsupervised clustering methods.
Methods
We analyzed ED visits for adults aged ≥60 with an AD diagnosis from the Nationwide Emergency Department Sample (NEDS) for the years 2007, 2012, 2017, and 2022. Using ICD-9/10 codes, we mapped diagnoses to 30 clinically relevant comorbidities per year and applied the k-means clustering method to identify subgroups based on diagnostic co-occurrence. Heatmaps summarized cluster compositions across timepoints.
Results
Over 15 years, four stable but evolving comorbidity clusters emerged in each year. Earlier cohorts (2007–2012) were dominated by cardiovascular and respiratory clusters (e.g., CHF, CAD, respiratory failure), while more recent cohorts (2017–2022) showed increased prevalence of nonspecific, frailty-related presentations (e.g., fatigue, GERD, general symptoms). Despite rising ED utilization among older adults, the proportion of visits documenting AD declined from 2.59% in 2007 to 1.34% in 2022, potentially reflecting shifts in coding, outpatient management, and diagnostic overshadowing by acute symptoms.
Conclusions
The comorbidity landscape of AD-related ED visits is changing, with a shift toward vaguer syndromes and complex multimorbidity. These findings underscore the need for dementia-aware triage strategies and dynamic phenotyping tools to improve emergency care for cognitively impaired older adults.
Keywords
Introduction
Alzheimer's disease (AD) is a complex neurodegenerative condition that often presents alongside a wide array of chronic comorbidities, especially in older adults seeking acute care. In emergency departments (EDs), where AD-related visits are expected to rise due to demographic aging, the clinical picture is rarely defined by cognitive symptoms alone. Instead, patients often present with non-neurological conditions, ranging from cardiovascular instability to infections, metabolic decompensation, and vague geriatric syndromes, resulting in heterogeneous care needs, uncertain triage, and variable outcomes.1–4
Despite this complexity, current ED care and population health approaches often treat AD patients as a homogeneous group. Traditional comorbidity indices capture disease burden but fail to account for distinct multimorbidity phenotypes, i.e., recurrent combinations of coexisting conditions that may reflect shared etiologies, care needs, or risk trajectories.5–7 Recognizing these patterns may offer a path toward more precise, syndromically aware models of emergency care for AD.
Unsupervised machine learning algorithms, such as k-means, enable the identification of hidden subgroups within high-dimensional health data without requiring a priori labels. Applied to large-scale administrative datasets, these methods have revealed clinically meaningful subtypes in heart failure, diabetes, and sepsis,8–12 but remain underexplored in AD populations, especially within emergency care settings. Whether distinct and stable comorbidity profiles exist across AD patients, and how they evolve over time, is largely unknown.
Beyond k-means, hierarchical clustering and model-based clustering frameworks (e.g., Gaussian mixture models) have been used to derive clinically meaningful subgroups in multimorbidity phenotyping and to explore heterogeneity in dementia and cognitive decline.13–20 This study extends that literature by focusing on longitudinal, population-level phenotyping of AD-associated emergency presentations and explicitly examining how comorbidity-defined subgroups evolve across time and coding eras. Importantly, by applying a consistent clustering pipeline across four nationally representative timepoints spanning ICD-9-CM and ICD-10-CM eras, we address the technical gap of limited long-horizon ED-based multimorbidity phenotyping in AD.
To address this gap, we analyzed nationally representative ED data from the Healthcare Cost and Utilization Project (HCUP) Nationwide Emergency Department Sample (NEDS) across four timepoints (2007, 2012, 2017, and 2022). We applied unsupervised clustering methods to identify recurrent multimorbidity patterns among AD-related ED visits and examined their evolution over 15 years. Our goal was twofold: (1) to characterize comorbidity-driven phenotypes that may inform triage, care planning, and prognosis, and (2) to assess their translational relevance for clinical trials, resource allocation, and population health monitoring.
Our findings reveal both temporal consistency and evolution in comorbidity clustering, including the emergence of nonspecific, frailty-related clusters in recent years. These results highlight the potential of clustering as a tool for redefining subtypes of AD in real-world clinical contexts and suggest new directions for precision care in aging populations.
Methods
Data source and study population
We analyzed data from the 2007, 2012, 2017, and 2022 releases of the Nationwide Emergency Department Sample (NEDS) (Table 1), the largest all-payer ED database in the United States, developed by the Healthcare Cost and Utilization Project (HCUP). NEDS captures a stratified 20% sample of hospital-owned ED visits across states, with approximately 25–35 million unweighted visits annually. After applying discharge-level weights, each dataset represents over 120 million encounters nationally.
AD-related and total ED visits among older adults (aged ≥60) in NEDS by year.
Our study focused on ED visits among adults aged 60 and older with an AD diagnosis. We queried all diagnosis fields (up to 15 in ICD-9-CM years including 2007 and 2012, and up to 40 in ICD-10-CM years including 2017 and 2022) to identify visits with at least one AD-related code (331.0 for ICD-9-CM; G30.x and F00.x for ICD-10-CM). NEDS is an administrative ED database and does not contain cognitive testing, biomarker data, or chart-adjudicated dementia diagnoses. The dataset also does not capture dementia severity (e.g., cognitive scores or staging). Hence, AD is identified by recorded ICD codes rather than clinical adjudication. Our comorbidity set is based on the 30 most frequent conditions per year; therefore, clinically important diagnoses (e.g., acute ischemic/hemorrhagic stroke or TIA) may be present but not consistently displayed if they fall below the year-specific top-30 frequency threshold. Future work will incorporate fixed, clinically curated condition sets to ensure inclusion of key neurologic events.
Identification and grouping of comorbidities
To characterize clinical complexity, we extracted all secondary diagnosis codes from AD visits and identified the 30 most common comorbidities for each study year (Figure 1). Diagnosis fields were tallied, excluding the AD code and non-clinical entries (e.g., trauma, screening, or administrative codes). We then aggregated related codes into clinically meaningful categories (e.g., all heart failure codes mapped to “CHF”). The final comorbidity matrix for each year consisted of binary indicators (1 = presence, 0 = absence) for the 30 top-ranked conditions per patient.

Workflow for identifying multimorbidity clusters in Alzheimer's disease ED visits. Comorbidity indicators were extracted from diagnostic codes, filtered by frequency, and clustered using unsupervised machine learning to identify recurring patient subgroups.
Unsupervised clustering of comorbidity patterns
To identify latent multimorbidity phenotypes among AD-associated ED visits, we applied k-means clustering to the year-specific binary comorbidity matrix (1 = condition present; 0 = absent). Clustering was performed independently for each study year to characterize temporal shifts in phenotype composition rather than imposing a single pooled cross-year solution. We implemented k-means using scikit-learn (sklearn.cluster.KMeans), which partitions observations into k clusters by minimizing within-cluster sum of squares under the Euclidean-distance objective. Models were fit with KMeans and cluster assignments were summarized using cluster-level comorbidity prevalences.
The number of clusters (k) was evaluated across candidate values (k = 2–9) using the elbow method (inertia; within-cluster sum of squares). As shown in Supplemental Figure 1, inertia decreased monotonically without a single dominant inflection point across years, a pattern that can occur in high-dimensional, sparse binary feature spaces where incremental increases in k yield steady improvements in within-cluster dispersion. Accordingly, the elbow plot was used as a screening tool rather than a strict decision rule. We selected k = 4 based on a pragmatic and reproducible criterion that balanced (i) preservation of clinically meaningful heterogeneity (avoiding over-aggregation at low k), (ii) avoidance of excessive fragmentation into very small clusters at higher k, and (iii) use of a constant k across years to support longitudinal comparison of cluster composition. Because k-means can be sensitive to initialization and sampling variability, formal stability assessments (e.g., repeated refitting across random seeds and resampled datasets with agreement metrics such as the adjusted Rand index) are prioritized for future confirmatory work.
Visualization and interpretation
Cluster-level heatmaps were generated to visualize comorbidity prevalence within each subgroup. Each cell represented the mean proportion of patients in a cluster who had a specific comorbidity, color-coded from white (0%) to dark blue (100%). Numerical values within cells (e.g., 0.43 = 43%) reinforced interpretability. These visual summaries allowed for cross-year comparisons of evolving multimorbidity phenotypes in the aging AD population.
Computational environment and reproducibility
All analyses were conducted in Python 3.12 using Google Colab. Data processing employed pandas for manipulation, scikit-learn for clustering, and matplotlib/seaborn for visualization. The pipeline was fully modular, allowing replication across all four datasets. ICD preprocessing, comorbidity binarization, clustering, and visualization scripts are available upon request.
Results
AD-associated ED visits
Across the four sampled years, 2007, 2012, 2017, and 2022, a total of 9,332,991 ED visits by patients aged 60 and older were recorded in the 2022 NEDS dataset, compared to 8,541,352 in 2017, 6,724,932 in 2012, and 5,469,205 in 2007, reflecting the growing burden of acute care utilization among older adults over time.
Within this population, the number of ED visits involving a diagnosis of AD was relatively stable in absolute terms, ranging from 141,393 in 2007 to 125,461 in 2022. However, when examined as a proportion of total older adult ED visits, the percentage of AD-related visits declined steadily over the 15-year period: from 2.59% in 2007 to 2.12% in 2012, 1.80% in 2017, and 1.34% in 2022 (Table 1).
Top 30 comorbidities among AD-associated ED visits
Across the four NEDS datasets (2007, 2012, 2017, and 2022), we identified the 30 most frequent comorbidities among ED visits involving patients with AD (Supplemental Tables 1–4). Diagnosis codes were extracted from all available fields (dx1–dx15 for ICD-9-CM and i10_dx1–i10_dx40 for ICD-10-CM), and binary indicators were generated to denote the presence or absence of each condition. To harmonize the analysis across coding eras, diagnoses were mapped to consistent clinical categories.
The most prevalent comorbid conditions consistently included hypertension, type 2 diabetes, hyperlipidemia, urinary tract infection (UTI), coronary artery disease (CAD), anemia, congestive heart failure (CHF), atrial fibrillation, chronic kidney disease (CKD), chronic obstructive pulmonary disease, pneumonia, depression, hypothyroidism, fatigue, acute kidney failure, gastroesophageal reflux disease (GERD), uncontrolled diabetes, dehydration, syncope, old myocardial infarction (Old MI), coronary atherosclerosis, pressure ulcer, general symptoms, abnormal glucose, chronic systolic heart failure, anemia of chronic kidney disease, convulsions, asthma, anemia of chronic disease, and respiratory failure. These conditions were used to construct the patient-level comorbidity matrix for cluster analysis.
Comorbidity cluster profiles by year
The k-means clustering method (k = 4) was applied separately to each year's binary comorbidity matrix. The clusters represent data-driven groupings of patients with similar comorbidity profiles.
In the 2007 data (Figure 2), Cluster 0 showed high prevalence of hypertension (0.51) and UTI (0.21). Cluster 1 was dominated by acute kidney failure (0.92) and dehydration (0.41). Cluster 2 included patients with fatigue (1.00) and CHF (0.26). Cluster 3 was marked by CHF (0.62) and atrial fibrillation (0.38).

Comorbidity profile by cluster among Alzheimer's disease ED visits in 2007 (k-means, k = 4). This heatmap illustrates the prevalence of the 30 most common comorbidities among patients with AD who visited the ED in the 2007 NEDS, stratified into four clusters identified through K-means clustering. Each cell represents the proportion of patients within a given cluster who had the specified comorbidity, with values ranging from 0.00 to 1.00, shown both numerically and as a color gradient (white = 0, dark blue = 1.0). Comorbidities are listed on the y-axis and cluster labels (Cluster 0 through Cluster 3) are on the x-axis. Distinct multimorbidity patterns were observed: Cluster 0 showed elevated rates of hypertension and urinary tract infection; Cluster 1 was defined by extremely high prevalence of acute kidney failure and dehydration; Cluster 2 was dominated by fatigue and moderate hyperlipidemia; and Cluster 3 was characterized by a cardiovascular profile including congestive heart failure and atrial fibrillation. These clusters reflect heterogeneous comorbidity phenotypes within the AD population and underscore the diversity of clinical presentations among older adults with AD in the ED setting.
In the 2012 dataset (Figure 3), Cluster 0 was characterized by CHF (0.98) and atrial fibrillation (0.36). Cluster 1 showed generally low prevalence across comorbidities. Cluster 2 had high fatigue prevalence (1.00) and moderate hyperlipidemia (0.39). Cluster 3 was defined by uncontrolled diabetes (1.00) and elevated hyperlipidemia (0.34).

Comorbidity profile by cluster among Alzheimer's disease ED visits in 2012 (k-means, k = 4). This heatmap visualizes the distribution of the 30 most prevalent comorbidities among patients with AD visiting the ED in the 2012 NEDS, stratified into four clusters generated using K-means clustering. Comorbidities are listed along the y-axis, and clusters (Cluster 0 to Cluster 3) are shown along the x-axis. Each cell represents the proportion of patients in a given cluster who had that specific comorbidity, with values ranging from 0.00 to 1.00. These proportions are indicated both by numerical values within the cells and by a color gradient, where darker shades of blue represent higher prevalence. Distinct cluster patterns were observed in this year. Cluster 0 was dominated by congestive heart failure (0.98) and atrial fibrillation (0.36), representing a cardio-centric profile. Cluster 1 showed moderate prevalence across a wide range of comorbidities, with no single dominant feature. Cluster 2 was defined by fatigue (1.00), as well as moderate levels of hyperlipidemia and GERD. Cluster 3 showed a sharp signature of uncontrolled diabetes (1.00) and elevated rates of acute kidney failure and hypertension. These patterns reflect the emergence of heterogeneous and clinically meaningful subgroups within the AD population, capturing both chronic disease burden and frailty-related syndromes during emergency care.
In 2017 (Figure 4), Cluster 0 exhibited higher rates of hypertension (0.55) and hyperlipidemia (0.22). Cluster 1 was dominated by respiratory failure (1.00) and acute kidney failure (0.35). Cluster 2 showed elevated frequencies of CAD (0.52), hyperlipidemia (0.55), and GERD (0.33). Cluster 3 included patients with hypertension (0.42) and hyperlipidemia (0.44).

Comorbidity profile by cluster among Alzheimer's disease ED visits in 2017 (k-means, k = 4). This heatmap displays the distribution of the 30 most prevalent comorbidities among ED visits by patients with AD in the 2017 NEDS, stratified into four distinct clusters generated using K-means clustering. The y-axis lists each comorbidity, while the x-axis shows the four patient clusters derived from similarities in comorbidity profiles. Each cell contains both a numerical value and a color intensity corresponding to the proportion of patients within that cluster who exhibited the given comorbidity, ranging from 0.00 (white/light blue, no cases) to 1.00 (dark blue, all patients). In this year's cohort, Cluster 0 was characterized by elevated hypertension (0.55), Cluster 1 was uniquely defined by respiratory failure (1.00) and acute kidney failure (0.35), Cluster 2 showed the highest prevalence of hyperlipidemia (0.55), coronary artery disease (0.52), and GERD (0.33), while Cluster 3 exhibited the highest proportion of general symptoms (1.00) and a strong presence of hyperlipidemia (0.44). The emergence of clusters defined by respiratory, metabolic, and gastrointestinal conditions, as well as vague, nonspecific symptom codes, highlights the increasing clinical complexity and syndromic ambiguity of AD presentations in the ED by 2017. These data-driven subgroupings may reflect evolving patterns of disease burden, diagnostic coding practices, and health system utilization in aging populations with advanced neurocognitive disorders.
In the 2022 dataset (Figure 5), Cluster 0 was defined by fatigue (1.00), hyperlipidemia (0.40), and hypertension (0.43). Cluster 1 again exhibited low prevalence across comorbidities. Cluster 2 was marked by general symptoms (1.00) and hyperlipidemia (0.24). Cluster 3 showed high rates of GERD (1.00) and hypertension (0.43).

Comorbidity profile by cluster among Alzheimer's disease ED visits in 2022 (k-means, k = 4). This heatmap illustrates the comorbidity structure of AD patients presenting to ED in the 2022 NEDS, stratified into four clusters using K-means clustering (k = 4). The y-axis lists the 30 most prevalent comorbid conditions, while the x-axis shows the four identified clusters. Each cell is shaded according to the proportion of patients in that cluster with the corresponding condition, with darker blue hues indicating higher prevalence. Numeric values denote the exact proportion (e.g., 0.43 represents 43% of patients in that cluster exhibiting hypertension). In 2022, Cluster 0 was uniquely characterized by near-universal prevalence of fatigue (1.00), indicating a symptom-dominant presentation possibly linked to advanced frailty or caregiver-observed functional decline. Cluster 1 lacked clear dominant features but showed modest rates across multiple conditions. Cluster 2 was defined by high prevalence of general symptoms (1.00), suggesting a nonspecific or syndromic presentation pattern. Cluster 3 showed exclusive elevation in gastroesophageal reflux disease (GERD; 1.00), highlighting a subgroup presenting with gastrointestinal complaints. Overall, the 2022 clusters reflect a significant departure from prior years, with a striking shift toward vague or symptom-based diagnoses rather than classical organ-specific conditions. These patterns may signal increased documentation of non-specific geriatric syndromes, rising reliance on ED services in late-stage AD, or evolving diagnostic ambiguity associated with cognitive decline. This visual summary underscores the growing clinical complexity and heterogeneity in how AD patients engage with emergency care at the population level.
Discussion
In this study, we investigated the heterogeneity of clinical presentations among AD patients seeking emergency care. We applied unsupervised clustering to comorbidity profiles derived from the NEDS across four time points: 2007, 2012, 2017, and 2022. Using the top 30 most prevalent comorbidities as a proxy for each patient's overall medical complexity, we employed KMeans clustering techniques to identify emergent subgroups based solely on patterns of co-occurrence. Our findings reveal both temporal consistency and evolution in multimorbidity phenotypes, highlighting the dynamic nature of aging and dementia-related emergency care over a 15-year span (Figure 6).

Temporal evolution of multimorbidity patterns in Alzheimer's disease patients visiting EDs. Visual summary of shifting comorbidity clusters from 2007 to 2022, highlighting the transition from organ-specific failures to nonspecific symptom-dominant profiles.
Trends in AD emergency visits over time
Prior to clustering, we examined trends in the absolute and relative frequency of AD-related visits among older adults (aged ≥60 years) presenting to the ED. While the overall number of emergency department visits by older adults increased markedly over time, from 5.47 million in 2007 to 9.33 million in 2022, the number of visits that included an AD diagnosis did not follow a similar trajectory. In fact, the absolute number of AD-related visits remained relatively stable, ranging from approximately 141,000 in 2007 to 125,000 in 2022. As a proportion of total visits by older adults, AD-related visits declined steadily over the study period, falling from 2.59% in 2007 to 1.34% in 2022.
This pattern should be interpreted as coded ascertainment in ED records, not as a direct measure of AD prevalence in the underlying population. AD and related dementias are projected to increase substantially over the coming decades, 21 yet a stable absolute count and declining proportion of AD-coded ED encounters can occur if the diagnosis is under-recognized or under-recorded in routine care and emergency settings,22–24 and if diagnostic labeling shifts toward nonspecific dementia categories rather than AD-specific codes.25,26 In addition, ED documentation commonly prioritizes the acute presenting problem; as medical complexity increases, chronic neurocognitive disorders may be less likely to be documented when not central to the visit, consistent with diagnostic overshadowing and known challenges in ED dementia detection and documentation.23,24 Coding-era effects may further contribute: the ICD-9 to ICD-10 transition introduced changes in code structure and documentation workflows that can influence observed prevalence trends in administrative datasets independent of true epidemiology, 27 and claims-based dementia ascertainment remains sensitive to how diagnoses are captured in coded records.26,28 Finally, evolving geriatric syndromes—such as frailty and cognitive–physical vulnerability—may increasingly drive ED utilization and hospitalization, potentially obscuring AD-specific labeling within broader, nonspecific clinical presentations. 29 Accordingly, we frame ICD-era and documentation effects as plausible, hypothesis-generating mechanisms, and we explicitly note in the Limitations that NEDS cannot disentangle true epidemiologic change from coding/documentation dynamics. Future confirmatory work will evaluate these mechanisms using diagnosis-position definitions, broader dementia case definitions to assess diagnostic substitution, and interrupted time-series approaches around the ICD transition. 27
Taken together, these findings suggest that although the visible footprint of AD in ED records has diminished in proportional terms, the underlying burden may be increasingly hidden within complex multimorbidity and vague geriatric syndromes. This underscores the need for phenotyping approaches that can detect AD-related needs even when explicit diagnostic labeling is absent.
Evolving multimorbidity patterns
In 2007, comorbidity clusters were centered around well-defined syndromes, chiefly CHF, respiratory failure, and fatigue. These clusters mirrored canonical geriatric syndromes and emphasized the role of cardio-pulmonary decompensation as a major driver of emergency presentations in cognitively impaired individuals. 30 Notably, fatigue, often considered a soft or subjective symptom, emerged as a cluster-defining feature even in the earliest cohort, reinforcing its clinical salience in frail older adults with dementia.
By 2012, there was a discernible shift toward more chronic disease-focused clusters, with CAD, CKD, and chronic systolic heart failure becoming prominent. This suggests not only an aging cohort with accumulating comorbidities but also enhanced survivorship and diagnostic labeling for chronic cardiovascular and renal conditions.31,32
In 2017, clusters highlighted acute-on-chronic instability, specifically respiratory failure, acute kidney injury, uncontrolled diabetes, and pneumonia, suggesting rising clinical complexity and overlapping metabolic, infectious, and pulmonary risks.33,34 These patterns may reflect a transitional phase in emergency medicine's response to dementia care, where acute complications intersect with underlying multimorbidity in an aging and medically complex population.
By 2022, we observed a striking emergence of clusters characterized by non-specific complaints such as fatigue, general symptoms, and GERD. These presentations reflect a broader shift away from organ-specific diagnoses toward syndromic patterns commonly associated with advanced frailty, homeostatic vulnerability, and complex geriatric syndromes.35,36 Several interrelated factors may contribute to this evolution: (1) increased coding of vague symptoms under ICD-10 37 ; (2) the aging of the AD population, with a greater proportion of patients in late-stage dementia and multimorbidity 38 ; (3) reduced access to outpatient or palliative care, leading to emergency departments functioning as safety nets for chronic symptom management 39 ; and (4) caregiver-driven ED utilization, wherein subtle behavioral changes or nonspecific symptoms (e.g., reduced intake, lethargy, agitation) prompt ED visits out of concern or exhaustion. 40
Strengths and contributions
One of the primary strengths of this study is the use of a large, nationally representative ED dataset spanning over a decade and a half. This enables both robust population-level inference and longitudinal trend analysis across healthcare delivery and coding eras. 41 Our use of unsupervised clustering on real-world diagnostic data avoids assumptions about disease relationships and permits the discovery of naturally occurring clinical phenotypes.42,43 The consistency of the 8-cluster solution across years suggests stable underlying patterns, while the shift in cluster composition over time highlights temporal evolution in AD care, comorbidity burden, and potentially even diagnostic behaviors.
The identification of symptom-dominant clusters (e.g., fatigue, general symptoms, GERD) also serves to re-center the clinical conversation around the subjective and ambiguous experiences of dementia patients, who may not be able to articulate traditional organ-specific complaints. 44
Limitations and methodological caveats
Despite its strengths, several limitations merit consideration. First, our clustering relied on binary indicators of presence or absence for the top 30 comorbidities, which, while pragmatic and reproducible, do not capture the clinical severity, chronicity, or temporal progression of each condition.10,43 Second, the exclusion of key contextual variables such as medication use, cognitive or functional status, and social determinants of health limits the interpretability and clinical richness of the derived clusters. 45
Additional limitations include the claims-based ascertainment of AD (ICD-coded diagnoses without clinical adjudication) and the absence of validated dementia severity, functional status, medication use, caregiver availability/living arrangement, or other direct measures of social support; these factors may substantially influence ED utilization and the clinical context of presentation. NEDS also does not allow reliable classification of hospital stroke/intervention capabilities or neurology service availability, limiting our ability to interpret the presence or management context of acute cerebrovascular events at the facility level. Finally, we present cross-year differences in cluster composition descriptively; formal statistical testing of temporal changes and quantitative stability analyses (e.g., resampling-based agreement metrics) will be prioritized in future work.
Third, while our approach identified consistent and interpretable patterns, clustering results are inherently sensitive to preprocessing decisions. The choice of diagnostic categories, ICD mappings, and thresholding strategies (e.g., frequency-based top-30 selection) could influence both the shape and stability of identified clusters. Although we harmonized comorbidity definitions across ICD-9-CM and ICD-10-CM eras and prioritized clinical interpretability, alternative aggregation schemas may yield different groupings.46,47
Fourth, our data source, the NEDS, relies on hospital administrative claims and diagnosis codes, which are prone to documentation biases. 8 Transitions in coding systems, evolving provider practices, and variable hospital billing incentives may introduce inconsistencies. Additionally, comorbidities that are not directly relevant to the index ED visit may be undercoded. AD identification relied on diagnosis codes in an administrative dataset rather than clinically adjudicated diagnoses; misclassification and undercoding are possible. In addition, NEDS does not capture dementia severity (e.g., staging, cognitive scores) or functional impairment. As a result, clusters should be interpreted as comorbidity-based phenotypes of coded ED presentations rather than severity-stratified clinical subtypes.
NEDS does not include definitive indicators of stroke-center capability or specialized neurology service availability; therefore, we cannot stratify presentations by hospital stroke-intervention resources.
Fifth, although k-means is a widely used unsupervised learning technique, it assumes relatively simple cluster geometry and may not capture more complex, nested, or probabilistic subgroup structure. In multimorbidity phenotyping, hierarchical clustering and other data-driven clustering frameworks have been used to derive clinically meaningful disease patterns in older adults, and systematic reviews highlight substantial variation in multimorbidity pattern-discovery methods.14,15,48 Likewise, in dementia and AD research, hierarchical clustering and related unsupervised stratification approaches have been used to characterize clinical heterogeneity, and model-based mixture approaches, including Gaussian mixture models, have been applied to identify latent subgroups relevant to cognitive decline and AD progression.18,49,50 Future work could evaluate whether these alternative methods yield complementary ED-based phenotypes and improve stability or interpretability across coding eras.
Temporal comparisons of cluster composition across years are presented descriptively. Future confirmatory analyses will incorporate formal inference to test changes in cluster prevalence and within-cluster comorbidity prevalence over time, using appropriate multiplicity control.
NEDS does not include patient-level measures of caregiver availability, living arrangement, or social support, which are likely determinants of ED utilization patterns in dementia. We therefore cannot directly assess how caregiver context modifies cluster membership or temporal trends.
Prior multimorbidity and dementia phenotyping studies have applied hierarchical and model-based clustering frameworks to derive clinically meaningful subgroups; we cite representative work in multimorbidity patterning and dementia heterogeneity and clarify that these alternative frameworks may be evaluated in future ED-based phenotyping studies.7,12,36,37
Finally, as with all unsupervised methods, clustering results are exploratory. They do not reveal causal pathways nor predict outcomes like mortality or readmission. Future studies should combine clustering with outcome prediction models or external validation.
Clinical and translational implications
Our findings challenge the conventional view of AD patients as a clinically uniform population within emergency care settings. Instead, the data reveals that AD patients presenting to the ED can be meaningfully classified into reproducible subtypes based on distinct multimorbidity profiles. This stratification carries several important implications for clinical care, trial design, and health system innovation.
First, incorporating subtype information into clinical workflows could enhance decision-making in emergency contexts. Cluster membership may help guide triage urgency, diagnostic testing, and referral strategies. For example, patients characterized by symptom-dominant clusters, such as those presenting primarily with fatigue, syncope, or general debility, may benefit from early geriatric or palliative care consultation, while individuals in cardiorenal clusters might require prompt cardiology evaluation and more intensive stabilization protocols.
Second, the identified clusters offer a framework for stratifying participants in future clinical trials. Interventional studies, particularly those targeting transitions of care or hospital avoidance in dementia populations, could use these clusters to define more coherent inclusion criteria or to tailor interventions to biologically and clinically relevant subgroups. This stratified approach may increase both the internal validity and translational potential of trials by accounting for the heterogeneity that would otherwise dilute effect sizes.
Third, the rising prevalence of non-specific symptom clusters over time suggests a growing misalignment between the traditional emergency medicine paradigm and the complex needs of advanced dementia care. These findings raise the possibility that EDs may need to be reimagined, both structurally and operationally, to better accommodate cognitively impaired older adults with complex and often non-acute presentations. This may include redesigning care pathways, staffing models, or reimbursement strategies to prioritize holistic assessment and transitional planning.
Finally, this study demonstrates how unsupervised machine learning methods applied to routine diagnostic data can uncover clinically meaningful phenotypes. This scalable approach to patient stratification could be extended to electronic health record systems, home-based care platforms, or post-acute datasets to develop dynamic risk models and personalized care pathways. As healthcare systems move toward precision medicine, leveraging such clustering methods may offer a low-cost and data-driven way to align care delivery with patient-specific needs in complex, high-risk populations like those with AD.
Conclusion
Over a 15-year span, unsupervised clustering of ED visits revealed shifting patterns of multimorbidity among older adults with AD. Early years (2007–2012) were dominated by clusters reflecting cardio-respiratory instability and classical geriatric syndromes, whereas later years (2017–2022) showed an emergence of fatigue-based, gastrointestinal, and general symptom clusters. These findings suggest a transformation in the clinical phenotype of AD-related ED utilization, from acute organ-specific crises to broader syndromic frailty and metabolic vulnerability. Our results underscore the heterogeneity of AD presentations in acute care settings and support the potential of machine learning-based phenotyping to guide precision triage, anticipatory care planning, and data-driven stratification for clinical trials. As AD populations age and care pathways evolve, dynamic surveillance of comorbidity clusters can enhance real-world understanding and inform targeted geriatric, palliative, and emergency care interventions.
Supplemental Material
sj-docx-1-alz-10.1177_13872877261430952 - Supplemental material for Detecting multimorbidity patterns in Alzheimer's disease using unsupervised machine learning: A nationwide emergency department study (2007–2022)
Supplemental material, sj-docx-1-alz-10.1177_13872877261430952 for Detecting multimorbidity patterns in Alzheimer's disease using unsupervised machine learning: A nationwide emergency department study (2007–2022) by Tursun Alkam, Ebrahim Tarshizi and Andrew H. Van Benschoten in Journal of Alzheimer's Disease
Footnotes
Acknowledgements
The authors would like to thank the Healthcare Cost and Utilization Project (HCUP) for providing access to the Nationwide Emergency Department Sample (NEDS) database.
Ethical considerations
This study used publicly available, de-identified data from the Nationwide Emergency Department Sample (NEDS) database managed by the Healthcare Cost and Utilization Project (HCUP) and did not involve direct human subject research. As such, it was deemed exempt from institutional review board (IRB) approval.
Consent to participate
Not applicable
Consent for publication
Not applicable
Author contribution(s)
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
The data used in this study are available from the Healthcare Cost and Utilization Project (HCUP) Nationwide Emergency Department Sample (NEDS) database. Restrictions apply to the availability of these data, which were used under license for the current study and are not publicly available.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
