Abstract
Inconsistent measurement of multimorbidity hinders comparative research and evidence synthesis. To address this, this study is a narrative review, informed by a structured scoping search of three major databases. We synthesized formal expert-consensus studies to identify a foundational core set of conditions that can serve as a pragmatic starting point for improving research comparability. Our synthesis of three expert consensus studies identified a comprehensive core set of 105 consistently recommended condition categories at the International Classification of Diseases, 10th revision (ICD-10) three-digit level. These conditions span 10 major clinical domains, covering key conditions such as ischemic heart disease, diabetes, dementia, and depression. To facilitate application, we mapped this core set to both ICD-10 and preliminary ICD-11 codes. A foundational consensus for measuring multimorbidity is emerging. We propose a “Core-plus-Customization” framework, where the identified core set serves as an evidence-based starting point for research, requiring local adaptation rather than being a prescriptive “minimum dataset.” This operationalized core set provides a practical tool to begin standardizing measurement in the field.
Introduction
Multimorbidity—generally understood as the presence of multiple health problems in a single individual—has become increasingly widespread, representing a primary challenge for contemporary healthcare systems.1,2 It is consistently associated with poorer quality of life, accelerated functional decline, increased mortality risk, and markedly greater use of healthcare resources, including more frequent hospitalizations, longer inpatient stays, and higher medication loads.3–5 The dual trends of population aging and the rising prevalence of chronic diseases have further exacerbated the burden of multimorbidity, exposing fundamental limitations in care models that were designed for single-disease management. 6 In-depth research on multimorbidity has therefore become a priority on the global health agenda. 7
Although researchers broadly agree on the conceptual idea of “multiple conditions,” operational definitions of multimorbidity vary widely. Differences span the minimum number of conditions required—some studies define multimorbidity as two or more concurrent conditions, whereas others set the bar at three or more—as well as the composition and size of disease lists, the grouping or weighting of conditions, and the data sources used (electronic health records, administrative claims, and survey self-report).8,9 This methodological heterogeneity produces strikingly divergent prevalence estimates and hampers cross-study comparisons, evidence synthesis, and the design of effective interventions.8,10 While all these factors contribute to the heterogeneity, the composition of the condition list itself is arguably the most foundational element. Establishing a consensus on which conditions to include is therefore a critical first step toward standardization.
To address this long-standing measurement inconsistency, numerous scholars have called for a standardized core set of conditions that should be included whenever multimorbidity is assessed.8,11,12 Several pioneering studies have attempted to establish such core sets through formal expert-consensus techniques (e.g., Delphi panels and nominal-group methods).13,14 While these commendable independent efforts represent significant progress, they also introduce a new risk: the formation of separate, potentially incompatible “consensus standards” in different countries or regions. This makes the task of integrating their findings particularly urgent. To our knowledge, no review has yet synthesized and critically compared these consensus-derived condition lists. This gap limits methodological standardization and the comparability of future research.
This narrative review synthesizes these pioneering efforts. We aim to provide a comprehensive overview of consensus-based approaches by identifying relevant studies, comparing their methodologies, and synthesizing their resulting condition lists. Through this integrated analysis, we distil a cross-study core set of conditions and explore the contextual factors driving heterogeneity, thereby laying a robust foundation for harmonized measurement in the field of multimorbidity.
Methods
Guideline and protocol
This study is a narrative review, conducted and reported in accordance with the Scale for the Assessment of Narrative Review Articles (SANRA) guidelines. 15 To ensure a comprehensive and transparent evidence base, this review was informed by a structured scoping search conducted in accordance with the methodological guidance for scoping reviews. 16 For reporting transparency of the search process, we have followed the PRISMA Extension for Scoping Reviews (PRISMA-ScR) guidelines (Supplemental Table S1). 17 A brief study protocol was established a priori and is available from the corresponding author upon reasonable request; we note as a limitation that it was not publicly pre-registered.
Eligibility criteria
Our search and inclusion criteria were guided by the Population, Concept, and Context (PCC) framework to ensure clarity: Population—stakeholders contributing to expert consensus on adult multimorbidity measurement; Concept—consensus-derived condition lists intended for multimorbidity measurement; Context—any geographical or healthcare setting for research use. 18
Based on this framework, we established explicit eligibility criteria. We included original research articles published in peer-reviewed journals that employed a formal, systematic expert-consensus method—such as the Delphi technique, Nominal Group Technique, or a structured expert panel classification—to generate a list of conditions. 19 A primary or secondary objective of the study had to be the creation of this list for the purpose of measuring multimorbidity in research. Only articles published in English were considered.
We excluded studies that focused on comorbidity (i.e. conditions surrounding a specific index disease), as well as reviews, commentaries, editorials, conference abstracts, or opinion pieces. Studies that used consensus methods for other purposes (e.g., developing clinical guidelines, care models, or core outcome sets) or that collected expert opinions without a formal, reproducible consensus process were also excluded.
We exclusively included studies using formal expert-consensus methods because the primary aim of this review was to synthesize evidence from processes specifically designed to create deliberated, rather than purely prevalence-based, core condition lists. This focus on expert judgment is critical for identifying a clinically and socially meaningful set of conditions for measurement.
Information sources and search strategy
A comprehensive search was performed across multiple electronic databases, including PubMed, Embase, and CINAHL from inception to 2 July 2025. The search strategy was constructed around three core concepts: (1) multimorbidity and multiple chronic conditions; (2) expert consensus methods (e.g., Delphi, consensus, and expert panel); and (3) measurement and definition (e.g., condition list, disease list, measurement, and definition). To ensure comprehensive retrieval, the search was supplemented by screening grey literature from relevant organizational websites (e.g., World Health Organization) and review protocol registries (e.g., PROSPERO). Furthermore, backward and forward citation tracking of all included studies was conducted. A full, reproducible search strategy was employed (Supplemental Table S2).
Selection of sources of evidence
The literature selection process consisted of two independent phases. First, two reviewers (WS and YZ) independently screened the titles and abstracts of all identified records. Subsequently, they independently assessed the full texts of all potentially eligible articles. Any disagreements throughout the process were resolved through discussion or, if necessary, arbitration by a third reviewer (RS).
Data charting process
We designed and calibrated a standardized data charting form to systematically extract key information from the included studies. One reviewer (WS) charted the data, and a second reviewer (YZ) verified the entirety of the extracted data for accuracy and completeness. The charted data items included: (i) basic study information (first author, publication year, and country/region); (ii) methodological details (consensus method, composition and size of the expert panel, presence of patient and public involvement (PPI), and the specific criteria or thresholds for achieving consensus); and (iii) study outputs (the complete list of recommended health conditions or problems).
Synthesis of results
Given the anticipated heterogeneity in the methodologies, contexts, and objectives of the included studies, we employed a narrative synthesis approach to integrate the findings. This synthesis involved three steps: (1) a descriptive and qualitative comparison of the methodological pathways used by the studies; (2) standardization and matching of the condition lists to identify a “core set” of conditions consistently recommended across studies; and (3) a contextual analysis to explore and interpret the reasons for any observed differences among the lists, considering factors such as geography, healthcare system, and expert panel composition. We used the International Classification of Diseases, 10th revision (ICD-10) as the primary classification system, as it was the basis for all included expert consensus studies, ensuring reproducible alignment. To enhance the future utility of our findings, we provided a preliminary mapping to the International Classification of Diseases, 11th revision (ICD-11) in Supplemental Table S4. Because ICD-11 often requires post-coordination and mappings are not always one-to-one (non-bijective), these ICD-11 codes are provided as best-fit references rather than prescriptive substitutions.
Results
Study selection
The literature selection process is detailed in the PRISMA flow diagram (Figure 1). Our initial database and supplementary searches identified 666 records. After deduplication, title and abstract screening, four articles proceeded to full-text assessment for eligibility. Of these, one article was excluded because, despite using a consensus methodology, it did not ultimately produce a finalized list of conditions for multimorbidity measurement, which was a primary eligibility criterion for this synthesis. Therefore, a final total of three studies met the pre-specified eligibility criteria and were included in the review.

PRISMA flow diagram of the study selection process.
Characteristics of included studies
The three included studies varied significantly in their geographical context, core methodology, expert panel composition, and primary outputs (Table 1). Two studies employed the Delphi method, while one utilized an expert panel classification approach. The respective aims were to provide a comprehensive classification framework for an older population, establish a core set of conditions for international research, and define measurement criteria for a specific primary care system.
Characteristics of included studies.
Note. N: number of participants; PPI: patient and public involvement; ICD-10: International Classification of Diseases, 10th revision.
Panelist geography as reported in the original publication by Ho et al. 13 : 53.3% of panelists were from Europe and 20.7% from North America.
A key source of methodological heterogeneity lay in how each study operationalized the concept of multimorbidity, with notable differences in the multimorbidity threshold, the source of the condition list, and the criteria for defining a “chronic” condition (Supplemental Table S3).
A primary point of divergence was the number of conditions required to define multimorbidity. While the studies by Calderón-Larrañaga et al. 14 and Ho et al. 13 both used a threshold of two or more conditions, the consensus in the study by Tyagi et al. 20 established a stricter criterion of three or more conditions. Furthermore, the criteria for defining a condition as “chronic” also varied. For example, Calderón-Larrañaga et al. 14 focused on the condition's duration and its impact on disability or need for long-term care, while Tyagi et al. 20 specified a minimum duration of 6 months alongside requirements for persistence and patient impact. The public panel in the Ho et al. 13 study reached consensus on a 12-month duration. While the professional panel agreed on a 6-month threshold, this highlights the complexity of defining “long-term” conditions. These foundational differences in definition directly influenced the composition of the final condition lists produced by each study.
Methodological approaches to consensus
The Delphi method
Two studies employed the Delphi method, which commenced with a pre-defined candidate list of conditions derived from the literature. Experts were asked to rate or select items over multiple anonymous online survey rounds. After each round, statistically aggregated, anonymized group feedback was provided to all panelists, allowing them to revise their judgments in subsequent rounds until a pre-specified consensus threshold was met. 19
Expert panel classification
This study utilized a distinct approach, beginning with a consensus definition of “chronic disease” agreed upon by a multidisciplinary team. Panel members then independently reviewed all four-digit codes in the ICD-10. 21 Following disagreement resolution through discussion, the expert panel further grouped all qualifying codes into 60 broad disease categories based on clinical criteria. 14
Synthesis of recommended condition lists
Overall composition of the lists
The final condition lists produced by the three studies varied in number and format. Ho et al. proposed a two-tiered list comprising 24 “always include” and 35 “usually include” conditions. 13 Tyagi et al. 20 identified a single list of 23 conditions. Calderón-Larrañaga et al. 14 constructed a comprehensive framework of 60 broad disease categories.
Identification of a core condition set
Despite these differences, a detailed cross-study comparison of the condition lists revealed a robust consensus. To achieve this, lists from each study were first standardized to ICD-10 three-digit categories. Items present in all three studies constituted the 105-item core set, while items present in at least two studies formed a broader 381-item high-agreement set. The 105-item core set spans 10 major clinical domains, including key cardiovascular, metabolic, and neuropsychiatric diseases (Table 2), underscoring a strong foundational consensus on the cornerstones of multimorbidity. The full, operationalized lists include the 105-item core set and the 381-item high-agreement set, with corresponding ICD-11 codes for each condition (Supplemental Tables S4 and S5).
Major clinical categories within the core consensus set of conditions.
Note. ICD-10: International Classification of Diseases, 10th revision; ICD-11: International Classification of Diseases, 11th revision. The table presents a summary of the major clinical categories within the core consensus set. The complete, detailed operational list of every condition, its specific ICD-10 codes, and a preliminary mapping to ICD-11 is available in a single, comprehensive table in Supplemental Table S4.
Discussion
Principal findings and interpretation
Despite heterogeneity in aims and target populations across the included studies, we observed a convergence on 105 ICD-10 categories. This cross-context overlap suggests that these conditions form a robust foundation for multimorbidity measurement in high-income settings. This finding supports using the core set as a backbone for research comparability, while the variations beyond this core underscore the necessity of the “Customization” element in our proposed framework. The conditions in this core set largely mirror the leading causes of death and disability in the Global Burden of Disease studies, which provides an epidemiological rationale for their centrality. 22 Notably, the consistent inclusion of mental health conditions like depression counterbalances the tendency in some measurement models to overlook psychosocial domains, supporting a more integrated, biopsychosocial perspective.
Beyond the core consensus, variations in the lists offer profound insights. These differences are not random but clearly reflect key distinctions between the studies, such as methodology (e.g., comprehensive bottom-up classification vs. focused Delphi techniques), panel composition (especially the inclusion of patient representatives, which added patient-centric conditions), and local context (such as including specific risk factors relevant to a primary care setting). These findings collectively point to a more effective measurement paradigm—the “Core plus Customization” model. This model is not just a technical solution but also has practical importance, particularly for making underrepresented conditions more visible by allowing regions to add locally prevalent diseases, thus preventing them from being “invisible” in statistics.23,24 To make this framework operational and prevent a return to measurement inconsistency, we propose two reporting standards for future studies: (1) researchers should always report outcomes based on the core set to ensure a comparable baseline across studies; and (2) they should additionally report outcomes for the extended “core + local add-ons” list, providing a transparent appendix of all added conditions and their codes. This dual-reporting approach preserves a comparable backbone while enabling full context specificity.
Strengths and limitations
The primary strength of this study lies in its novelty as the first narrative review to systematically synthesize these independent consensus-building efforts. The transparency of our literature search process is ensured by our adherence to the PRISMA-ScR checklist for reporting.
However, the study's limitations are also clear. The principal limitation is the very small number of included studies (n = 3), which, while a finding in itself, highlights that the use of formal consensus methods to define multimorbidity is still in its infancy.
A second, and perhaps most significant, limitation is the overrepresentation of high-income countries within the source studies. This geographical bias is reflected in the composition of our synthesized lists. For instance, while chronic infectious diseases such as HIV/AIDS are present in our broader 381-item “high-agreement set,” they are strikingly absent from the more stringent 105-item “core set.” This is particularly telling, as multimorbidity in many low- and middle-income countries is driven by both communicable and non-communicable diseases, with HIV being a central factor. 25 This absence underscores a critical gap and demonstrates that our synthesized core set cannot be considered “global.” It highlights the urgent need for new consensus studies grounded in the epidemiological realities of diverse populations. Finally, in accordance with narrative review methodology, we did not conduct a formal risk-of-bias assessment. While we performed a descriptive check of the reporting completeness of the included studies using the ACCORD checklist (Supplemental Table S6), this does not constitute a formal quality appraisal, and the methodological rigor of the primary studies should be considered when interpreting our findings.
Implications for research
Based on our findings, we propose several directions for future research. First, our identification of both a stringent 105-item “core set” and a broader 381-item “high-agreement set” provides researchers with flexible, evidence-based tools. We propose that the core set serve as an evidence-based “starting point” for studies requiring high comparability, rather than being adopted wholesale as a fixed, prescriptive “minimum dataset.” Second, new consensus studies in diverse regions are urgently needed. For this, we suggest a methodological approach that synthesizes established best practices, such as those used in core outcome set development. 26 This “hybrid” approach would first employ a data-driven approach, using local health data to identify a candidate list of high-burden or “hub” conditions.27,28 This list is then refined and validated by a multi-stakeholder expert panel, including patients, through a formal consensus process. This approach aims to merge empirical evidence with expert wisdom to create contextually relevant measurement standards. Third, the core condition set itself should be a “living list.” Emerging chronic issues like long COVID should be considered for future inclusion.29,30 Likewise, macro-level factors like climate change may reshape the chronic disease landscape and warrant attention.31,32
Implications for clinical practice and policy
For clinical practice, it is crucial that this foundational research set is not misconstrued as a restrictive “clinical checklist.” Such a use would risk marginalizing patients with conditions absent from the list, directly contradicting the principles of patient-centered care. For health policy, it enables more reliable monitoring and equitable resource allocation. However, a significant “implementation gap” exists between establishing a consensus and achieving its widespread adoption. Therefore, future work must focus on implementation science to ensure research consensus translates into practical benefit.33,34
Conclusion
In conclusion, by systematically synthesizing pioneering expert consensus studies, this review provides an evidence-based, foundational core set of conditions to tackle the long-standing measurement inconsistency in the field of multimorbidity. By promoting this consensus-driven standard, we can foster more coherent, comparable, and impactful research, ultimately to improve the health and well-being of people living with multimorbidity worldwide.
Supplemental Material
sj-docx-1-sci-10.1177_00368504251409841 - Supplemental material for Synthesizing expert consensus to define a core condition set for multimorbidity measurement: A narrative review
Supplemental material, sj-docx-1-sci-10.1177_00368504251409841 for Synthesizing expert consensus to define a core condition set for multimorbidity measurement: A narrative review by Weihao Shao, Zuolin Lu, Xiaoxia Wei, Yunyuan Kong, Yutong Long, Yue Zhang and Ruitai Shao in Science Progress
Footnotes
Acknowledgements
We would like to express our gratitude to the authors and participants of the primary studies included in this review, whose original work formed the foundation of our synthesis. The authors confirm that no AI-assisted technologies were used in the generation of this manuscript, including data analysis, interpretation, or the writing of the text.
Ethics approval and consent to participate
Since this study solely relies on published literature, no ethics approval is necessary.
Author contributions
Weihao Shao conceived and designed the study, conducted the literature search and formal analysis, and wrote the original draft of the manuscript. Yue Zhang, Zuolin Lu, Yunyuan Kong, Xiaoxia Wei, and Yutong Long contributed to the literature search, data extraction and analysis, and critically reviewed the manuscript. Ruitai Shao contributed to the conceptualization of the study, supervised the project, secured funding, and critically reviewed and edited the manuscript. All authors have read and approved the final manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the non-profit Central Research Institute Fund of Chinese Academy of Medical Sciences (2021-RC330-004); the non-profit Central Research Institute Fund of Chinese Academy of Medical Sciences (2022-ZHCH330-01); the Disciplines Construction Project: Population Medicine (WH10022022010); and the Special Research Fund for Central Universities, Peking Union Medical College (3332025142).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
All data generated or analyzed during this study are included in this published article and its supplementary information files.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
