Abstract
This study aims to conduct a systematic quality assessment of frailty assessment tools for elderly Chinese speakers in order to screen and recommend high-quality assessment tools and provide theoretical support for healthcare professionals. We searched multiple databases, including PubMed, Embase, Web of Science, CINAHL, China National Knowledge Infrastructure (CNKI), China Science and Technology Journal(VIP), and WanFang Data, for contents related to the psychometric properties, scale localization, and cross-cultural studies of Chinese frailty assessment instruments for the elderly. The search period was from the establishment of the databases to April 1, 2025. Two researchers independently screened the literature and extracted data. The risk of bias for the instruments was assessed using the bias risk assessment form of Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN). The measurement properties were evaluated using the assessment form of COSMIN, and the modified Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) system were used to determine the recommendation level. A total of 35 studies involving 25 Chinese frailty assessment scales were included, none of which reported cross-cultural validity or responsiveness. Based on evidence of low quality or above, the C-RGA and GFI-C frailty assessment scales for hospitalized elderly patients were deemed to possess “adequate” content validity and internal consistency. These 3 assessment tools were recommended as Grade A, with 12 others classified as Grade B, 7 as Grade C, and 3 as Grade D. There are various Chinese frailty assessment instruments for the elderly, but their quality varies greatly. The Frailty Assessment Scale for Elderly Hospitalized Patients, C-RGA, and GFI-C are suitable for the assessment needs of most elderly populations are provisionally recommended for use in clinical and institutional settings pending further validation. Therefore, none of the tools can yet be considered fully validated for broad clinical or research use without further rigorous psychometric evaluation.
Introduction
China has the world’s largest elderly population, and by 2035, adults aged ≥60 years are expected to comprise 30% of the population. “Healthy aging” has therefore become a core objective of the Healthy China 2030 strategy.1,2 Frailty is an important and extensively studied condition in geriatric medicine, characterized by reduced physiological reserves and heightened vulnerability to stressors, leading to adverse outcomes such as disability, hospitalization, and mortality adverse outcomes such as disability, hospitalization, and mortality.3,4 It directly affects older adults quality of life and imposes a considerable burden on families and healthcare systems. Epidemiological data suggest that the prevalence of frailty among Chinese older adults may reach 22.4%, 5 underscoring the urgent need for accurate and feasible assessment tools. Assessing frailty status helps clinicians understand patients’ health trajectories, stratify risks, and provide individualized interventions tailored to functional capacity and care goals. 6
Currently, frailty is commonly measured using various assessment instruments, ranging from phenotype-based scales (e.g. Fried Frailty Phenotype) to multidimensional questionnaires (e.g. Tilburg Frailty Indicator, Edmonton Frail Scale). However, previous research has primarily summarized the psychometric evidence for frailty scales among Western or English-speaking populations. 7 Although these studies have enhanced global understanding of frailty assessment, evidence regarding the performance of these tools when adapted into Chinese or developed for Chinese elderly people remains limited.
Given the cultural, linguistic, and healthcare system differences that may influence item interpretation and response patterns, evaluating the psychometric properties of Chinese-language frailty assessment instruments is essential. This includes examining content validity, structural validity, internal consistency, reliability, and construct validity, to ensure that these measures are both conceptually and statistically sound for the target population.
Therefore, the present study systematically reviews and evaluates the methodological quality and measurement properties of Chinese-language frailty assessment instruments using the Consensus-Based Standards for the Selection of Health Measurement Instruments (COSMIN) checklist. 8 By identifying high-quality and psychometrically robust tools, this review aims to provide evidence-based guidance for researchers and healthcare professionals involved in frailty screening and management among Chinese-speaking older adults.
Methods
Research Design
This systematic review was registered in PROSPERO, an international prospective register of systematic reviews, and was conducted in accordance with the PRISMA-COSMIN recommendations.9,10
Literature Inclusion and Exclusion Criteria
Inclusion criteria: (1) Studies focusing on the development and validation of frailty assessment tools for older adults; (2) Participants aged ≥60 years; (3) Research encompassing the development, cross-cultural adaptation, or validation of psychometric properties for patient-reported outcome measures (PROMs); and (4) Reporting of at least one psychometric property of the instrument.
Exclusion criteria: (1) Conference abstracts, proceedings, or other non-full-text publications; (2) Studies that exclusively applied assessment instruments as outcome measures in interventional trials; (3) Secondary research publications (e.g. reviews, systematic reviews); (4) Studies with missing essential psychometric information (e.g. reliability coefficients, validity indices, factor structure), duplicate publications, or unavailable full texts; and (5) The tool is not in Chinese or has not been verified in China.
Literature Search Strategy
The search was restricted to Chinese and English language publications due to the study’s focus on tools developed or validated for elderly Chinese speakers. Instruments validated in unrelated linguistic and cultural contexts were excluded to ensure measurement equivalence. While this may introduce a risk of selection bias, it was deemed necessary to maintain cultural and linguistic relevance to the target population.
A comprehensive literature search was conducted in PubMed, Embase, Web of Science, CINAHL, China National Knowledge Infrastructure (CNKI), VIP, and WanFang databases to identify studies on frailty assessment instruments for older adults. The search encompassed records from database inception to April 1, 2025. Supplementary manual searches of reference lists were performed. Search strategies utilized both controlled vocabulary (subject headings) and free-text terms. Chinese search concepts included: “frailty/vulnerability/weakness/frailty syndrome,” “elderly/older adults/aged,” “instruments/scales/assessment,” and “reliability/validity/psychometric properties.” Corresponding English terms covered: “Aged/older adults,” “Frailty/frail elderly,” “Assessment instruments/scales,” and related measurement properties. The search strategy using PubMed as an example is provided in Supporting information.
Literature Selection and Data Extraction
Two reviewers independently performed literature screening, data extraction, and cross-verification against predefined inclusion/exclusion criteria. Discrepancies were resolved through consensus discussion or third-reviewer adjudication. All identified records were managed using NoteExpress 3.9.0. For eligible studies, reviewers independently extracted the following data: authors, year, instrument name, region, sample size, number of dimensions, number of items, completion time, and retest time.
Quality Assessment Methods
Two rigorously trained reviewers independently assessed methodological quality and psychometric properties following the COSMIN Risk of Bias checklist (version 2.0) and COSMIN guidelines for PROMs. 8 Any discrepancies were resolved by a third reviewer.
Assessment of Methodological Quality
Two independent reviewers evaluated the methodological quality of all outcome measurement instruments (OMIs) in studies assessing frailty among older adults using the COSMIN Risk of Bias checklist.11,12 Methodological quality was evaluated using the COSMIN Risk of Bias checklist (version 2.0), covering ten domains: PROM development, content validity, structural validity, internal consistency, cross-cultural validity/measurement invariance, reliability, measurement error, criterion validity, construct validity (hypothesis testing), and responsiveness. Each domain was rated as very good, adequate, doubtful, or inadequate, with the overall domain score determined by the lowest rating.
Assessment of Measurement Properties
Measurement properties were synthesized using the COSMIN criteria for good measurement properties. Each psychometric property—content validity, structural validity, internal consistency, cross-cultural validity/measurement invariance, reliability, measurement error, criterion validity, construct validity (hypothesis testing), and responsiveness—was rated as sufficient (+), insufficient (−), inconsistent (±), or indeterminate (?), based on empirical evidence.
Quality of Evidence and Grading of Recommendations
The quality of evidence for each measurement property was appraised according to the modified GRADE approach. 13 Each psychometric property initially received a high evidence rating and could be downgraded based on 4 domains: risk of bias, inconsistency, indirectness, and imprecision. The final evidence quality was classified into 4 levels: high, moderate, low, or very low. To enhance clarity, the grading system was revised as follows: Grade A (Strong): Instruments with sufficient content validity and internal consistency (at least low-quality evidence). Grade B (Moderate evidence): Instruments with adequate measurement properties but requiring further validation. Grade C (Limited evidence): Instruments showing partial or inconsistent psychometric support. Grade D (Insufficient evidence): Instruments with inadequate measurement properties or very low-quality evidence.
Results
Search and Study Selection
The initial search yielded 1567 records. Following sequential screening, 35 studies14 -48 met the inclusion criteria (13 in English and 22 in Chinese), collectively identifying 25 distinct frailty assessment instruments designed for Chinese-speaking older adults. All English-language publications reported studies validating Chinese-language versions of the instruments. The literature selection process is illustrated in Figure 1.

PRISMA flowchart.
Study Characteristics
The 35 studies included encompassed 25 frailty assessment tools (see Table 1 for details). Six were original Chinese instruments,28,31,34,35,41 while 19 were translations or cross-cultural adaptations of existing tools. Sample sizes ranged from 40 to 600 participants, with reported mean ages of subjects between 61.8 and 93.1 years. Most studies were conducted in community or hospital settings. All scales underwent psychometric testing using Chinese versions; no original English-language instruments were directly tested.
Inclusion of Instruments and Description of Their Characteristics.
NR = not reported.
Methodological Quality and Psychometric Property Evaluation
No included studies evaluated cross-cultural validity/measurement invariance, and responsiveness remained unassessed across all instruments. The methodological quality and psychometric properties of the 25 instruments are summarized in Table 2.
Evaluation of Methodological Quality and Measurement Attributes of Included Studies.
Note. Methodological quality: V indicates very good; A indicates good; D indicates unclear; I indicates poor; a indicates expert consultation; b indicates consultation with the target population. Measurement attributes: + indicates adequate; − indicates inadequate; ? indicates uncertain; NR indicates not reported.
Instrument Development
Twelve studies14,21,25 -28,32,36 -38,42,44 demonstrated rigorous scale development methodology, featuring: (a) Explicit operationalization of measured constructs, (b) Expert consultation and cognitive interviews with target populations, and (c) Cross-cultural adaptation protocols for Chinese versions. Despite unspecified verbatim transcription practices, comprehensive content coverage warranted an overall “good” methodological quality rating. The remaining thirteen studies16,24,29 -31,34,35,40,41,43,45 -47 exhibited significant methodological shortcomings, including: Omission of cognitive interviews with stakeholders; Unreported moderator qualifications; Absence of verbatim interview documentation; These deficiencies resulted in an “unclear” methodological quality designation.
Content Validity
Analysis of 20 studies evaluating this characteristic revealed 3 distinct methodological tiers. Six studies28,32,36 -38,40 demonstrated rigorous practice through cognitive interviews with experts and target populations, systematically assessing project relevance, comprehensiveness, and interpretability. These studies demonstrated sound methodological quality and robust measurement properties. Fourteen studies14,20,25 -27,30,31,34,35,41 -44,46 showed methodological shortcomings, such as undocumented research processes, unclear statistical methods, and absent stakeholder interviews, resulting in doubtful methodological ratings and insufficient evidence for measurement properties. Five studies16,24,29,45,47 entirely omitted content validity reporting, resulting in inadequate methodological quality ratings and insufficient validity assessment. This stratified analysis highlights the critical interdependence between methodological rigour and content validity assurance in OMI development.
Structural Validity
Structural validity assessment revealed critical methodological distinctions across the evidence base. Twelve studies16,24,25,29,36,38,40 -42,44 -46 demonstrated fundamental limitations through omission of both exploratory and confirmatory factor analyses (EFA/CFA), warranting inadequate methodological ratings and insufficient evidence for structural validity. Among partial implementations, one study 14 employed EFA exclusively (good methodology) but invalidated its findings by omitting cumulative variance reporting, yielding insufficient evidence. Conversely, 6 CFA-only investigations27,31,34,35,37,43 exemplified sound execution with sample sizes exceeding the 7:1 item ratio threshold (very good methodology), however, the lack of prior EFA validation reduced confidence in the model’s construct representation. Two studies22,30 adopted correlational techniques, demonstrating strong item-dimension-total correlations (70.9% variance explained) and improved fit indices (good methodology). However, their avoidance of factor analytic approaches resulted in insufficient psychometric evidence. The strongest validation emerged from 4 studies26,28,32,47 implementing integrated EFA-CFA methodologies, achieving very good methodological quality with sufficient structural validity evidence. Notably, one such study 47 exemplified excellence with confirmatory fit indices exceeding benchmarks (CFI>0.90; RMSEA < 0.08). This stratification demonstrates that methodological completeness—particularly the use of sequential EFA-CFA procedures—is essential for establishing robust structural validity, whereas standalone analytic methods do not provide sufficient psychometric support.
Internal Consistency
Twenty-two studies14,16,22,24 -28,30 -32,34 -38,40 -44,46-47 reported the internal consistency of their assessment instruments and were rated as having “good” methodological quality. Among these, one instrument 25 reported a Cronbach’s α below 0.7, resulting in “inadequate” measurement properties. Conversely, the remaining 21 studies (84%)14,16,22,24,26 -28,30 -32,34 -38,40,41,43,44,46,47 reported α coefficients ≥0.7, indicating “adequate” measurement properties. Three additional studies28,41,44 failed to report Cronbach’s α coefficients, leading to a rating of “poor” methodological quality and “inadequate” measurement properties. Overall, 84% of scales demonstrated adequate internal consistency, though the lack of factor validation constrained the interpretation of results.
Stability
Twenty studies14,16,25 -32,34 -36,38,40,43 -47 assessed instrument stability via test-retest reliability coefficients. Due to unreported key methodological factors (e.g. participant characteristics, implementation protocols), these studies received an “unclear” methodological quality rating. Among them, one study 29 reported an intraclass correlation coefficient (ICC) < 0.7, yielding “inadequate” measurement properties. The remaining instruments demonstrated “adequate” measurement properties, though 2 studies32,40 substituted Pearson correlation coefficients (>0.7) for ICCs, which nevertheless indicated adequate stability. Overall, stability was adequate in most studies, but limited methodological reporting reduced confidence in these estimates.
Measurement Error
Only a single study 45 reported the Minimum Detectable Change (MDC) value for its assessment instrument. However, due to its short retest interval, methodological quality was rated as “poor,” resulting in an “uncertain” evaluation of measurement error.
Criterion Validity
According to COSMIN guidelines, the original instrument constitutes the criterion reference only when directly compared to its abbreviated version 12 . Among 5 studies16,24,27,35,47 assessing criterion validity, 224,27 did not explicitly report this property. Instead, they inferred validity through predictive experiments and Pearson correlations, resulting in an “ambiguous” methodological quality rating and “uncertain” psychometric properties. The remaining 3 studies16,35,47 quantified criterion validity, although none used the original instrument as the comparator. Notably, 2 studies16,47 reported inter-scale correlation coefficients >0.70, yielding “adequate” measurement properties despite a “poor” methodological quality rating due to the absence of criterion reference comparison. The final study 35 demonstrated “inadequate” measurement properties. These findings suggest that although some instruments align partly with established frailty constructs, formal criterion validation remains underdeveloped.
Hypotheses Testing for Construct Validity
Eight studies16,24 -26,43,44,46,47 assessed construct validity through hypothesis testing (evaluating convergent and/or discriminant validity). However, due to insufficient reporting of subgroup characteristics, methodological quality was rated as “unclear.” Among these, 4 studies16,44,46,47 demonstrated >75% agreement with hypothesized relationships, resulting in “adequate” measurement properties. Conversely, the remaining 4 studies24 -26,43 yielded “inadequate” measurement properties.
Evidence Grading and Recommendations Included in the Assessment Instrument
All downgrading decisions were made using the modified COSMIN-GRADE approach, assess the overall quality of evidence regarding the psychometric properties of each tool. Based on sufficient content validity and internal consistency (at least low-quality evidence), the 25 tools were categorized into 4 recommendation grades (A, B, C, and D). Comprehensive details are provided in Table 3.
Inclusion of Research Measurement Characteristics Synthesis of Results and Recommendations.
①Overall rating; ②Evidence level; + indicates sufficient, − indicates insufficient, ? indicates uncertain; NR indicates not reported.
Grade A: Strong Provisional Recommendation
The Frailty Assessment Scale for Hospitalized Elderly Patients, 28 C-RGA, 32 and GFI-C 47 were classified as Grade A, reflecting strong evidence and sufficient measurement properties.
The Frailty Assessment Scale for Hospitalized Elderly Patients demonstrated sound methodological quality and robust measurement properties in content validity assessment. This represents one of the most robust validation studies employing a comprehensive Exploratory Factor Analysis-Confirmatory Factor Analysis methodology, exhibiting excellent methodological quality and sufficient evidence of structural validity.
The C-RGA demonstrated sufficient internal consistency (Cronbach’s α ≥ 0.7). Its content validity was assessed through cognitive interviews with experts and the target population, employing rigorous methodology.
The GFI-C also demonstrated sufficient internal consistency. Its validation study exhibited excellent construct validity, employing an integrated EFA-CFA approach with confirmatory fit indices exceeding benchmarks (CFI > 0.90; RMSEA < 0.08). Furthermore, it showed sufficient outcome validity (≥75% hypothesis confirmation) in construct validity testing.
These 3 instruments have demonstrated sufficient content validity and internal consistency, covering a large-scale, diverse population of middle-aged and elderly individuals across varied settings. It is provisionally recommended that these 3 instruments be used in clinical and institutional settings, with further validation anticipated.
Grade B: Moderate Evidence
Twelve assessment tools14,16,27,30,31,34,36,38,40,44 -46 were categorized as Grade B, indicating moderate evidence quality with acceptable psychometric performance across multiple domains such as internal consistency and construct validity. Instruments including the TFI, 16 CFAI, 27 and CFI-36 30 demonstrated sound internal consistency and construct validity. However, their universality and clinical applicability remain constrained by the absence of rigorous cross-cultural validation and reactivity testing. Notably, Grade B recommended tools such as the APAFOP-C 45 warrant attention for their innovative, context-specific assessment paradigms, contrasting markedly with traditional questionnaire-based instruments. Whilst their measurement properties require further validation, such tools may offer a valuable alternative for assessing frailty in settings with specific spatial or clinical constraints.
In summary, Grade B scales may be cautiously employed in research and practice, particularly for frailty screening and early identification within specific contexts. Future studies should undertake confirmatory analyses, multicentre validation, and longitudinal responsiveness testing to strengthen their evidence base.
Grade C and D: Limited or Insufficient Evidence
Seven tools22,24 -26,35,37,43 were rated as Grade C, with a further 3 tools rated as Grade D,29,41,42 demonstrating limited or very low evidence quality reflecting serious methodological flaws in their underlying validation studies.
Common weaknesses in these tools include lack of content validity assessment, uncertain reliability outcomes (e.g. ICC < 0.7 or failure to report Cronbach’s α), and reliance on flawed analytical procedures such as using item correlations alone without structural validation. Their inconsistent psychometric results carry a risk of misclassification, potentially leading to inappropriate clinical interventions.
Consequently, without further rigorous psychometric evaluation, none of these 25 Chinese instruments can be considered sufficiently validated for widespread clinical or research application. The 3 Class A recommended instruments are only provisionally endorsed. Whilst Class C instruments should not be entirely excluded from potential clinical use, their use should be limited to settings with stringent validation requirements, specialized validation studies.
Discussion
The Landscape of Frailty Assessment and the Significance of This Study
This systematic review rigorously adhering to the COSMIN guidelines, comprehensively reveals for the first time the measurement characteristics and methodological quality of frailty assessment tools for the elderly population in China. Findings indicate that the increasing number of assessment tools reflects heightened awareness of frailty screening within Chinese clinical practice. However, compared to international evaluations of established scales, the evidence base for Chinese assessment tools remains fragmented. This situation underscores the urgent need for recommendations of high-quality, evidence-based tools, which cannot be met solely by relying on international English-language systematic reviews. Overall, the fragmented assessment landscape directly stems from prevalent methodological shortcomings in validation studies, offering limited guidance for clinicians and policymakers in effectively selecting tools.
Clinical and Theoretical Implications of Key Methodological Deficiencies
Macro-level analysis has revealed pervasive shortcomings that directly compromise the clinical utility and dissemination potential of these instruments. Content validity forms the bedrock of measurement properties, ensuring scales comprehensively represent the frailty construct being assessed. Robust development methodologies are paramount for establishing content validity in outcome measurement instruments (OMIs). 49 Nevertheless, we identified 3 critical theoretical and clinical shortcomings in the validation of Chinese assessment tools. Firstly, none of the included instruments assessed responsiveness. Given that frailty is a dynamic, reversible syndrome, tools incapable of reliably detecting changes following clinical interventions face fundamental limitations in their utility for evaluating treatment effects, monitoring prognosis, and allocating resources. Moreover, although 20 instruments were adapted from English versions, none reported cross-cultural validity or measurement invariance. This theoretical flaw means we cannot guarantee these tools assess the same concept of “frailty” across different cultural backgrounds, regions, or dialects among Chinese older adults. This limits the instruments’ generalizability and may introduce assessment bias. Finally, regarding criterion-related validity, some studies relied on measures such as FRAIL or the Fried frailty phenotype as “criterion reference” for validation. Future studies should clearly define appropriate criterion references and provide supporting evidence for the reliability and validity of these reference tools themselves. Concurrently, addressing the prevalent issue of incomplete EFA-CFA methodologies is essential to ensure rigorous construct validity. 50
Clinical Application and Policy Recommendations
Based on the evidence grading, this study provides clear guidance for clinical practice. Clinicians are strongly advised to prioritize the use of the Hospital-Based Frailty Assessment Scale for the Elderly, C-RGA, and GFI-C, which have received Grade A provisional recommendations. These tools possess the most robust evidence regarding content validity and other aspects, meeting the assessment needs of elderly populations across diverse care settings. Grade B tools may be employed in specific settings, subject to further validation. Furthermore, this study identifies a systematic evidence gap in cross-cultural validity and responsiveness for Chinese assessment tools, providing a clear roadmap for future research.
From a policy perspective, the current fragmented evidence landscape constrains national or regional health authorities in promoting unified frailty screening standards. Policymakers should encourage and fund multicentre, cross-regional collaborative studies, particularly longitudinal validation of Grade A tools concerning responsiveness, measurement error, and cross-cultural validity. Concurrently, researchers, clinicians, and public health institutions must strengthen collaboration, prioritizing the assessment of responsiveness and measurement error to enhance tool quality. This will ensure all instruments better align with the practical needs of China’s rapidly ageing and heterogeneous population, thereby promoting equity in health assessment.
Implications for Future Practice and Research
The use of frailty assessment instruments in older populations has proven effective for early detection and diagnosis of frailty, thereby enabling timely interventions and support for patients. 51 Findings from this study provide a foundation for researchers to select the most appropriate frailty assessment instruments for older adults. This selection enhances assessment accuracy, facilitates early and precise diagnosis of frailty, and ultimately improves patient outcomes. Furthermore, the results offer scientific grounding for continuous refinement of existing assessment instruments.
The review revealed that most frailty assessment instruments have been validated in overly narrow contexts. Many instruments target specific clinical populations—for example, the G8 instrument for older cancer patients, 36 TSFI for trauma patients, 40 and Frail-NH for institutionalized older residents 25 —with limited testing in general community populations. Additionally, all instruments were predominantly validated in urban Han Chinese samples, without evaluating measurement invariance across ethnic, regional, or linguistic groups. Given China’s vast demographic and sociocultural diversity, the absence of cross-cultural validation constitutes a significant gap, undermining the equity and generalizability of frailty screening efforts.
To address these challenges, future research should prioritize developing and validating instruments using rigorous methodologies with cultural adaptation and multilingual versions. Mixed-methods approaches that integrate qualitative inputs with quantitative testing are essential for enhancing content validity. Moreover, combining confirmatory factor analysis (CFA) with item response theory (IRT) can strengthen structural and construct validity. 52 While prior studies employed telephone-based assessment methods with limited sample sizes, they still provided valuable insights. Future research should integrate assessment methods with digital health platforms to improve usability.
Strengths and Limitations
This study possesses 3 strengths: (1) adherence to COSMIN guidelines with reporting according to the PRISMA-COSMIN checklist; (2) comprehensive search and literature collection across English and Chinese databases; and (3) evaluation of methodological quality and measurement properties of frailty instruments based on COSMIN guidelines.
Limitations include: (1) The scope was intentionally restricted to frailty assessment instruments developed or validated for use among Chinese-speaking older adults. Although this focus aligns with the study objective, it inherently introduces selection bias. Instruments validated exclusively in non-Chinese linguistic or cultural contexts were excluded, even when they may possess strong psychometric properties. This restriction limits the generalizability of the findings and may bias the evidence map toward instruments with China-specific validation studies. Future research should incorporate broader cross-cultural comparisons and evaluate measurement equivalence across linguistic groups to mitigate this limitation; (2) Methodological and measurement-property evaluations were based primarily on recently published studies. Earlier validation research—though potentially more rigorous—may have been omitted, which could influence the overall evidence grading; (3) some COSMIN criteria involve subjectivity, introducing potential risk of bias in corresponding evaluations. Future studies should continuously update this review as new validation evidence emerges, incorporate cross-cultural adaptation data where available, and enhance transparency to reduce misclassification and selection bias.
Conclusion
This systematic review, based on the COSMIN guidelines, assessed the methodological quality and psychometric properties of 25 instruments designed to evaluate frailty in older adults in China. Whilst the increasing number of assessment tools reflects heightened attention to frailty among China’s ageing population, findings indicate that most instruments exhibit significant methodological limitations. Only 3 scales—the Hospital-Based Frailty Assessment Scale for Elderly Patients, the C-RGA Scale, and the GFI-C Scale—demonstrated sufficient content validity and internal consistency to be provisionally recommended. However, none of the scales assessed cross-cultural validity/measure invariance or responsiveness, and measurement error issues were generally overlooked. Existing evidence indicates that China’s frailty assessment framework remains fragmented, offering limited guidance to clinicians and policymakers. While certain tools demonstrate potential in specific contexts, their broader application requires careful selection and further validation. Moreover, Category C tools should not be excluded from potential clinical use but should instead prompt more rigorous validation studies. To advance frailty research, future efforts should focus on developing/validating assessment tools through longitudinal studies, cross-cultural adaptations, and modern psychometric methods (e.g. item response theory), prioritizing responsiveness and applicability across diverse regions and populations. Researchers, clinicians, and public health institutions must collaborate to foster standardized frailty assessment tools tailored to China’s rapidly ageing and demographically heterogeneous population.
Supplemental Material
sj-docx-1-inq-10.1177_00469580251411630 – Supplemental material for Measurement Properties of Chinese-Language Frailty Assessment Instruments for Older Adults: A Systematic Review Following COSMIN Guidelines
Supplemental material, sj-docx-1-inq-10.1177_00469580251411630 for Measurement Properties of Chinese-Language Frailty Assessment Instruments for Older Adults: A Systematic Review Following COSMIN Guidelines by Hang Gao, Yaorong Liu, Biao Guo, Shuyun Zhao and Wei Du in INQUIRY: The Journal of Health Care Organization, Provision, and Financing
Supplemental Material
sj-docx-2-inq-10.1177_00469580251411630 – Supplemental material for Measurement Properties of Chinese-Language Frailty Assessment Instruments for Older Adults: A Systematic Review Following COSMIN Guidelines
Supplemental material, sj-docx-2-inq-10.1177_00469580251411630 for Measurement Properties of Chinese-Language Frailty Assessment Instruments for Older Adults: A Systematic Review Following COSMIN Guidelines by Hang Gao, Yaorong Liu, Biao Guo, Shuyun Zhao and Wei Du in INQUIRY: The Journal of Health Care Organization, Provision, and Financing
Footnotes
Acknowledgements
Thanks to each author for their guidance and support for this article.
Abbreviations
COSMIN: COnsensus-based Standards for the selection of health status Measurement Instruments
GRADE: Grade of Recommendations Assessment, Development and Evaluation
KCL-SC: Kihon Checklist-Simplified Chinese
TFI: Tilburg Frailty Indicator
EFS: Edmonton Frailty Scale
FRAIL: Fatigue, Resistance, Ambulation, Illness and LossF
Frail-NH: Frail Nursing homes
VES-13: Vulnerable Elders Survey
CFAI: Comprehensive Frailty Assessment Instrument
CSHA-CFS TV: Chinese-Canadian Study of Health and Aging Clinical Frailty Scale Telephone Version
CFI-36: Chinese Frailty Indicator
C-RGA: Chinese Version of Rapid Geriatric Assessment
G8: Geriatric 8-Item Questionnaire
aCGA: Abbreviated Comprehensive Geriatric Assessment
CP-FI-CGA: Care Partner Frailty Index Comprehensive Geriatric Assessment
GFI: Groningen Frailty Indicator
TSFI: Trauma-Specific Frailty Index
CFS: Clinical Frailty Scale
C-RNLI: Reintegration to Normal Living Index Chinese version
JFS-C: Japan Frailty Scale Chinese version
APAFOP-C: Assessment of Physical Activity in Frail Older People
PFS: Pittsburgh Fatigability Scale
Ethical Considerations
Ethical approval was not required for this study because it is a systematic review based on previously published data.
Consent to Participate
Informed consent was not applicable as this study did not involve human participants directly.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by the Ministry of Education’s Fund for Humanities and Social Sciences(24YJAZH032).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Information can be obtained from the supporting information or by contacting the corresponding author.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
