Abstract
Purpose:
Many standardized outcome measures exist to measure recovery after surgical fixation of distal radius fractures, however, choosing the optimal instrument is difficult. We evaluated responsiveness, ceiling/floor effects, and criterion validity over multiple time intervals across a 2-year follow-up period for six commonly used instruments.
Methods:
A total of 259 patients who received open reduction and internal fixation for distal radius fractures between 2012 and 2015 were recruited. Patients were administered the Patient-Rated Wrist Evaluation (PRWE), Shortened Disabilities of the Arm, Shoulder and Hand questionnaire (QuickDASH), Green and O’Brien score (Cooney modification) (CGNO), Gartland and Werley score (Sarmiento modification) (SGNW), flexion-extension arc (FEArc), and grip fraction test (GripFrac) at 1.5, 3, 6, 12, and 24 months postoperatively. Responsiveness was evaluated by calculating standardized response means (SRM) and Cohen’s d effect sizes (ES), and by correlating each instrument’s change scores against those of QuickDASH and PRWE, which were also used as external comparators to assess criterion validity. Ceiling/floor effects were calculated for all measures at each time point.
Results:
SRM (1.5–24 months) were 1.81, 1.77, 1.43, 1.16, 2.23, 2.45 and ES (1.5–24 months) were 1.81, 1.82, 1.95, 1.31, 1.99 and 2.90 for QuickDASH, PRWE, CGNO, SGNW, FEArc, and GripFrac respectively. Spearman correlation coefficients against QuickDASH at 24 months were: 0.809, 0.248, 0.563, 0.285, and 0.318 for PRWE, CGNO, SGNW, FEArc, and GripFrac respectively. Significant (>15% of patients reaching maximum score) ceiling effects were observed before 6 months for PRWE and SGNW.
Conclusions:
Our evidence supports the use of QuickDASH, PRWE, FEArc and GripFrac up to 6 months postsurgery, and QuickDASH and PRWE after 6 months.
Level of evidence:
Level II.
Keywords
Introduction
Standardized outcome measurement in orthopedics is essential for distinguishing the effects of different treatment methods and aiding research to produce better ones. Distal radius fractures are one of the most common fractures, occurring from a variety of mechanisms in people of all ages, from simple falls to high-energy sports injuries. 1 A wide range of standardized instruments are available to evaluate patient outcomes for these and other upper limb injuries, however choosing the optimal tool to do so can be challenging.
Different instruments measure different dimensions of clinically relevant outcome variables. Commonly measured outcomes of distal radius fractures include joint alignment, range of motion, strength, pain, task-specific functioning, perceptions of daily living, and emotional and mental health. These variables comprise five levels of quality of life: (1) biological and physiological status, (2) symptoms, (3) function, (4) general health perceptions, and (5) overall quality of life. 2 Because each patient is unique in his or her values, concerns, and expectations, it is difficult to strictly define a “best measure.” A questionnaire that measures the ability to perform heavy chores may be of little relevance to a low-demand elderly patient, and a measure that reveals a perfect range of motion may overestimate the patient’s actual wellbeing. Due to the subjective nature of quality of life, standardized measurement becomes increasingly complex at each additional level, 2 and yet increasingly relevant to the patient.
Choosing the ideal instrument is further complicated by the fact that the importance of certain outcome variables changes across the rehabilitation period. In the early stages following surgery, pain tends to be the most salient outcome, whereas the ability to perform tasks becomes of greater concern as healing progresses and pain subsides. 3 Return to work could be a priority immediately following treatment, or not at all, depending on the patient. Thus, we hypothesize that certain scoring systems are better suited than others for distinguishing good and bad outcomes across different phases of recovery.
We assessed the performance of six commonly used outcome instruments in terms of their responsiveness, ceiling/floor effects and criterion validity over a 2-year period following surgical treatment of distal radius fractures. Based on their popularity in the literature and relevance to our daily practice, the instruments selected were: (1) the Shortened Disabilities of the Arm, Shoulder and Hand questionnaire (QuickDASH), 4 (2) the Patient-Rated Wrist Evaluation (PRWE), 5 (3) wrist flexion and extension range of motion arc (FEArc), (4) handgrip strength fraction (GripFrac), (5) the Cooney modification of the Green and O’Brien score (CGNO), 6 and (6) the Sarmiento modification of the Gartland and Werley score (SGNW). 7,8 We summarize the evidence for the use of each instrument across the rehabilitation period based on an adapted set of predefined criteria.
Methods
Study protocol
This prospective cohort study was carried out in a publicly funded university healthcare institute in a high-income region (GDP per capita USD$48,915). 9 Patients who received open reduction and internal fixation (ORIF) for a distal radius fracture between July 2012 and June 2015 were screened for enrollment. Inclusion criteria were: (1) AO/OTA classification type 23.A, B, or C 10 distal radius fracture, (2) treatment within 3 weeks of injury, (3) treatment by volar or dorsal plate fixation, (4) willingness to participate in a protocol-driven rehabilitation schedule, and (5) expected capacity to complete multiple consecutive questionnaires for up to 1 year. Exclusion criteria were: (1) treatment delayed beyond 3 weeks of injury, (2) pathological fracture, (3) polytrauma (Injury Severity Score >16), (4) concomitant upper limb trauma, and (5) compromised cognitive state. Ethical approval was waived for non-interventional studies on routinely collected clinical data by our institutional review board at the time of the study. Verbal informed consent was obtained from all patients prior to study inclusion. All patient-rated outcome instruments were administered under supervision by trained research personnel, and all observer-rated items were graded by physicians or occupational therapists.
Patients followed a standardized rehabilitation regime at the same designated outpatient rehabilitation center under the supervision of physiotherapists and occupational therapists. Rehabilitation protocol dictated early gentle active range of motion training before 1.5 months, progressing to passive range of motion and strengthening exercises after 1.5 months if fracture union was evident on AP and lateral radiographs.
All patients were seen in the outpatient follow-up clinic at 1.5, 3, 6, 12 and 24 months post-surgery with all six outcome instruments administered at each visit. The study period of 24 months was chosen based on our hypothesis that the change in outcome measures might not plateau before 2 years.
Outcome measures
The Shortened Disabilities of the Arm, Shoulder and Hand questionnaire (QuickDASH)
QuickDASH is a patient-rated measure of function, symptoms, and quality of life pertaining to the upper limb. 4 It was developed as a more convenient abbreviation of the original Disabilities of the Arm, Shoulder and Hand (DASH) questionnaire, a highly validated measure of upper extremity functional status. 11 QuickDASH consists of 11 self-administered items with a final score ranging from 0 (least disability) to 100 (most severe disability). It has been extensively validated in the literature and has demonstrated good concurrent validity and responsiveness compared to the original DASH in the context of distal radius fractures. 12 A language- and culture-validated version of the QuickDASH that matched the study population 13 –15 was used in our study.
Patient-Rated Wrist Evaluation (PRWE)
PRWE is a patient-rated, wrist-specific instrument developed specifically to measure pain and disability in patients after distal radius fracture. 5 It consists of two subscales: Pain, which contains 5 items rated from 0–10, and Function, which consists of 10 items rated from 0–10. The Function score is divided by 2 and added to the Pain score to give a total score out of a maximum of 100 points, with higher scores indicating poorer results. PRWE is commonly used and extensively validated in the literature, 16 although in this study we used a local language- and culture-matched version that has been validated to a lesser extent. 17
Flexion-extension arc range (FEArc) and grip strength fraction (GripFrac)
Range of motion and grip strength are clinician-rated tests that have been shown to be sensitive to change and to significantly predict DASH scores. 18 For the FEArc test, wrist joint range of motion was manually assessed via a goniometer. Grip strength was assessed via hydraulic hand dynamometer (JAMAR, Bolingbrook, IL) with the patient seated, elbow flexed to 90 degrees, and forearm in the neutral position. GripFrac was calculated as the grip strength fraction of the injured side to the contralateral side, expressed as a percentage. Hand dominance was not considered in the GripFrac measurement as we aimed to evaluate the measurement properties of the instrument as a whole, and due to a lack of reliable pre-injury data for grip strength. Both FEArc and GripFrac were measured and documented by an occupational therapist with an average of three trials recorded.
Cooney modification of the Green and O’Brien score (CGNO)
CGNO is a clinician-rated, wrist-specific assessment measuring pain (25 points), functional status (25 points) range of motion (25 points), and grip strength (25 points) (Online Appendix A). 6 This measure has not been extensively validated although it is credited for its simplicity and ease of use. 19 The final score is graded as Excellent (90–100 points), Good (80–89 points), Fair (65–79 points) or Poor (<65 points).
Sarmiento modification of the Gartland and Werley score (SGNW)
SGNW is a mixed clinician- and patient-rated, wrist-specific assessment system (Online Appendix B). 7 The tool consists of clinician-rated items including residual deformity (3 points), range of motion and grip strength (5 points), nerve compression (3 points), finger stiffness (2 points), and arthritis change (5 points), as well as a patient-rated subjective evaluation (5 points). Several different methods of scoring have been reported in the literature. 20 Our method allowed for a maximum score of 24 points, with 0–2 points being graded as “Excellent,” 3–8 points as “Good,” 9–20 points as “Fair” and 21 or more points as “Poor.” This measure has commonly been reported in the literature despite a lack of validation. 21
Statistical analysis
Standardization of scores
To enable direct comparison of mean scores between outcome measures, each score was standardized to a score of 0–100, with 0 representing lowest function and most severe symptoms, and 100 representing best function and least severe symptoms. For the purposes of this study, the maximum value obtained from the sample (140 degrees) was considered the ceiling for the standardized FEArc scale. For GripFrac, values greater than 100 were truncated to a maximum score of 100. The conversions were calculated as follows:
Responsiveness
Responsiveness is the ability of an outcome measure to detect a change in the construct of interest. 22 In addition to validity and reliability, responsiveness is an essential property of repeated outcome measures when a change in the construct of interest is expected to have occurred. 23 Responsiveness was evaluated using a mixed distribution- and criterion-based approach. In a distribution-based approach, the distribution and change scores of the sample are analyzed. 24 In a criterion-based approach, as defined by the COSMIN panel, change scores are correlated against those of a comparator measure that is presumed to be a “gold standard.” 25
For the distribution-based approach, Cohen’s d effect size (ES) and standardized response mean (SRM) were calculated for the six outcome measures for the 1.5- to 3-month, 3- to 6-month, 6- to 12-month, 12- to 24-month and 1.5- to 24-month time intervals. Both indices were calculated since there is no consensus on which index is superior. 26 ES was calculated as:
where
where
where
For the criterion-based approach, change scores for FEArc, GripFrac, CGNO, and SGNW were calculated for the 1.5- to the 24-month interval, and correlated against corresponding change scores for QuickDASH and PRWE using Spearman’s rho correlation coefficient. Correlation coefficient values were taken to represent high (>0.7), moderate (0.5–0.7), low (0.3–0.5) or negligible (<0.3) correlations. 28 QuickDASH and PRWE were considered valid comparators based on their extensive validation and demonstrated responsiveness in the literature. 11,12 Higher correlation of change scores with QuickDASH and PRWE was considered better evidence for responsiveness to changes in pain and function. Furthermore, all change scores and effect sizes were expected to occur in the direction indicating improved function and/or symptoms, since the true change was assumed to occur in this direction for all outcome measures at all time intervals. Values in the opposite direction would be taken as evidence for poor responsiveness for a given outcome measure.
Ceiling and floor effects
Ceiling and floor effects occur when a substantial proportion of patients obtain the maximum or minimum score for a given scoring system. Instrument scales should be “in-range” in order to discriminate outcomes at patients’ best or worst statuses. The proportion of patients reaching the maximum and minimum scores for each scoring system were calculated for each follow-up interval. Ceiling or floor effects were considered significant if 15% of patients or more reached the upper or lower bounds of the scale, as per the definition by McHorney. 29 Ceiling/floor effects greater than 15% at or prior to 6 months follow-up were considered unsatisfactory as full recovery was not expected to occur by this time in a majority of cases. It should be noted that while ceiling effects are expressed for GripFrac for the purposes of this study, in practice there is no reliable maximum score since the relative strength of the contralateral wrist varies by patient.
Criterion validity
In order to “double-check” for any unexpected non-agreements, all outcome instruments were correlated against PRWE and QuickDASH scores for each follow-up point using Spearman’s rho correlation coefficient, with coefficient values taken as high (>0.7), moderate (0.5–0.7), low (0.3–0.5) or negligible (<0.3) correlations. 28 PRWE and QuickDASH were assumed to be valid comparators due to their extensive validation in the literature. 11,12
Handling of missing data
As there currently exists no consensus on the best method for handling missing data when assessing PROM measurement properties, 30 missing instrument responses were excluded pairwise to minimize data exclusion.
Results
A total of 259 patients (147 female, 112 male) were recruited. Mean age was 55.6 years (range 16–86 years). The number of patients with complete data at 12 and 24 months was 209 and 111, respectively. The number of patients who completed a minimum of two, three, four or all five follow-ups was 251, 186, 130, and 52, respectively. Fracture characteristics are presented in Table 1.
Fracture characteristics of 259 patients.
Patient outcomes
All six outcome measures demonstrated improvements across all time intervals. Table 2 shows the non-standardized mean and SD of the six measures at each follow-up point. Figure 1 shows the mean trends of the six outcome measures on the standardized 1–100 scale with 95% confidence intervals.
Mean, SD (raw scores) and number of patients with complete data at each follow-up point for the six outcome measures.

Converted (0–100) mean trends across follow-up with error bars representing 95% confidence intervals.
Responsiveness by distribution-based approach
All outcome measures demonstrated mean differences in the direction indicating improved function and/or symptoms, as hypothesized. All outcome measures had moderate to high (0.5 or higher) ES and SRM between 1.5 and 3 months. Between 3 and 6 months, FEArc, GripFrac and CGNO had moderate to high ES and SRM, while QuickDASH, PRWE and SGNW had one or both effect indices fall into the low range. By 6 months, all measures had low ES and SRM except GripFrac, which had an SRM of 0.59. SGNW had the lowest effect sizes overall (1.5–24 months). These results alone suggest that SGNW is not responsive to change in the context of distal radius fractures. ES and SRM values for each follow-up interval are presented in Table 3. Trends in ES and SRM across time intervals are displayed in Figure 2.
Cohen’s d (d) and standardized response mean (SRM) of outcome measures for each follow-up interval.a
a Directionality is not displayed as all effect sizes were in the expected direction indicating better functionality/decreased pain.

Cohen’s d and SRM across follow-up intervals for the six outcome measures.
Responsiveness by criterion-based approach
PRWE and QuickDASH change scores correlated highly with each other overall (rs = 0.754) (Table 4). FEArc and GripFrac change scores showed moderate correlations with QuickDASH and low correlations with PRWE. CGNO and SGNW change scores showed low to negligible correlations with both QuickDASH and PRWE, suggesting a lack of responsiveness in these measures.
Spearman correlation coefficients of 1.5- to 24-month change scores with QuickDASH and PRWE change scores.a
a All correlations were statistically significant.
Ceiling and floor effects
Ceiling effects generally increased at each subsequent time interval. Some patients who reached the ceiling for a given score were subsequently lost to follow-up, particularly after 12 months. SGNW had the greatest ceiling effects at all time intervals. PRWE and SGNW demonstrated significant ceiling effects at or prior to 6 months follow-up. No significant floor effects were observed for any outcome measure. Proportions of ceiling and floor effects are presented in Figure 3.

Ceiling and floor effects of the six outcome measures.
Criterion validity
All outcome measures displayed highly statistically significant correlations (p < 0.01) with QuickDASH and PRWE at all follow-up points, except FEArc with QuickDASH at 24 months (p < 0.05). Correlations with QuickDASH and PRWE generally decreased with time. PRWE correlated highly with QuickDASH at all follow-up points. FEArc showed low correlations with QuickDASH and PRWE at all time points. GripFrac, CGNO, and SGNW showed low to moderate correlations with QuickDASH and PRWE. Values for Spearman correlations with QuickDASH and PRWE are presented in Table 5. Scatter plots of relationships between criterion comparators and other outcome measures are presented in Figure 4. A summary of evidence for use of each outcome measures is provided in Table 6.
Spearman correlation coefficients of raw instrument scores with QuickDASH and PRWE at each follow-up point.a
a Directionality is not displayed as all effect sizes were in the expected direction indicating better functionality/decreased pain. All correlations were significant at the 0.01 level except where indicated.
b Correlation is significant at the 0.05 level (two-tailed)

Scatter plots of correlations between criterion comparators (QuickDASH and PRWE) and other outcome measures. Standardized (0–100) scores for QuickDASH, PRWE, CGNO and SGNW were used, whereas raw scores were used for FEArc and GripFrac.
Summary of evidence for use.
+: evidence for use; −: evidence against use; ∅: neutral evidence.
Discussion
Based on our evaluation of responsiveness, ceiling/floor effects, and criterion validity, there is good evidence to support the use of QuickDASH, PRWE, FEArc and GripFrac up to 6 months postsurgery to evaluate recovery following distal radius fractures. After 6 months, our data support the use of QuickDASH and PRWE only, as there is negative evidence for responsiveness or criterion validity for the other measures. Our study provides evidence for just a few of the measurement properties that are of interest when selecting an outcome measure for clinical or research use. Recently, efforts have been made to reach consensus on which measurement properties of outcome measures are most important and how these properties should be measured. 31,32 Optimal selection of outcome measures ultimately relies on the accumulation of high-quality evidence and critical consideration of available information. We demonstrate that responsiveness and ceiling/floor effects vary across different phases of the rehabilitation period. While many previous studies have followed up to 6 or 12 months, this period may not be of sufficient length to evaluate final patient outcomes after treatment of distal radius fractures as it is evident from our data that patients continue to improve beyond 12 months.
Responsiveness, also known as longitudinal validity, is the ability of an outcome measure to detect a meaningful change in the construct of interest. 25 Traditional distribution-based measures of responsiveness include effect sizes, such as Cohen’s d (ES) and the standardized response mean (SRM). 33 These indices quantify the signal-to-noise ratio for the observed change in an outcome measure. More recently, it has been argued that effect sizes alone are inadequate for determining an instrument’s responsiveness since they contain no information about whether the observed change in an instrument is due to a corresponding change in the construct of interest. 34 The COSMIN panel, therefore, proposes two valid methods for evaluating responsiveness: first, by comparing change scores of an instrument to those of a “gold standard” or external criterion (criterion approach), and second, by testing predefined hypotheses about the magnitude and direction of correlations of an instrument’s change scores against those of other measures which have been shown to be adequately responsive (hypothesis-based approach). 22 Neither approach provides a quantitative measure of an instrument’s responsiveness; as the panel notes: “There is no criterion to decide whether an instrument is valid or responsive. Assessing validity or responsiveness is a continuous process of accumulating evidence.” 34
We employed a mixed distribution- and criterion-based approach to provide evidence for or against the responsiveness of the outcome measures. In our view, a responsive outcome measure should (1) have an adequately high signal-to-noise ratio in order to detect a change, and (2) change correspondingly to the construct of interest. Both conditions are necessary, as a large effect size may not necessarily correspond to a large change in the construct of interest, and a measure that detects the change in the desired construct may be subject to a high degree of variance that can lead to uncertainty in measurements. We, therefore, calculated both ES and SRM, two of the most commonly used effect sizes, as well as correlated the change scores of the instruments against those of the QuickDASH and PRWE. ES and SRM provide slightly different values due to differences in the calculation of the denominator for each ratio, however general trends over time were the same between the indices.
QuickDASH and PRWE are two of the most commonly used PROMs following distal radius fractures. Both measure what are generally considered to be important aspects of wrist health, and have been shown to have good validity, reliability and responsiveness in the context of distal radius fractures. 11,12 Correlation of the change scores of these two PROMs against each other confirmed a high level of agreement (rs = .754) and further justifies the use of either as a valid comparator for the other four measures.
Both FEArc and GripFrac are indicated to be responsive by both the distribution- and criterion-based approaches, particularly in the early recovery period, where they demonstrated large effect sizes in addition to moderate correlations with QuickDASH and PRWE. CGNO is also indicated to be responsive by both approaches, although with lower effect sizes in the 1.5- to 3-month interval and lower correlations with QuickDASH and PRWE than had FEArc and GripFrac. SGNW is not indicated to be responsive by either approach, particularly after 6 months, where it demonstrated the lowest effect sizes and correlations with QuickDASH and PRWE as compared to the other instruments.
It should be noted that multiple modifications and conflicting scoring methods for the Gartland and Werley system have been reported in the literature, which has led to confusion among authors. 20 Our scoring method allows for a maximum of 24 points, whereas other methods might have led to different results. However, based on the findings of ours and previous studies, there appears to be overall negative evidence for the responsiveness of the Gartland and Werley score. 11 In the presence of other instruments that have demonstrated good validity, reliability, and responsiveness, there appears to be little justification for its use when measuring recovery following distal radius fractures.
Ceiling and floor effects further complicate outcome assessment as they hinder the ability of outcome measures to detect improvement or deterioration once the maximum or minimum score has been reached. 29 Floor effects were not significant in any of the evaluated instruments. Ceiling effects were observed, particularly at later time intervals; however, it can generally be expected that a larger proportion of patients will reach the upper bounds of an outcome measure at later stages of the recovery period.
Our study has several limitations: First, the results apply to the evaluated instruments as estimators only of the constructs measured by the comparators (i.e. function and symptoms as measured by QuickDASH and PRWE). Responsiveness and criterion validity depend on the construct of interest and evaluation of these properties yields different results depending on the chosen comparator measures. 25 Generalizability of our study is further limited to patients who have suffered distal radius fractures and been treated via ORIF. The applicability of these instruments to patients with different injuries or interventions requires evaluation through separate studies. Finally, our study is limited by the lack of an anchor measure, which typically involves a patient-rated measure of subjective change (i.e. “better,” “worse,” or “unchanged”) between time points. This provides an external criterion that more closely relates measured change to patient experience and can be used to estimate the minimal clinically important difference (MCID) of an instrument. 33
In conclusion, our evidence supports the use of QuickDASH, PRWE, FEArc and GripFrac up to 6 months postsurgery, and QuickDASH and PRWE after 6 months. Other measurement properties of outcome measures which lay outside the scope of this study remain relevant and additional high-quality evidence should be considered to fully inform the clinician’s choice of instrument.
Supplemental material
Supplemental Material, sj-docx-1-osj-10.1177_2309499020971866 - A comparison of six outcome measures across the recovery period after distal radius fixation—Which to use and when?
Supplemental Material, sj-docx-1-osj-10.1177_2309499020971866 for A comparison of six outcome measures across the recovery period after distal radius fixation—Which to use and when? by Christian Fang, Evan Fang, Dennis KH Yee, Kenny Kwan, Gladys Leung and Frankie Leung in Journal of Orthopaedic Surgery
Supplemental material
Supplemental Material, sj-docx-2-osj-10.1177_2309499020971866 - A comparison of six outcome measures across the recovery period after distal radius fixation—Which to use and when?
Supplemental Material, sj-docx-2-osj-10.1177_2309499020971866 for A comparison of six outcome measures across the recovery period after distal radius fixation—Which to use and when? by Christian Fang, Evan Fang, Dennis KH Yee, Kenny Kwan, Gladys Leung and Frankie Leung in Journal of Orthopaedic Surgery
Footnotes
Acknowledgments
The authors wish to thank the AO Trauma Asia-Pacific Research Grant for funding this study. They also wish to thank Margaret Ho, Elaine Tian, Lorraine Cheung, Grace Ho and Kathine Ching for data collection and keeping.
Author contributions
CF: guarantor, principal investigator, manuscript preparation, data analysis, performing surgeries and follow-ups; EF: medical writer, manuscript preparation, data analysis; DY: monitoring visits, performing surgery and follow-ups, data analysis, manuscript preparation; KK: research grant application, manuscript review; GL: performing therapy and ensuring quality of scores, manuscript review; FL: study and grant application, manuscript preparation, study supervision.
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: CF and FL are speakers for DePuy Synthes.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded by the AO Trauma Asia Pacific Grant [AOTAP 12-09].
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
