Abstract
Background:
Few studies have investigated the relationship between the Patient-Reported Outcomes Measurement Information System (PROMIS) and legacy patient-reported outcome (PRO) measurements.
Purpose:
To compare patient-reported outcomes from the PROMIS physical function (PF) and upper extremity (UE) platforms against one another and against legacy PRO measurements to assess the potential strengths and weaknesses of the National Institutes of Health PROMIS initiative and expand on the use of PRO measurements in clinical orthopaedic practice.
Study Design:
Systematic review; Level of evidence, 4.
Methods:
A systematic search of the PubMed, Embase, and Cochrane Library databases was conducted following PRISMA (Preferred Reporting Items for Systematic Meta-Analyses) guidelines. All English-language studies published between 2017 and 2019 using PROMIS to evaluate patients for shoulder surgery were analyzed. PROs were compared based on survey administered and the shoulder condition being investigated. Study quality was evaluated using the Modified Coleman Methodology Score and the Methodological Index for Non-Randomized Studies score.
Results:
We included 9 studies (5 studies were level 2; 3 studies were level 3; 1 study was level 4) encompassing a total of 1130 patients (60.2% male; mean age, 52.6 ± 16.5 years; mean BMI, 29.8 ± 2.8 kg/m2). Of these, 6 studies administered the PROMIS PF, and 6 studies administered the PROMIS UE. The strongest correlation was between PROMIS PF computer adaptive test and the 36-Item Short Form Health Survey Global Health (SF-36 GH) (r = 0.75). The highest overall correlation with the PROMIS UE was found with the American Shoulder and Elbow Surgeons (ASES) Shoulder Score (r = 0.70). The lowest correlations were found between PROMIS PF and the Marx Shoulder Activity Scale (r = 0.08) and the PROMIS UE and the Marx Shoulder Activity Scale (r = 0.18).
Conclusion:
From available data, the PROMIS PF and PROMIS UE were most closely correlated with outcomes measured by the SF-36 GH. The PROMIS UE alone was most correlated with ASES Shoulder Score. Thus, either PROMIS PF or UE may provide a possible alternative to legacy PRO measurements but with a lower overall number of questions and higher generalizability. Future research should compare the time and question burden of the various PROMIS platforms with a more consistent evaluation of standard PRO measurements.
Patient-reported outcome (PRO) measurements play an important role in evaluating a patient’s perspective on clinical care, clinical research, and health care policy. However, with the development of new PRO instruments, patients may face “survey fatigue” from question burden, and providers may face the challenge of which PRO instrument to administer and to whom, as well as potential ceiling effects, especially as patients age. 4,5,13 To mitigate some of the limitations faced by earlier generation PRO tools, the National Institutes of Health (NIH) has developed a platform PRO measurement that is applicable to the general population and can be administered and scored in a standardized fashion, thereby allowing for the comparison across a wider range of clinical scenarios. 13,18 This initiative, known as the Patient-Reported Outcomes Measurement Information System (PROMIS), attempts to overcome the criticism of many common forms being administered and created without proper statistical validation. 23 There are 2 possible administration platforms: the Short Form (SF), which ranges from 2 to 8 questions, and the computer adaptive test (CAT), which is a computer algorithm–based questionnaire that outputs questions based on previous question input—therefore, the number of questions administered is variable based on the patient. 13
PROMIS can be subdivided into either the PROMIS physical function (PF) or the PROMIS global health (GH). The PROMIS PF considers social function, pain, fatigue, and emotional distress. The PROMIS GH considers overall physical function, pain, fatigue, emotional distress, and social health.
Shoulder injuries and diseases are common conditions seen by orthopaedists. 22 Often, surgical intervention is required to help a patient regain function and relieve pain in conditions, such as rotator cuff disease, osteoarthritis, and shoulder instability. Given the potential overall increase in quality of life after surgical repair while still considering the potential decrease in quality of life immediately after surgery, PRO instruments can play a role in assessing a patient’s perception of the clinical intervention both pre- and postoperatively. Additionally, PRO measurements provide meaningful data on the success of a procedure’s outcome based on PF scores, such as pain intensity, pain interference (PI), fatigue, and sleep disturbance—all of which encompass the PF domains in the NIH PROMIS. 3,4,12
Traditionally, shoulder-related PROs have been assessed via questionnaires, such as the Simple Shoulder Test (SST) and the American Shoulder and Elbow Surgeons (ASES) Shoulder Score, often termed legacy PROs. 4,14 However, such PRO instruments may lack the generalizability of PROMIS and can be time-consuming, display ceiling effects, and pose an increased question burden. Considering that the goal of the PROMIS survey is to broaden outcomes reporting among various diagnoses and assess big-picture outcomes in a timely, consistent fashion, its use in patients with orthopaedic shoulder conditions holds significant potential for quantifying patient perspective as well as clinician performance. The purpose of this review is to assess the performance of different PROMIS platforms against one another and compare them to legacy PRO measurements in patients undergoing surgical intervention for shoulder conditions.
Methods
This systematic review was conducted by 2 independent reviewers (I.S. and J.-R.H.S.) via the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines following the appropriate PRISMA checklist and template. Each reviewer searched and documented results from 3 databases, PubMed, Embase, and the Cochrane Library, using the following search terms with no additional exclusionary criteria: ((“Orthopedic” OR “Orthopaedic” OR “Orthopedics” OR “Orthopaedics”)) AND (“PROMIS” OR “Patient Reported Outcomes Measurement Information System”) AND (“shoulder” OR “shoulders”). All English-language studies published between 2017 and 2019 using PROMIS to evaluate patients for shoulder surgery were analyzed. A total of 123 articles were identified through the database search, and upon the removal of duplicates and abstract screening, 30 studies were determined to be eligible. Eligibility criteria included studies using any PROMIS questionnaire to report outcome measures in patients undergoing surgical intervention for shoulder disease or injury; thus, studies not using PROMIS that initially met the search criteria were eliminated. Exclusion criteria included studies not using the PROMIS score, studies not looking at shoulder outcomes, and studies not looking at patients pre- or postoperatively. After a final screening eliminating studies not reporting on PROMIS outcomes, studies not reporting patient-reported outcomes, or studies with incomplete data, 9 studies were determined to be eligible for the review (Figure 1). Data extraction was performed independently (I.S.) and reviewed by the other reviewer (J.-R.H.S.). Funding and third-party involvement were not required to obtain any of the collected data.

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram in patients undergoing surgical intervention for shoulder conditions using the Patient-Reported Outcomes Measurement Information System (PROMIS) to report outcome measure.
Reporting Outcomes
The outcomes extracted and pooled from the studies included demographic data (age, percentage male, body mass index [BMI]), shoulder condition or main concern, and timing of PROMIS survey administration (pre- or postoperatively). Additional data included scores on all PROMIS domains (PF, CAT, and upper extremity [UE]), ASES Shoulder Score, Marx Shoulder Activity Scale, 36-Item Short Form Health Survey (SF-36) GH or PF, Euro-Qol 5 Dimensions (EQ-5D), Western Ontario Rotator Cuff (WORC) index, Western Ontario Shoulder Instability (WOSI) index, Western Ontario Osteoarthritis Shoulder (WOOS) index, Single Assessment Numeric Evaluation, postoperative visual analog scale for pain, and SST.
Study Method Assessment
Study quality was assessed via the Methodological Index for Non-Randomized Studies (MINORS) score. MINORS is a bias assessment tool based on 8 criteria for noncomparative studies and 12 criteria for comparative studies. Scores range from 0 to 16 for comparative studies and from 0 to 24 for noncomparative studies. 21 For each specified criterion, an independent, numerical score is assigned: 0 (not reported), 1 (reported but inadequately), or 2 (reported adequately).
The quality of study method was evaluated through use of the Modified Coleman Methodology Score (MCMS), which is scored from 0 to 100. Scores of 85-100 are excellent; 70-84, good; 55-69, fair; and < 55, poor. 7 For correlation coefficients, r ≥ 0.7 was considered a strong or high level of correlation.
Statistical Analysis
Given the limited specifications of the reported outcome measures and the heterogeneity of the included studies, no calculable data or meta-analyses are presented in this review. The intent of this review was to group descriptive statistics, demographics, categorical variables, and outcome measures already presented in each respective study.
Results
Patient Demographics
We included 9 studies (5 studies were level 2, 1,2,6,11,17 3 studies were level 3, 9,19,20 and 1 study was level 4 15 ) with a total of 1130 patients (60.2% male; mean age, 52.6 ± 16.5 years; mean BMI, 29.8 ± 2.8 kg/m2). 1,2,9,11,19 Of these, 6 studies 1,2,6,9,11,17 administered the PROMIS PF, and 6 studies 1,2,9,11,15,19 administered the PROMIS UE.
A total of 569 patients had a primary concern and/or diagnosis of rotator cuff disease (3 studies) 1,17,19 ; 274 patients had concerns or a diagnosis of shoulder arthritis or glenohumeral osteoarthritis (3 studies) 6,9,20 ; 142 patients had concerns of recurrent shoulder instability (2 studies) 2,11 ; and 145 patients had a subscapularis (SSc) tear (1 study) 15 (Table 1). The mean age at the time of treatment was 56.6 years for rotator cuff disease, 62.7 years for arthritis, 24.6 years for recurrent shoulder instability, 67.6 years for unspecified condition requiring total shoulder arthroscopy, and 59.3 years for SSc repairs (Table 1).
Characteristics of Included Study Participants a
a All values (and SDs when available) are reported based on what was provided in each study. BMI, body mass index; GH OA, glenohumeral osteoarthritis; NR, not reported; RC, rotator cuff; SI, shoulder instability; SSc, subscapularis.
Study Method Assessment
Table 2 presents the MINORS and MCMS scores of the included studies. The average MINORS score for noncomparative studies was 12.4, indicating overall adequate reporting on the specified criteria. The average MCMS score was 81.8, indicating overall good methodological quality of the included studies.
Country of Origin
All of the included studies 1,2,6,9,11,15,17,19,20 took place in the United States, and their authors specified where they received approval to conduct their study (Table 2).
Included Study Characteristics a
a Study design was obtained from the respective publications except for the 2 cross-sectional studies, Patterson et al and Saad et al, whose levels were determined by the reviewers (I.S., J.-R.H.S.) for this review. AJSM, American Journal of Sports Medicine; JSES, Journal of Shoulder and Elbow Surgery; MCMS, Modified Coleman Methodology Score; MINORS, Methodological Index for Non-Randomized Studies; NR, not reported; OJSM, Orthopaedic Journal of Sports Medicine; PDR, procedure date range (MM/YYYY).
Conflict of Interest
All studies 1,2,6,9,11,15,17,19,20 included disclosures or disclaimers of potential conflicts of interest.
Surgical Technique
Two studies 9,15 provided details on the operative procedure undertaken. Dowdle et al 9 analyzed only patients undergoing primary total shoulder arthroplasty, so they had no comparative groups. Monroe et al 15 compared PROs among patients undergoing isolated and combined arthroscopic SSc tendon repairs and found that outcomes were similar irrespective of the size of the SSc tear and regardless of whether there were concurrent tears to the supraspinatus or infraspinatus; those investigators also reported that biceps abnormality was common in patients with rotator cuff tears. Each study discussed what intervention was undertaken (Table 3).
Study Populations and Outcomes a
a See Appendix Table A1 for a summary of the included studies. ASES, American Shoulder and Elbow Surgeons; CAT, computer adaptive test; EQ-5D, Euro-Qol 5 Dimensions; PF, physical function; PI, pain interference; PRO, patient-reported outcome; PROMIS, Patient-Reported Outcomes Measurement Information System; RC, rotator cuff; RCR, rotator cuff repair; SF-36, 36-Item Short Form Health Survey; SSc, subscapularis; TSA, total shoulder arthroscopy; UE, upper extremity.
Clinical Outcomes
PROMIS
The PROMIS UE and PROMIS PF CAT were correlated well with one another, with an average r = 0.68 reported across the 4 studies 1,2,9,11 that used them both. In addition, the PROMIS UE matched moderately (r < 0.7) to the ASES Shoulder Score, with an average r = 0.68 from the 5 studies 1,2,9,11,19 that used them both, followed by the SF-36 GH, with an average r = 0.67 from the 4 studies 1,2,9,11 that compared them (Tables 4 and 5).
Study PROMIS Domains a
a CAT, computer adaptive test; Dn, depression; NA, not applicable; PF, physical function; PI, pain interference; PROMIS, Patient-Reported Outcomes Measurement Information System; UE, upper extremity.
Association of PROMIS UE (at Earliest Time Point) With Other Outcome Measures a
a Correlation only at the earliest time point is reported because that was the only time point consistently reported across all of the included studies. All reported r and P values were obtained from the analyses performed in each respective study. Type of statistical analysis performed is listed in Appendix Table A2. ASES, American Shoulder and Elbows Surgeons Shoulder Score; CAT, computer adaptive test; Dn, depression; EQ-5D, Euro-Qol 5 Dimensions; GH, global health; PF, physical function; PI, pain interference; PROMIS, Patient-Reported Outcomes Measurement Information System; SF-36, 36-Item Short Form Health Survey; SST, Simple Shoulder Test; UE, upper extremity; WOOS, Western Ontario Osteoarthritis Shoulder Index; WORC, Western Ontario Rotator Cuff Index; WOSI, Western Ontario Shoulder Instability Index.
b Divergent validity is listed for Hajewski et al comparing PROMIS UE with the Marx Shoulder Activity Score and the EQ-5D.
The PROMIS PF CAT platform was more closely correlated with the SF-36 GH 0.74 per Table 6 with an average r = 0.74 across the 4 studies 1,2,9,11 that compared them, followed by the EQ-5D with an average r = 0.67 from the 6 studies 1,2,9,11,17,20 that compared them (Table 6).
Overall, the worst correlations were between the PROMIS UE and the Marx Shoulder Activity Score, with an average r = 0.08 among 3 studies 2,9,11 that compared them. The PROMIS PF CAT and Marx Shoulder Activity Scale had an average r = 0.18 (Tables 5 and 6).
Correlation of PROMIS PF CAT (or PROMIS Global-10 With PF) at Earliest Time Point With Other Outcome Measures a
a All reported r and P values were obtained from the analyses performed in each respective study. Type of statistical analysis performed is listed in Appendix Table A2. Correlation only at the earliest time point is reported because that was the only time point consistently reported across all of the included studies. ASES, American Shoulder and Elbows Surgeons Shoulder Score; CAT, computer adaptive test; Dn, depression; EQ-5D, Euro-Qol 5 Dimensions; GH, global health; NR, not reported; PF, physical function; PI, pain interference; PROMIS, Patient-Reported Outcomes Measurement Information System; SANE, Single Assessment Numeric Evaluation; SF-36, 36-Item Short Form Health Survey; SST, Simple Shoulder Test; UE, upper extremity; WOOS, Western Ontario Osteoarthritis Shoulder Index; WORC, Western Ontario Rotator Cuff Index; WOSI, Western Ontario Shoulder Instability Index.
b Hajewski et al reported values for comparison with the Marx Shoulder Activity Scale and the EQ-5D as divergent validity.
All studies 1,2,9,11 that compared the PROMIS UE and PROMIS PF CAT, WORC, WOSI, or WOOS, ASES Shoulder Score, SF-36, and the EQ-5D reported correlation coefficients (strength of association between the relative values) with statistical significance (P < .05) (Table 5).
All studies that compared the PROMIS PF CAT and ASES Shoulder Score, SF-36 GH, or EQ-5D reported correlation coefficients with statistical significance (P < .05) (Table 6).
PROs
Of the 9 studies, 4 studies 1,2,9,11 reported on both the PROMIS PF CAT and the UE domains. Two studies 17,20 used the PROMIS Global-10, 1 study 15 used only the PROMIS UE, and 1 study 6 used the PROMIS PF, PI, and depression domains. The PROMIS Global-10 is a set of 10 questions designed to work for a variety of health conditions and serves to assess multiple aspects of health and functioning including physical and mental health, social health, pain, fatigue, and overall perceived quality of life (Tables 5 and 6).
Patient Satisfaction
Ultimately, a broad goal of PRO measurements is to assess patient satisfaction through tangible domains, such as mental health, physical health, emotional health, and social health. 8 Dynamic measurements, such as PROs, can help clinicians evaluate their performance and make changes to their practice based on direct patient feedback; further, having a consistent platform, with a standardized, established set of outcomes to assess, can make practical review simpler and more likely to elicit relevant changes.
Discussion
Among the studies that reviewed PROMIS against legacy PRO measurements, against its various platforms, and over time, there were a few common points for improvement: increasing accessibility, physician education in order to expand administration rates, and follow-up to track patient progress. With more data at different time points and reports on physical rehabilitation, surgical technique, and consistent BMI logging, the PROMIS domain could surpass many of the nonstandard PRO measurements. The breadth at which the PROMIS PF CAT domain aims to serve could be beneficial outside of orthopaedics and may be applicable for physician, procedure, and even hospital evaluation given how simple the test is to administer. 5,18 The pitfalls of electronic administration need to be considered, but with increasing access to electronic platforms, such as applications on smartphones, tablets, and computers, there is tremendous potential in mass data collection via PRO instruments, such as the PROMIS. 19 Potential adaptations of the PROMIS platform for highly active, athletic populations remains unavailable, but such a form could improve score correlations among PRO measurements by removing ceiling effects.
Additionally, the NIH PROMIS platform was normalized to the average American citizen based on US census data from 2000. All of the studies in this review were published in 2017-2018, although PROMIS first launched in 2004, and this opens potential for discrepancy between the baseline standardization and the current population 13 years later. The US census data may not adequately reflect the population at large in the United States and may not be generalizable to other populations, meaning that uniformity on a large scale is unlikely. 10 Nonetheless, of the 7 studies 1,2,9,11,17,19,20 that reported on ceiling effects, 3 studies 1,9,20 reported no significant floor or ceiling effects with the PROMIS PF CAT form. Anthony et al, 2 in their study of patients with shoulder instability, reported that with the PROMIS UE, ceiling effects were present in 28.6% of patients aged 18 to 21 years, which is the most significant report among the included studies.
Hajewski et al 11 reported on the ceiling and floor effects of PROMIS UE and PF CAT and found significant ceiling effects in the PROMIS UE at 6 months (68.1% of included PROs) and at 2 years (67.0% of included PROs). Patterson et al 19 reported that the PROMIS UE had ceiling effects in 3% of patients, showcasing that the type of PROMIS administered may be relevant to the population to which it is administered. Because the ceiling data provided were reported only as percentages and were present for only 1 of the conditions considered (instability), insufficient information was available to pool and analyze for an effect in this review.
The results of this systematic review suggested that the PROMIS domains (UE and PF) demonstrated a strong correlation with previous legacy PRO instruments in patients undergoing surgical intervention for shoulder conditions. Using PROMIS to assess patients undergoing surgical intervention for shoulder injury or disease could simplify the administration and analysis of PRO instruments for physicians as well as lower the question burden for patients, as the goal of the NIH PROMIS initiative is to streamline the number of potential PRO measurements used by having a few broader sets of standardized surveys. 16
Question Burden and Survey Timing
One of the most obvious drawbacks to using surveys to collect data from patients is survey completion related to “survey fatigue,” or question burden. Survey developers and administrators need to be sensitive to how much is reasonable to ask of a patient. CATs provide the ability to track how long a patient takes to answer a questionnaire and even how long a patient spends on each question.
Limitations
As with any systematic review, there are limitations to the data provided. In this study, only complete data were analyzed. This means that important variables, such as completion rate, internal consistency, reproducibility, reliability, sensitivity to change, and ceiling effects, do not have pooled data available for comparison. Nonetheless, given that the PROMIS platform is relatively new, this study serves as a starting point for gathering what common data are available.
Conclusion
From available data, the PROMIS PF and PROMIS UE were most closely correlated with outcomes measured in SF-36 GH. The PROMIS UE alone was most correlated with ASES Shoulder Score. Thus, either PROMIS PF or UE may provide a possible alternative to legacy PRO measurements but with an overall lower overall question burden and higher generalizability. Future research should compare the time and question burden of the various PROMIS platforms with a more consistent evaluation of standard PRO measurements.
Footnotes
Final revision submitted January 29, 2020; accepted February 13, 2020.
One or more of the authors has declared the following potential conflict of interest or source of funding: R.M.F. has received grant support from Arthrex; educational support from Arthrex, Medwest, and Smith & Nephew; speaking fees from Arthrex; and royalties from Elsevier. J.T.B. has received research support from Biomet and Stryker; consulting fees from DJO, Encore Medical, and Smith & Nephew; and royalties from Shukla Medical. E.C.M. has received research support from Arthrex, Biomet, Breg, Mitek, Ossur, Smith & Nephew, and Stryker; consulting fees from DePuy and Zimmer Biomet; speaking fees from Arthrex; and royalties from Elsevier and Zimmer Biomet. AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.
Appendix
Patient-Reported Outcome Instruments Administered in Each Respective Study a
| Study | Assessment Method (Correlation Coefficient) | PRO Instruments Compared | Total | ||||||
|---|---|---|---|---|---|---|---|---|---|
| PROMIS Only | ASES | Marx Shoulder Activity Scale | SF-36 GH or PF | EQ-5D | WORC, WOSI, or WOOS | Other | |||
| Anthony (2017) 1 | Pearson and/or Spearman | X | X | GH | X | WORC | 5 | ||
| Anthony (2017) 2 | Pearson and/or Spearman | X | X | PF | X | WOSI | 5 | ||
| Chen (2019) 6 | Pearson | X | 2 | ||||||
| Dowdle (2017) 9 | Pearson and/or Spearman | X | X | PF | X | WOOS | 5 | ||
| Hajewski (2019) 11 | Spearman | X | X | GH and PF | WOSI | 4 | |||
| Monroe (2019) 15 | NA | X | Postoperative VAS | 2 | |||||
| Nicholson (2019) 17 | Spearman | X | X | WORC | SANE | 4 | |||
| Patterson (2018) 19 | Pearson | X | SST, PI | 3 | |||||
| Saad (2018) 20 | Spearman | X | X | WOOS | SANE | 4 | |||
a ASES, American Shoulder and Elbows Surgeons Shoulder Score; EQ-5D, Euro-Qol 5 Dimensions; GH, global health; NA, not available; PF, physical function; PI, pain interference; PRO, patient-reported outcome; PROMIS, Patient-Reported Outcomes Measurement Information System; SANE, Single Assessment Numeric Evaluation; SF-36, 36-Item Short Form Health Survey; SST, Simple Shoulder Test; WOOS, Western Ontario Osteoarthritis Shoulder Index; WORC, Western Ontario Rotator Cuff Index; WOSI, Western Ontario Shoulder Instability Index; VAS, visual analog scale for pain.
