Abstract
Background
Patient pain and clinical function are important factors in decision-making for patients with glenohumeral osteoarthritis (GHOA). The correlation between radiographic severity of arthritis and demographic factors with modern patient-reported outcome measures has not yet been well defined.
Methods
This cross-sectional study included 256 shoulders in 246 patients presenting with isolated GHOA. All patients obtained standard radiographs and completed the American Shoulder and Elbow Surgeons score, Simple Shoulder Test (SST), Shoulder Activity Scale, Visual Analog Scale, and Patient-Reported Outcome Measurement Information System (PROMIS) computer adaptive tests at the time of presentation. Radiographs were graded according to the Samilson–Prieto classification. Mean pain and functional scores were compared between the radiographic grades of osteoarthritis (OA) and demographic factors.
Results
There were 6 shoulders rated as grade 1 OA, 41 shoulders as grade 2, 149 shoulders as grade 3a, and 65 shoulders as grade 3b. There was excellent interobserver reliability in grade of OA (κ = 0.77). There were no significant differences in patient-reported pain or any validated measure of clinical function between radiographic grades of OA (P > .05). Males reported higher function and lower pain scores than females (P = .001–.066), although only the values for the SST and PROMIS physical function test were clinically relevant.
Discussion
While gender correlated with pain and function, the clinical relevance is limited. Radiographic severity of GHOA does not correlate with patient-reported pain and function, and symptoms should remain the primary determinants of surgical decision-making. Further investigation is necessary to examine whether radiographic severity of OA influences improvement following operative intervention in this population.
Keywords
Introduction
The shoulder is a complex joint with a variety of pathologies that can cause pain and dysfunction, with the reported lifetime prevalence of shoulder pain ranging from 7% to 67%. 1 Nearly 20 million people in the United States reported shoulder pain in 2008 alone, making it the third most common musculoskeletal complaint following back and knee pain, and that number is expected to increase as the population continues to age. 2
Glenohumeral osteoarthritis (GHOA) is a common cause of shoulder pain and dysfunction that is associated with lower functional levels than patients with other shoulder conditions, independent of age.3,4 As a result, patients experience a decreased quality of life and pose a substantial socioeconomic burden. 5 While several radiographic classifications have been developed to describe the degenerative changes seen in GHOA, it is unclear to what extent the radiographic severity of GHOA should factor into clinical decision-making and timing of surgery.6–13
Over the past 2 decades, there has been increased focus on understanding patients’ perceptions of their illness and its effect on their quality of life, and patient-reported outcomes (PROs) have become increasingly used to aid decision-making and to monitor response to treatments. 14 This focus on PROs has led to the development of over 30 outcome measure instruments for shoulder pathology alone, although the data comparing radiographic severity of GHOA with PRO measures (PROMs) are limited.15,16
Increasing osteophyte size has been shown to correlate with decreased range of motion; however, radiographic severity does not correlate with reported pain scores. 6 Pain and range of motion alone do not entirely capture how patients function in their daily lives, and the correlation of radiographic severity with patient perceptions of their illness remains incompletely answered. The purpose of this study is to evaluate the association between radiographic severity of GHOA and patient demographic factors with pain and function as determined by PROMs.
Methods
Patient Selection
This was a retrospective cross-sectional case series. All clinic visits at our high-volume regional-referral shoulder arthroplasty center were reviewed from August 2015 to August 2017. Patients with an isolated International Classification of Diseases, 10th Revision (ICD-10) code of M19.011 (primary osteoarthritis [OA], right shoulder), M19.012 (primary OA, left shoulder), and M19.019 (primary OA, unspecified shoulder) were included for review. Clinic notes and radiographs were reviewed to ensure that included patients had an isolated diagnosis of GHOA. Exclusion criteria were patients with coexisting diagnoses (eg, full-thickness rotator cuff tear, inflammatory arthritis, cervical radiculopathy), ipsilateral shoulder surgery within the past year, outcome measures collected more than 1 year from the time of radiographs, and incomplete medical records.
Data Collection
All patient information was collected from at the initial clinic visit. Patient demographics collected included gender, age, body mass index (BMI), previous surgery, smoking status, hand dominance, and laterality of GHOA. PROMs collected included Visual Analog Scale (VAS, range 0–10), American Shoulder and Elbow Surgeon score (ASES, range 0–100), 17 Simple Shoulder Test (SST, range 0–12), 18 Shoulder Activity Scale (SAS, range 0–20), 19 Patient-Reported Outcome Measurement Information System (PROMIS) physical function (PF), PROMIS upper extremity (UE), and PROMIS pain interference (PI).20,21 PROMIS instruments are administered as computer adaptive tests (CATs). Responses to the first prompt guide the system’s choice of the next question. The CAT continues until either the standard error drops below a T score metric of 3.0 or the patient has answered 12 questions, whichever occurs first. PROMIS instruments are calibrated with a score of 50 as the average for the U.S. general population with a standard deviation of 10. For all outcome measures collected, a higher score corresponds to more of the concept being measured (ie, a higher ASES score translates to higher function, a lower PROMIS PI score translates to less pain).
Radiographs of the affected shoulder were obtained at the initial clinic visit. Three observers analyzed all radiographs: 1 board-certified shoulder and elbow surgeon and 2 orthopedic surgery residents. Radiographic severity was rated according to the Samilson–Prieto scale. 10 The true anterior–posterior (Grashey) radiograph was used to measure the size of the inferior osteophyte. Inferior osteophytes measuring <3 mm were rated as grade 1, osteophytes between 3 mm and 7 mm were rated as grade 2, and osteophytes >7 mm were rated as grade 3. As many patients presenting with GHOA have large inferior osteophytes, the grade 3 osteophytes were divided into 2 subgroups: inferior osteophytes measuring between 8 mm and 15 mm were rated as grade 3a, and inferior osteophytes measuring >15 mm were rated as grade 3b.
Statistical Analysis
No a priori sample size determination was performed as this was a retrospective study and all available patients were included.
Interobserver reliability was assessed using intraclass correlation coefficient (kappa, κ). Interobserver reliability <0.4 was rated as poor, between 0.40 and 0.59 as fair, between 0.60 and 0.74 as good, and >0.75 as excellent. 22 In the setting of interobserver disagreement, radiographic grade for statistical analysis was determined by the majority opinion among the 3 observers.
Data normality was assessed using the Shapiro–Wilk test. Equality of variance between groups was assessed with Levene test. One-way analysis of variance testing with a post hoc Tukey analysis was performed between grades of radiographic arthritis to assess for differences in patient-reported pain and function, as measured by PROMs. The influence of gender, dominant extremity involvement, BMI >30 kg/m2, and history of previous surgery on PROMs were compared by the Student t test or Mann–Whitney U test based on data normality. A P value of <.05 was considered to be significant.
Results
Patient Demographics
During the study period, 1497 patients presented with an ICD-10 code corresponding to GHOA. After excluding those with concomitant diagnoses and those with incomplete survey data, 256 shoulders in 246 patients were eligible for inclusion. Reasons for exclusion are shown in Figure 1. Overall demographics are shown in Table 1. A slight majority of patients were male and the dominant extremity was involved in approximately half. The majority of patients had no prior history of shoulder surgery on the affected extremity. Of the 42 who underwent prior surgery, 3 had an isolated biceps procedure, 1 had a capsular release, 1 had a distal clavicle resection, 14 had arthroscopic debridements, 3 had an arthroscopic labral repair, 1 had an open capsular shift, 13 had arthroscopic rotator cuff repairs, 3 had arthroscopic superior labrum tear from anterior to posterior (SLAP) repairs, and 3 had “soft tissue stabilization” procedures. The previous surgeries were done at an average of 12 years prior to presentation (range 1–49 years).

Exclusion criteria. PROMIS, Patient-Reported Outcome Measurement Information System.
Patient Demographics.
Abbreviation: SD, standard deviation.
Radiographic Severity
Of the 256 shoulders, 6 were rated as grade 1, 41 shoulders as grade 2, 146 shoulders as grade 3a, and 63 shoulders as grade 3b. There was excellent interobserver reliability in the radiographic grading of GHOA (κ = 0.77).
Association With PROs
Overall mean and standard deviations for ASES, SAS, SST, VAS, PROMIS PF, PROMIS UE, and PROMIS PI are shown in Table 2. While there were consistent measurable disabilities and functional limitations of the shoulder in this cohort, there were no significant differences in any patient-reported pain or functional outcome measures among the different radiographic grades of GHOA (P = .16–1.0). The results of the Tukey post hoc test are shown in Table 3.
Descriptive Statistics of Outcome Measures.
Abbreviations: ASES, American Shoulder and Elbow Surgeons; PF, physical function; PI, pain interference; PROMIS, Patient-Reported Outcome Measurement Information System; SAS, Shoulder Activity Scale; SD, standard deviation; SST, Simple Shoulder Test; UE, upper extremity; VAS, Visual Analog Scale.
Tukey Post Hoc Analysis Based on Radiographic Osteoarthritis.
Abbreviations: ASES, American Shoulder and Elbow Surgeons; CI, confidence interval; HSD, honest significant difference; PF, physical function; PI, pain interference; PROMIS, Patient-Reported Outcome Measurement Information System; SAS, Shoulder Activity Scale; SD, standard deviation; SST, Simple Shoulder Test; UE, upper extremity; VAS, Visual Analog Scale.
Compared to females, males showed higher function and lower pain scores across all outcome measures except VAS (though VAS approached statistical significance). The mean difference between genders was 7.5 points for ASES, 3.7 points for SAS, 2.0 points for SST, 4.9 points for PROMIS PF, 3.4 points for PROMIS UE, and 2.2 points for PROMIS PI.
Dominant extremity involvement was associated with higher PROMIS PF (mean difference of 2.2 points), but otherwise showed no differences among outcome measures. A history of prior surgery was associated only with a higher SST compared to those without any prior surgery on the affected shoulder (mean difference 1.1 points). A BMI over 30 kg/m2 was associated with a lower SST, lower PROMIS PF, and higher PROMIS PI compared to those with a lower BMI (mean difference 0.7 points, 2.7 points, and 2.4 points, respectively). Results of the Student t tests are shown in Tables 4 to 7.
Patient Outcomes by Gender.
Abbreviations: ASES, American Shoulder and Elbow Surgeons; PF, physical function; PI, pain interference; PROMIS, Patient-Reported Outcome Measurement Information System; SAS, Shoulder Activity Scale; SD, standard deviation; SST, Simple Shoulder Test; UE, upper extremity; VAS, Visual Analog Scale.
Patient Outcomes by Extremity Involved.
Abbreviations: ASES, American Shoulder and Elbow Surgeons; PF, physical function; PI, pain interference; PROMIS, Patient-Reported Outcome Measurement Information System; SAS, Shoulder Activity Scale; SD, standard deviation; SST, Simple Shoulder Test; UE, upper extremity; VAS, Visual Analog Scale.
Patient Outcomes by Previous Surgery.
Abbreviations: ASES, American Shoulder and Elbow Surgeons; PF, physical function; PI, pain interference; PROMIS, Patient-Reported Outcome Measurement Information System; SAS, Shoulder Activity Scale; SD, standard deviation; SST, Simple Shoulder Test; UE, upper extremity; VAS, Visual Analog Scale.
Patient Outcomes by BMI.
Abbreviations: ASES, American Shoulder and Elbow Surgeons; BMI, body mass index; PF, physical function; PI, pain interference; PROMIS, Patient-Reported Outcome Measurement Information System; SAS, Shoulder Activity Scale; SD, standard deviation; SST, Simple Shoulder Test; UE, upper extremity; VAS, Visual Analog Scale.
Discussion
PROMs have been increasingly used to quantify patient pain and function preoperatively and monitor improvement following surgery. While factors such as mental health have been shown to influence outcome measures in patients with GHOA, the correlation between radiographic severity of GHOA with traditional PROMs and PROMIS is not well defined. 23 Previous studies have noted that the size of the inferior osteophyte correlates with decreased functional range of motion. 6 However, patient pain, clinical function, and ability to perform daily activities are still considered to be the predominant indications to pursue arthroplasty. We demonstrate that the radiographic severity of GHOA does not correlate with pain scores or any of the other commonly used PROMs in this study.
Patient demographics of our cohort are representative of the typical population that develops GHOA. The majority of patients presenting for evaluation had more severe radiographic findings, although nearly 20% of patients were graded as Samilson–Prieto I or II. Mean PROs were also typical of what would be expected in a population presenting with OA, reporting higher pain and worse PF compared to the general U.S. population. 24
When evaluating for differences in PROMs between categories of radiographic severity, we were unable to identify any significant differences. This includes measures looking both at clinical function (ASES, SST, SAS, PROMIS PF, and PROMIS UE) and pain (VAS, PROMIS PI). This expands on prior studies that showed poor correlation between radiographic severity and constant scores. 6 A lack of correlation between severity of radiographic OA and PROM scores contrasts to OA in the lower extremity, where increasing severity of knee OA is associated with higher pain scores and lesser quality of life. 25 This discrepancy may be, in part, related to the different weight-bearing requirements among the joints. The knee is a weight-bearing joint that commonly withstands 2 to 3 times the force of body weight with routine activities. 26 Conversely, shoulder joint reaction force is <50% of body weight in most activities of daily living, with the maximum contact force experienced only 1.6 times body weight for a sit-to-stand task. 27 Patients may therefore be able to tolerate GHOA that is significantly worse radiographically before it causes symptoms that prompt physician evaluation. Additionally, while the Samilson–Prieto scale is a common radiographic classification of OA used in research, it does not account for loss of joint space, posterior subluxation of the humeral head, or glenoid erosion. It is possible that factors not assessed in the Samilson–Prieto classification play important factors in function and may explain the absence of correlation between radiographic severity and PROM scores. Future investigations utilizing classification systems that assess deformity in other planes, such as the Walch classification, or joint space narrowing, such as the Kellgren–Lawrence classification, may identify a correlation between radiographic deformity and outcome scores. 28
Interestingly, males reported statistically significantly higher function and lower pain scores than females in our cohort. However, with large patient cohorts, it is important to distinguish between statistical significance and clinical relevance. Prior studies of shoulder arthritis populations have identified minimal clinically important differences (MCIDs) of 13.6 points for the ASES, 1.5 points for the SST, and 1.6 points for the VAS.29,30 No studies have yet been performed to establish the MCID for the PROMIS instruments in a shoulder arthritis population, although several studies have identified MCIDs in the range of 3.5 to 5.0 points for the PROMIS instruments in other disease processes.31–33 Using these MCID values as proxies for clinical relevance shows that despite the statistically significant differences in outcome measures between males and females, only the SST and PROMIS PF values are likely to be clinically relevant. Similarly, the few statistically significant differences in outcome measures based on dominant extremity involvement, prior surgery, and BMI over 30 kg/m2 are likely not clinically relevant. We were unable to identify specific parameters within the SST and PROMIS PF questionnaires that would explain why these instruments identified what are likely clinically significant differences between genders while the other instruments did not.
A major strength of this study is the relatively large number of patients included in the cohort. To our knowledge, this is the largest study evaluating the association of PROs and radiographic severity of GHOA. Study limitations include a retrospective design and cross-sectional analysis. The retrospective nature required identification of patients via an ICD-10 code corresponding to GHOA. This may have led to the exclusion of patients whose visits were not coded appropriately. The cross-sectional design allows us to determine association but not causation. Additionally, we are unable to draw conclusions on responsiveness to treatment, such as whether or not radiographic severity influences outcomes following operative intervention. Another limitation is that we did not assess shoulder function directly, such as range of motion or strength, and advanced imaging was not routinely performed and thus we could not exclude other pain generators such as undiagnosed rotator cuff tears or biceps disease. Another limitation is the uneven distribution of patients in different radiographic severity groups. The majority of patients had more advanced GHOA (grade 3a or 3b), with relatively low number of patients evaluated with less radiographically severe (grade 1 or 2) GHOA. As such, the study may be underpowered to demonstrate differences in PROMs by radiographic severity of GHOA and raises the possibility of a Type II error. It is possible that patients with less severe radiographic OA may be less symptomatic and less commonly warrant evaluation by a shoulder and elbow surgeon. Interestingly, however, those patients with less radiographically severe OA who were symptomatic enough to present to our clinic for evaluation, there was no difference in patient-reported pain and function compared to those with more severe radiographic findings. Further investigations currently underway are evaluating the association between preoperative radiographic severity of GHOA and outcomes following operative intervention.
Conclusion
Radiographic severity of GHOA does not correlate with patient-reported pain and function, and patient demographic factors do not show a correlation that is clinically relevant. While radiographs are important to identify the etiology of shoulder pain and help inform treatment options, patient pain and clinical function should remain the predominant indications for surgery regardless of the severity of radiographic arthrosis of the joint.
Footnotes
Authors’ Note
The work for this manuscript was performed at Washington University in St. Louis in St. Louis, Missouri.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The authors, their immediate families, and any research foundation with which they are affiliated have not received any financial payments or other benefits from any commercial entity related to the subject of this article. Dr Keener is a paid consultant for Arthrex and Wright Medical, receives research support from the National Institutes of Health and Zimmer/Biomet, receives royalties from Shoulder Innovations and Wright Medical, and is on the editorial staff for the Journal of Shoulder and Elbow Surgery. Dr Chamberlain is a paid consultant for DePuy Synthes, Wright Medical, and Arthrex and received research support from the National Institutes of Health, Zimmer/Biomet, and the Orthopaedic Research and Education Foundation. Dr Aleem, Dr Kohan, Dr Hill, and Dr Lamplot have no conflicts to disclose.
Ethical Approval
This study was approved by the institutional review board prior to initiation of the project (Washington University in St. Louis Institutional Review Board protocol #201611040).
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
