Abstract
Background:
The glenohumeral joint combines large range of motion and insufficient bony stabilization, making it susceptible to instability and dislocations. Arthroscopic surgery is routinely used as a diagnostic tool and has been considered the gold standard for the diagnosis of shoulder lesions. However, several studies have demonstrated variability in intraobserver and interobserver agreement.
Purpose:
To evaluate interobserver and intraobserver agreement in the assessment of intra-articular lesions associated with shoulder instability among fellowship-trained shoulder surgeons.
Study Design:
Cohort study (diagnosis); Level of evidence, 3.
Methods:
A total of 24 arthroscopic videos from patients treated for recurrent shoulder instability were shown to a group of 10 fellowship-trained shoulder surgeons who are members of the Multicenter Orthopaedic Outcomes Network (MOON) Shoulder Group. They were presented to the surgeons on 2 different occasions at least 2 months apart. They were asked to classify labral tears by their position, type, extension, other intra-articular abnormality, and preferred treatment. No patient history or physical examination data were provided. The primary outcome was the median overall percentage of agreement for the surgeons performing a video review, measured for each variable evaluated. Intraclass correlation coefficients were used to evaluate continuous variables, and kappa values were used for categorical items.
Results:
Interobserver agreement was good for anterior labral lesions; good for Hill-Sachs lesions; and moderate for lesions of the superior labrum, posterior labrum, anterior sublabral foramen, and position and extension of the tear. Intraobserver agreement was either good or very good for all variables evaluated, except for being poor for inferior labral lesions and moderate for lesions of the meniscoid superior labrum.
Conclusion:
Interobserver and intraobserver reliability for the arthroscopic assessment of labral tears in patients with recurrent shoulder instability were good to moderate for the majority of anatomic structures assessed. There was relatively good agreement between shoulder instability surgeons on assessing and documenting shoulder instability–associated abnormalities. These findings are important when interpreting collaborative clinical cohort studies with numerous surgeons involved in the research.
The glenohumeral joint is the most commonly dislocated joint in the body, with an overall incidence of up to 23.9 per 100,000 per year. It is especially common among the young active male population (78%), with high redislocation rates after an initial traumatic dislocation. 18
The glenohumeral joint combines large range of motion with insufficient bony stabilization, making it susceptible to instability and dislocations. It is stabilized by both static and dynamic mechanisms. The static mechanisms include the bony configurations of the glenoid and the humerus, the glenoid labrum, the joint capsule, and the glenohumeral ligaments. The dynamic mechanisms include the muscles of the rotator cuff and, to a lesser degree, the long head of the biceps and the deltoid muscle.
A glenohumeral dislocation usually results in labral damage and may result in a fracture of the anteroinferior portion of the glenoid rim in association with soft tissue damage. This damage can contribute to recurrent anterior instability of the shoulder. Lesions of the glenoid rim and the corresponding Hill-Sachs impaction fracture of the humeral head have been reported in up to 97% and 90%, respectively, of initial dislocations in young athletes. 15 For optimal outcomes, the surgeon must be well aware of the normal anatomy, the various anatomic variants, and the abnormality that can be encountered. 16
Arthroscopic surgery is routinely used as a diagnostic tool and has been considered the gold standard for the diagnosis of shoulder lesions in previous studies evaluating the diagnostic ability of different imaging techniques. 5,6,8 –10,12,13 However, studies evaluating shoulder arthroscopic surgery as a diagnostic tool have demonstrated variability in intraobserver and interobserver agreement when surgeons are describing and characterizing intra-articular structures. 4,11,14,17 To our knowledge, only 1 study has evaluated this specifically for patients with shoulder instability, and that study only evaluated interobserver variability, 14 while 2 other studies have evaluated patients specifically related to superior labrum anterior posterior (SLAP) lesions. 4,17
It is important to develop an understanding of interobserver reliability for clinical care, research, educational, and communication purposes. Interobserver agreement is relevant when orthopaedic surgeons interpret and apply the literature to their patients. More caution should be used in the assessment and treatment of lesions with low agreement. For multisurgeon collaborative research in which participant inclusion and exclusion may be related to intraoperative findings, an understanding of interobserver reliability and agreement among surgeons is crucial to ensuring that the same patients are being entered into the study or excluded across all participating sites. 4 In addition, reliability data for collaborative multicenter cohort studies evaluating numerous intraoperative variables can provide insight into the validity of results determined from analysis of collected information. Videotaped surgical procedures are a useful tool for exploring interobserver reliability because they provide opportunity for assessing agreement.
The purpose of this study was to evaluate interobserver and intraobserver agreement among fellowship-trained shoulder surgeons in the assessment of intra-articular lesions, specifically labral lesions related to instability, in patients with shoulder instability. The goal of this study was to determine if these lesions could be identified and categorized reproducibly by surgeons performing arthroscopic surgery for shoulder instability.
Methods
Twenty-four shoulder arthroscopic surgery videos corresponding to different patients with recurrent shoulder instability (anterior, posterior, and multidirectional) were compiled. Each video consisted of standard diagnostic arthroscopic surgery with visualization obtained through the posterior portal and the anterosuperior rotator interval portal. The videos demonstrated probing of all pertinent anatomic structures, as done with routine shoulder diagnostic arthroscopic surgery. Diagnostic arthroscopic surgery was performed by 3 fellowship-trained surgeons (B.R.W., C.M.H., E.E.S.). There were 20 videos in the beach-chair position and 4 videos in the lateral decubitus position. Exemption from approval for this study was obtained from the University of Iowa Institutional Review Board.
The videos were shown to a group of 10 experienced sports medicine fellowship–trained shoulder surgeons who are members of the Multicenter Orthopaedic Outcomes Network (MOON) Shoulder Group. They were presented to the surgeons on 2 different occasions at least 2 months apart. For the second evaluation, the videos were arranged in a different order.
Surgeons were asked to classify labral tears by their type, position, extension, and preferred treatment based on the Shoulder Instability Operative Form used by the MOON Shoulder Group to record data in ongoing studies (Table 1). The position and extension of the tear had to be described on a visual representation of the glenoid: 360° divided evenly into 20 segments covering 18° each (Figure 1). There was no restriction to the number of diagnoses by structure that the surgeons could choose within the given alternatives in the questionnaire. There was no previously established “correct answer” for each of the videos. The reviewing surgeons performed their assessment independently and were blinded to the final treatment. No patient history or physical examination data were provided for the videos.
Classification of Tear Types, Position/Extension, and Preferred Treatment Method a
a ALPSA, anterior labral periosteal sleeve avulsion; GLAD, glenolabral articular disruption; SLAP, superior labrum anterior posterior.

Diagrams used by surgeons to document location of labrum pathology.
Two different basic modalities were used for the evaluation of the data obtained. As multiple options could be selected per variable, diagnostic categories were collapsed by anatomic region to calculate the kappa value for intraobserver and interobserver agreement (Table 2). For those items with a defined ordinal scale (ie, classification of SLAP tears), the original classification scheme was preserved.
Categories of Intraobserver and Interobserver Agreement Based on Kappa Coefficient
Agreement for the type of lesion (frayed, cracked, detached, etc) within an anatomic region (anterior labrum, inferior labrum, etc) was expressed as a percentage of agreement, both within a surgeon’s own responses and between surgeons. Percentage of agreement between observers was chosen as the statistical representation of reliability because it provides a clear, straightforward, and easily interpreted statistical assessment of the data. 3 The median pairwise agreement was selected over the mean as the primary measure to represent agreement for each variable because it is unaffected by extreme values. 2 Agreement among all possible surgeon pairwise combinations was calculated and interpreted with the scale previously described by Sasyniuk et al 14 (Table 3). Statistical analysis was performed with SAS software (version 9.4; SAS Institute).
Strength of Intraobserver and Interobserver Agreement Based on Value a
a From Sasyniuk et al. 14
Results
Intraobserver agreement was very good for anterior labral tears and good for the superior labrum, Hill-Sachs lesions, Buford complex, anterior sublabral foramen, position and extension of the tear, and preferred treatment (Table 4). Intraobserver agreement was moderate for the posterior labrum and meniscoid superior labrum. Intraobserver agreement was poor for the assessment of the inferior labrum.
Intraobserver and Interobserver Agreement of Tear Type, Position/Extension, and Preferred Treatment Method a
a SLAP, superior labrum anterior posterior.
Interobserver agreement was good for the anterior labrum and Hill-Sachs lesions and moderate for the superior labrum, posterior labrum, anterior sublabral foramen, and position and extension of the tear. Interobserver agreement was fair for the Buford complex and poor for the inferior labrum and meniscoid superior labrum.
Agreement for the type of lesion (frayed, cracked, detached, etc) for each structure analyzed showed higher absolute values compared with the kappa coefficient. Median overall agreements were as follows: anterior labrum, 92%; inferior labrum, 67%; superior labrum, 75%; posterior labrum, 75%; and Hills-Sachs, 83%.
Overall agreement on preferred treatment was 67%. The largest difference observed for preferred treatment was variability in the inclusion of capsular plication along with labral repair. The second largest difference was moderate agreement on the diagnosis of SLAP tears, secondarily influencing treatment options. If these 2 factors are accounted for, overall agreement on preferred treatment increases to 75%.
Discussion
Arthroscopic surgery is routinely used for the diagnosis of intra-articular abnormalities. Historical studies have demonstrated that routine magnetic resonance imaging misses labral lesions associated with shoulder instability in approximately 40% of cases; hence, arthroscopic surgery remains an important tool for diagnosing labral tears. 1,7 In the current study, agreement among surgeons was the best for anterior labral lesions and Hill-Sachs lesions and worst for the inferior labrum. To our knowledge, this is the first study to present intraobserver reliability for labral lesions in a population with shoulder instability. Previous studies evaluating agreement for anterior, inferior, and posterior labral lesions have assumed higher intraobserver reliability of shoulder arthroscopic surgery as a diagnostic tool based on interobserver agreement, but a second round of evaluations was not performed. 14 Intraobserver agreement was very good for the majority of variables evaluated; however, it was poor for the inferior labrum (kappa = 0.20) and moderate for the meniscoid superior labrum (kappa = 0.59) and posterior labrum (kappa = 0.58).
Sasyniuk et al 14 showed varying interobserver agreement depending on the structure examined in the shoulder. They showed very good reliability (>80%) for the anterior labrum and Hill-Sachs lesions, poor agreement (<40%) for the glenoid and anteroinferior glenohumeral ligament, and intermediate for all other structures examined in a group of patients with anterior shoulder instability. Their findings indicated that the interpretation of intra-articular lesions in the glenohumeral joint could vary depending on the observer. 14
We also described agreement for position and extension of the tear on a visual representation of the glenoid: 360° divided evenly into 20 segments covering 18° each. We showed this documentation technique to have good intraobserver reliability and moderate interobserver reliability. This could indicate that surgeons have better agreement when shown which part of the labrum is affected on a visual model than they do when classifying it as an anatomic area, especially in the case of inferior labral lesions.
A wide spectrum of intraobserver variability in the classification and treatment of SLAP lesions was also noted by Gobezie et al, 4 demonstrating moderate intraobserver reliability in both the classification and the treatment of labral injuries. When individual classifications and treatments were analyzed, the kappa values ranged from poor to moderate. The Gobezie et al 4 study designated a correct answer for each case and analyzed agreement in each case accordingly. Wolf et al 17 found that interobserver variability among experienced arthroscopic specialists evaluating the superior labrum and SLAP tears using the Snyder classification was significant, and analysis of intraobserver variability showed only moderate agreement for both diagnosis and treatment (kappa = 0.54 and 0.45, respectively). They also found that interobserver agreement decreased with the introduction of clinical vignettes. The Wolf et al 17 study evaluated agreement, and no “correct” answer was assumed for each case. The results of the current study were similar, with moderate intraobserver agreement (kappa = 0.48) among surgeons.
The strengths of this study are that the video recordings present the same information to each surgeon for assessment. The limitations include the inability of the reviewer to probe structures themselves to determine their patency and the lack of history and physical examination data. Structures were probed in the videos; however, not having tactile feedback could negatively influence the ability of the surgeon to properly identify the abnormality. The only information given to surgeons was that all videos were performed in patients being treated for shoulder instability of some variety. In the actual clinical setting, treatment is decided by a combination of history and physical examination findings, including examination under anesthesia, imaging, and intraoperative results. This study only presents an analysis of one aspect in this equation.
Videos were of varying lengths and quality. However, each selected video was carefully reviewed and deemed to appropriately demonstrate all necessary anatomic structures by the senior author (B.R.W.). While all surgeons were fellowship trained and board certified, the level of experience of each surgeon was not evaluated. Even though previous studies 14 have failed to prove a difference between more experienced and less experienced surgeons, they still advocate for the inclusion of only expert-level surgeons for reliability studies. Last, some videos were recorded in the beach-chair position and some were recorded in the lateral decubitus position. It is possible that surgeons’ decisions were affected by the orientation of the videos depending on their preferred positioning for shoulder arthroscopic surgery in usual practice. It is also possible that poor agreement on assessing the inferior labrum could be related to the majority of the videos being recorded in the beach-chair position, as visualization of this area is more challenging in this position.
This study compared intraobserver reliability for each surgeon involved. As expected, it was higher than interobserver reliability, but several variables showed only moderate correlation, and the inferior labrum showed poor agreement. The data collection and documentation of intraoperative findings can pose a challenge in multicenter collaboration research. Understanding the difficulties and limitations in acquiring and recording data is important for the interpretation of subsequent collaborative study outcomes. Overall, this study demonstrated good to moderate agreement in the assessment and treatment of most anatomic structures related to shoulder instability. Some variability was demonstrated on treating labral tears, which is notable in terms of how much the results from this study are helpful in establishing agreement among enrolling surgeons and for establishing validity of future outcome studies from this cohort data set.
Conclusion
Interobserver and intraobserver reliability of intra-articular lesions associated with shoulder instability among fellowship-trained shoulder surgeons varied depending on the structure evaluated. Generally, agreement was good for the majority of structures evaluated.
Footnotes
One or more of the authors has declared the following potential conflict of interest or source of funding: B.R.W. is a consultant for ConMed Linvatec, has received educational support from Wardlow Enterprises, and is on the Scientific Advisory Board for UnitedHealthcare. B.U. has received educational support from Arthrex. C.M.H. has received educational support from Arthrex, Zimmer Biomet, and Pacira Pharmaceuticals and has received research support from Tornier. AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.
Ethical approval for this study was waived by the University of Iowa Human Subjects Office/Institutional Review Board.
