Abstract
Aim
(1) To determine the interobserver reliability of magnetic resonance classifications and lesion instability criteria for capitellar osteochondritis dissecans lesions and (2) to assess differences in reliability between subgroups.
Methods
Magnetic resonance images of 20 patients with capitellar osteochondritis dissecans were reviewed by 33 observers, 18 orthopaedic surgeons and 15 musculoskeletal radiologists. Observers were asked to classify the osteochondritis dissecans according to classifications developed by Hepple, Dipaola/Nelson, Itsubo, as well as to apply the lesion instability criteria of DeSmet/Kijowski and Satake. Interobserver agreement was calculated using the multirater kappa (k) coefficient.
Results
Interobserver agreement ranged from slight to fair: Hepple (k = 0.23); Dipaola/Nelson (k = 0.19); Itsubo (k = 0.18); DeSmet/Kijowksi (k = 0.16); Satake (k = 0.12). When classifications/instability criteria were dichotomized into either a stable or unstable osteochondritis dissecans, there was more agreement for Hepple (k = 0.52; p = .002), Dipaola/Nelson (k = 0.38; p = .015), DeSmet/Kijowski (k = 0.42; p = .001) and Satake (k = 0.41; p < .001). Overall, agreement was not associated with the number of years in practice or the number of osteochondritis dissecans cases encountered per year (p > .05).
Conclusion
One should be cautious when assigning grades using magnetic resonance classifications for capitellar osteochondritis dissecans. When making treatment decisions, one should rather use relatively simple distinctions (e.g. stable versus unstable osteochondritis dissecans; lateral wall intact versus not intact), as these are more reliable.
Keywords
Introduction
Treatment strategies and operative planning for osteochondritis dissecans (OCD) lesions of the capitellum are based on stability, size and location of the lesion, in addition to the severity of symptoms and capitellar physis status, among others.1–3 Non-operative treatment is advocated for a stable OCD (i.e. intact cartilage) in the setting of an open capitellar growth plate. 4 A surgical approach is indicated in an unstable OCD.3,5,6 An OCD is considered unstable if magnetic resonance (MR) images demonstrate discontinuity of the cartilage, a high signal intensity interface between the fragments and their bed or articular defects.7–9 Arthroscopic debridement with bone marrow stimulation3,5,10,11 or fragment fixation6,12 may lead to satisfactory outcomes in an unstable OCD without involvement of the lateral margin of the capitellar wall. More invasive treatment by means of osteochondral autologous transplantation is suggested in large (>10 mm), unstable lesions that involve the lateral wall of the capitellum.13–17
Various imaging modalities have been used to characterize OCD of the capitellum including radiography, ultrasonography and computed tomography (CT), but most commonly MR imaging is performed.7,9,18–21 Both CT and MR have shown to correlate well with intraoperative findings7,9; however, classifying the stage of an OCD on CT images has been shown to be interpreted inconsistently among surgeons specialized in upper extremity injuries. 22 Because surgical decision making and preoperative planning highly depend on lesion stability, size and location of the lesion, it is important to know if physicians interpret MR images regarding these characteristics in a consistent manner.
The goals of this study were: (1) to determine the interobserver reliability of existing MR classification systems and lesion instability criteria for capitellar OCD lesions; (2) to assess differences in interobserver reliability between subgroups (e.g. years in practice, number of OCD cases per year, specialty).
Materials and methods
This study was approved by our institutional review board (protocol no. 2009P001019/MGH).
Physicians from different continents were invited to participate in this interobserver study via an invitation e-mail that included a study description. Invited were orthopaedic surgeons who were fellowship trained in shoulder and elbow injuries and/or sports-related injuries. Invited as well were radiologists who were fellowship trained in musculoskeletal imaging. Invitations were sent only to physicians who were known by at least one of the authors.
Participating physicians (i.e. observers) were asked to review MRs of 20 patients who were selected from our retrospective database. MR selection was performed by one of the authors who is a musculoskeletal fellowship trained radiologist (FJS), using the Kijowski/DeSmet lesion instability criteria.8,23 We sought to select a representative variety of OCDs. Accordingly, we selected 12 MRs with an unstable OCD (Figure 1), 6 MRs with a stable OCD (Figure 2) and 2 patients with an unremarkable MR. In these two patients, an MR was performed because of ongoing pain localized at the radio-capitellar joint; however, MR images demonstrated no abnormalities. Not more than 20 MRs were selected because reviewing 20 MRs according to multiple classifications/criteria is time consuming for the observers.
24
Selecting more MRs would have resulted in fewer observers completing the study.
Images of an unstable OCD in the left elbow of a 15-year-old male patient. (a) Coronal T1 and (b) sagittal PDFS (proton density fat suppressed) images from MR of the left elbow showing articular surface collapse with fluid undercutting a cortical ossific fragment on the sagittal image. Mild surrounding bone marrow edema in addition to cartilage irregularity and loss are also seen. Images of a stable OCD in the left elbow of a 13-year-old male patient. (a) Sagittal proton density fat saturated and (b) axial proton density images from MR of the left elbow showing subchondral bone marrow edema of the capitellum with intact overlying cortical margin, lack of fluid signal undercutting the cortex or cartilage, and no cystic change in the capitellum.

The mean age of patients at the time of MR was 15.4 years (range, 11 to 17), including 11 males and nine females.
MRs which were obtained locally were performed on a 1.5T or 3T scanner, using standard departmental protocol including proton density (PD) axial, T2FS (fat suppression) axial, T2FS coronal, T1 coronal, T2 GRE (gradient recalled echo) coronal and PDFS sagittal pulse sequences. As a tertiary referral center, we often receive and interpret MRs from outside institutions with variable pulse sequences, vendors and magnet strengths. All MRs, either obtained locally or from an outside institution, included fat-saturated, fluid-sensitive sequences (T2FS or PDFS) in the coronal and sagittal planes for adequate assessment of the capitellum. MRs with intra-articular contrast were not selected.
One of the authors (RB) not involved in patient care removed all identifying information from the MR images and uploaded the Digital and Communications in Medicine (DICOM) files to a web-based study platform (www.shoulderelbowplatform.com). Observers evaluated MR images using a built-in, web-based DICOM viewer and could adjust brightness, contrast, window leveling, zoom and measure distance. All questions related to one case had to be completed to proceed to the next case. Observers completed the study at their own pace on various computers if needed.
MR classification systems and lesion instability criteria for capitellar OCD.
MR: magnetic resonance; OCD: osteochondritis dissecans.
Statistical analysis
Agreement among observers was calculated using the multirater kappa (k) coefficient as described by Siegel and Castellan. Point estimates and two-sided 95% confidence intervals (CIs) were calculated as well.22,27–29 The multirater kappa is a commonly used statistic to describe chance-corrected agreement in interobserver studies. A value of 0 indicates no agreement beyond chance alone.30,31 A value of 0.01 to 0.20 is defined as slight agreement; 0.21 to 0.40, fair agreement; 0.41 to 0.60, moderate agreement; 0.61 to 0.80, substantial agreement; and more than 0.80, near-perfect agreement.30,31 Using a Z-test, differences in agreement between specific subgroups (e.g. years in practice, number of OCD cases per year, specialty) were analyzed.22,27,29 Statistical analysis was performed with the use of Stata 12.0 (StataCorp LP, College Station, TX, USA).
Results
Participants
Observer demographics (n = 33).
Interobserver agreement
Interobserver reliability of characterization of capitellar osteochondritis dissecans using MR imaging.
MR: magnetic resonance; CI: confidence interval.
Significantly more agreement when observers' responses were dichotomized: stable OCD versus unstable OCD.
Significantly more agreement when observers' responses were dichotomized: lesions ≤10 mm versus >10 mm.
When observers' responses for classification systems were dichotomized into either a stable or unstable OCD, there was more agreement for the Hepple (moderate, k = 0.52; p = .002) and Dipaola/Nelson classification (fair, k = 0.38; p = .015). Similarly, when observers' responses for instability criteria were dichotomized into either a stable OCD (i.e. none of criteria present) or unstable OCD (i.e. one or more criteria present), agreement significantly improved for DeSmet/Kijowski (moderate, k = 0.42; p = .001) and Satake (moderate, k = 0.41; p < .001).
Interobserver agreement was fair for lesion size assessment (k = 0.24); agreement improved to moderate when lesion size was dichotomized into lesions ≤10 mm versus >10 mm (k = 0.41; p < .001) (Table 3). Agreement was slight with regard to involvement of the lateral capitellar wall (k = 0.16).
Observers' confidence about their responses ranged from 2.5 to 2.7 (range, 1 to 4) (Table 3). None of the classification systems or instability criteria demonstrated significantly higher or lower confidence levels compared to the others (p > .05).
Factors associated with interobserver agreement
Interobserver agreement by years in practice: 0–10 years versus >10 years.
CI: confidence interval.
Significantly more agreement among observers 0–10 years in practice than observers >10 years in practice.
Interobserver agreement by capitellar OCD cases per year: 0–10 cases versus >10 cases.
CI: confidence interval.
Interobserver agreement by location of practice: North America versus Europe.
CI: confidence interval.
Interobserver agreement by specialty: orthopaedic surgeons versus musculoskeletal radiologists.
CI: confidence interval.
Significantly more agreement among radiologists than surgeons.
Discussion
The present investigation is the first that evaluated the interobserver reliability of existing MR classifications among a large group of physicians. Overall, there was limited interobserver reliability for classifications and lesion instability criteria, as well as for lesion size assessment and lateral capitellar wall involvement. Reliability significantly improved (fair to moderate) when classification systems and instability criteria were simplified (i.e. stable lesion versus unstable lesion). Similarly, reliability significantly improved (moderate) when OCD size was subdivided into small (≤10 mm) and large lesions (>10 mm). Overall, reliability was independent of the number of years in practice, as well as independent of the number of capitellar OCD cases a physician encountered per year.
The interobserver reliability for existing classification systems was lower in our investigation compared to two previous studies.7,32 Itsubo et al. reported an intraclass coefficient, of their own developed classification, that ranged from 0.82 to 0.88 (i.e. good to excellent reliability). 7 This was based on three orthopaedic surgeons who evaluated 52 MRs of capitellar OCD. The fact that there was substantially more agreement in their group may be the result of comprehensive experience in using its own classification since 2006. 7 This discordance may also be due to the fact that much more observers were included in the present study. Ellerman et al. determined the agreement for the Dipaola/Nelson classification based on MRs of knee OCD. 32 The authors reported moderate agreement among two musculoskeletal-trained radiologists and one musculoskeletal radiology fellow, compared to slight agreement in our investigation concerning capitellar OCD. Higher reliability reported by the authors may be due to the fact that the Dipaola/Nelson classification was originally developed for knee and talar OCD.18,26
Interestingly, our results showed substantially more agreement when classification systems7,25 and instability criteria8,9,23 were simplified. In other words, there is more consistency among observers in determining whether the OCD was either stable or unstable (fair to moderate) rather than classifying its specific stage (slight to fair). This finding suggests that one should be cautious when assigning grades using MR classifications. When making treatment decisions, one should rather use simplified distinctions (e.g. stable versus unstable OCD; lateral wall intact versus not intact), as these seem more reliable. The finding that relatively simple distinctions result in more reliability compared to classifications consisting of multiple subgroups is in line with previous studies who determined the reliability of the classification of proximal femur fractures.29,33
Overall, interobserver reliability did not differ between experienced and less experienced physicians: neither more years in medical practice nor a higher number of OCD cases per year were associated with more agreement (p > .05). The fact that agreement did not differ between experienced and less experienced physicians is consistent with two studies investigating the reliability of fracture characteristics on radiographs.29,34 It could be that the learning curve of assigning grades or measuring the width of an OCD is more steep in the first few years of practice and subsequently reaches a plateau after a certain point in time. Or it may be that assigning grades or measuring the width of an OCD is not experience dependent. Interestingly, only for the Hepple classification, we found more agreement among observers up to 10 years in practice compared to observers >10 years in practice. We hypothesize that this is the result of a relative large number of radiologists in the first group (11 radiologists) compared to the last (3 radiologists). Subgroup analysis demonstrated that radiologists interpreted two classification systems (including Hepple) and lesion size more consistent than surgeons. This indicates that, although agreement is still limited (fair), consulting a musculoskeletal radiologist should be part of routine clinical care in patients suspected for capitellar OCD.
One of the strengths of this study is that it investigated the interobserver reliability MR classifications among a large group of observers (n = 33), which allowed subgroup analysis. Furthermore, this is the first study that compared agreement among orthopaedic surgeons and musculoskeletal radiologists in such a large group. Lastly, all available classification systems were evaluated simultaneously by the same group of observers. However, the findings of this study should be interpreted by considering some limitations. First, observers did not have any additional training regarding the classification systems and instability criteria. Observers who were not familiar with these may have been more consistent if they had some form of training.28,35 For instance, the use of an image atlas at the time of assigning grades may have been helpful. Second, neither patient history nor physical examination was given, whereas treatment decisions in the orthopaedic practice are based on the whole patient instead of solely MR imaging. Although we aimed to determine the helpfulness of MR without any potential bias, adding this information would have led to more agreement. Third, most reviewers were not familiar with the web-based DICOM viewer which, though adequate for the purposes of this study, was not a fully functioning PACS workstation and lacked certain functionality such as the ability to cross-reference images, leading to difficulty navigating the cases and decreasing confidence.
The present study highlights the need to develop and test the most relevant MR classifications. Improved diagnostic MR protocols and training may lead to more consistency among physicians and ultimately to classifications that are reliable enough to be used for prognostic and therapeutic studies. Also, when comparing various studies, it is important that we use reliable distinctions to classify an OCD to ensure that patient characteristics are similar between studies.
Conclusions
This investigation adds to a growing body of evidence indicating that relatively simple distinctions on MR images are more reliable. One should be cautious when assigning grades using MR classifications in the assessment of capitellar OCD. When making treatment decisions, one should rather use simplified distinctions (e.g. stable versus unstable OCD; lateral wall intact versus not intact), among other factors such as the severity of symptoms and capitellar physis status.
Footnotes
Collaborators
Patzer T., Ovesen J., Oliviveira A.M., Meehan T.M., Bryan Jr. R.G., Lenobel S., Simeone F.J., Godoy I.R., Taneja A.K., Lambers Heerspink F.O., Porcellini G., Duncan S.F., Patino J.M., Marinelli A., Barco R., Bhatia D.N., Bilsel K., Chang C.Y., van Eck C.F., Kaar S.G., Palmer W.E., Shafritz A.B., Torriani M., Wong T.T., Oppenheimer J.D., Caputo A.E., Waryasz G.R., Arrigoni, P., Freehill M.T., van Bergen C.J., Lin D.J., Shahid K.R., and Vicentini J.
IRB approval
Massachusetts General Hospital IRB; protocol #: 2009P001019/MGH.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: One author (RB) receives support from small grants to cover expenses for living and housing during research activities: amount less than USD 10,000 from International Society of Arthroscopy, Knee, Surgery and Orthopaedic Sports Mendicine (ISAKOS) Foundation (San Ramon, USA), amount less than USD 10,000 from Marti-Keuning Eckhart Foundation (Lunteren, Netherlands), amount less than USD 10,000 from Hendrik Muller Foundation, and an amount less than USD 10,000 from Anna Foundation (Oegstgeest, Netherlands).
