Abstract
Background
Radiographs are part of routine clinical care after radial head arthroplasty (RHA). Therefore, the aim of this diagnostic study was to assess the interobserver reliability of radiographic assessment following RHA.
Methods
Anteroposterior (AP) and lateral radiographs of 24 consecutive patients who underwent press-fit bipolar RHA were evaluated with respect to 14 parameters by 14 orthopaedic surgeons specializing in the elbow: shaft loosening (AP, lateral), subcollar bone resorption, nonbridging heterotopic ossification, capitellar erosion, capitellar osteopenia, implant size, ulnohumeral joint gapping, ulnohumeral joint degeneration, proximal radio-ulnar joint congruency, stem size, stem positioning (AP, lateral) and component dissociation or polyethylene wear of the head with increased angulation. Observer agreement was evaluated using the multirater kappa (κ) measure.
Results
Nine of 14 parameters had poor interobserver agreement [κ = 0.0 to 0.20, confidence interval (CI) = 0.0 to 0.31). Four parameters had fair agreement: subcollar bone resorption (κ = 0.27, CI = 0.12 to 0.40), capitellar erosion (κ = 0.30, CI = 0.20 to 0.40), ulnohumeral joint degeneration (κ = 0.35, CI = 0.22 to 0.51) and stem positioning in AP view (κ = 0.24, CI = 0.14 to 0.36). One parameter had moderate agreement: nonbridging heterotopic ossification (κ = 0.47, CI = 0.31 to 0.64).
Conclusions
The overall interobserver reliability of radiographic assessment following press-fit bipolar RHA was poor among experienced elbow surgeons. Therefore, radiographic evaluation after RHA should be interpreted with caution when making treatment decisions.
Introduction
Radial head arthroplasty (RHA) is indicated for selected displaced or comminuted radial head fractures, especially in unstable elbows or when reduction cannot be accomplished with open reduction and internal fixation.1–5 RHA is also performed in patients with persistent post-traumatic elbow symptoms, including nonunion and malunion of the radial head, and symptomatic elbow instability following previous excision of the radial head.3,4,6–8
Clinical and radiographic follow-up is important after RHA because loosening of the prosthesis, osteolysis, erosion of the capitellum, and implant failure can affect long-term outcomes.1,9,10 Conventional radiographs have been widely used in orthopaedic practice to monitor adverse events after RHA.1,9,10 Several studies have demonstrated that radiographic abnormalities (e.g. heterotopic ossification, subcollar bone resorption, and signs of implant loosening) result in poor clinical outcome in RHA.9–11 Ha et al. 9 reported an association between radiographic abnormalities and the presence of symptoms (e.g. pain, range of motion and instability). Additionally, a significant association between heterotopic ossification and restricted range of motion was found by Rotini et al. 10 at 2-year follow-up.
Ideally, radiographic assessment would be reliable and consistent among surgeons because it is widely used for follow-up of RHA and clinical research purposes. However, interpretation of radiographs may vary from surgeon to surgeon,12–14 and thus it would be useful to determine whether radiographic assessment for signs of malpositioning and loosening in RHA is reliable among experienced elbow surgeons. To our knowledge, this has not yet been investigated.
The purpose of the present study was to evaluate the interobserver reliability of radiographic assessment following press-fit bipolar radial head arthroplasty. We hypothesized that there would be moderate interobserver agreement among experienced elbow surgeons for the evaluation of 14 radiographic parameters in press-fit bipolar RHA.
Materials and methods
Orthopaedic surgeons specializing in elbow surgery from several countries were invited to participate in this interobserver study. They were asked to evaluate postoperative radiographs of 24 consecutive patients who had press-fit bipolar radial head arthroplasty (RHS®; Tornier, Montbonnot-Saint-Martin, France). Indications for RHA were post-traumatic symptoms following a radial head fracture, such as persistent pain, restricted range of motion and instability. The mean (SD) radiographic follow-up was 27 (10) months. The right elbow was treated in 13 patients and the left elbow in 11 patients.
Plain anteroposterior (AP) and lateral radiographs of the elbow were available for each patient. A research fellow not involved in patient care removed all identifying information from the radiographs and uploaded the Digital and Communications in Medicine (DICOM) files to a web-based study platform (www.shoulderelbowplatform.com). Observers evaluated radiographs using a built-in DICOM viewer and were able to adjust brightness, contrast, window leveling and zoom in. All questions related to one case had to be completed to proceed to the next case. The observers completed the study at their own pace and in their own time on various computers, if necessary.
Radiographic assessment after radial head arthroplasty (n = 14).
AP, anteroposterior.
This retrospective study was approved by the Medical Ethics Committee of our institution. Data were collected as part of routine clinical care and each patient consented that their patient data could be used for scientific purposes.
Statistical analysis
Agreement among observers was calculated by using a multirater kappa (k), as described by previously.20–22 Point estimates and two-sided 95% confidence intervals (CIs) were calculated for each radiographic parameter. The multi-rater kappa is a commonly used statistic to describe chance-corrected agreement in various interobserver studies.20–22. A value of 0 indicates no agreement beyond chance alone. A value of –1.00 indicates total disagreement and +1.00 represents total agreement.21,22 A value of 0.01 to 0.20 is defined as poor agreement; 0.21 to 0.40, fair agreement; 0.41 to 0.60, moderate agreement; 0.61 to 0.80, substantial agreement; and more than 0.81, almost perfect agreement.21,22 Subgroups (number of years in practice and number of RHAs performed per year) were compared using the Z-test.23,24 Multi-rater kappas were calculated with use of Stata, version 12.0 (StataCorp LP, College Station, TX, USA).
Results
Participants
Observer demographics (n = 14).
Interobserver agreement
Observers had poor interobserver agreement in nine of 14 evaluated radiographic parameters (κ = 0.0 to 0.20, CI = 0.0 to 0.31). Two examples of parameters with poor agreement (i.e. implant size and loosening of the shaft) are shown in Figs 1 and 2. There was fair agreement in four parameters: signs of subcollar bone resorption (κ = 0.27, CI = 0.12 to 0.40), signs of erosion of the capitellum (κ = 0.30, CI = 0.20 to 0.40), signs of degeneration of the ulnohumeral joint (κ = 0.35, CI = 0.22 to 0.51) and stem positioning in AP view (κ = 0.24, CI = 0.14 to 0.36). There was moderate agreement for assessment of nonbridging heterotopic ossification (κ = 0.47, CI = 0.31 to 0.64) (Fig. 3). Additional analysis performed by dichotomizing the responses of shaft loosening (parameters 1 and 2) into ‘loosening present’ or ‘loosening not present’ did not improve interobserver agreement. A detailed summary for each parameter is shown in Table 3.
Radiographs of a 55-year-old female who underwent radial head arthroplasty after a comminuted radial head fracture and dislocation. Observers disagreed on radial head implant size (diameter); eight observers assessed implant size as just right and six observers assessed implant size as too large. Radiographs of a 48-year-old female who underwent radial head arthroplasty for post-traumatic symptoms following radial head excision and a fracture of the proximal ulna that was previously treated operatively. Observers disagreed on signs of loosening of the shaft; six out of 14 observers assessed shaft loosening in anteroposterior view and 10 out of 14 assessed shaft loosening in lateral view. Radiographs of a 56-year-old man who underwent radial head arthroplasty and lateral collateral ligament repair after a malunion following a comminuted radial head fracture. All observers agreed on presence of nonbridging heterotopic ossification. Interobserver agreement of radiographic assessment after radial head arthroplasty (n = 14). AP, anteroposterior.


For each of the 14 parameters, subgroup analysis showed that there was no difference in interobserver agreement between surgeons who performed more than five RHAs per year and surgeons who performed between one and five RHAs annually (p > 0.05). For parameter 5 (presence of capitellar erosion) and parameter 14 (component dissociation or polyethylene wear of the head with increased angulation of the head in relation to the shaft), there was greater agreement among surgeons who were in practice for ≤ 10 years compared to surgeons who were in practice for > 10 years (p < 0.001). No difference in agreement was found for the remaining 12 parameters (p > 0.05).
Discussion
Plain radiographs are routinely performed in follow-up of radial head arthroplasty. It is important to determine whether radiographic assessment is reliable among surgeons. The present study shows that the interobserver agreement for radiographic evaluation after placement of a press-fit bipolar radial head prosthesis is poor, even among experienced elbow surgeons. Only one of 14 parameters (i.e. nonbridging heterotopic ossification) showed moderate interobserver agreement. Fair agreement was found in four parameters: signs of degeneration of ulnohumeral joint, signs of capitellar erosion, signs of subcollar bone resorption and stem positioning in AP view. Interobserver agreement was poor for the remaining parameters.
No direct comparison can be made because RHA is a fairly new procedure and there are no studies that have investigated the reliability of radiographic assessment following such procedures. Nevertheless, similar values were reported in previous studies that evaluated the reliability of radiographic assessment in total hip arthroplasty.12–14 Smith et al. 14 found limited interobserver agreement (k = 0.26) in an assessment of radiolucency and loosening in the Gruen zones; a system of dividing the interface between the femoral component and femur in seven areas. 25 Muir and colleagues described comparable kappa values (k = 0.24 – 0.41) in an evaluation of the Engh Grading Scale; a scale that is widely used in the follow-up of uncemented total hip arthroplasty. 13 No difference in interobserver variability was seen between different specialties: two arthroplasty surgeons, a senior orthopaedic resident, and a radiologist. 13 In our study, for two parameters (component dissociation or polyethylene wear of the head with increased angulation of the head in relation to the shaft), there was greater interobserver agreement among surgeons who were in practice for ≤ 10 years in comparison with surgeons in practice > 10 years. Because this was only the case in two parameters and the fact that the interobserver agreement was still only fair at best, we consider that no hard conclusions can be drawn from the subgroup analysis performed in the present study.
Limited interobserver reliability of radiographic assessment may be attributable to its subjective nature and the fact that most parameters require a categorical response. For example, in our study, to indicate whether the stem size is correct, observers had to fill in whether the stem was ‘too large’, ‘too small’ or ‘just right’. Similarly, to assess gapping of the ulnohumeral joint, observers responded with either ‘yes’ or ‘no’. We hypothesize that replacing categorical parameters (qualitative) with continuous parameters (quantitative) may improve interobserver reliability; however, Al-Ahaideb et al. 12 found only slightly greater reliability after adding a quantitative component in radiographic evaluation of total hip arthroplasty. Although the overall interobserver agreement was poor in our study, it is important to note that agreement for nonbridging heterotopic ossification was moderate (κ = 0.47).
The findings of the present study have implications for radiographic assessment in the clinical setting and its use in research. Radiographic evaluation after RHA is a routine part of follow-up surveillance in orthopaedic practice. However, based on the inconsistency among observers in the evaluation of 24 consecutive cases, one should be cautious when interpreting postoperative radiographs and one should not solely rely on them. Also, one should take into consideration the overall poor interobserver agreement when using multiple observers for radiographic assessment in longitudinal follow-up studies of RHA. Future studies should focus on standardizing radiographic assessment following RHA. Precise and universal definitions and terminology for radiographic parameters need to be developed to improve interobserver agreement among surgeons. Furthermore, training in radiographic assessment is needed because RHA is a relatively new procedure, and not intimately familiar to a lot of surgeons. Future studies should also investigate the observer agreement for different designs of radial head prostheses, with and without the use of cement.
Some limitations of the present study should be taken into consideration. First, all patients in the present study had a press-fit bipolar radial head prosthesis. Therefore, our findings may not be applicable to other types of ingrowth radial head prostheses or cases in which cement was utilized. Second, none of the observers were specifically trained prior to participation in the present study. Interobserver agreement may have been greater if the observers were educated how to evaluate plain radiographs. Third, to date, none of the 24 consecutive patients needed subsequent surgery at mean follow-up of 27 months. This may be the reason that only a few obvious radiographic abnormalities could be observed.
Conclusions
The present study demonstrates overall poor interobserver reliability of radiographic assessment in press-fit bipolar radial head arthroplasty among surgeons with elbow expertise. In a clinical or research setting, caution is waranted when interpreting postoperative radiographs in RHA.
Footnotes
Acknowledgments
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Ethical Review and Patient Consent
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. For this type of study, formal consent is not required. Ethics Committee Review (Amphia Hospital, Breda, The Netherlands) was not required for the present study according to Dutch law because it was a retrospective review that used anonymous data.
