Abstract
Abstract
Purpose
To develop and evaluate the reliability of an explicit set of parameters and criteria for simple bone cysts (SBCs) and evaluate the reliability of single versus serial chronological reading methods
Methods
Radiographic criteria were developed based on the literature and expert consensus. A single anteroposterior/lateral radiograph from 32 subjects with SBC were evaluated by three radiologists. A second reading was then conducted using revised criteria including a visual schematic. In the third reading the same images were assessed but radiologists had access to images from two additional time points. Inter-rater reliability was assessed after each reading using kappa (κ) and percentage agreement for categorical and binary parameters and intra-class correlation coefficient (ICC) for continuous parameters.
Results
Parameters that were revised with more explicit definitions including the visual schematic demonstrated consistent or improved inter-rater reliability with the exception of continuous cortical rim present and cyst location in the metaphysis and mid-diaphysis. Cortical rim displayed only slight reliability throughout (κ= -0.008 to 0.16). All other categorical parameters had a percentage agreement above 0.8 or a moderate (κ= 0.41 to 0.60), substantial (κ = 0.61 to 0.80) or almost perfect inter-rater reliability (κ = 0.81 to 1.0) in at least one reading. All continuous parameters demonstrated excellent inter-rater reliability (ICC > 0.75) in at least one reading with the exception of scalloping (ICC = 0.37 to 0.70). Inter-rater reliability values did not indicate an obviously superior method of assessment between single and serial chronological readings.
Conclusion
Explicit criteria for SBC parameters used in their assessment demonstrated improved and substantial inter-rater reliability. Inter-rater reliability did not differ between single and serial chronological readings.
Level of Evidence
Not Applicable
Introduction
Simple bone cysts (SBC) otherwise known as unicameral bone cysts are the most common benign bone lesions in children and adolescents.1,2 Patients with SBCs usually present with fracture or pain in the affected limb.3–5 SBCs have a high probability of recurrence and repeat fracture after treatment,3,6–10 and may even cause discrepant limb lengths or angular deformity.1,4,7,11 While benign, SBCs can have a life-long impact on children through risk of fracture and activity restriction, thus, clinicians need reliable measures for their assessment.8,10,12
The radiographic features of SBC can include those used for diagnosis such as the location of the cyst, prognosis such as the cyst index, 13 or as an outcome of treatment, for instance, the Neer classification. 14 Other SBC features are also consistent with various benign bone lesions, such as cortical reaction. However, few of these parameters have been shown to be reliable and the absence of specific criteria may partially explain the conflicting findings as to which characteristics can be used to predict healing or fracture.12,13
Beyond the criteria used to evaluate an evolving disease such as SBC, whether radiographs should either be interpreted individually without review of prior images or assessed in a chronological series has received little attention in the orthopaedic literature. While serial chronological readings may introduce bias from prior examinations, 15 it may be advantageous to have an illustration of the chronological progression of the disease which could lead to assessments more sensitive to change.15,16
The purpose of this study was: a) to evaluate the reliability of radiographic criteria used in diagnosis, prognosis or outcome of SBC as a typical benign bone lesion; and b) to evaluate the reliability of a single
Materials and methods
Patient sample
The patient sample consisted of 32 patients aged four to 14 years (median ten years) with a radiographic diagnosis of a SBC. All participants were previously enrolled in a multi-centre trial for the treatment of a SBC. 17 For each study subject, an anteroposterior (AP) and lateral (Lat) radiograph of the cyst at baseline, one year and two years post-treatment were obtained and de-identified. Approval for this study was granted by the institutional research ethics board of the Hospital for Sick Children, Toronto, Ontario, Canada.
Development of radiographic parameters
A literature search using MEDLINE was conducted to collate radiographic characteristics of SBCs. The following search string was used: bone cysts/and (unicameral or multicameral or simple), radiography, treatment outcome, treatment failure and aneurysmal bone cysts. The search was limited to the English language and filtered to include humans and children 0 to 18 years old. Redundant parameters describing similar characteristics were consolidated or eliminated. Parameters for characterization of bone cysts were defined either by consensus of authors or by modified criteria from existing definitions in the literature (Table 1). In some cases, variables were assigned reference images (Fig. 1) and those parameters related to continuous data such as cortical thinning were divided into categorical scales such as 0% to 25%, 25% to 50%, etc. Factors were grouped into those used for diagnosis, prognosis and outcome assessment of SBC.
Diagnostic, prognostic and outcome parameters and their definitions, criteria and interpretation when present in simple bone cysts
parameters that were illustrated on the visual schematic created for the second reading
AP, anteroposterior; Lat, lateral

Product scale developed to represent the cyst grade spectrum.
Radiographic assessment
The readers involved were three board-certified paediatric musculoskeletal radiologists with 29, seven and 13 years of experience after training, respectively (PB, ASD, JS). Each radiograph was assigned a subject ID and images were read using picture archiving and communications system workstations with the default calibration settings and measurement tools available in clinical practice. The radiologists had no prior discussion regarding measurement or specific instructions other than the criteria provided. Data was recorded on a paper case report form and later inputted into a secure REDCap database. 18
First reading
AP/Lat images from one of the three time points available for each of the 32 patients was randomly selected and reviewed by the three radiologists. After an initial analysis, with the assistance of one of the radiologists (ASD) more precise definitions were added to the criteria with particular attention to those with poor reliability (Table 1). A visual schematic illustrating how to classify various cystic features was also created (Fig. 2).

Visual schematic developed to illustrate criteria for radiologists in the second and third readings.
Second reading
Upwards of one year after the first reading, the two radiologists (PB, JS) not involved in the revisions to the parameters evaluated the same 32 AP/Lat radiographs using the revised criteria, case report form and the visual schematic.
Third reading
At least two weeks after the second reading, the two radiologists (PB, JS) evaluated the same 32 radiographs using the criteria. However, each radiologist had the ability to view all three images for the corresponding patient simultaneously and had knowledge of their chronological order (serial chronological method).
Statistical analysis
Based on the number of radiologists per reading, sample size calculations yielded estimates of 32 patients required for the first reading and 19 required for the second and third reading. 19 All inter-reader reliability was determined using the intra-class correlation coefficient (ICC) for continuous parameters and the kappa statistic (κ) and percentage agreement for categorical or binary parameters. ICC values range from 0 to 1 and values above 0.75 were considered to have excellent reliability. 20 Kappa values range from -1 to 1 21 and were categorized as poor (κ < 0), slight (κ = 0 to 0.20), fair (κ = 0.21 to 0.40), moderate (κ = 0.41 to 0.60), substantial (κ = 0.61 to 0.80) and almost perfect (κ = 0.81 to 1). 22 The kappa statistic tends to penalize the reliability of data with an imbalanced distribution that is heavily skewed towards positive agreement,23–27 as agreement attributed to chance is then assumed to be high which then lowers the kappa value. 28 Therefore, percentage agreement was also used as a primary measure of reliability in this study.
Results
Out of the 32 patients, 22 (69%) were male and ten (31%) were female. There were 24 (75%) and eight (25%) SBCs located in the humerus and femur, respectively. A summary of the resulting reliability values for bone measurements and each of the parameters can be found in Table 2. All categorical parameters demonstrated moderate to almost perfect kappa values during at least one of the readings with the exception of the percentage of continuous cortical rim and cyst location, which consistently had a poor or slight kappa value. The percentage of scalloping present around the cyst border was the only continuous parameter to consistently demonstrate less than excellent reliability.
Kappa (κ), percentage agreement (%) and intra-class correlation coefficient (ICC) values for all parametersSingle hyphens indicate value not calculated due to data classification as continuous or categorical. Dashed lines are indicative of a parameter that was irrelevant to that particular stage of reading
Single hyphens indicate value not calculated due to data classification as continuous or categorical. Dashed lines are indicative of a parameter that was irrelevant to that particular stage of reading n, sample size; CI, confidence interval
Within the diagnostic parameters, cyst location and periosteal reaction demonstrated variability in kappa values, however, both had percentage agreement values above 0.8 in at least one of the readings. Loculation had fair to substantial reliability across readings. Prognostic parameters based on continuous data demonstrating consistently excellent reliability included cyst dimensions, cyst volume and tubulation measurements. Cyst activity and indices demonstrated less than excellent reliability in the first reading only, otherwise they also showed excellent reliability. Prognostic parameters based on categorical data including the presence and absence of tubulation, whether the tubulation caused expansion to the bone and the presence and absence of scalloping demonstrated slight to substantial kappa values but had percentage agreement values above 0.8 across the second and third readings. The outcome parameter of fracture had fair to substantial kappa values but percentage agreement values above 0.8. Cyst grade had fair to moderate reliability. Bone measurements consistently demonstrated excellent reliability with the exception of the width of the nearest physis in the first reading.
Overall, the second reading using the revised criteria and schematic displayed either consistent (difference < 0.10) or higher reliability values compared with the first reading for all continuous parameters and for all categorical parameters with the exception of the location in the metaphysis and mid-diaphysis and cortical rim for percentage agreement. Using the schematic and revised criteria, loculations, cyst activity, widest width of tubulation, expected bone width in tubulated area, percentage of scalloping around the cortical rim and bone measurements all demonstrated improved reliability between the first and second reading.
Comparing single and serial-chronological methods of reading
Between the single and serial chronological series methods, inter-rater reliability values did not indicate an obviously superior method. Parameters for which kappa values and percentage agreement had improved by more than 0.2 (20%) were cyst location in the metaphysis and mid-diaphysis. Number of loculations had a kappa and percentage agreement value that decreased by more than 0.2 (20%). Continuous parameters had ICC values that remained stable across methods.
Discussion
Reliable radiographic assessment is essential for the diagnosis, prognosis and outcome of benign bone lesions. Although parameters such as the cyst grade, index and activity had explicitly defined criteria in the literature,3,13,14,29 the remainder of SBC parameters did not. This study initially evaluated the inter-rater reliability of parameters based on descriptions that were available, and after a schematic and precise criteria were added, reliability generally improved. Thus, the translation of these descriptive characteristics into specific binary categories, ordinal scales and quantitative measurements depicted in the revised criteria should guide clinicians when evaluating benign bone lesions such as SBCs.
Many of the SBC parameters that were evaluated in this study are also relevant to other bone lesions.30–36 In contrast to prior studies, one notable trend is that the kappa values for the categorical parameters in our study displayed greater variance across readings than those reported in prior literature.30–33,35 The likely explanation is that whereas prior studies almost exclusively used binary scales (i.e. absent or present), this study used ordinal scales. Multiple categories provide clinicians with more specific information but may affect reliability.
Diagnostic factors, such as the cyst's location,4,7 number of loculations,7,8,37 the presence of a cortical rim,7,29,38 and evidence of periosteal reaction30,39 have been used to identify SBCs. The inter-rater reliability of the number of loculations and periosteal reaction in this study were similar or better compared with studies on other benign bone lesions such as aneurysmal bone cysts 30 or malignant bone lesions, and are useful parameters.31–33 The criteria for the cortical rim demonstrated consistently poor reliability in this study due to the ambiguity of the cyst walls. Comparable reliability values for cortical rim in literature have varied depending on whether the study evaluated cortical rim thickening or destruction.31–33 The lack of a consistent method of assessment and the variability in reliability indicates this parameter as an unreliable factor, therefore, we recommend against its use to assess SBCs.
Prognostic information such as scalloping of the cyst wall and tubulation 40 (widening) have been associated with non-healing and cyst recurrence. Additionally, active cysts1,29,37 or an increase in cyst volume7,41 have been associated with fracture. Prior studies of lesions such as enchondroma and chondrosarcoma have demonstrated high percent agreement, 33 with slight to fair kappa values. 31 The reliability of tubulation, lesion activity and volume have not been previously investigated but demonstrated excellent values in this study and thus are useful parameters. The cyst index as proposed by Kaelin and McEwen 13 is another prognostic tool that has been used to assess SBCs.1,8,10,42 Vasconcellos et al 34 reported poor inter-observer reliability most likely due to the use of trapezoidal shapes to approximate the cyst area for the equation. This can create unreliability due to the variable configuration of SBCs. To address this, one study used tracing software to determine the area and reported substantial reliability. 35 The current study created two modified cyst indices (A and B) based on a rectangular approximation. Both demonstrated excellent reliability in the second and third readings. However, as Cyst index B was more reliable in the first reading and also accounts for cyst depth, this is the preferred method.
Currently, the most commonly used measure of outcome in the form of cyst healing is the Neer Classification, 14 and the modified version by Hashemi-Nejad and Cole. 3 Reliability values in this study were consistently lower than previous studies reporting substantial or almost perfect reliability (κ = 0.76 to 0.88).35,36 This raises concerns that cyst healing may not be a reliable measure of treatment success. Identifying the presence of fracture had an excellent percentage agreement as previously reported. 33
The second objective of this study was to assess the reliability of reading radiographs individually
There are many orthopaedic measurements that lack specific criteria, which can cause variability in clinical evaluations. Many studies of measurement variability assess reliability once and often conclude unreliability. This study was relatively unique in that after finding poor reliability values in the first reading, more specific criteria and a schematic were developed which enhanced the reliability of all but two parameters. As a result, variables that we recommend for use in the assessment of SBCs include all that were studied with the exception of the cortical rim. Future studies should aim to not only evaluate reliability but to also implement strategies to enhance reliability.
The main limitation for the current study is the generalizability of results. Reliability values and measurements were obtained in a study setting from two to three radiologists, all with long-term experience. However, radiologists did not receive a formalized tutorial session prior to readings, to mimic clinical practice and to minimize a false elevation in reliability due to a study setting. Second, this study only investigated variability due to the clinician. 44 Other sources of variance such as patient or procedure may affect the reliability and thus the reliability found in this study should be considered a slight overestimate of the true reliability. 44 Finally, this study evaluated the radiographic assessment of SBCs and, therefore, the results cannot necessarily be extrapolated to other benign bone lesions.
In conclusion, the majority of the radiographic criteria developed for the diagnosis, prognosis and outcome of SBC were found to be reliable and improved with explicit criteria and a visual schematic. Inter-rater reliability did not differ between single and serial-chronological readings.
Footnotes
Acknowledgements
We would like to thank Peggy Law for her help on this project.
SC: data collection, all of the above.
