Abstract
Introduction
The updated Bosniak classification in 2019 (v2019) addresses vague imaging terms and revises the criteria with the intent to categorise a higher proportion of cysts in lower-risk groups and reduce benign cyst resections. The aim of the present study was to compare the diagnostic accuracy and inter-observer agreement rate of the original (v2005) and updated classifications (v2019).
Method
Resected/biopsied cysts were categorised according to Bosniak classifications (v2005 and v2019) and the diagnostic accuracy was assessed with reference to histopathological analysis. The inter-observer agreement of v2005 and v2019 was determined.
Results
The malignancy rate of the cohort was 83.6% (51/61). Using v2019, a higher proportion of malignant cysts were categorised as Bosniak ≥ III (88.2% vs 84.3%) and a significantly higher percentage were categorised as Bosniak IV (68.9% vs 47.1%; p = 0.049) in comparison to v2005. v2019 would have resulted in less benign cyst resections (13.5% vs 15.7%). Calcified versus non-calcified cysts had lower rates of malignancy (57.1% vs 91.5%; RR,0.62; p = 0.002). The inter-observer agreement of v2005 was higher than that of v2019 (kappa, 0.70 vs kappa, 0.43).
Discussion
The updated classification improves the categorisation of malignant cysts and reduces benign cyst resection. The low inter-observer agreement remains a challenge to the updated classification system.
Introduction
The Bosniak classification stratifies renal cysts using radiological appearances to predict the risk of malignancy and guide management. Determined by malignancy rates, the literature supports discharge of Bosniak I and II cysts, imaging follow-up of Bosniak IIF cysts and surgical intervention for Bosniak III and IV cysts. Categorising cysts according to the original Bosniak classification has a number of shortcomings. It suffers from subjective criteria, significant interobserver variation and frequent inappropriate resection of benign cysts.1–5 Low-grade malignant cysts are also frequently resected often conferring little survival benefit. 6
With the aim of addressing these limitations, Silverman et al. developed the updated classification.7,8 The updated guidelines clarify vague imaging terms and revise the criteria with a view to reduce the rate of benign cyst resections and improve inter-observer agreement. Recent studies suggest an improvement in inter-observer agreement with the updated classification; however, the inter-observer agreement between radiologists of differing levels of experience remains unclear. Whether or not the updated classification system improves the diagnostic accuracy for malignancy of the Bosniak categorisation system is also uncertain.
The present study aims to compare the diagnostic accuracy and inter-observer agreement rate between the updated (v2019) and original (v2005) classification systems.
Method
Population cohort
This is a further update to our previous publication, where the process of cohort identification has been described. 5 Briefly, all patients with complex renal cysts diagnosed between January 2009 and December 2019 over a defined geographical area of approximately 420,000 people were recorded. Those who did not have a triple-phase characterisation CT scan or a resected/biopsied Bosniak cyst histopathology were excluded from the analysis. Data was collected from multiple electronic databases using deterministic records-linkage methodology and patients were tracked using a unique 10-digit Community Health Index number. This yielded demographic data, CT images for characterisation and pathological analysis.
Imaging protocol
It is well established that reliable characterisation of renal lesions requires multi-phasic CT. 5 At our institution the CT protocol for the characterisation of renal lesions has evolved over time. Previously, only pre-contrast and nephrographic (100 s) phases were included. More recently an arterial phase (20–30 s) was added to aid assessment of vascular extension of tumours, but also delineate vascular anatomy prior to possible surgery. It has also been suggested that, in rare cases, hyper-vascular RCCs may be better detected in the arterial phase due to rapid washout.6,7 Our dataset reflects this evolution, with both dual and triple-phase imaging being used in this study, which includes patients from 2009 to 2018. In all cases, patients underwent a minimum of dual-phase imaging following IV administration of 100 ml of an iodine-based contrast agent (Omnipaque) (Figure 1).9-11

Example of CT protocol for a Bosniak IV cyst (v2019). A – preconstrast; B – arterial phase; C – nephrographic phase.
Comparison of v2005 and v2019
The CT scans were then reviewed by one well-experienced uroradiologist with over five years’ experience in uroradiology and one less experienced uroradiologist (less than five years of experience). The Bosniak cysts were first characterised using v2005. With at least one-month separation, the cysts were re-characterised according to v2019 by the same uroradiologists (blinded to results of the v2005 analysis). The diagnostic accuracy of both classification systems was calculated with reference to the histopathological analysis. The degree of agreement between v2005 and v2019 was compared using quadratic kappa scores.
Cyst characteristics were compared between the benign and malignant sub-groups using chi-squared analysis. Statistical significance was considered for p-values less than 0.05.
Interobserver agreement
The interobserver agreement of v2005 was determined by comparing the categorisations of the above well-experienced uroradiologist (more than five years experience) with a second less experienced uroradiologist (less than five years experience). This was repeated for v2019. Both well- and less-experienced radiologists categorised the cysts to offer insight into the usability of the updated classification system between those with varying experience levels. The inter-observer agreements were demonstrated by calculating a quadratic kappa score. All statistical tests were conducted using the STATA/IC 19.1 statistical package.
Results
There were 61 complex renal cysts included in the study, identified in 59 patients (median age, 62 years; M:F, 1.3:1.0). The malignancy rate of the cohort was 83.6% (51/61) confirmed by resection in 60 cases and biopsy in 1 case. The histopathology of the malignant cysts was as follows: 80.4% clear cell (41), 13.7% papillary (7), 3.9% chromophobe (2) and 2.0% collecting duct tumour (1). The 10 benign cysts comprised seven simple cysts, two oncocytomas and a mixed epithelial/stroma tumour. Radiological findings of the cysts categorised by v2005 and v2019 are reported in Tables 1 and 2, respectively.
v2005 findings.
v2019 classification findings.
Diagnostic accuracy of v2005 and v2019
The categorisation of benign and malignant cysts is demonstrated in Table 3. Using v2019, 88.2% of malignant cysts (45/51) were categorised as Bosniak III/IV versus 84.3% (43/51) using v2005 (p = 0.56). When using Bosniak ≥ III as a threshold for surgery, v2019 would have resected fewer benign cysts (7/52; 13.5%) in comparison v2005 (8/51; 15.7%; p = 0.97). Sixty-nine percent of malignant cysts (35/51) were categorised as Bosniak IV v2019, compared to 47.1% (24/51) in v2005 (p = 0.049). Of the cysts categorised as III/IV using v2019, the malignancy rate was 86.5% (45/52), compared with 84.3% (43/51) using v2005 (p = 0.75) (Table 4).
Categorisation of benign and malignant cysts, v2005 and v2019.
Categorisation of cysts by Bosniak grade, v2005 and v2019.
Cyst characteristics were compared between the benign and malignant groups. Cysts with calcification had lower rates of malignancy versus those without calcification (57.1% vs 91.5%; RR, 0.62; p = 0.002) in the v2019 classification. Other than this finding, no specific cyst characteristic was positively or negatively associated with malignancy (p > 0.05).
The agreement rate between v2005 and v2019 is poor, demonstrated by a weighted kappa of 0.48 (Table 5). v2019 upgraded 26.2% (16/61) cysts and downgraded 11.5% (7/61) of cysts. Of the 16 upgraded cases in v2019, 10 cases were due to enhancing wall nodularity and eight cases of enhancing septal nodularity, otherwise categorised in Bosniak IIF/III in v2005. There was also one case of septal irregularly categorised as Bosniak III on the updated system which was categorised as Bosniak IIF with v2005.
Comparison of categorisation, v2005 versus v2019.
Interobserver agreement of v2005 and v2019
Interobserver agreement between the two radiologists using v2005 was 0.70 demonstrating a good degree of agreement (Table 6). The interobserver agreement rate in v2019 was lower at 0.43 (Table 6).
Interobserver agreement; v2005 and v2019.
Discussions
The present study demonstrates an improvement in diagnostic accuracy with the utilisation of v2019. A higher proportion of malignant cysts are categorised as Bosniak ≥ III (88.2% vs 84.3%) and a significantly higher percentage were categorised as Bosniak IV (68.9% vs 47.1%; p = 0.049). The rate of benign cyst resection is also reduced (13.5% vs 15.7%). v2019 upgraded a significant number of cysts (32.8%) either due to enhancing wall or septal nodularity, which otherwise would be characterised as Bosniak IIF/III. Inter-observer agreement of v2019 is poor and was lower than v2005 (0.43 vs 0.70).
Prior to the introduction of the IIF category, the malignancy rate of resected Bosniak III cysts was low (31–45%), however, with the introduction of IIF, studies reported increased malignancy rates in the Bosniak III category (60–81.8%).12–15 Despite this improvement, the unnecessary resection of benign cysts remained and prompted the updated classification.7,8 Similar to our study, two other studies have demonstrated increased specificity and a reduction in benign cyst resections when using v2019. When also using Bosniak ≥ III as a threshold for surgery, Tse et al. found that v2019 resulted in less benign cyst resections (52.4% vs 66.7%). 16 Similarly, Park et al. found improved specificity with v2019 (70–73% vs 50–56.7%). 17
As a result of the cyst upgrades, the rate of malignant cyst resection in the Bosniak ≥ III subgroup appeared to increase from 84.3% to 88.2%. Whilst Chan et al. showed maintained sensitivity with v2019 (100% vs 100%) no other studies have demonstrated such a finding as presented here. 18 Instead, Tse et al. actually found a reduction in malignant cyst resections when using Bosniak ≥ III as a threshold for surgery (81.6% vs 89.2%). 16 Both Tse et al. and Yan et al. found a higher proportion of malignant IIF cysts with v2019.16,19 Although this may be an expected consequence of v2019, the reason for our contrasting findings is clear. v2019 frequently upgraded Bosniak IIF/III with enhancing wall/septal irregularities to Bosniak IV under the term ‘enhancing nodules’ and resulted in the more appropriate categorisation of malignant cysts. 8 The significant implication of this finding is that v2019 may actually reduce benign cyst resections without compromising on rates of malignant cyst resection.
The agreement between v2005 and v2019 is low but expected (kappa = 0.48). Cyst upgrades using v2019 were often due to enhancing wall/septal nodules (Bosniak IV) categorised as Bosniak IIF/III with v2005. Cysts downgrades were often due to soft tissue components (Bosniak IV) on v2005 that did not meet the criteria for ‘enhancing nodule’ on v2019. Interestingly, there were actually more cyst upgrades than downgrades with the new classification (26.2% vs 11.5%), which was not the intention of the v2019. It must be noted that the cohort only includes resected cysts and does not include those monitored under surveillance imaging which may have influenced this finding.
Previous studies have either demonstrated higher inter-observer agreement with v2019 compared to v2005 or no significant difference.16,18–22 nevertheless, it must be stated that the inter-observer agreement rate remains low and is a concern going forward. For example, Yan et al. quote a kappa ranging between 0.26 and 0.47 and Shampain et al. report a gwet coefficient of 0.50.19,20 The present study is no different and reports a comparably low inter-observer agreement (kappa = 0.43). Our unique stance on inter-observer agreement is the comparison between a well-experienced (less than five years) and less-experienced (more than five years) radiologist to determine the usability of the updated classification system between those with varying experience levels. The reduction in the agreement with v2019 could suggest difficulty with inexperienced users, perhaps secondary to the more precise criteria. Similarly, Shampain et al. found a lower inter-observer agreement (gwet, 0.50 vs 0.54) between novices compared to experts when categorising using v2019.
Despite the updated guidance, the benefit of surgical resection of Bosniak cysts is still under speculation. The cancer-specific mortality rate of our cohort has previously been reported and is very low (1.7% over 77.6 months follow-up) with the vast majority of deaths in complex renal cystic disease being related to non-cancer specific causes. 6 The difference in mortality between resected and conservatively managed cysts was insignificant in surgical candidates. As such the identification and then resection of more aggressive malignant cysts may be more important and is an area of future research. Of course, when recommending surgical resection, clinicians should always account for the patient's comorbid status when aiming to minimise operative risk, particularly when survival benefit is unclear.
More recently there has been increasing research into the role of artificial intelligence in the detection of malignant cysts. Miskin et al. performed a retrospective study on 147 patients to determine if computed tomography-based texture-based machine learning algorithms improve the detection of malignant Bosniak cysts and identified potential in its detection of malignant Bosniak cysts (AUC = 0.80). 23 Similarly, He et al. conducted a similar study using a fusion feature-based machine learning algorithm and demonstrated excellent diagnostic performance even with external validation (AUC = 0.93). 24 Although this is not explored in the present study the role of artificial intelligence as an alternative to classification systems is an area for further research.
The present study has a number of limitations. Although the findings are significant, the study suffered from a lower sample size, which was secondary to the inclusion of only resected/biopsied cysts. Significant findings may have been overlooked, such as increased malignancy rates in Bosniak II/IIF categories. Of course, such studies often suffer from the inability to predict true sensitivity and specificity. Histopathology is only available in a subgroup of cysts (e.g., those resected/biopsied) so true sensitivity/specificity cannot be reported. The use of multi-centre data may improve the sample size of similar studies that include resected cysts, and help compare the two classification systems. Finally, although the aim of the present study was specifically to compare CT characterisation, cysts characterised by MRI were not included in the paper.
In conclusion, the utilisation of v2019 increases the malignancy rate of Bosniak III/IV groups and reduces the number of benign cyst resections. Inter-observer agreement remains a challenge even in v2019.
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
