Abstract
BACKGROUND:
Proliferation rate is a major determinant of the biologic behavior of the tumor and provides information that can be used to guide treatment decisions.
METHODS:
This ring study included 27 pathologists from 14 Institutions, in order to assess inter-observer concordance between pathologists in Croatia. We analyzed Ki-67 proliferative index on ten randomly selected breast cancer samples comparing consistency between visual assessment using light microscopy compared to digital image analyses results from one central laboratory as a referral value.
RESULTS:
When we analyzed Ki-67 as numeric value high concordance rate was found between Ki-67 score visually assessed in all participating Institutions compared to referral value assessed by digital image analysis (ICC 0.76, 95% CI 0.58–0.91), and Krippendorff’s alpha was 0.79 (95% CI 0.58–1.00). Concordance was better in slides with higher Ki-67 values. When we categorized Ki-67 values according to generally accepted 20% cut-off value we noticed the lower concordance rate among participants in our study.
CONCLUSION:
Proliferation remains one of the most important parameters for tumor characterization helpful in making clinical decisions, but it should be used with great caution. Standardization of the Ki-67 assessment is essential and proliferating index should be expressed as exact numeric value. For patients with proliferative index near the cut-off value, other factors must be considered in making clinical decisions.
Introduction
To guide therapeutic decisions in breast cancer patients oncologists use traditional clinicopathological parameters such as histological type, grade, tumour size, nodal status and biomarkers like estrogen receptors (ER), progesterone receptors (PR), human epidermal growth factor receptor 2 (HER 2) and proliferation index. Tumor cell proliferation may be assessed by variety of methods, including mitotic count, flow cytometric analysis for estimation of the fraction of cells in S-phase of the cell cycle and immunohistochemical determination of proliferation-associated antigens. Ki-67 is nuclear protein expressed in all active phases of the cycle (G1, S, G2, and mitosis) and absent in resting phase (G0), and it has attracted lot of attention in recent years as an important indicator for tumor cell proliferation [1]. According to recommendations of the “International Ki-67 in Breast Cancer Working Group”, proliferation index assessed by Ki-67 monoclonal antibody (MIB1 clone) is a low cost method, suitable for widespread use in clinical practice and the current assay of choice [2].
Proliferation rates estimated by the use of Ki-67 antigen provide information on biologic behavior and aggressiveness of the tumor, and also serve as a decision tool for or against chemotherapy in hormone receptor positive breast cancer, by subclassifying tumors into good prognosis luminal A subgroup and poorer prognosis luminal B subgroup [3–5].
In 2011 the group of experts published the recommendations for Ki-67 assessment in breast cancer, but inconsistency is observed in routine diagnostic even among some of the world’s most experienced laboratories, mostly in moderately differentiated breast cancer due to preanalytical and analytical variations [2,6,7].
Since Ki-67 biomarker is widely used as parameter for treatment decisions in breast cancer patients reliable results are essential and this ring study was conducted in order to assess inter-observer concordance between pathologists in Croatia. We analyzed Ki-67 proliferative index on ten breast cancer samples comparing consistency between visual assessment using light microscopy and digital image analyses results as a referral value.
Methods
This cross-sectional study was done during February and March 2016 in 14 Croatian Pathology Departments, with 27 pathologists included. The study was designed as evaluation of proliferation index assessed by IHC staining.
Ten breast cancer paraffin blocks were randomly selected from the archive of the Department of Pathology, Forensic Medicine and Cytology, University Hospital Centre Split (N = 10). The samples were re-cut and re-stained with the fully automated Benchmark staining system (Ventana Medical Systems) using clone MIB-1 (1:100, Dako). The immune-stained whole tissue sections were sent to all other Pathology Departments (University Hospital Centre Zagreb, Clinic for Tumors Zagreb, Clinical Hospital Dubrava Zagreb, University Hospital Center “Sestre Milosrdnice” Zagreb, Faculty of Medicine of the University of Rijeka, University Hospital Osijek, General Hospital Dubrovnik, General Hospital Vinkovci, General Hospital Slavonski Brod, General Hospital Šibenik, General Hospital Pula, General Hospital Zadar, General Hospital Varaždin) for interpretation in order to determine inter-observer concordance between pathologists from different Departments.
At the Department of Pathology University Hospital Centre Split Ki-67 proliferative index was determined by manual counting 1000 tumour cells using the Olympus Image Analyser (magnification 400×), at the hot spots and at the periphery of the invasive component, and was expressed as the percentage of tumor cells showing nuclear immunoreactivity among a total number of tumour cells in the counted area, according to recommendations of “International Ki-67 in Breast Cancer Working Group” [2]. Nuclear positivity was defined as any stain independently from the stain intensity. These values were considered as referral values.
At all other Institutions Ki-67 proliferative index was determined on the same whole tissue samples by the visual assessment using light microscopy (magnification, ×400) and also expressed as the percentage of cells showing nuclear immunoreactivity among 1000 invasive tumor cells at the “hot-spots” and at the periphery of the invasive component according to previously mentioned recommendations [2]. Nuclear positivity was defined as any stain independently from the stain intensity. The participating pathologists were blinded for the results of the referral laboratory.
Statistical data analysis
All 27 pathologists did the analysis of all ten preparations. To access the inter-observer concordance we calculated the percentage of agreement, intraclass correlation coefficients (ICC), Gwet’s AC, and Krippendorff’s alpha (𝛼). According to McGraw and Wong ICC convention we used a two-way random effects, absolute agreement, single rater model. According to Shrout and Fleiss convention we used ICC. To assesses the validity of visual assessment we calculated the mean of differences from the referent value assessed by digital image analysis and the difference as the percentage of the referent value. We calculated coefficients of variation by dividing the standard deviations with means. We set statistical significance at two-tails p < 0.05 and all confidence intervals at 95% level. We performed statistical data analysis using the R Core Team (2018) [8].

Absolute differences of quantitative visual assessments from the numeric referent values by the absolute magnitude of the referent value; dotted line represents locally estimated scatterplot smoothing (LOESS) curve with 80% span.
Ten randomly selected, centrally stained breast cancer samples representing the wide range of Ki-67 scores were used for Ki-67 assessment. All tumors were invasive carcinomas, NOS histological subtype. Study included 27 pathologists from 14 institutions (Table 1).
Sample of pathologists by institution, type of the institution and settlement size (n = 27)
Sample of pathologists by institution, type of the institution and settlement size (n = 27)
UHC = university hospital centre; UH = university hospital, CH = clinical hospital; GH = general hospital.
Mean value of Ki-67 of all ten preparations assessed by quantitative, digital image analyzer in Clinical Hospital Centre Split was 31. Mean (SD) Ki-67 visually assessed by 26 pathologists from other Departments was 33 (7.5). Total range of visually assessed Ki-67 was 29 (17–46) with coefficient of variation COV = 23%. Mean (SD) absolute deviation of visually assessed Ki-67 from referent value established by digital image analyzer in Clinical Hospital Centre Split was 6 (SD 4.3). This was a relative deviation of 20% (SD 13.8%) (Table 2). Overall percentage agreement between observers was 97%, ICC was 0.76 (95% CI 0.58–0.91), Gwet’s AC was 0.84 (95% CI 0.74–0.93), and Krippendorff’s alpha 0.79 (95% CI 0.58–1.00).
Ki-67 assessment by particular pathologists and differences from the referent median value for all 10 preparations
Ki-67 assessment by particular pathologists and differences from the referent median value for all 10 preparations
Data are sorted by the difference relative to the numeric referent value. Abbreviations: Δ = mean of differences from the referent value; Δ% = mean of differences as the percentage of the referent value; 𝛼 = Krippendorff’s Alpha concordance with the referent values; UHC = university hospital centre; UH = university hospital; CH = clinical hospital; GH = general hospital. ∗Percentage points.
In the University Hospital Centre Split 70% of tumors were recognized as having Ki-67 ≥ 20%. Mean (SD) percentage of samples from other departments with Ki-67 ≥ 20% was 65% (17.3%). Mean (SD) absolute deviation of visually assessed Ki-67 from referent value was 15 (SD 9.5). This was a relative deviation of 21% (SD 13.7%) (Table 2). Total range of visually assessed prevalence of Ki-67 ≥ 20% was 60% (40%–100%). Overall percentage agreement between observers was 71%, Gwet’s AC was 0.47 (95% CI 0.10–0.83), and Krippendorff’s alpha 0.79 (95% CI 0.08–0.65).
Accuracy by the absolute magnitude of Ki-67
Accuracy of quantitative assessment was better in slides with higher Ki-67 values (Table 3). Percentage of difference between the visual quantitative assessment and the referent values assessed by digital image analysis, decreased by the increase of absolute magnitude of the Ki-67 true value (Figure 1). In the three slides with Ki-67 < 20%, the agreement between 26 pathologists was weak. For the numeric Ki-67 values, ICC was 0.23 (95% CI 0.06–0.93), Gwet’s AC was 0.45 (95% CI −0.10–0.99), Krippendorff’s alpha was 0.16 (95% CI −0.31–0.63). In seven slides with Ki-67 ≥ 20%, the agreement was good: ICC was 0.77 (95% CI 0.54–0.95); Gwet’s AC was 0.79 (95% CI −0.66–0.93), Krippendorff’s alpha was 0.74 (95% CI −0.46–1.00).
Ki-67 assessment of particular preparations and differences from the referent value
Ki-67 assessment of particular preparations and differences from the referent value
Preparations are sorted in ascending order by the referent value. Abbreviations: Δ = mean of differences from the referent value; SD = standard deviation of differences from the referent value; COV = coefficient of variation; Δ% difference as the percentage of referent value.
As expected, in the qualitative assessment of categorized Ki-67 according to ≥20% cut off, the percentage of accurate assessments was better as the value of Ki-67 was further from the 20% cut off (Table 3, Figure 2). The assessments were of very low accuracy in samples with Ki-67values between 11% and 24%. For the categorized Ki-67 values according to ≥20% cut off, in three slides with Ki-67 < 20% the agreement was very low. Overall percentage agreement between observers was 63%, Gwet’s AC was 0.39 (95% CI −0.79–1.00), and Krippendorff’s alpha was 0.07 (95% CI −0.23–0.38). In seven preparations with Ki-67 ≥ 20%, the agreement of visual assessments was somewhat better, but still pure. Overall percentage agreement between observers was 74%, Gwet’s AC was 0.63 (95% CI 0.21–1.00), and Krippendorff’s alpha was 0.18 (95% CI 0.01–0.34).

Percentage of correctly qualitatively assessed ten preparations by the absolute magnitude of their referent values; dotted line represent locally estimated scatterplot smoothing (LOESS) curve with 80% span, (n = 26 raters).
Ki-67 proliferation index has been widely used as prognostic and predictive marker in breast cancer. Several studies have shown that high proliferative index in breast cancer is associated with the poorer outcome [3,9]. Also, Ki-67 proliferation index improved the prediction of treatment response in a group of breast cancer patients receiving neoadjuvant treatment [10,11].
Aiming for better analysis, reporting, and the use of Ki-67, the “International Ki-67 in Breast Cancer Working Group” has published the recommendations for the assessment of Ki-67 in breast cancer [2]. Despite these guidelines, several studies have reported failure in reproducibility due to preanalytical and analytical variations, especially in moderately differentiated breast carcinomas for which misinterpretation of the Ki-67 proliferating index may result in patients over or undertreatment [6,7,12].
This ring study was conducted in order to assess inter-observer concordance between pathologists from different Institutions in Croatia. We analyzed Ki-67 proliferative index on ten randomly selected breast cancer samples comparing consistency between visual assessment using light microscopy compared to digital image analyses results from one central laboratory as a referral value. It is important to state that in our study all pre-analytical variations were excluded, since all stains were performed in one central laboratory, which significantly contributed to high inter-observer concordance rate in our study. Substantial variability in Ki-67 scoring was observed among some of the world’s most experienced laboratories even when preanalytical variations were excluded by scoring centrally stained slides due to inconsistency in scoring system [13].
Automated digital image analysis is considered superior method for determination of Ki-67 proliferation index especially in moderately differentiated breast carcinomas, but due to its high cost, considerable number of pathological laboratories in our and other countries for evaluation use visual assessment.
When we analyzed Ki-67 as numeric value high concordance rate was found between Ki-67 score visually assessed in all participating Institutions compared to referral value assessed by digital image analysis in Clinical Hospital Centre Split.
Mean value of Ki-67 of all ten preparations assessed by quantitative, digital image analyzer in Clinical Hospital Centre Split was 31. Mean (SD) Ki-67 visually assessed by 26 pathologists from other Departments was 33 (7.5). Total range of visually assessed Ki-67 was 29 (17–46) with coefficient of variation COV = 23%. Mean (SD) absolute deviation of visually assessed Ki-67 from referent value established by digital image analyzer in Clinical Hospital Centre Split was 6 (SD 4.3). This was a relative deviation of 20% (SD 13.8%) (Table 2). Overall percentage agreement between observers was 97%, ICC was 0.76 (95% CI 0.58–0.91), Gwet’s AC was 0.84 (95% CI 0.74–0.93), and Krippendorff’s alpha 0.79 (95% CI 0.58–1.00).
These results are in concordance with other studies suggesting that visual assessment and digital image analysis could both be used to assess Ki-67 proliferating index in clinical practice [12,14].
It is important to state that all participants in our study used the same scoring criteria based on the recommendations of the “International Ki-67 in Breast Cancer Working Group” [2]. Proliferative index was determined on the whole tissue samples and calculated as the percentage of cells showing nuclear immunoreactivity among 1000 invasive tumor cells. The selected areas included “hot-spots” and the periphery of the invasive component based on the assumption that regions of increased proliferation are biologically active and most relevant for prognosis [2,15]. Nuclear positivity was defined as any stain independently from the stain intensity.
According to study by Mikami et al., the concordance among pathologists was also higher when the assessed field was predetermined, indicating that the selection of the evaluation area is critical for obtaining reproducible Ki-67 proliferative index in breast cancer [16].
In the second international study by Polley et al., in which centrally stained tissue microarray cases were evaluated according to defined scoring instructions, the inter-laboratory reproducibility was higher compared to the first study in which laboratories used their own scoring methods on centrally stained cases [7,13].
In 2011, at St Gallen International Breast Cancer Conference, Ki-67 proliferation index was introduced as important tool for subclassifying hormone receptor positive breast cancer into Luminal A and Luminal B subgroup, in order to select patients who may benefit from adjuvant chemotherapy [4]. The original recommendation for Ki-67 cut-off value was set at 14%, according to results assessed by gene expression profiling proposed by Cheang et al. [4,17].
Since then immunohistochemical criteria for subclassification of two luminal groups have been changing. In 2013, the cut-off value for Ki-67 was set at 20% [5]. The Panel noted that standardized cut-offs for Ki-67 have not been established, but the majority of the Panel voted that a threshold of ≥20% was clearly indicative of ‘high’ Ki-67 status. They also added value of PR in distinguishing between Luminal A and Luminal B based on the research by Prat et al. who indicated that PR ≤ 20% correspond to Luminal B subtype [5,18].
However, at St Gallen Conference in 2015, the majority of the panelist voted that the minimum value of Ki-67 required for subclassifying Luminal B tumors should be at least 20–29%, and that 10% or less are clearly low and can be used for subclassifing tumors in Luminal A group, but there are still uncertainty with type assignment and therapeutic options for tumors with intermediate Ki-67 levels [19].
When we categorized Ki-67 values according to generally accepted 20% cut-off value we noticed the lower concordance rate among participants in our study.
In the University Hospital Centre Split 70% of tumors were recognized as having Ki-67 ≥ 20%. Mean (SD) percentage of samples from other departments with Ki-67 ≥ 20% was 65% (17.3%).
As expected, in the qualitative assessment of categorized Ki-67 according to ≥20% cut off, the percentage of accurate assessments was better as the value of Ki-67 was further from the 20% cut off, and the assessments were of very low accuracy in samples with Ki-67 values between 11% and 24%. In this region the percentage of accurately classified Ki-67 as being bellow or above the ≥20% cut off was not significantly different from guessing. For the categorized Ki-67 values according to ≥20% cut off, in three slides with Ki-67 < 20% the agreement was very low. In seven preparations with Ki-67 ≥ 20%, the agreement of visual assessments was somewhat better, but still pure.
The choice of the cut-off point has a major impact in practice, as it determines which patients will receive more aggressive therapy. Even if a common cut-off point for Ki-67 is once agreed, transforming continuous variable, like the Ki-67 index, into two categories is not reliable enough for making definite clinical decisions. According to Royston et al. “the model with binary variable is less reliable than continuous values”, because it suggests that tumors with Ki-67 levels close either to the higher or lower side of the cut-off point are very different, whereas in reality their biologic behavior can be quite similar” [20].
Also, the cut-off used for the differentiation of Luminal tumors might have limiting eligibility for other tumour groups, since baseline Ki-67 values for Triple negative and HER2 positive tumors are much higher [21–24]. Therefore, its clinical utility as a prognostic marker might be more apparent if it was considered within more narrowly defined tumor subgroups.
Conclusion
The assessment of proliferation remains one of the most important parameters for tumor characterization. Since it has been used in making clinical decisions, the standardization of the Ki-67 assessment is considered essential, but results should be used with great caution. The Ki-67 proliferating index should be expressed as exact numeric value, and for patients who have the results near the cut-off value, other factors must be considered in making decisions about adjuvant therapy such as histological grade, nodal status, tumor size, results from multi- gene-expression assays, patient’s comorbidity and preferences.
