Abstract

Keywords
“Despite the availability of FDA-cleared reagents and clinically validated methods, there is still significant variability between laboratories, leading to reduced test accuracy and substantial patient misclassification and false-negative results”
Estrogen receptor (ER) and progesterone receptor (PgR) status determines the treatment options available to breast cancer patients. Since the measurement of these proteins was first described, there have been challenges to their overall accuracy. Even today, in the age of molecular tools and automation, there is still a surprisingly high rate of misclassification and, consequently, a negative impact on patient treatment and outcome. Here we examine this problem and suggest a solution.
Breast cancer is one of the most common cancers among women worldwide. Surgery, radiotherapy and sometimes chemotherapy are the optimal treatment options for patients diagnosed with this disease. In addition, adjuvant therapy is available to patients with tumors expressing the ER and/or the PgR. Approximately 70–75% of breast cancers express ER and/or PgR, thus they are eligible for endocrine therapy (also referred to as estrogen-receptor modulators), using drugs such as tamoxifen, aromatase inhibitors and gonadotropin-releasing hormone agonists [1]. While only a percentage of ER/PgR-positive patients will respond to therapy, ER/PgR-negative cases do not respond. Here we examine the current status of this companion diagnostic test.
The importance of ER as a molecular marker for the prognosis and response to therapy in breast cancer has been known since the late 1970s [2]. Many studies have been published since, demonstrating that ER and PgR are strongly predictive markers of response to endocrine therapy. The first methodology developed to assess the presence of ER (and subsequently of PgR) was a ligand-binding assay (LBA), which involved the competitive binding of a radiolabeled steroid to the receptor of interest. This assay reported the femtomoles of receptor protein per milligram of total cytosol protein (fmol/mg). While it represented a new paradigm for matching patients to therapy and was really the first companion diagnostic test in oncology, the main disadvantage was that the assay was labor intensive and difficult to scale. The use of antibodies in an enzyme-linked immunoassay was an improvement and Pasic et al. showed that the assay was equivalent to the LBA, paying special attention to the cut point and noting that a value of 15 fmol/mg of cytosol protein by the immunoassay was equivalent to 10 fmol/mg by the LBA [3]. However, in the 1980s highly specific monoclonal antibodies and immunohistochemical techniques became available and the cut point was lost in the move to a semiquantitative assessment of these receptors in the nuclei of the tumor cells [4]. The classic work by Harvey et al. in the Allred group demonstrated that immunohistochemistry (IHC) was equivalent or superior to LBA in predicting response to hormonal therapy [5]. Shortly thereafter, IHC became the standard clinical approach to breast cancer diagnostics via hormone receptor analysis. This work also defined both a cut point (Allred score of three or higher for positivity) and the weakness of subjective scoring; the data showed a bimodal distribution of a dataset of a type that was previously proven by the LBA to be unimodal [6]. Since then several assay formats using a variety of ER and PgR antibodies have been validated against clinical benefit from endocrine therapy [5,7,8]. Some have been cleared by the US FDA as class II in vitro diagnostics devices.
Despite the availability of FDA-cleared reagents and clinically validated methods, there is still significant variability between laboratories, leading to reduced test accuracy and substantial patient misclassification and false-negative results. Patients testing negative (even though their tumors really do express ER) do not receive adjuvant endocrine therapy and are thus denied the potential therapeutic benefits of one of the best breast cancer drugs. A comprehensive report from the UK National External Quality Assessment Service (NEQAS) revealed the results from an assay reproducibility study conducted in several European laboratories showing an overall agreement on the degree of expression of ER in shared samples of approximately 63.2% with the majority of the participating laboratories reporting ER expression levels lower than the central reference laboratory [9]. Data from a large international clinical trial comparing hormonal therapies in receptor-positive breast cancer showed a false-negative rate of approximately 69% for ER in patients that were enrolled in the trial as receptor-negative based on the result of the local laboratory report and that were subsequently retested by a central laboratory [10]. Finally, and much more distressing, was the experience in the Canadian maritime provinces where a 40% false-negative rate resulted in undertreatment and a number of deaths [11,12].
“Data from a large international clinical trial comparing hormonal therapies in receptor-positive breast cancer showed a false-negative rate of approximately 69% for ER in patients that were enrolled in the trial as receptor-negative based on the result of the local laboratory report and that were subsequently retested by a central laboratory.”
These alarming events prompted the American Society of Clinical Oncology (ASCO) and the College of the American Pathologists (CAP) to convene an international Expert Panel to develop guidelines and recommendations for assay validation and quality control [13]. This document addresses the basic requirements for hormone-receptor testing from the preanalytical phases of the tissue fixation and handling to the reporting of the final test results, indicating that ER and PgR status is to be determined on all invasive and recurring breast cancers. The Panel strongly recommended the inclusion of positive and negative controls in each staining batch along with controls displaying intermediate reactivity to ensure that the assay is maintaining appropriate sensitivity. They also changed the cutoff to distinguish ‘positive’ from ‘negative’ cases from >10% to ≥1% positive tumor cells. The intent of this change was to capture the low expressing cases that may still benefit from endocrine therapy. Previous papers [14,15] and a recent meta-analysis from the Early Breast Cancer Trialists' Collaborative Group (EBCTCG) with over 20,000 patients showed that “even in marginally ER-positive disease, the recurrence reduction was substantial” [16].
We believe the well-intentioned ASCO/CAP Panel may have missed the mark. The key variable that they were unable to address is intensity. While they accept ‘any intensity’ as positive, there was no attempt to define or standardize a threshold. Our recent work showed that it is the intensity threshold, not the percentage of cells positive, that is the key variable that leads to misclassification. We used AQUA® technology (HistoRx, Branford, CT, USA), based on quantitative immunofluorescence (QIF), to define the absolute amount of ER measurable in breast cancer cases. AQUA technology is an antibody-based method that provides objective and continuous protein expression scores for tissue by using standardized automated fluorescence microscopy and advanced image analysis algorithms. Analysis of ER using AQUA technology in cohorts of breast cancer patients results in a continuous unimodal distribution of expression scores [17]. Using a series of cell line standards and an index set of cases, we defined a reproducible and highly sensitive threshold for ER detection. When applied to two retrospective cohorts, we found misclassification (false negative) rates between 10 and 20% [18]. We suggest they are false negatives based on the outcomes of these patients that were similar to cases that were positive for both assays (QIF compared with IHC).
This paper raised two key issues that the field needs to address. The first is, “Does this matter to patients?” The proper way to address this question would be to begin a randomized, double-blinded, multicenter, prospective trial where patients who score positive by the AQUA test but negative by conventional IHC are randomized to endocrine therapy (tamoxifen). However, this trial is ethically impossible since any patient that tests ER positive by any test should be treated with endocrine therapy. Even if it were possible, it would take 10–20 years to yield a meaningful result. Instead we can look to historic data, as recently published by the EBCTCG [16]. Using a quantitative test (the LBA) the authors showed that even near the threshold of that test (10–19 fmol/mg cytosol protein) there is substantial benefit from endocrine therapy (relative risk = 0.67). Thus, we can conclude that those false-negative cases are important. Given that there are at least 160,000 new cases of invasive breast cancer per year in the USA and 25% of them are called ER negative, a 10–20% false-negative rate means between 4000 and 8000 women are undertreated every year in the USA alone.
The second issue is, “Is this happening in Clinical Laboratory Improvement Amendments (CLIA)-certified labs and if so, how do we fixit?” In studies not yet published but reported at the San Antonio Breast Cancer Symposium (SABCS), we have shown that the problem is caused by both variability between methods and between labs [19]. When we compared the QIF method to conventional 3,3'-diaminobenzidine-based IHC, we found a series of cases near the threshold that were positive by QIF but negative by IHC [18]. This was caused by very pale staining that was hidden by the hematoxylin counterstain. In fact, subsequent assessment of serial sections of these cases showed definitive very pale nuclear staining in a high percentage of nuclei. The second problem is variability between labs. In another work reported at USCAP, we looked at the thresholds from four different labs (three of which are CLIA certified) with different staining protocols, autostainers and antibodies [20]. Here the greatest discrepancy between laboratories was 30%. This indicates that the least sensitive lab had a 30% discordance rate compared with the most sensitive lab, and again the discordant cases were all proven by outcome data to be false negatives.
“Given that there are at least 160,000 new cases of invasive breast cancer per year in the USA and 25% of them are called ER negative, a 10–20% false-negative rate means between 4000 and 8000 women are undertreated every year in the USA alone.”
We believe the solution to the problem is to use a sensitive method combined with a standardized approach. Although it may be possible to increase the sensitivity of the 3,3'-diaminobenzidine assay or use less hematoxylin counterstain, we believe QIF solves both the sensitivity and the standardization issues. In our experience it is hard to absolutely standardize the precipitation reaction that leads to 3,3'-diaminobenzidine visualization. Rather, we recommend using QIF with a carefully constructed standardized index array to be run with each staining batch (in a manner similar to that described in Welsh et al. [18]). When the index array is measured and it correctly defines the cases and cell lines around the threshold, then one can have confidence that the rest of the cases in the batch will also have the same correct threshold. Although this does not correct for preanalytic variables, the ASCO/CAP guidelines did a great job at addressing that issue. This method, in conjunction with a highly standardized platform (<2% coefficient of variation [21]), can provide a solution to the problem of breast cancer misclassification. Employing this methodology could dramatically affect outcomes for thousands of women in the USA alone. It is our hope that bodies like the ASCO/CAP Panel will consider these data in future recommendations.
Footnotes
E Romeo and MD Gustavson are employees at HistoRx, Inc., sole licensee of the AQUA® Technology from DL Rimm's laboratory at Yale University; DL Rimm is founder, consultant and stockholder of HistoRx, Inc. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.
