Abstract
Analysis of formalin-fixed paraffin-embedded (FFPE) tissue by immunohistochemistry (IHC) is commonplace in clinical and research laboratories. However, reports suggest that IHC results can be compromised by biospecimen preanalytical factors. The National Cancer Institute’s Biospecimen Preanalytical Variables Program conducted a systematic study to examine the potential effects of delay to fixation (DTF) and time in fixative (TIF) on IHC using 24 cancer biomarkers. Differences in IHC staining, relative to controls with a DTF of 1 hr, were observed in FFPE kidney tumor specimens after a DTF of ≥2 hr. Reductions in H-score and/or staining intensity were observed for c-MET, p53, PAX2, PAX8, pAKT, and survivin, whereas increases were observed for RCC1, EGFR, and CD10. Prolonged TIF of 72 hr resulted in significantly reduced H-scores of CD44 and c-Met in kidney tumor specimens, compared with controls with 12-hr TIF. An elevated probability of altered staining intensity due to DTF was observed for nine antigens, whereas for prolonged TIF an elevated probability was observed for one antigen. Results reported here and elsewhere across tumor types and antigens support limiting DTF to ≤1 hr when possible and fixing tissues in formalin for 12–24 hr to avoid confounding effects of these preanalytical factors on IHC.
Keywords
Introduction
Developed decades ago, formalin fixation and paraffin embedding remains a routine method of tissue preservation for histological and immunohistochemical characterization in both clinical and research settings.1,2 Immunohistochemistry (IHC) allows both antigen localization and semiquantitative analysis, thereby preserving the context of expression within the larger morphological landscape of the tissue or tumor. 3 Because of these benefits, IHC is commonly used for immunophenotypic subtype diagnosis, pathogen detection, characterization of cellular infiltrates, and assessing cancer progression. 4
Although IHC is well studied and accepted as a reliable technique in both clinical and research laboratories, the extent and intensity of immunostaining can be adversely affected by suboptimal specimen fixation and processing. 5 Formalin fixation and paraffin embedding is a multistep process, and there are differences in reagents, temperatures, durations, and stages among studies and across institutions and laboratories. A review by the Biorepositories and Biospecimen Research Branch (BBRB) at the National Cancer Institute (NCI) identified 15 preanalytical factors with reported effects on immunohistochemical staining from the literature. Two preanalytical factors, delay to fixation (DTF) and duration of formalin fixation, also referred to as time in fixative (TIF), elicited significant changes in the immunostaining of clinical biomarkers.6–8 The earliest alteration in IHC results associated with DTF was reported for progesterone receptor (PR) and estrogen receptor (ER) expression in breast cancer tissue. 9 Furthermore, formalin fixation over the weekend (48–72 hr) as opposed to a weekday (24 hr) was also associated with a significantly higher frequency of negative ER and PR results, 6 indicating the importance of TIF in analyzing IHC results.
Guidelines developed by several organizations for breast,10–13 lung, 14 and colorectal 15 cancer specimens recommend limiting DTF to no longer than 1 hr and applying a TIF between 6 and 72 hr. However, these recommendations are primarily based on studies conducted in breast tumors; there is limited evidence on whether DTF and TIF thresholds observed in one tissue type can be applied to another.
A review of studies evaluating DTF or TIF sensitivity in different tissue types other than breast suggested that optimal and acceptable DTF and TIF durations may depend on both the tissue type and biomarker evaluated by IHC. A study on lung tumors reported a progressive reduction in the percentage of PD-L1 immunopositive tumor cells with increasing DTF, beginning after 1 hr at room temperature. 16 However, several colorectal biomarkers, such as HIF-1α, GLUT-1, Ki-67, and CDX, were not affected by a DTF of 60–180 min, 17 although levels of several phosphorylated proteins, including phosphorylated epidermal growth factor receptor (pEGFR), significantly declined.18–20 In ovarian tumor specimens, 6–24% of phosphorylated sites examined were affected by a DTF of up to 1 hr. 21
Evidence of TIF-mediated effects on IHC beyond the breast tumor model is also limited, and optimal fixation durations may vary among tissue types and biomarkers evaluated. Experiments on tonsils revealed optimal immunostaining with a TIF of 12 hr 22 or 24–36 hr 23 compared with shorter or longer durations. However, equivalent immunostaining of the antigens evaluated was reported with fixation times of 24–72 hr in lung 24 and colorectal 25 specimens.
Meta-analysis of studies investigating the DTF and TIF effects can be difficult due to differences ranging from study design to details of fixation and paraffin processing. Although clear differences in tissue types and timepoints investigated are obvious hurdles to such comparisons, more nuanced differences can also confound meta-analysis. Variations in specimen size; fixation and processing reagents, conditions, and durations; and the storage duration of blocks and/or slides before analysis have documented effects on IHC. 5 Furthermore, immunohistochemical staining of biomarkers in breast tumors is influenced by differences in the source of antibodies,9,26 formalin-fixed paraffin-embedded (FFPE) block age, 27 storage conditions, 28 and storage duration of slide-mounted sections.29,30 Preanalytical factors can also be interdependent. Although formalin fixation is a chemical clock reaction, additional time may be required for large specimens due to the rate at which formalin penetrates tissue. Minimization of confounding variables, although difficult, is essential, as is thorough reporting of elements relating to biospecimen handling. 31
The NCI BBRB initiated the Biospecimen Preanalytical Variables (BPV) Program in 2010 to better understand the impact of discrete preanalytical factors on molecular and proteomic analyses in cancer research. We report effects of DTF and TIF on immunohistochemical staining of a set of existing and promising cancer biomarkers in up to four different tumor/tissue types investigated under the BPV Program (Supplementary Table 1, Fig. 1). The biomarkers included those involved in diagnosis, prognosis, and treatment decisions. A subset of tissue types and biomarkers was also used to explore tissue-specific differences between DTF and TIF for a given biomarker and to evaluate potential ramifications of plausible FFPE block storage for 1 year before analysis. Results presented herein, in concert with those previously reported by the BPV Program,32–35 will provide a more comprehensive understanding of the tissue-wide effects of DTF and TIF as well as fit-for-purpose guidance.
Materials and Methods
Collection and Processing of Tumor Tissues
For the BPV Program, solid tumor biospecimens were collected and processed at four medical centers (Biospecimen Source Sites [BSS]) under a single set of standard operating procedures (SOPs) (https://biospecimens.cancer.gov/programs/bpv/bpv_sops.asp). Solid tumor specimens from 111 cases of renal cell carcinoma (kidney), 28 cases of ovarian carcinoma (ovary), 8 cases of lung adenocarcinoma and squamous cell carcinoma (lung), and 29 cases of colorectal adenocarcinoma (colon) were collected using institutional review board–approved protocols for human biospecimen collection for research purposes in accordance with the Declaration of Helsinki 1975, as revised in 1983 (Emory University IRB00045796 [approved March 2, 2013]; University of New Mexico IRB00000591 [approved June 28, 2012]; University of Pittsburgh IRB0106147 [approved May 28, 2014], IRB0411047 [approved July 18, 2014], IRB09502110, IRB0506140 [approved May 28, 2014], and IRB056140 [approved June 9,2014]; and Boston Medical Center IRB00000376 [approved February 5, 2014]). Patients who donated tissues for the study gave their explicit informed consent.
A tissue segment ≥1.0 cm3 was dissected from each surgically resected tumor and then divided into six similar-sized tissue pieces to yield a snap-frozen tissue control, an FFPE quality control (QC), and four experimental tissue segments processed at various DTF or TIF timepoints (Supplementary Fig. 1).34,35 DTF (i.e., cold ischemia time) was defined as the time that elapsed between surgical excision and placement of the tissue in formalin; specimen transport, grossing, collection of clinical specimens, and dissection of tissue segments for DTF and TIF experimental modules occurred during the DTF. A common remote data entry system was used at all participating medical centers to record all data, including times for DTF and TIF.
The snap-frozen tissue segment served as a gold standard reference for DNA and RNA analyses. The FFPE QC tissue segment was used to confirm conformance to histological quality criteria and was processed with a DTF of 1 hr and a TIF of 23 hr due to the small size (0.33 cm thick) of each tissue segment. Experimental DTF tissue segments included kidney, colon, ovarian, and lung tumor specimens that were placed in a humidified chamber at room temperature for 1, 2, 3, or 12 hr and then fixed in 10% neutral buffered formalin (NBF) for 23 hr (Fig. 1A). Additional details relating to DTF during BPV experiments have been previously reported. 34 Experimental TIF tissue segments were limited to kidney tumor specimens that had a DTF of 1 hr before being fixed in 50 ml of 10% NBF for 6, 12, 23, or 72 hr (Fig. 1B). Each BSS tumor specimen was fixed and processed on site at the participating medical institution using a dedicated FFPE tissue processor (Peloris II; Leica Biosystems, Inc., Buffalo Grove, IL) and BPV-specified reagents and protocols. 34 FFPE blocks were shipped to the project’s Comprehensive Biospecimen Resource at the Van Andel Research Institute (VARI), where specimens were cataloged, processed, and stored. Additional details on the patient consent process, infrastructure, workflow, eligibility criteria, and specimen processing in the BPV Program have been reported 34 and can also be found online (https://biospecimens.cancer.gov/programs/bpv/bpv_page2.asp).

Tumor tissue processing and fixation schemes for the Biospecimen Preanalytical Variables (BPV) Research Program. Tumor from each case was divided into six equally sized pieces to yield segments for snap-freezing (frozen), a formalin-fixed paraffin-embedded (FFPE) quality control, and four experimental segments. Analysis was conducted in two phases that allowed the addition of a tumor type and/or additional biomarkers and a comparison of immunopositive staining in case-matched specimens after 1 year of FFPE block storage. (A) Delay to fixation (DTF) was investigated in kidney, colon, lung, and ovarian tumor specimens at a DTF of 1 (image analysis control), 2, 3, or 12 hr at room temperature. (B) Time in fixative (TIF) was investigated in kidney tumor specimens fixed in 10% neutral buffered formalin for 6, 12 (image analysis control), 23, or 72 hr before final processing.
Immunohistochemistry
FFPE tissue sections (5 µm) were prepared at VARI from each tissue segment, mounted onto positively charged glass microscope slides, and shipped to a Clinical Laboratory Improvement Amendments (CLIA)/College of American Pathologists (CAP)–accredited pathology clinical laboratory for IHC analysis. A total of 33 clinically relevant antigenic targets (Table 1) were analyzed using IHC, with approximately 5500 slides generated from 720 tumor segments that reflected a total of 179 cases. Target antigens bound by specific antibodies during the IHC procedure generally localized to one of three cell regions: nucleus, cytoplasm, or cell membrane. As per existing clinical protocols for the tumor type evaluated, IHC assays were performed with the Dako Autostainer Plus Staining System (Santa Clara, CA), BOND Advanced Staining Solutions (Leica Biosystems, Inc.), or the Ventana BenchMark Ultra iSC/ISH staining system (F. Hoffman-La Roche, Ltd.; Basil, Switzerland).
Details of Immunohistochemistry Parameters and Conditions.
Abbreviations: Ab, antibody; ER2, epitope retrieval solution 2; CC1, cell conditioning 1; HRP, horseradish peroxidase.
Immunohistochemical staining was performed in two phases that were approximately 1 year apart to (1) increase the number of antigens and antibodies evaluated, (2) attempt to replicate a subset of results, and (3) accommodate the addition of lung specimens (Fig. 1). Details that include the specific antibodies and dilutions used, antigen retrieval methods, IHC parameters, and positive and negative controls for each antigen are provided in Table 1 and Supplementary Tables 2 and 3. Each IHC run was performed with positive and negative control slides. Endogenous peroxidase activity was quenched with 3% hydrogen peroxide, and an automated Tissue-Tek Film Coverslipper was used with xylene-activated adhesive-backed film (Sakura; Torrance, CA).
All slides were manually scored by one surgical pathologist who was blinded to the experimental DTF or TIF protocol of individual tissue segments. Immunohistochemical staining was scored based on both the percentage of positively stained tumor cells (0–100%) and staining intensity (0, 1, 2, 3; Table 2). IHC slides were then digitally imaged using the Aperio Scan Scope XT imaging system (Leica Biosystems, Inc.).
Immunohistochemistry Scoring Criteria for Intensity by Pathologist Visual Acuity.
Image Analysis
Image analysis using Aperio was conducted to generate a set of analytical data complementary to that from the pathologist scoring described above. Image analysis of each IHC slide was limited to a region of interest (ROI), a relevant contiguous tumor area defined using the Aperio software outline tool. A subregion within the ROI was designated manually as a region of exclusion to omit stained areas from software-generated scoring that contained necrotic material, folded tissue layers, tissue regions detached from the slide, or other artifacts (e.g., dense smudges, foreign material). White regions or spaces were automatically excluded from the algorithm analysis. Saved annotated images were evaluated using the image analysis algorithm of interest (nuclear, cytoplasmic, or membranous staining) to capture the degree of staining for each antigen–antibody interaction.
Aperio Image Analysis Toolbox algorithms (Leica Biosystems, Inc.) were used to analyze and score immunopositive staining for each tumor segment. BPV tumor segments considered to be “optimally” fixed (<1 hr DTF, 12 hr TIF) were used to calibrate, train, and validate software-generated scores for each tumor type and antigen–antibody combination. A normalized optical density value for red/green/blue was selected for each IHC stain using a color deconvolution algorithm for each antibody. Optical density values for each antibody could vary depending on possible combinations of tumor type, antigen–antibody pattern, chromogen, and IHC platform. “Training slides” were a subset (5.5% of the total slides) of all the slides manually scored by the designated study pathologist for staining intensity (1, 2, 3) and the percentage of tumor cells with positive immunostaining (0–100%) to identify threshold values for software-generated scoring. Thresholds for intensity scores, which also reflect adjusted hematoxylin and chromogen color calibration, were specific to each tumor type and targeted antigen (Table 3). Training sets included up to three cases for each intensity score (1, 2, 3) per tumor type and antigen. A second subset of optimally fixed tumor segments was used to validate the software-generated score using the pathologist’s manual score.
Antibody-Specific Image Analysis Light Value Thresholds for Immunohistochemical Staining for Each Tumor Type.
H-scores were calculated by multiplying the percentage of positive staining (be it tumor area within the ROI or positively stained nuclei) by the intensity of the scored stain provided by the Aperio algorithm (expressed as an integer; Table 3). Membranous and cytoplasmic stains generated higher overall H-scores than did nuclear stains due to a larger potential area for positive staining.
Experimental comparisons were drawn between the software-generated scores of optimally fixed and processed tumor segments and those with different DTF and TIF timepoints. Tumor segments used in both the training and validation sets were included in the final analysis.
Statistical Analysis
Immunohistochemical staining scores for each antigen were analyzed by tumor type and the preanalytical factors (DTF, TIF) investigated. Notably, 0–3 intensity scores generated by the Aperio software were captured from total pixels per cell to estimate cytoplasmic areas; in these cases, values were estimated using the results of other digital algorithms for a respective tissue in a random forest regression (covariates were percent 1, percent 2, and percent 3 cells), using R (v. 3.6.0) and the random Forest package. 36 All statistical tests were two-sided, and significance (false discovery rate [FDR]/p value) was set at <0.05 after Benjamini–Hochberg FDR adjustment for multiple testing, and 95% false coverage intervals are reported. 37
Linear mixed-effects models with random effects for patient ID nested within BSS were used to compare mean and percent differences [100 × (1 − delayed H-score/1-hr fixation H-score)] in H-scores (0–300) between reference tumor segments (DTF of 1 hr, TIF of 12 hr) and experimental DTF and TIF timepoints using the R package lme4. 38 These models were used for data collected during both phases I and II. Logistic mixed-effects models with random effects for patient ID nested within BSS were used to model the probability of the intensity score (0–3) increasing or decreasing independently in specimens from DTF and TIF experiments relative to the reference timepoint (1 hr for DTF, 12 hr for TIF) using the R package lme4. 38 Results from these models are reported as predicted probabilities with 95% false coverage prediction intervals. A subset of tissues and antigens was replicated to confirm that findings were consistent between phases I and II.
Rationale for Statistical Analysis Approach
We tested the effects of DTF on antigen expression detected by IHC using linear and logistic mixed-effects models focusing on two metrics: (1) percentage change in H-scores with increasing DTF or TIF, and (2) probability of an increase or decrease in IHC score intensity with progressive DTF or various TIFs.
Taking both metrics under consideration permits extrapolation of significant differences between DTF and TIF timepoints, and the likelihood of changes in immunostaining intensity due to DTF and TIF. These inferences are crucial, given that for many antigens evaluated in this study there is no standardized immunostaining scoring scheme or a widely accepted threshold for overexpression.
Results
Delay to Fixation
For the 24 antigens evaluated in four tissues (111 kidney, 29 colon, 28 ovarian, and 8 lung tissue specimens) at 2-, 3-, and 12-hr DTF timepoints, there was significant variability in percentage change in H-scores relative to 1-hr DTF controls and the probability of different intensity scores with progressive DTF. Percentage change in H-scores is defined as [100 × (1 − delayed H-score/1-hr fixation H-score)]. Results from statistical analysis, including p values, are provided in Table 4 and Supplementary Tables 4 and 5. We report below the tested antigens in groups based on potential clinical relevance of the antigen and/or statistical significance of the results.
Summary of Percentage Change in H-scores and Probability of Altered Staining Intensity With DTF.
Abbreviations: DTF, delay to fixation; FDR, false discovery rate.
Antigens Affected by DTF With Potential for Clinical Impact
PAX8 (Kidney and Ovary), PAX2 (Kidney), and WT1 (Ovary)
PAX8 is clinically used to differentiate primary lung tumor from metastatic renal cell cancer (PAX8-positive) in cases of occult primary tumors and also to differentiate metastatic breast cancer (PAX8-negative) from metastatic ovarian cancer (PAX8-positive).39–41 We tested the effect of DTF on PAX8 expression in the kidney and ovary. In kidney, PAX8 had a 51% and 93% decrease in H-score after a DTF of 3 and 12 hr, respectively (p=0.001) (Fig. 2A, Table 4) observed with progressive DTF. These findings translate to an elevated probability of a decline in the PAX8 intensity score in kidney specimens with a DTF of 2, 3, and 12 hr (24%, 39%, and 79% probability of a reduction in intensity score, respectively). Whereas a similar expression pattern was observed for PAX8 in the ovary, percentage change in the PAX8 H-score relative to 1-hr DTF controls was not significant for 2-, 3-, and 12-hr DTF timepoints (Fig. 2A, Table 4). However, reduction in the percentage of 3+ stained cells over the DTF time course translated to a 40% probability of a decline in PAX8 immunostaining intensity for ovary specimens having a 3- or 12-hr DTF. PAX2, which is structurally similar and has a similar clinical utility as PAX8, 41 also displayed a significant decrease in percentage change in H-score (p<0.0001), and the elevated probability of a decrease in immunostaining intensity in 2-, 3-, and 12-hr DTF kidney specimens in the percentage of 2+ cells (Fig. 2B, Table 4). WT1 is a common tumor suppressor gene that is frequently mutated in Wilms’ tumor and is also an established potent transcription factor (TF). 42 Clinical research suggests that WT1 may be used to distinguish metastatic ovarian tumors (WT1-positive) from breast cancer (WT1-negative). 43 Although, the percentage change in H-score decreased over the DTF time course, these changes were not statistically significant. However, a shift in the percentage of 3+ cells to 2+ cells with progressive DTF resulted in a consistent elevated probability of a decrease in WT1 intensity score with a DTF of ≥2 hr (Fig. 2C, Table 4). Findings were consistent for PAX8 (kidney, ovary) and WT1 (ovary) in replicate phase II experiments (Supplementary Fig. 2, Supplementary Table 4). Our results suggest that DTF affects intensity score (Fig. 2A for kidney [panel iv] and ovary [panel iv]), and absolute mean change in H-scores (Fig. 2A for kidney [panel i] and ovary [panel i]) and is similar for PAX8 across the tissue types investigated and that of structurally similar antigens, such as PAX2, may be similarly affected by DTF.

(continued) Antigens affected by delay to fixation (DTF) with potential for a clinical impact. (A) PAX8 (kidney and ovary), (B) PAX2 (kidney), (C) WT1 (ovary), (D) p53 (kidney, ovary), and (E) ER (ovary). Panels: (i) Immunohistochemistry scores (H-scores, 0–300) from tumor specimens with a DTF of 1, 2, 3, or 12 hr at room temperature were plotted by antigen. Dots represent the mean H-score of individual samples. (ii) Percentage change in H-scores: the bold dot represents the expected percentage change in H-scores; error bars represent the 95% false coverage interval; the gray dotted line depicts no change; red depicts false discovery rate (FDR)-adjusted p<0.05. (iii) Proportion of cells that stained 1+, 2+, or 3+ in a respective tissue by antigen. (iv) Probability of a change (increase [red] or decrease [blue]) in immunostaining intensity (0–3+) relative to the ≤1-hr DTF control of tumor specimens is depicted by a dot for each biomarker and DTF timepoint; the 95% false coverage interval is shown with error bars.
p53 (Kidney and Ovary)
Immunohistochemistry detection of p53 expression has widespread clinical application, given that TP53 mutations are the most commonly detected genetic abnormalities in human tumors. 44 Whereas wild-type p53 has a short half-life of 20 min and is often not detectable by IHC, 45 mutant p53 has a longer half-life and is detectable by IHC. We investigated the sensitivity of p53 immunostaining to DTF in ovary and kidney specimens. In the kidney, significant reductions in percent H-scores were observed after a DTF of 3 and 12 hr, respectively, relative to 1-hr DTF controls (p=0.0147). A shift in the percentage of 2+ to 1+ cells conferred a 21% probability of a decline in p53 staining intensity in 12-hr DTF kidney specimens (Fig. 2D, Table 4). In the ovary, effects of DTF on p53 immunostaining were inconclusive. A DTF of up to 12 hr did not significantly alter p53 immunostaining in phase I experiments, but phase II experiments displayed a significant percentage reduction in H-score at 12-hr DTF (p=0.0014) relative to that of 1-hr DTF controls (Supplementary Fig. 2, Supplementary Table 5). However, the conflicting probability of both an increase and decrease in staining intensity suggests that p53 immunostaining is variable in these ovary specimens.
ER (Ovary)
ER, an estrogen-activated nuclear TF that regulates growth and differentiation, 46 has been extensively studied in breast cancer as a key biomarker for hormonally responsive tumors. ERα expression has been associated with better prognosis in epithelial ovarian cancers in some retrospective meta-analyses. 47 Therefore, we investigated the potential effects of DTF on ER immunostaining in ovarian tumors. Although the maximum percentage change in H-score was a 8.6% (12 hr) decrease, a complete loss of 3+ staining in 3- and 12-hr DTF specimens conferred an elevated probability of a decline in ER intensity score in ovary specimens (40% and 30%, respectively) (Fig. 2E, Table 4). IHC staining for ER of ovarian specimens during phase II showed similar results (Supplementary Fig. 2, Supplementary Table 5).
Antigens With Statistically Significant DTF Effects but Unknown Clinical Impact
c-Met (Lung and Kidney)
c-Met is the cell surface receptor for the hepatocyte growth factor (HGF), 48 and dysregulation of the c-MET/HGF axis has been implicated in a wide range of cancers, including bladder, renal, thyroid, and lung cancers. c-Met is overexpressed in approximately 40–60% of lung cancer patients, which is much higher than the incidence of genomic amplification, 49 suggesting a mechanism of c-Met overexpression independent of genomic amplification or gain-of-function mutations. Although not used routinely in the clinic, meta-analyses of several studies reveal that high c-Met expression can be associated with poor overall survival in renal cell cancer. 50 In this study, c-Met exhibited a progressively decreasing percentage change in H-scores with DTF in both the kidney and the lung, although statistically significant differences relative to 1-hr DTF controls were limited to 12-hr DTF kidney specimens (p=0.048) (Fig. 3A, Table 4). Both kidney and lung specimens had reductions in the percentage of 3+ stained cells during the DTF time course, resulting in an elevated probability of a decrease in the c-Met intensity score with a DTF of 2 hr or longer (kidney, 29–38%; lung, 13–25%) (Fig. 3A, Table 4).

(continued) Antigens with significant delay to fixation (DTF) with uncertain clinical impact. (A) c-Met (kidney and lung), (B) pAKT (kidney), (C) survivin (kidney), (D) RCC1 (kidney), (E) CD10 (kidney), and (F) EGFR (kidney, colon, lung). Panels: (i) Immunohistochemistry scores (H-scores, 0–300) from tumor specimens with a DTF of 1, 2, 3, or 12 hr at room temperature were plotted by antigen. Dots represent the mean H-score of individual samples connected by bold lines. (ii) Percentage change in H-scores: the bold dot represents the expected percentage change in H-scores; error bars represent the 95% false coverage interval; the gray dotted line depicts no change; red depicts false discovery rate (FDR)-adjusted p<0.05. (iii) Proportion of cells that stained 1+, 2+, or 3+ in a respective tissue by antigen. (iv) Probability of a change (increase [red] or decrease [blue]) in immunostaining intensity (0–3+) relative to the ≤1-hr DTF control in tumor specimens is depicted by a dot for each biomarker and DTF timepoint; the 95% false coverage interval is shown with error bars.
pAKT and Survivin (Kidney)
Phosphoproteins such as pAKT are traditionally difficult to detect by IHC, as they display a staining intensity of less than 1+ in most cases. However, pAKT and survivin are both potential prognostic markers of renal cell carcinoma.51,52 In the kidney cohort, pAKT and survivin exhibited low staining intensity (1+) that did not show significant evidence of alteration by a DTF of up to 12 hr (Fig. 3B and C) and a low probability of altered staining intensity for both antigens. Percentage changes in H-score were significantly decreased with a DTF of 3 hr for pAKT (p=0.048) and after 12 hr for both pAKT (p=0.0036) and survivin (p=0.0048) (Fig. 3B and C, Table 4). The practical and clinical implications of these results are unclear.
EGFR (Colon, Kidney, and Lung), RCC1 (Kidney), and CD10 (Kidney)
EGFR, an epidermal growth factor that induces tyrosine kinase activity, represents an actionable target in lung cancers associated with specific oncogenic driver mutations that show a clinical response to tyrosine kinase inhibitors.53,54 However, for such cases, clinicians rely on next-generation sequencing (NGS) to identify the molecular target, as the measurement of EGFR by IHC has been historically challenging. In cases of colon cancer, however, approximately 80% of tumors exhibit overexpression without underlying amplification or oncogenic mutations. 55 Kidney, colon, and lung specimens were examined for potential DTF-mediated effects on EGFR immunostaining (Fig. 3D, Table 4). In kidney, EGFR H-scores decreased by 21.6% and 23.9% at 2- and 3-hr DTFs (p<0.05), before increasing by 11.5% at 12 hr. In colon specimens, H-score decreased by 27% at 2-hr DTF and increased by 0.9% at 3-hr and by 19.7% at 12-hr DTF. Although these phase I experiments in colon were non-significant, a significant percentage increase in H-score (88.4%) at the 12-hr DTF was observed during phase II experiments (p=0.0388) (Fig. 3D, Table 4, Supplementary Fig. 2, Supplementary Table 5).
RCC1 and CD10 have shown promise in early studies in identifying renal cell carcinoma subtypes.56,57 CD10 has also been used to differentiate metastatic RCC (CD10-positive) from clear cell ovarian cancer (CD10-negative). When evaluated in the kidney cohort, effects of DTF on RCC1 and CD10 immunostaining were limited to the 12-hr timepoint, with significant percentage increases in H-score relative to 1-hr DTF controls for RCC1 (p<0.0001) and CD10 (p=0.048) and an increase in the percentage of 2+ and 3+ stained cells for RCC1 and CD10, respectively, which translated to an elevated risk of an increase in staining intensity of 36% for RCC1 and 33% for CD10 (Fig. 3E and F, Table 4). Results of RCC1 staining were similar in phase II experiments (Supplementary Fig. 2, Supplementary Table 5).
Antigens With No Observed DTF Effects
Several categories of antigens had undetectable or minimal percentage changes in H-scores with DTF and negligible probabilities of change in staining intensity.
β-Catenin (Colon), CA125 (Ovary), CK7 (Ovary), and TTF-1 (Lung)
β-catenin, CA125, CK7, and TTF-1 had subtle changes in H-score relative to 1-hr DTF controls, with little probability of change in staining intensity. However, immunostaining for these antigens was intense but specific. When translocated to the nucleus via Wnt activation, β-catenin acts as a TF. 58 Therefore, the antibody must be saturated to detect the nuclear translocation of β-catenin by IHC,59,60 as evident by 100% of cells with a 3+ staining intensity. The intense immunostaining we observed for CK7, CA125, and TTF-1 may be due to the structural integrity of the antigen and is likely unrelated to DTF (Fig. 4A–D, Table 4).

Antigens with no observed delay to fixation (DTF) effects. (A) β-catenin (colon), (B) TTF-1 (lung), (C) CK7 (ovary), and (D) (ovary). Panels: (i) Immunohistochemistry scores (H-scores, 0–300) from tumor specimens with a DTF of 1, 2, 3, or 12 hr at room temperature were plotted by antigen. Dots represent the mean H-score of individual samples connected by bold lines. (ii) Percentage change in H-scores: the bold dot represents the expected percentage change in H-scores; error bars represent 95% false coverage interval; the gray dotted line depicts no change; red depicts false discovery rate (FDR)-adjusted p<0.05. (iii) Proportion of cells that stained 1+, 2+, or 3+ in a respective tissue by antigen. (iv) Probability of a change (increase [red] or decrease [blue]) in immunostaining intensity (0–3+) relative to the ≤1-hr DTF control in tumor specimens is depicted by a dot for each biomarker and DTF timepoint; the 95% false coverage interval is shown with error bars.
Antigens With Non-significant Effects of DTF
Many of the antigens examined by IHC did not exhibit a significant change in H-score after a DTF of ≤12 hr or noticeable changes in staining intensity. Risks of a change in intensity for these antigens remained low or displayed wide 95% false coverage intervals (Supplementary Fig. 3, Supplementary Table 4). Antigens that were not affected significantly by a DTF of 12 hr included MUC1, CD44, and CAIX in kidney; CK20, p16, CDX2, MSH6, and COX2 in colon; MUC1 and p16 in ovary; and COX2 and Napsin-A in lung. Napsin-A IHC expression was seen in the cytoplasm and nucleus. In our experiments, we measured both the nuclear and cytoplasmic expression values by IHC, and neither changed significantly during DTF (Supplementary Fig. 3I).
Time in Fixative
To determine whether the duration of formalin fixation affects IHC of clinically relevant antigens, kidney specimens with a 1-hr DTF and a TIF of 6, 12, 23, or 72 hr were examined. A total of 13 antigens were evaluated by IHC in 60 kidney specimens, and statistical results, including p values, are summarized in Table 5. Represented as a percentage change relative to 12-hr TIF controls, significant reductions of 23% and 17% were observed for c-Met and CD44 H-scores, respectively (both p<0.0001), after a TIF of 72 hr (Fig. 5A and B, Table 5). A 72-hr TIF also resulted in a 51% and 26% probability of a reduction in staining intensity for c-Met and CD44, respectively (Fig. 5A and B, Table 5). This elevated risk was supported by an observable shift in the percentage of cells with a 3+ and/or 2+ staining intensity at 6- and 12-hr TIFs to a 1+ staining intensity at the 72-hr TIF for both c-Met and CD44. RCC1 displayed consistent estimated effects in IHC scores of kidney specimens during phase I and II experiments, although statistically significant percentage changes in H-scores were only observed in phase II kidney specimens (p<0.05) (Supplementary Figs. 4J and 5, Supplementary Table 6). The remaining antigens demonstrated non-significant changes in H-score and either a low probability of change in staining intensity or wide 95% false coverage intervals for shorter and longer TIFs compared with controls fixed for 12 hr (Supplementary Fig. 4A–K, Supplementary Fig. 5, Table 5). Interestingly, antigens that showed a statistically significant decrease with higher DTF, such as PAX8, PAX2, and p53, were unaffected by the TIFs examined (Table 5).
Summary of Percentage Change in H-scores and Probability of Altered Intensity With TIF.
Abbreviations: TIF, time in fixative; FDR, false discovery rate.

Effects of time in fixative (TIF) on immunohistochemistry scores. (A) c-Met (kidney) and (B) CD44 (kidney). Panels: (i) Immunohistochemistry scores (H-scores, 0–300) in tumor specimens with a TIF of 6, 12, 23, or 72 hr at room temperature were plotted by antigen. Dots represent the mean H-score of individual samples connected by bold lines. (ii) Percentage change in H-scores: the bold dot represents the expected percentage change in H-scores; error bars represent the 95% false coverage interval; the gray dotted line depicts no change; red depicts false discovery rate (FDR)-adjusted p<0.05. (iii) Proportion of cells that stained 1+,2+, or 3+ in a respective tissue by antigen. (iv) Probability of a change (increase [red] or decrease [blue]) in immunostaining intensity (0–3+) relative to the 12-hr TIF control among tumor specimens is depicted by a dot for each biomarker and TIF timepoint; the 95% false coverage interval is shown with error bars.
Discussion
The BPV Program was conducted at four U.S. medical institutions using a set of common SOPs, a common remote data entry system, and the same tissue processing instrumentation across institutions to control, as much as possible, preanalytical factors from post-excision to fixation and from fixation to processing. BPV specifically defined DTF as cold ischemia, that is, time of removal from the body to placement in formalin. Thus, DTF here includes all preanalytical factors that may be encountered after warm ischemia and before fixation. Tissues for this study were represented by tissue aliquots with known DTF and TIF. These aliquots were from the same tissue block that was obtained from a single resection. All IHC analyses were performed by a single commercial clinical pathology laboratory to minimize assay variability.
Systematic analysis of the BPV colon, kidney, lung, and ovarian tumor specimens revealed significant DTF and TIF effects on IHC staining of several antigens evaluated, which have been identified as potential therapeutic targets and markers of cancer prognosis or diagnosis. The most remarkable affect observed was a progressive reduction in H-score and/or staining intensity with DTF of ≥2 hr or prolonged formalin fixation (72-hr TIF).
Of the 24 antigens evaluated for DTF sensitivity, 17 antigens remained stable during a DTF ≤3 hr; however, significant and potentially relevant alterations in six other antigens were observed after a DTF of 2 or 3 hr relative to controls (1-hr DTF). Data presented here point to five antigens that could potentially impact an aspect of clinical care: PAX8, PAX2, WT1, ER, and p53. These DTF-associated changes ranged in magnitude from a 20% decline in PAX2 H-score in kidney after a 2-hr DTF to a 50% reduction in PAX8 H-score in kidney and a complete loss of 3+ ER staining in ovary after a 3-hr DTF. Considering the H-scores of controls and the percentage reductions observed after DTF, staining metrics of these antigens straddled IHC thresholds used for clinical trial eligibility, prognosis, and determining primary vs metastatic tumor origin (Supplementary Table 7). For example, a 40% decline in p53 H-score after a 3-hr DTF could result in a lower percentage of p53 immunopositive cells than the >10% threshold specified for use of p53 as a prognostic biomarker of renal cell carcinoma.61,62 PAX8 has shown promise in distinguishing renal tumors of primary and metastatic origin alone 63 or in combination with PAX2, 41 and H-scores for PAX8 and PAX2 observed during the DTF time course lie on the spectrum of those identified for differentiating primary and metastatic tumors. PAX8 is also a promising prognostic marker for ovarian cancer with a threshold of 3+ staining in ≥51% cells. 64 Although the percentage decrease in H-score during the DTF time course was not significant, we observed a reduction in the percentage of 3+ PAX8-stained cells in ovary specimens. Furthermore, although different immunoscoring schemes preclude direct comparisons, the reduction in proportion of 3+ WT1 immunopositive cells we observed after a DTF ≥2 hr may straddle the eligibility criteria (an immunoreactive score [IRS] score of 4–12) for a phase I clinical trial on a WT1 vaccine to prevent ovarian cancer reoccurrence (NCT02737787).
pAKT, listed as an antigen with a DTF effect of unknown clinical impact, displayed a significant reduction in H-score of 57% after a DTF of 3 hr. In fact, the largest percentage decrease in H-score relative to controls occurred in pAKT (a 121% decline) in 12-hr DTF kidney specimens. The instability of phosphoproteins during ischemia is well known, and declines in pAKT levels during cold ischemia have been previously reported.18,65,66 Although four other antigens (c-Met, survivin, RCC1, and CD10) are promising biomarkers in cancer research, a significant percentage decrease in H-score or an elevated probability of a change in intensity score was limited to the 12-h DTF timepoint, and it is unclear whether the magnitude of change we observed may carry clinical relevance. Notably, both RCC1 and CD10 displayed a significant percentage increase in H-score after a 12-hr DTF compared with 1-hr DTF controls, although they differed from one another in the magnitude of effect. An exact mechanism of action in response to DTF remains unknown for these antigens in our study, but increases in both mRNA and protein levels with progressive cold ischemia have been reported previously for other biomarkers.33,67
Several antigens were evaluated for potential DTF effects in more than one tumor tissue type. Although tissues differed in baseline H-scores, similar patterns of change were observed during the DTF time course. Nonetheless, statistically significant changes in H-score were limited to a single tissue (kidney), which for some antigens may be attributable to tissue-specific differences in the magnitude of DTF effects. For example, although robust reductions in H-score were noted for PAX8 in 3-hr DTF kidney specimens, minimal declines were observed in ovarian specimens (51% vs 14%). With a few exceptions, the type of DTF effect, a significant decline in H-score, was consistent across biomarkers and tissue types, suggesting peptide/protein degradation as a probable mechanism rather than an induced ischemic response. Tissue-specific differences in ischemic response and degradation have been reported in the literature, and their existence is well accepted due to baseline differences in metabolic rates and proteolytic activity between tissues. 68 One exception to the pattern of declining H-score with progressive DTF was EGFR. In each tissue examined (kidney, colon, lung), EGFR displayed a reduction in H-score at a DTF of 2 hr before rebounding to a higher H-score than observed in the 1-hr DTF control. Although the mechanism of action remains unclear, others have reported a subtle increase in EGFR protein over a shorter DTF time course of 0–45 min. 18 c-Met (kidney, lung) and PAX8 (kidney and ovary) also displayed similar patterns of the DTF effect across tissue types examined; therefore, we cannot exclude the possibility of a DTF effect in these tissues given the comparably smaller sample size of lung specimens (8 cases), ovary (28 cases), and colon (29 cases) compared with kidney specimens (111 cases) and the higher levels of variability observed in some tissues.
Of the 14 biomarkers evaluated in kidney for potential TIF effects, two (c-Met and CD44) displayed significant reductions in H-score relative to controls after a TIF of 72 hr that also corresponded to an elevated probability of a decline in c-Met immunostaining intensity. Our findings of biomarker-specific susceptibility to prolonged TIF are in agreement with previous studies.22,69,70 Our results are also consistent with studies reporting declines in immunostaining for other antigens after 72 hr of formalin fixation in different tissue types.71,72 The mechanism of a TIF-mediated reduction in immunostaining is likely the result of formalin-induced crosslinking or a conformational change that precludes antibody–antigen binding. Optimization of antigen retrieval techniques alone73–76 or in conjunction with antibody titrations 77 can mitigate the effects of overfixation to a large extent. However, the minimum time in formalin that is required for effective fixation is less clear. Although a TIF of 24 hr is generally recommended,5,78 minimum TIF depends on both the size of the tissue specimen and time required for chemical fixation. Results presented here and those published previously agree that formalin fixation of small tissue segments (0.33 cm3) for 6 hr produces sufficient IHC staining.22,79,80 Notably, Chu et al. 79 also observed a small reduction in the intensity of immunostaining in specimens fixed for 22 hr compared with case-matched controls fixed for 6 hr, although the mechanism of this difference remains unclear. Taken together, these findings highlight the importance of standardizing tissue fixation protocols for the size of the tissue specimen and the antigen of interest, particularly when clinical applications depend on IHC results. For example, our results demonstrated that c-Met, a tyrosine kinase involved in tumor growth and metastasis in renal cell carcinoma, 81 was adversely affected by prolonged TIF; IHC staining of c-Met can be used as a marker of acquired resistance to EGFR inhibitors.82,83
The BPV Program was designed to mirror clinical scenarios as much as possible by focusing on common tumor types, established and promising cancer biomarkers, and assays currently used in clinical and research laboratories. Furthermore, the DTF and TIF timepoints examined in this study reflect delays and durations that are plausible in a clinical setting, and all IHC analyses were conducted in a CLIA/CAP-accredited laboratory. Supplementary Table 7 shows results in the context of cited clinical research. However, the IHC staining metrics presented here (percentage change in H-score and the probability of an increase or decrease in immunostaining intensity) cannot provide specific H-score thresholds for acceptable DTF- or TIF-induced effects because the clinical significance of such changes will largely rest with the antigen/antibody evaluated and the immunostaining score of an individual specimen, whether scored by H-score, Allred score, IRS score, or percentage of positively stained cells.
As previously noted, studies under the BPV Program were carefully designed to minimize sources of variability, and potentially confounding factors were examined in detail when possible. Two specific preanalytical factors, FFPE storage duration and batch effect, were examined for potential effects after additional biomarkers and an additional tissue type were included in an analysis 1 year after the first. By running DTF and TIF experimental tissue segments alongside case-matched controls and presenting differences in immunostaining as a percentage change relative to controls, we were able to minimize the impact of common differences encountered between IHC runs, such as differences in lot numbers of primary antibody and detection kits. When sections from the same FFPE blocks were evaluated 1 year later for 9 antigens (EGFR, p53, RCC1, CDX2, CK20, MSH6, ER, PAX8, WT1), significant differences were limited to EGFR immunostaining in 2-hr DTF colon specimens. and the change in H-score was moderate (∆H-score = 12.06) (Supplementary Figs. 2 and 5). This suggests that FFPE block storage of up to 1 year and associated batch effects were not confounding variables for IHC for most of the specimens and antigens investigated. Others have reported comparable immunostaining among archival FFPE blocks stored for 10 years84,85 or longer relative to results obtained immediately after processing when IHC analysis was performed concurrently. We also examined whether DTF and/or TIF effects might be driven by the intracellular location of antigens targeted. Although biomarkers significantly affected by DTF in kidney specimens included nuclear proteins, effects were not exclusive, and all the nuclear antigens evaluated were not significantly affected. Similarly, TIF-affected biomarkers were not restricted to one intracellular location but included peptides/proteins localized to the cell membrane and cytoplasm.
Studies conducted under the BPV Program reflect an invested effort to identify effects of discrete preanalytical factors while minimizing potentially confounding variability. Results from previous studies coupled with data presented from the BPV Program support that DTF be limited to 1 hr or less when possible and TIF fall within 12–24 hr. These results also highlight the critical importance of recording specific DTF times as well as the time a specimen is placed in and removed from formalin, as recommended for breast tumor specimens. Longer DTFs (≤3 hr) and TIFs (<72 hr) may yield acceptable immunohistochemical results but may require optimization experiments for the type and size of tissue and the biomarker of interest. Tissue microarrays containing a specimen set exposed to a panel of DTFs and/or TIFs would be an ideal methodological control for assessing both acceptable thresholds and antibody titrations. However, DTF should be extended with extreme caution, as it may cause irreparable antigen degradation. Conversely, conformational changes and crosslinking caused by overfixation may be mitigated to some extent by analytical optimization.
Similar recommendations to limit DTF and TIF are upheld by BPV and other reported data for multiple downstream analyses of FFPE tissues: DNA quality, 34 NGS, 86 RNA quality, 34 quantitative RT-PCR, 33 microarray, 87 and RNA sequencing. 35
Remnant biospecimens from the BPV research program, including tissue microarrays, are available for further research; for additional information, please see https://pbc.vai.org/bpv/. Additional data generated from the BPV study are available for further analysis in the database of Genotypes and Phenotypes (dbGaP; https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001304.v1.p1) under controlled access (study accession phs001304).
Supplemental Material
sj-pdf-1-jhc-10.1369_0022155421995600 – Supplemental material for Impact of Preanalytical Factors on the Measurement of Tumor Tissue Biomarkers Using Immunohistochemistry
Supplemental material, sj-pdf-1-jhc-10.1369_0022155421995600 for Impact of Preanalytical Factors on the Measurement of Tumor Tissue Biomarkers Using Immunohistochemistry by Aditi Bagchi, Zachary Madaj, Kelly B. Engel, Ping Guan, Daniel C. Rohrer, Dana R. Valley, Emily Wolfrum, Kristin Feenstra, Nancy Roche, Galen Hostetter, Helen M. Moore and Scott D. Jewell in Journal of Histochemistry & Cytochemistry
Footnotes
Acknowledgements
We thank the research participants for their generous donation of biospecimens, which made this study possible. We thank the personnel in the Pathology and Biorepository Core of the Van Andel Research Institute who managed the Biospecimen Core Resource for the Biospecimen Preanalytical Variables project. We also acknowledge the VARI Bioinformatics and Biostatistics Core for their expert review and data analysis. In addition, we thank Mitchell Gail and Michael Sachs of National Cancer Institute (NCI) and Mary Barcus of Leidos Biomedical Research, Inc. We also thank the following current and former members of the NCI team for their contributions: Latarsha Carithers, Hana Odeh, Merlyn Rodrigues, and Philip Branton. We thank the following current and former members of the Leidos Biomedical Research, Inc. team for their contributions: Rachana Agarwal, Leslie Sobin, Conrado Soria, and Jasmin Bavarva. We also thank additional members of the NCI Biospecimen Preanalytical Variables Research Program team, led at the University of New Mexico by Therese Bocklage, at Emory University by Gabriel Sica, at the University of Pittsburgh by Rajiv Dhir, and at Boston Medical Center by Christopher Andry.
Competing Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Author Contributions
All authors have contributed to the article as follows: AB organized the writing, revisions, and design of the manuscript; conducted image analysis of cases; refined graphs and tables for the manuscript; and contributed to statistical analysis and checking all technical details of the manuscript. ZM provided biostatistical analysis of data, developed graphs and tables, and contributed significantly to interpreting results and writing the manuscript. KE performed a literature review on biospecimen science on the fixation of tissues via the BBRB, assisted significantly in writing the manuscript, and helped edit and revise the manuscript. PG reviewed methods and procedures of the tissue procurement sites and clinical laboratory that performed the IHC. She validated several methodological fact and contributed to the overall design and collection of standard operating procedures (SOPs) of the biospecimen procurement sites. She also helped write the Methods and Discussion sections of the manuscript. DR managed operations of the VARI Comprehensive Biospecimen Resource (CBR), and contributed to the management (specimen kits, training, shipments, and receipts) of biospecimens and development of the Methods section of the manuscript. DV contributed to the management of biospecimens (specimen kits, training, shipments, and receipts); quality management of the VARI CBR, including SOPs and documentation of the specimens used in IHC,; and development of the Methods section of the manuscript. EW provided biostatistical analysis for image analysis data. NR managed the team of personnel involved in the management of collection sites, development of SOPs, and contributed to writing and revisions of the manuscript. GH as the CBR pathologist provided primary review and marking of tissue regions for IHC image analysis and provided considerable help in writing the manuscript and interpreting the meaning of IHC results in both research and/or clinical settings. HM was significantly involved in the study design of this work; oversaw the teams leading the collection, management, and analysis of tissue samples; and contributed to writing the manuscript, edits, and revisions. SJ was Principal Investigator of the CBR and was charged with the overall operations and management of biospecimens, defined the methodological design for image analysis of IHC slides, provided review and defined regions in tissues to be analyzed by imaging analysis, and contributed to writing, results interpretation, and revisions of the manuscript.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project has been supported in whole or in part by federal funds from the National Cancer Institute, National Institutes of Health, and Leidos Biomedical Research, Inc., contract number 10XS1035. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
