Automated Selection of DAB-labeled Tissue for Immunohistochemical Quantification

Abstract

The increased use of immunohistochemistry (IHC) in both clinical and basic research settings has led to the development of techniques for acquiring quantitative information from immunostains. Staining correlates with absolute protein levels and has been investigated as a clinical tool for patient diagnosis and prognosis. For these reasons, automated imaging methods have been developed in an attempt to standardize IHC analysis. We propose a novel imaging technique in which brightfield images of diaminobenzidene (DAB)-labeled antigens are converted to normalized blue images, allowing automated identification of positively stained tissue. A statistical analysis compared our method with seven previously published imaging techniques by measuring each one's agreement with manual analysis by two observers. Eighteen DAB-stained images showing a range of protein levels were used. Accuracy was assessed by calculating the percentage of pixels misclassified using each technique compared with a manual standard. Bland-Altman analysis was then used to show the extent to which misclassification affected staining quantification. Many of the techniques were inconsistent in classifying DAB staining due to background interference, but our method was statistically the most accurate and consistent across all staining levels.

Keywords

image analysis immunohistochemistry growth factors diaminobenzidene normalized blue

The increased use of immunohistochemistry (IHC) in both clinical and basic research settings has led to the development of techniques for acquiring quantitative information from immunostaining levels (Huang et al. 1996; Lehr et al. 1997; Matkowskyj et al. 2000; Seidal et al. 2001). A correlation between IHC staining and protein levels has been shown using accepted measurement methods, including Western blotting analysis (Venter et al. 1987; Podhajsky et al. 1997; Dias et al. 2000) and enzyme immunoassays (Aasmundstad et al. 1992; Lehr et al. 1997; Bhatnagar et al. 1999; Simone et al. 2000). In addition, quantitative IHC techniques have often yielded clinically important information regarding patient diagnosis, prognosis, or both (Seidal et al. 2001). However, these methods may produce conflicting results when performed in different laboratories because of differences in staining and analysis protocols (Seidal et al. 2001). Acceptance of protocol and analysis standards is essential for the comparison of results across laboratories (Allred et al. 1998; Grzybicki and Moore 1999; Dowsett et al. 2000; Braun and Harbeck 2001; Ermert et al. 2001; Hanna 2001; O'Leary 2001; Schnitt 2001; Williams et al. 2001).

Computer-assisted analysis of immunostains has been shown to lessen the variation in analysis of staining levels (Seidal et al. 2001). Advanced digital image processing systems have become more readily available and easier to use and could become an important tool for IHC analysis. A variety of computer-assisted methods have been developed to aid in the analysis of digital images of immunostains (Goto et al. 1992; Huang et al. 1996; Montironi et al. 1996; Kohlberger et al. 1999; Lehr et al. 1997,2001; Ruifrok 1997; Smejkal and Shainoff 1997; Ma and Lozanoff 1998; Vilaplana and Lavaille 1999; Matkowskyj et al. 2000; Ruifrok and Johnston 2001; Underwood et al. 2001; King et al. 2002; McGinley et al. 2002).

Horseradish peroxidase (HRP) is an enzyme marker frequently used for IHC staining. Diaminobenzidene (DAB), followed by a hematoxylin counterstain, can be used to identify the HRP-labeled antigen. DAB reacts with HRP to give a brown coloration and hematoxylin stains the background tissue blue. Visually, an observer can differentiate between the two colors when the antigen staining is high. However, low levels of stained antigen are difficult to separate from areas of dark counterstain. This difficulty in differentiating between these areas of staining leads to variations in the results of manually analyzed staining patterns. Imaging methods that reduce dependence on observer analysis would decrease the time of analysis and lessen the variation in results among different laboratories.

In the development of image-processing methods for the delineation of DAB-stained tissue from background hematoxylin, conventional techniques are hampered by high levels of nonspecific selection (Ruifrok 1997; King et al. 2002). Various investigators have developed more advanced imaging methods using both interactive (Lehr et al. 1997; Vilaplana and Lavaille 1999; Underwood et al. 2001; King et al. 2002; McGinley et al. 2002) and automated imaging techniques (Goto et al. 1992; Montironi et al. 1996; Kohlberger et al. 1999; Ruifrok 1997; Smejkal and Shainoff 1997; Ma and Lozanoff 1998; Ruifrok and Johnston 2001). Fully automated methods would allow rapid analysis of IHC staining with limited observer bias, but none of the previous methods has shown the statistical accuracy required of an analysis standard.

We have developed a method of image analysis that permits automated selection of positively stained tissue areas using digital images of DAB-labeled immunostains. Brightfield (24-bit) images of DAB-labeled stains are converted to 8-bit normalized blue images (BN), which accurately delineate stained tissue from background counterstaining. This method is simple and can be performed with most conventional image-processing software programs. Here we describe our method and seven other published techniques. A statistical analysis is presented comparing the methods on the basis of their agreement with manual selection of stained tissue.

Materials and Methods

Animal Model and Surgery

New Zealand White rabbits have been used as an animal model to study growth factor localization in a healing tooth extraction socket (Lalani et al. in press). Slides from this study were used to evaluate the imaging techniques. Four incisor teeth (two maxillary and two mandibular) were extracted atraumatically from each animal using dental elevators and general anesthesia. The resultant tooth extraction sockets were allowed to heal for selected time frames (for this study, all samples were taken after 2 weeks of healing), at which time the animals were sacrificed. The jaw bone containing the tooth extraction sockets was harvested from each rabbit and processed for IHC analysis (Bourque et al. 1993; Mori et al. 1988). Animal surgery, postoperative care, and sacrifice were carried out per National Institute of Health guidelines and with the approval of The University of Texas Health Science Center at Houston Animal Welfare Committee (HSC-AWC-99–107).

Immunohistochemistry

The harvested tissue was fixed with paraformaldehyde-lysine-phosphate fixative (McLean and Nakane 1974), decalcified using ethylenediaminetetraacetic acid-glycerol (Bourque et al. 1993; Mori et al. 1988) for 5 weeks, washed to remove the glycerol, and frozen in OCT embedding medium (Tissue-Tek; Torrance, CA). Sections were cut 10-μm thick on a cryostat and placed on 3-aminopropyltriethoxysi-lane-coated slides.

IHC staining was performed using the avidin-biotin complex (ABC) technique for detection of antibodies to the following growth factors: vascular endothelial growth factor (VEGF), bone morphogenetic protein-type 2 (BMP-2), and fibroblast growth factor type 2 (FGF-2). These growth factors were selected because after 2 weeks of healing their levels in the tooth socket are expected to differ from one another. Slides were incubated in 3% H₂O₂ in methanol for 1 hr to quench endogenous peroxidase activity. The sections were washed twice with PBS for 5 min each and then incubated with normal blocking serum for 1 hr. This was followed by incubation with goat anti-human (BMP-2 and FGF-2; Santa Cruz Biotechnology, Santa Cruz, CA) and mouse anti-human (VEGF; Upstate Biotechnology, Lake Placid, NY) primary antibodies overnight at 4C in a humidity chamber. Then sections were washed with PBS, again twice for 5 min each, and incubated with biotinylated rabbit anti-goat (BMP and FGF-2) or horse anti-mouse (VEGF) secondary antibody (Vectastain ABC Elite kit; Vector Laboratories, Burlingame, CA) for 1 hr. Sections were then treated with ABC solution (Vectastain ABC Elite kit) for 1 hr, washed with PBS, and incubated with DAB substrate for 10 min. Counterstaining was carried out with Harris hematoxylin (Sigma-Aldrich; St Louis, MO). Controls were routinely included.

An additional image (a generous gift from Guido Sclabas, MD) is presented to show the applicability of this technique to a cytoplasmic DAB stain. The image is a section of human pancreatic cancer stained with goat anti-human Trk B (Santa Cruz Biotechnology), an antibody that stains the cytoplasmic domain of the Trk B receptor. The secondary antibody used was HRP-conjugated anti-goat (DAKO; Carpinteria, CA), and the image was acquired on a Nikon ECLIPSE E400 microscope and using a Sensys digital camera.

Imaging

Brightfield images were captured from immunostains of the healing socket using an Olympus IX-50 inverted microscope (Olympus; Melville, NY) with a X20 objective lens (NA 0.40) and an Olympus 750 color CCD camera. IPLab 3.5 image analysis software (Scanalytics; Fairfax, VA) allowed computer control of image acquisition and all image processing. Four images were acquired from a randomly selected location in each slide, white-balanced, and averaged to minimize noise. No further image processing was performed before image analysis. The resulting images were 637 × 480 pixels, with a pixel resolution of 0.61 μm. Six slides were analyzed per growth factor, resulting in 18 total images analyzed in the study.

Manual Selection

Manual selection is currently accepted as the most accurate technique for classifying an area as stained positively for DAB chromogen. To allow for variation between observers, images of DAB-stained tissue were examined and positively stained areas were selected by two independent observers (EMB and ZL) using the “paintbrush” and “region of interest” tools in IPLab. Selection was repeated on 3 different days to allow for intraobserver variation.

Automated Techniques

In conventional digital imaging, brightfield images are a composite of three 8-bit monochromatic channels (red, green, and blue), resulting in a 24-bit color image. The imaging techniques discussed below are based on converting these images into a format that allows maximal separation of DAB-stained pixels from background tissue (i.e., largest dynamic range). Imaging software can then be used to set thresholds for automated DAB selection and quantitation of staining parameters (e.g., percentage of area stained, intensity, integrated optical density).

The red, green, and blue channels of brightfield images can be normalized by the sum of the three channels. Pixel values of 8-bit BN images are calculated using the following equation:

BN = \frac{255 * Blue}{Red + Green + Blue} (1)

where Red, Green, and Blue are the pixel values of the three monochromatic channels, and ^∗ indicates multiplication.

We analyzed our images by converting them to BN images for automatic classification of tissue stained for the DAB chromagen and compared our results to seven other published automated selection methods: blue channel (Blue) (King et al. 2002), green filtered (Green) (Smejkal and Shainoff 1997), Hue (Ma and Lozanoff 1998), Hue-Saturation Intensity (HSI) (Goto et al. 1992; Kohlberger et al. 1999), Green divided by Blue (G/B) (Montironi et al. 1996), Brown (Ruifrok 1997), and Color Deconvolved (CD) (Ruifrok and Johnston 2001).

Selection of DAB-stained pixels by setting a threshold on the blue channel values is the best option of the RGB monochromatic channels, but background interference can be considerable (Kuyatt et al. 1993; Ruifrok 1997; King et al. 2002).

A green filter in the light path enhances the sensitivity of Western blotting analysis to low DAB staining levels and decreases the inclusion of background staining (Smejkal and Shainoff 1997). The green channel of the filtered brightfield image was extracted and used to set thresholds and identify DAB-stained pixels.

The HSI coordinate system is intended to more closely correspond with the way humans perceive color (Goto et al. 1992; Castleman 1996; Ma and Lozanoff 1998; van der Laak et al. 2000), and conversion of RGB images to the three 8-bit HSI images is an option available on most modern imaging systems (including IPLab). Hue is a measurement of the dominant wavelength (van der Laak et al. 2000) or color quality (Ma and Lozanoff 1998). Hue images may provide a more useful means for assessing color change and have been suggested to enhance the separation of DAB and hematoxylin staining (Ma and Lozanoff 1998). There are many mathematical techniques for calculating the image Hue from RGB images (Goto et al. 1992; Castleman 1996; Ma and Lozanoff 1998; van der Laak et al. 2000), all of which give similar results (unpublished data).

Although the Hue image has been used alone, other investigators have selected DAB-stained tissue using combined thresholding along all HSI channels (Goto et al. 1992; Kohlberger et al. 1999). For one analysis, we set a threshold for the Hue image only, and in another we classified DAB-stained pixels by combining the thresholds of all three HSI images, which allowed maximal selection of the DAB-labeled tissue.

Brown, G/B, and CD are mathematical translations of RGB information obtained from brightfield images. Brown images are proposed to allow maximal separation of DAB and hematoxylin (Ruifrok 1997). Pixel values for Brown images are calculated by the following equation:

Brown = Blue - 0.3 * (Red + Green) (2)

G/B images have been used in the study of DAB-stained endothelial cells, and the technique is said to allow clear distinction between DAB-stained area and background hematoxylin (Montironi et al. 1996). Quite simply, G/B images are calculated by dividing the green channel of the bright-field image by the blue channel:

\frac{G}{B} = \frac{Green}{Blue} (3)

CD was developed through deconvolution mathematics to determine the relative contributions of each of the RGB color channels to DAB staining (Ruifrok and Johnston 2001). Parameters are given in the original article for calculating CD, but they will vary for different staining and imaging protocols. To use this technique appropriately, new parameters were determined for the protocols of our laboratory. Briefly, the monochromatic pixel values (Red, Green, Blue) are first converted to optical density (OD) values. Typically, a density standard is used to calibrate imaging systems for conversion to OD values, but Ruifrok and Johnston (2001) propose capturing a control image of a blank field of view and using it to calculate OD values according to the following equation:

OD (r, g, b) = - {log}_{10} (\frac{(Red, Green, Blue)}{{(Red, Green, Blue)}_{control}}) (4)

where r, g, and b are the optical density values for each monochromatic color and (Red, Green, Blue)_control are the pixel values for the image of the blank field. Samples stained for DAB only were used to determine the mean OD values for DAB-stained tissue in the absence of hematoxylin. The same was done for tissue stained with only hematoxylin or eosin. Using these values and the mathematical procedure given by Ruifrok and Johnston (2001), we determined CD images for our laboratory to be calculated as:

CD = 3.63 * b - 2.60 * r - 1.28 * g (5)

Threshold values for each of the methods were set after examining preliminary images stained at the same time as the 18 images used in the statistical study. For any IHC staining procedure, control slides of tissue with a known staining pattern should be included to verify the accuracy of the results. Images of these control tissues are examined to determine the threshold. Values should be selected that maximize selection of the known positive tissue in the control while minimizing background interference. These images were used only for determining threshold parameters and were not included in the statistical study. The threshold values of each technique were held constant in the analysis of all 18 images. Scripts were written in IPLab that allowed automatic analysis of the 18 images, including conversion of RGB images to the required formats, pixel classification with constant threshold values for each technique, and analysis of staining. Once thresholds were set, the process was completely automated for the analysis of all 18 images using all techniques. Again, all of these processes can be automated using most image processing software programs.

Analysis

The accuracy of these methods was assessed based on the percent agreement with manual selection of stained tissue and by calculating the variation in quantitation of a staining parameter. For calculation of percent agreement, a gold standard for each image was constructed from the six (two observers, 3 days each) manual analyses. Any pixel selected as DAB-stained by three or more of the manual observations was designated as DAB-positive. The automated methods were compared to this manual gold standard and the percentage of misclassified pixels (i.e., pixels not in agreement with the gold standard) was calculated for each method. This is a direct measurement of the agreement of each technique with manual analysis.

It is important to understand how this misclassification affects the measurement of staining parameters (e.g., percent area, integrated optical density, mean intensity). IHC staining levels have been shown to correlate with protein measurements (Aasmundstad et al. 1992; Lehr et al. 1997; Podhajsky et al. 1997; Bhatnagar et al. 1999; Dias et al. 2000; Simone et al. 2000) and even DNA measures (Venter et al. 1987; Ratcliffe et al. 1997; Gerdes et al. 1998; Jacobs et al. 1999; Lehr et al. 2001) of highly amplified genes. There are a number of ways to measure the “level” of IHC stains. In some instances, the area of tissue stained for DAB is used (percent area). This is the area of the image classified as DAB stained, divided by the total image area:

percentarea = \frac{\sum Stainedarea}{Totalimagearea} (6)

The percent area of DAB-stained tissue for all images was determined for each manual observation and the eight automated methods. The statistical determination of agreement in percent area between manual analysis and the automated techniques is described below.

Statistics

The mean (±SEM) of percentage misclassified pixels was determined for each technique. Lower values indicate more accurate methods, with 0% representing exact agreement with manual analysis. Difference testing was performed using a paired Student's t-test. Statistical significance was defined as p≤0.05.

Manual measurement of the percent area of stain is not absolute because of inter- and intraobserver differences. Concordance correlation (r) and the slope of the regression line for the mean percent area of each observer were calculated and the data plotted against the line of perfect agreement (observer one mean = observer two mean). Linear regression provides a graphical representation of agreement and correlation measures the linearity of the data. This method gives insight into the accuracy of the data but cannot be used as an independent measure of agreement (Hansen et al. 1998; Daly and Bourke 2000; King et al. 2002).

Calculation of 95% confidence intervals using Bland-Altman analysis is the most direct way to measure agreement between the observers (Bland and Altman 1996; Daly and Bourke 2000; King et al. 2002). The mean percent area value across all six manual measurements was calculated for each image. The mean difference ( $d ¯$ ) and standard deviation (s) between each observer value and this mean value were calculated. The mean manual value then has a 95% confidence interval of ± 1.96s (Bland and Altman 1996; Daly and Bourke 2000; King et al. 2002).

Pixel misclassification results in inaccuracies in stain quantitation. Results of percent area measurements by the automated techniques were compared with mean manual values using similar analyses: concordance correlation and Bland-Altman agreement analysis. When two techniques (in this study, manual and each individual automated method) are compared, automated percent area measurements are expected, with 95% confidence, to differ from the mean manual value by the following interval:

Limits of agreement = d ¯ + 1.96 s (7)

This range is the confidence limit for the techniques when the percent area is measured. The more narrow the range of agreement, the more precise the method. The closer these values are to zero, the more accurate the method. A fundamental assumption for Bland-Altman analysis is that $d ¯$ must be independent of the measurement magnitude. This was verified by calculating correlation coefficients for plots of $d ¯$ against mean observer values (data not shown).

Results

Images

Two weeks after tooth extraction, a healing socket is characterized by the formation of bone spicules within the soft tissue matrix. Figure 1A shows a representative image of a healing socket stained for VEGF. Areas of DAB-labeled tissue are seen close to dark hematoxylin-stained nuclei. The 8-bit processed images resulting from the automated techniques evaluated in this study are shown in Figures 1B-1H.

Hue (Figure 1B) clearly differentiates between the dark nuclei and DAB-stained tissue. However, distinguishing DAB staining patterns in the bone spicule is difficult. The pattern of differential staining in the bone is lost. The Blue (Figure 1C) and Green (Figure 1D) images provide a clearer picture of the overall tissue morphology. Variations in staining within the spicules are evident, but stained nuclei are not easily distinguished from low-intensity areas of DAB stain. The Brown technique (Figure 1E) enhances areas of low staining compared to the Green and Blue methods. Differentiation between some nuclei and DAB-stained tissue is improved but the majority of nuclei remain indistinguishable from low-intensity DAB staining.

In the G/B (Figure 1F), CD (Figure 1G), and BN images (Figure 1H), cell nuclei are converted to a distinctly different intensity level and are easily distinguished from the stained tissue. Unlike with the Hue method (Figure 1B), this is accomplished while maintaining the differential staining pattern within the tissue. DAB staining can easily be separated from both the dark blue nuclei and the light hematoxylin intensity in the background, while maintaining the staining pattern in the tissue.

Misclassification

All methods allowed rapid analysis of the 18 images using programs written in IPLab. Staining levels in all 18 images with all techniques were analyzed in less than 5 min using the preset thresholds.

Table 1 reports the mean percentage of pixels misclassified using each image processing technique. The BN method resulted in the lowest percentage of pixels misclassified (0.32 ± 0.21%). The CD method resulted in the second lowest level of misclassification (1.94 ± 0.36%). Hue alone resulted in higher misclassification (18.62 ± 3.60% misclassified), but thresholding along all HSI images greatly reduces the misclassification (2.07 ± 0.33% misclassified). Images from B/G were misclassified by 3.89 ± 0.75%, with the Brown method slightly higher at 7.74 ± 1.38%. Blue and Green images yielded levels of misclassification similar to Hue (15.67 ± 2.12% and 19.86 ± 3.95%, respectively).

Figure 1 (A)

Example brightfield images of VEGF immunostaining in a tooth extraction socket. The socket at 1 week after extraction is characterized by the formation of spicules of bone within the soft tissue matrix. Brown is positive VEGF stain and blue is hematoxylin counterstain. (B-H) 8-bit images of A calculated using each processing method for comparison. Negative controls are not shown. (B) Hue, (C) Blue, (D) Green, (E) Brown, (F) G/B, (G) CD, and (H) BN. Bar = 50 μm.

Table 1

Percentage of misclassified pixels in DAB-stained images using eight automated techniques (±SEM)

Imaging method	Misclassified pixels (%)
BN	0.32 ± 0.21
CD	1.94 ± 0.36
HSI	2.07 ± 0.33
G/B	3.89 ± 0.75
Brown	7.74 ± 1.38
Blue	15.67 ± 2.12
Hue	18.62 ± 3.60
Green	19.86 ± 3.95

Observer Variation

Table 2 summarizes the concordance correlation, Bland-Altman analysis, and 95% confidence ranges using manual and automated methods to quantitate staining via measurement of the percent area of DAB stain. Figure 2A is a plot of the mean values for percent area from each observer. The data indicate almost perfect agreement between observer one and observer two (r = 0.9979; slope of regression line = 1.02). According to Bland-Altman analysis, the confidence interval for manual measurements of percent area is ±1.64 percentage points (Figure 2B). This indicates that any single manual measurement of percent area is within 1.64 percentage points (95% confidence) of the gold standard.

Percent Area

The mean differences ( $d ¯$ ) between the automated and manual measurements of percent area were less than 3 percentage points for all methods (Table 2). A better determination of accuracy is the 95% CI because it includes the entire range of possible values. Automated measurements of percent area using BN strongly correlated with manual measurements (r=0.9996) but were slightly higher on average ( $d ¯$ =0.47 percentage points). BN measurements of percent area fell between −0.63 and 1.58 percentage points (95% CI) of the mean manual measurement. This interval is within the CI of the mean manual value (±1.64 percentage points), indicating that the BN technique was as accurate as the gold standard.

Table 2

Correlation coefficients, mean differences, and confidence intervals of manual and automated measurements of the percent area of DAB staining a

Imaging method	Correlation coefficient (r)	Average difference ( $d ¯$ )	95% Confidence interval
Manual	0.9979	0	-1.64 to 1.64
BN	0.9996	0.47	-0.63 to 1.58
CD	0.9989	0.06	-1.36 to 1.48
HSI	0.9972	0.53	-1.19 to 2.25
G/B	0.9577	0.17	-4.96 to 5.30
Brown	0.8972	1.27	-13.24 to 15.79
Blue	0.9472	2.86	-7.60 to 13.32
Hue	0.7852	-2.31	-26.85 to 22.24
Green	0.8304	-2.76	-25.35 to 19.82

^aPercent area = area of the image classified as DAB stained divided by the total image area.

Figure 2

Observer comparison. (A) Scattergram of observer one and observer two measurements of percent area. Solid line is the linear regression of the data. Dotted line is the line of perfect agreement (y=x). (B) Plot of differences between observer one and observer two percent area measurements compared with mean observer values. Solid line is the average difference, and the dashed lines represent 95% confidence intervals.

The CD method was also as accurate as the gold standard. The measurements strongly correlated with manual measurements (r=0.9989), with a very low average difference ( $d ¯$ =0.06). Although the CI for the CD method (-1.36 to 1.48 percentage points) is more broad than the BN method, it is still within the CI of the mean manual value. All other methods yielded CIs falling outside the range of the manual value.

Combined thresholding of all HSI images had the third highest correlation with manual measurements (r=0.9972) (Table 2). HSI measurements were slightly higher on average than manual values ( $d ¯$ =0.53) and approximated the mean manual value with a narrow confidence range (-1.19 to 2.25 percentage points). This range was 1.23 percentage points more broad than that of BN, and the maximal value (2.25) was slightly higher than the maximal confidence value of the manual measurements (1.64).

Figure 3

BN vs manual. (A) Scattergram of BN percent area vs mean observer measurements of % area. Solid line is the linear regression of the data. Dotted line is the line of perfect agreement (y=x). (B) Plot of differences between BN and mean observer percent area measurements against mean observer values. Solid line is the average difference, and the dashed lines represent 95% confidence intervals. The dotted lines are confidence intervals for the mean manual value.

The average differences using CD and G/B images were low ( $d ¯$ =0.07 and $d ¯$ =0.17, respectively). Although the mean percent misclassification from G/B images was low, the CI for measuring percent area was very broad (-4.96 to 5.30 percentage points). All other methods resulted in extremely poor results and CI ranges broader than 20 percentage points.

Figure 3A is a plot of the BN values against the mean manual measurements. Strong agreement is indicated, as most of the points and the regression line fall near the line of perfect agreement (BN = mean manual). Figure 3B is a scatterplot of the BN values compared with the mean manual measurements for all 18 images. The confidence range of the BN technique (dashed lines) falls within the confidence range of the manual value (dotted line).

Figure 4 is an example of the automated selection of DAB-labeled tissue from an area of a VEGF immunostained sample. In the brightfield image (Figure 4A), areas of DAB-labeled tissue are seen along with dark hematoxylin-stained nuclei. The brightfield image is converted to a BN image (Figure 4B) and, using preset thresholds, areas of DAB stain are selected and pseudo-colored yellow (Figure 4C). Staining parameters (e.g., percent area, mean optical density) can then be calculated from these yellow pixels. These processes (conversion to BN, setting thresholds, calculation of stain properties) can be performed and automated using one of many image processing systems.

Discussion

This study presents a technique for automated selection and analysis of DAB-stained samples. The use of BN images is simple and allows rapid, user-independent analysis. In addition, we present a review and analysis of seven previously published automated methods. This work should provide a starting point for research laboratories seeking automated imaging techniques for quantitative immunohistochemistry.

A complex spatial and temporal expression of proteins influences both normal tissue physiology and the evolution of many pathologies. IHC is a great advantage to biomedical studies because it allows antigen-specific analysis while maintaining overall tissue morphology. However, it is difficult to compare results among laboratories because of different methods of staining and analysis. Standardization of methods is essential for effective communication among investigators.

Manual selection is currently accepted as the most accurate method for classifying tissue as DAB positive or negative, but these measurements are subject to observer variation. The manual values are not absolute and have a limited confidence range. Our study agrees with those of previous investigators who have reported low but significant variations in the manual analysis of IHC stains (Jagoe et al. 1991; Kay et al. 1996; Giatromanolaki et al. 1997; Polkowski et al. 1997; Hansen et al. 1998; Belien et al. 1999).

Figure 4 (A)

Brightfield image of VEGF immunostaining in a tooth extraction socket. (B) 8-bit BN image created from image A. (C) Image processing software can be programmed to select positively stained areas based on preset thresholds. Areas selected as DAB-stained are pseudo-colored yellow. Bar = 25 μm.

Many methods have been developed to rapidly differentiate DAB-stained tissue from background staining levels (Goto et al. 1992; Montironi et al. 1996; Lehr et al. 1997; Ruifrok 1997; Smejkal and Shainoff 1997; Ma and Lozanoff 1998; Kohlberger et al. 1999; Ruifrok and Johnston 2001; Vilaplana and Lavaille 1999; King et al. 2002; McGinley et al. 2002), but a consensus on the most accurate method has not been reached. We present our method and seven other methods found in the literature. This study indicates that converting brightfield images to 8-bit BN images allows for rapid, automated selection of DAB-labeled tissue from background hematoxylin. Fewer than 1% of the pixels are misclassified when the BN images are used with a constant threshold to select DAB-stained areas, and quantitation of the percent area is accurate within the confidence intervals of manual analysis. The BN method has been used to study the spatial and temporal localization of growth factors in a tooth extraction socket (Lalani et al. in press) and to assist in the reconstruction of microvascular networks from immunostains for CD31 (Brey et al. 2002).

The CD method also allows separation of DAB-stained areas from background staining with a low level of pixel misclassification. The confidence range for percent area measurements is also accurate within the CIs of manual analysis. This method could also be used in automated analysis but requires the determination of parameters unique to the imaging and staining protocols of each individual laboratory. Careful technique must be employed in parameter determination. In the initial work single Red, Green, and Blue pixel values found in a pure stain of DAB (Ruifrok and Johnston 2001) were used in determining the CD parameters. In practice, there will be a range of values for each color channel, depending on the level of DAB staining. In our work, parameter values and analysis results varied with the particular values selected from the pure stain, with optimal results achieved using mean values (unpublished results). We suggest using the mean values from the pure stains in parameter determination to optimize the results when the CD technique is used.

Combined thresholds for all HSI images or single thresholds for G/B images give the next lowest levels of misclassification. HSI quantitation of percent area has a CI only 1.43 percentage points greater than that of BN. This range falls slightly outside the CI of manual analysis. HSI may not be as accurate as BN, but it has good accuracy and could be considered for automated analysis. Although visual inspection of G/B images indicates good color separation, pixel misclassification leads to a confidence range of more than 10 percentage points for measuring percent area. This is unacceptable for precise differentiation of staining areas. Published results using G/B for analysis only state that it allows “almost clear-cut separation,” but quantitative verification is not given (Montironi et al. 1996).

Many of the other methods investigated result in high percentages of pixel misclassification. Although combining thresholds for all HSI images results in 2% misclassification, setting thresholds along the hue image alone resulted in 18% of pixels being misclassified and led to inaccurate quantitation. A study has suggested advantages to the Hue method (Ma and Lozanoff 1998), but a statistical analysis of results was not performed.

As expected, the Blue, Brown, and Green methods yielded poor results. The Blue channel is the most accurate of the conventional RGB channels but it is known to require user interaction to eliminate nonspecific selection (King et al. 2002). The Brown method was developed based on specific staining data. A study of the automated Brown method showed 2.4% pixel misclassification (Ruifrok 1997), but interactive evaluation of the images and deletion of problem areas were required before analysis. In the present study, the Brown method was employed without prior image manipulation. The Green method (Smejkal and Shainoff 1997) was designed to improve sensitivity to low levels of DAB staining, but in the absence of hematoxylin counterstaining. These results are not useful for quantitation of stains with low antigen levels (percent area <10%).

The IHC stains studied here were for VEGF, BMP-2, and FGF-2, all secreted proteins present in the stroma of the healing socket. The BN method is not limited to secreted proteins. Figure 5 is an example image of an immunostain for the intracellular domain of the Trk B receptor and its conversion to an 8-bit image using the BN technique. The DAB-positive area is clearly separated from the dark nuclei within the cell.

Nuclear stains, however, will confound the results of any of the imaging techniques presented. As shown in Figure 1, the successful methods clearly separate DAB-stained tissue from dark blue nuclei. Although all DAB-positive tissue areas receive some contribution from hematoxylin, it is typically low intensity. Nuclear stains with mixed high-intensity hematoxylin and DAB staining will result in inconsistent results.

The selection of BN is based on the color theory of additive primary colors (i.e., R, G, B), also referred to as machine color space. In this color space, equal proportions of R, G, and B produce white, equal proportions of two of the three additive colors produce the three secondary colors (cyan, magenta, and yellow), and unequal proportions of two or three create other colors. DAB-positive areas result in brown pigmentation, which is produced from 1 part blue, 1 part green, and 4 parts red (i.e., 1B + 1G + 4R). Blue or background-stained areas are produced from pixels predominantly B, with low contributions from R and G in creating different shades. The B values for brown DAB- and dark blue-stained nuclei are similar, but the contributions from G and R are much greater for the DAB. Therefore, normalizing the B value by the sum of all the color values (i.e., BN) will give values that allow clear separation of brown from blue coloration.

Figure 5 (A)

Brightfield image of pancreas immunostained for the cytoplasmic domain of Trk B. (B) 8-bit BN image created from image A.

A technique that accurately analyzes images is only one of many steps in the IHC analysis process that must be standardized. Others include staining protocols (e.g., fixative, dilutions), the selection of appropriate control slides (Seidal et al. 2001), and determination of the staining parameters that best correlate with the amount of protein present. Automated imaging methods are sensitive to laboratory technique. Although the BN imaging technique is highly accurate in delineating brown areas of tissue from the background hematoxylin, it cannot separate nonspecific stain from specific stain. Rigorous staining and imaging protocols must be employed to ensure accurate results. In addition, this technique may not work in tissue with high brown pigmentation such as high-melanin skin, as these areas will most likely be interpreted as positive for DAB. This method can be applied using most image processing and microscopy systems. However, pilot studies should provide statistical verification at each individual laboratory before the technique is used as a research tool.

The methods presented here are automated. Once a threshold is selected from trial images, all subsequent images are analyzed using the same preset threshold. Interactive evaluation of each individual image is not required as it is with other methods (Lehr et al. 1997; Vilaplana and Lavaille 1999; Underwood et al. 2001; King et al. 2002; McGinley et al. 2002). Writing a script, or program, in IPLab that performs this task is simple and straightforward. In addition, this method can be easily employed using other image analysis software, including NIH Image (although the programming is more complex with this software).

Footnotes

Acknowledgements

Supported by grants from the NIH (T32-GM08362, HL 18672, HL 62341, CA 16672, P30-CA16672), Army Department of Defense (DAMD 17–99-1-9268), Whitaker Foundation, and Robert A. Welch Foundation Grant (C-938).

We would like to thank Dr Guido Sclabas from the Department of Surgical Oncology at The University of Texas M. D. Anderson Cancer Center for the Trk B immunostain.

References

Aasmundstad

Haugen

Johannesen

Hoe

Kvinnsland

(1992) Oestrogen receptor analysis: correlation between enzyme immunoassay and immunohistochemical analysis. J Clin Pathol 45:125–129

Allred

Harvey

Berardo

Clark

(1998) Prognostic and predictive factors in breast cancer by immunohistochemical analysis. Mod Pathol 11:155–168

Belien

Somi

de Jong

van Diest

Baak

(1999) Fully automated microvessel counting and hot spot selection by image processing of whole tumour sections in invasive breast cancer. J Clin Pathol 52:184–192

Bhatnagar

Tewari

Bhatnagar

Austin

(1999) Comparison of carcinoembryonic antigen in tissue and serum with grade and stage of colon cancer. Anticancer Res 19:2181–2189

Bland

Altman

(1996) Measurement error and correlation coefficients. Br Med J 313:41–42

Bourque

Gross

Hall

BKL

(1993) A histological processing technique that preserves the integrity of calcified tissues (bone, enamel), yolky amphibian embryos, and growth factor antigens in skeletal tissue. J Histochem Cytochem 41:1429–1434

Braun

Harbeck

(2001) Recent advances in technologies for the detection of occult metastatic cells in bone marrow of breast cancer patients. Breast Cancer Res 3:285–288

Brey

King

Johnston

McIntire

Reece

Patrick

Jr (2002) A technique for quantitative three-dimensional analysis of microvascular structure. Microvasc Res 63:279–294

Castleman

(1996) Color and multispectral image processing. In Castleman

, ed. Digital Image Processing. Englewood, NJ, Prentice Hall, 547–562

10.

Daly

Bourke

(2000) Bias and measurement error. In Daly

Bourke

, eds. Interpretation and Uses of Medical Statistics. Oxford, Blackwell Science, 381–420

11.

Dias

Chen

Dilday

Palmer

Hosoi

Singh

. (2000) Strong immunostaining for myogenin in rhabdomyosarcoma is significantly associated with tumors of the alveolar subclass. Am J Pathol 156:399–408

12.

Dowsett

Cooke

Ellis

Gullick

Gusterson

Mallon

Walker

(2000) Assessment of HER2 status in breast cancer: why, when and how? Eur J Cancer 36:170–176

13.

Ermert

Hocke

Duncker

Seeger

Emert

(2001) Comparison of different detection methods in quantitative microdensitometry. Am J Pathol 158:407–417

14.

Gerdes

Nielsen

Mohr

Pofeiffer

Knoop

Rose

Horder

. (1998) Correlation between molecular genetic analysis and immunohistochemical evaluation of the epidermal growth factor receptor and p185HER2. Anticancer Res 18:2529–2534

15.

Giatromanolaki

Koukourakis

Theodossiou

Barbatis

O'Byrne

Harris

Gatter

(1997) Comparative evaluation of angiogenesis assessment with anti-factor-VIII and anti-CD31 immunostaining in non-small cell lung cancer. Clin Cancer Res 3(12 Pt 1):2485–2492

16.

Goto

Nagatomo

Hasui

Tamanaka

Murashima

Soto

(1992) Chromaticity analysis of immunostained tumor specimens. Pathol Res Pract 188:433–437

17.

Grzybicki

Moore

(1999) Implications of prognostic markers in brain tumors. Clin Lab Med 19:833–847

18.

Hanna

(2001) Testing for HER2 status. Oncology 61(suppl 2):22–30

19.

Hansen

Gribau

Rose

Bak

Sorensen

(1998) Angiogenesis in breast cancer: a comparative study of the observer variability of methods for determining microvessel density. Lab Invest 78:1563–1573

20.

Huang

Chen

Tietz

(1996) Immunocytochemical detection of regional protein changes in rat brain sections using computer-assisted image analysis. J Histochem Cytochem 44:981–987

21.

Jacobs

Gown

Yaziji

Barnes

Schnitt

(1999) Specificity of HercepTest in determining HER-2/neu status of breast cancer using the United States Food and Drug Administration-approved scoring system. J Clin Oncol 17:1983–1987

22.

Jagoe

Steel

Vucicevic

Alexander

Van Noorden

Wootton

Polak

(1991) Observer variation in quantification of immunohistochemistry by image analysis. Histochem J 23:541–547

23.

Kay

Barry Walsh

Whelan

O'Grady

Leader

(1996) Inter-observer variation of p53 immunohistochemistry— an assessment of a practical problem and comparison with other studies. Br J Biomed Sci 53:101–107

24.

King

Brey

Youssef

Johnston

Patrick

Jr (2002) Quantification of vascular density using a semi-automated technique for immuno-stained specimens. Anal Quant Cytol Histol 24:39–48

25.

Kohlberger

Breitenecker

Kaider

Losch

Gitsch

Breite-necker

Kieback

(1999) Modified true-color computer assisted image analysis versus subjective scoring of estrogen receptor expression in breast cancer: a comparison. Anticancer Res 19:2189–2193

26.

Kuyatt

Reidy

Hui

Jordan

(1993) Quantitation of smooth muscle cell proliferation in cultured aorta. Anal Quant Cytol Histol 15:83–87

27.

Lalani

Wong

Brey

Mikos

Duke

(in press) Spatial and temporal localization of TGF-β1, BMP-2, and PDGF-A in healing tooth extraction sockets in a rabbit model. J Oral Maxillofac Surg.

28.

Lehr

Jacobs

Yaziji

Schnitt

Gown

(2001) Quantitative evaluation of HER-2/neu status in breast cancer by fluorescence in situ hybridization and by immunohistochemistry with image analysis. Am J Clin Pathol 115:814–822

29.

Lehr

Mankoff

Corwin

Santeusanio

Gown

(1997) Application of Photoshop-based image analysis to quantification of hormone receptor expression in breast cancer. J Histochem Cytochem 45:1559–1565

30.

Lozanoff

(1998) A full color for quantitative assessment of histochemical and immunohistochemical staining patterns. Biotech Histochem 74:1–9

31.

Matkowskyj

Schonfeld

Benya

(2000) Quantitiative immunohistochemistry by measuring cumulative signal strength using commercially available software Photoshop and Matlab. J Histochem Cytochem 48:303–311

32.

McGinley

Knott

Thompson

(2002) Semi-automated method of quantifying vasculature of 1-methyl-1-nitrosourea-induced rat mammary carcinomas using immunohistochemical detection. J Histochem Cytochem 50:213–222

33.

McLean

Nakane

(1974) Periodate-lysine-paraformaldehyde fixative: a new fixative for immunoelectron microscopy. J Histochem Cytochem 22:1077–1083

34.

Montironi

Diamanti

Thompson

Bartels

(1996) Analysis of the capillary architecture in the precursors of prostate cancer: recent findings and new concepts. Eur Urol 30:191–200

35.

Mori

Sawai

Teshima

Kyogoku

(1988) A new decalcifying technique for immunohistochemical studies of calcified tissue, especially applicable to cell surface marker demonstration. J Histochem Cytochem 36:111–114

36.

O'Leary

(2001) Standardization in immunohistochemistry. Appl Immunohistochem Mol Morphol 9:3–8

37.

Podhajsky

Bidanset

Caterson

Blight

(1997) A quantitative immunohistochemical study of the cellular response to crush injury in optic nerve. Exp Neurol 143:153–161

38.

Polkowski

Meijer

Baak

ten Kate

Obertop

Offerhaus

van Lanschot

(1997) Reproducibility of p53 and Ki-67 immunoquantitation in Barrett's esophagus. Anal Quant Cytol Histol 19:246–254

39.

Ratcliffe

Wells

Wheeler

Memoli

(1997) The combination of in situ hybridization and immunohistochemical analysis: an evaluation of HER-2/neu expression in paraffin embedded breast carcinomas and adjacent normal-appearing breast epithelium. Mod Pathol 10:1247–1252

40.

Ruifrok

(1997) Quantification of immunohistochemical staining by color translation and automated thresholding. Anal Quant Cytol Histol 19:107–113

41.

Ruifrok

Johnston

(2001) Quantification of histochemical staining by color deconvolution. Anal Quant Cytol Histol 23:291–299

42.

Schnitt

(2001) Breast cancer in the 21st century: neu opportunities and neu challenges. Mod Pathol 14:213–218

43.

Seidal

Balaton

Battifora

(2001) Interpretation and quantification of immunostains. Am J Surg Pathol 25:1204–1207

44.

Simone

Remaley

Charboneau

Petricion

Glickman

Emmert-Buck

Fleischer

. (2000) Sensitive immunoassay of tissue cell proteins by laser capture microdissection. Am J Pathol 156:445–452

45.

Smejkal

Shainoff

(1997) Enhanced digital imaging of diaminobenzidene-stained immunoblots. Biotechniques 22:462

46.

Underwood

Gibran

Muffley

Usui

Olerud

(2001) Color subtractive-computer-assisted image analysis for quantification of cutaneous nerves in a diabetic mouse model. J Histochem Cytochem 49:1285–1291

47.

van der Laak

Pahlplatz

Hanselaar

de Wilde

(2000) Hue-saturation-density (HSD) model for stain recognition in digital images from transmitted light microscopy. Cytometry 39:275–284

48.

Venter

Tuzi

Kumar

Gullick

(1987) Overexpression of the c-erB-2 oncoprotein in human breast carcinomas: immunohistological assessment correlates with gene amplification. Lancet 2:69–72

49.

Vilaplana

Lavaille

(1999) A method to quantify glial fibrillary acidic protein immunoreactivity on the suprachiasmatic nucleus. J Neurosci Methods 88:181–187

50.

Williams

Buscarini

Stein

(2001) Molecular markers for diagnosis, staging, and prognosis of bladder cancer. Oncology 15:1469–1470