Abstract
In certain cases, quantitative tissue structural data derived from tissue sections may be required to make critical decisions in the drug development or risk assessment process. Most frequently, these questions center on test article–related effects on cell number. In this opinion article, the limitations of estimating cell number by standard cell or nuclear profile counts from sections/blocks collected for routine histopathology are discussed from both a scientific and regulatory perspective and contrasted with the robust, sensitive, statistically based methods of design-based stereology. Specific existing industry practices are reviewed. Recent advances in stereological theory, software, hardware, and automated immunohistochemical staining now make it feasible to implement unbiased stereological methods to assess test article–related effects on cell number in a regulatory toxicology setting. These design-based stereological methods for counting cells are recommended when the quantification of small changes in cell number is critical to the risk assessment or decision-making process. These methods provide levels of sensitivity and statistical guarantees of accuracy that no other currently available tissue section–based methodology can provide.
Introduction
The human visual system is highly optimized for pattern recognition, but is a poor judge of spatial or density information.
Histopathology is a balance of interpretational art and objective knowledge dependent on visual pattern recognition to identify changes in tissue sections and knowledge of what those patterns mean with respect to tissue injury. Qualitative histopathology has become a core component of the database required for the overall, weight-of-evidence, risk assessment process that meets regulatory needs in the majority of cases. However, when quantitative morphological endpoints are needed, the human visual system has limited sensitivity to detect the subtle quantitative changes required for certain specific endpoints, especially with respect to cell number (Wanke 2002).
In 1966, Ewald Weibel, a pioneer in the field of stereology (Weibel et al. 1966) noted that “biological morphologists still shy away from quantitation of their results for the primary reason that direct measurements of structures on sections is very cumbersome and yields doubtful results.” While an exponential expansion of computer-based technologies and image analysis tools has made a wide variety of morphometric measurements 1 practical, in 2009, many morphometric measurements still yield doubtful results due to either the particular morphometric methods used and/or the technical implementation of those methods. In the past two decades, there have been revolutionary advances in modern morphometric methods known as design-based stereology. Basic principles of these methods are presented in a review article in this issue of Toxicologic Pathology (Boyce 2010). This opinion article will consider the scientific and regulatory consequences of the use of various quantitative approaches to estimate cell number and strive to provide convincing arguments that only design-based stereology is applicable—if the numbers really matter. Only design-based stereological methods provide the necessary sensitivity and the guarantee of accuracy 2 required for critical decisions based on quantitative tissue structural endpoints derived from tissue sections. Accuracy of the data is guaranteed by the statistical and mathematical basis of stereological methods. Accuracy is used here in its pure statistical sense and conveys that the average of a set of estimates will become arbitrarily closer to the true population mean with replication (i.e., repeated sampling and counting). The issues affecting the practical implementation of design-based stereology in the regulatory toxicology environment will be discussed, and the current positions of regulatory guidelines and scientific disciplines will be summarized, with respect to the generation of quantitative data.
The Consequences of Morphometric Methodological Decisions
Assumption-Based (Biased) Methods: Post Hoc Nonstereological Particle Counts
Qualitative histopathology is most efficient when it is based on standardized “optimum” tissue sections, typically through the center of the organ in question, and many morphometric needs are identified post hoc, based on changes in the patterns seen. Many nonstereological morphometric methods to estimate cell number based on nuclear or cell profile counts have been applied to the optimum tissue sections routinely provided to the toxicologic pathologist for post hoc quantification. These methods may seem practical, timely, and cost-effective but are highly problematic because the optimum tissue sections are highly biased, nonrandom tissue samples, and profile counts in a section provide information about profile counts only in that two-dimensional (2D) section; unfortunately, these counts have no known mathematical relationship to cell number in the three-dimensional (3D) tissue.
Consequently, Because all statistical methods for hypothesis testing presume random sampling, a statistical between-group comparison is not a sound statistical basis for conclusions regarding cell number. Because a microtome sectioning plane samples cells proportional to particle size—or more specifically, height—and not proportional to cell number, the numbers of profiles in a section has no mathematical relationship to actual numbers of cells in 3D tissue space in an organ, and thus, the profile counts in a section are not valid estimates of particle number. Because of these sampling biases, one is assuming that particle size, shape, and distribution are uniform and homogenous in controls and test article–treated animals. It cannot be assumed that these biases are equal and will cancel out by comparing controls with test article–treated animals because test article treatment may alter particle size, shape, and distribution and the way the tissue responds to processing. The impact, in both magnitude and direction, of these biases or assumptions on estimates is never known and cannot be measured. Hence, accuracy cannot be guaranteed; only unbiased or assumption-free methods can guarantee statistical accuracy. Increasing estimate precision by counting more will not make estimates accurate if generated using methods with these assumptions or biases; they become only more precisely inaccurate. Because the reference space (e.g., organ volume) may not have been defined prospectively, data are typically reported as densities (e.g., profiles per field or per section area) or ratios (e.g., labeling indices). Furthermore, because treatment may change the reference space (i.e., organ volume) and processing-induced tissue shrinkage may be different in controls and test article–treated animals, changes in density or ratios may be misleading and not equate with changes in total particle number per organ. The critical consequence of a biased or assumption-based method is an uncontrolled risk for both false-positive or false-negative outcomes.
Design-Based (Unbiased) Stereological Methods
Design-based stereology is a statistical methodology that can provide estimates of total volume, surface area, length, or cell number in an organ, making no assumptions about the structures in an organ. Strict adherence to the principles of stereology guarantees accuracy (i.e., the mean of the estimates comes closer and closer to the true value with replication). Accurate estimation of particle numbers from tissue sections became possible only in 1984 with the invention of the physical disector (Sterio 1984). The physical disector consists of a pair of thin sections separated by a known distance; for cell number estimation, these are typically serial sections. The area of the section multiplied by the disector height (the distance separating the paired sections or section thickness for serial sections) constitutes a volume of tissue. Because number is 0-dimensional, cells can only be sampled proportional to their number by sampling within a volume of tissue. Within this volume, only profiles appearing in one section and absent in the other are counted, ensuring that cells are counted only once and independently of their height or size. When combined with an estimator known as the fractionator, where cells are counted in a known fraction of the organ, thin paraffin sections can be used. These are amenable to automated staining and long-term archiving. The latest versions of stereology software make extensive use of scanned-in digital images with automated, highly precise alignment of thin-section image pairs and automated in silico sampling of the regions to be counted. Counts for each animal typically can be collected in generally 1 hour or less. Because each step of the electronic process can be documented by image archival with the associated electronic data files, and the thin sections can be archived as raw data, the entire analytical process can be reconstructed for auditing or verification purposes.
Prior to the development of software to align high-magnification fields in physical disector sections, the optical disector was developed to address the alignment issue. Cells are counted within a known volume of a thick tissue section as they appear in focal planes created by focusing through the section using high-numerical-aperture objectives or a confocal microscope. Although the optical disector is a robust tool, there are many technical issues that may limit utility of the optical disector in the regulatory toxicology setting. These include typically manual time-consuming immunohistochemical staining protocols and nonuniform z-axis shrinkage of the thick section, which can affect cell number estimates, requiring mathematical correction. This shrinkage can continue over time after coverslipping. Because a minimal section thickness of 25 µm is required at the time of counting to avoid biases related to artifacts at the section plane and to ensure an adequate disector height, prompt counting shortly after sectioning/staining may be required. The reporting of data captured from sections of inadequate thickness is not uncommon with this method, and because of the potential biases, these data must be critically reviewed.
The consequences of selecting design-based stereology include the following: Because these are statistical methods, they cannot be validated by animal studies; they are validated via mathematical proofs (Student’s t-test is not validated by animal studies!). However, validation of the technical aspects of implementing the stereological design (e.g., staining protocols and microtome calibration) is required because variations in technical quality can introduce bias in the estimate. Because the design of tissue-sampling protocols is critical and the entire organ must be available for sampling, prospective designs must be in place prior to necropsy and tissue trimming. This does not mean whole-body perfusions are required. Sampling of the fresh organ can occur at the time of necropsy after organ weights are collected to obtain a stereologically valid sample of the organ. New fast fractionator sampling methods facilitate and expedite this process (Mirabile et al. 2010). Because these methods are assumption free or unbiased and modern systematic sampling routines are highly efficient with respect to reducing variance, these methods are both accurate and precise. Because 8 to 12 high-quality pairs of thin slides must be collected using a form of random sampling from each organ for estimating cell number using the physical disector, adequate technical resources are required. Notably, sufficient numbers of skilled histotechnologists are needed to produce high-quality sections in a timely manner, and high throughput requires capital investments including automated stainers and motorized microtomes, stereological software, and whole-slide imaging capabilities.
Do the Methods Matter?
Direct comparisons of estimates derived using assumption-based and stereological methods are not common; however, Mendis-Handagama and Ewing (1990) conducted an elegant series of method comparisons using a model of testicular atrophy in rats and hamsters to compare estimates of Leydig cell number in control and atrophic testes. The methods included design-based stereology (the physical disector method with no assumptions) and assumption-based methods, which assumed (1) tissue fixation/processing was similar in all treatment groups, (2) Leydig cell nuclei were uniform spheres in both control and treated animals, and (3) treatment did not affect overall organ volume. Key findings that invalidated the assumptions included smaller testes volume in the atrophic testes (as one would expect), nonuniform or irregular Leydig cell nuclear shapes and size in atrophic testes, and differing degrees of fixation/processing-related tissue shrinkage in control and treated animals. The impact of these findings on the Leydig cell counts is graphically illustrated in Figure 1 and includes the following: In control rat and hamster testes, where there was minimal tissue shrinkage and uniform nuclear shape, the design- and assumption-based methods gave similar results because all assumptions turned out to be reasonably valid. The assumption-based method found a marked increase in Leydig cell density (number per cm3) in the atrophic testes. When corrected for the effects of tissue shrinkage, and when assumptions about nuclear shape were removed by using design-based stereology, cell density substantially decreased, indicating that shrinkage was greater in treated atrophic testes and the assumption that Leydig cells were uniform spheres in atrophic testes was erroneous. The design-based disector method found a marked decrease in the total number of Leydig cells in atrophied compared with control testes.
These data illustrate the “reference trap,” where reporting particle densities might give the impression of an increase in cell numbers when in fact significant decreases occurred. These data also illustrate the fallacy of assuming that one can control for the effects of tissue shrinkage and other technical issues by treating control and treated animal tissues in an identical manner.

Graphical presentation of figure 3 from Mendis-Handagama and Ewing (1990) summarizing the data for control and test article–treated testes (testosterone and 17β estradiol–induced atrophy) in hamsters (solid circles) and rats (open circles). Graph 1 shows data generated under three assumptions (similar tissue shrinkage in all groups, Leydig cell nuclei were uniform spheres, and no change in organ volume); the data imply test article treatment increased Leydig cell numerical density in both hamsters and rats. Graph 2 is the same data corrected for shrinkage, demonstrating a comparative reduction in density estimates due to substantially more shrinkage in test article–treated testes in both species. Graph 3 illustrates the data with removal of the assumption that Leydig cells are uniform spheres by estimating cell density using the unbiased disector principle. The data now indicate that the assumption of uniform spheres holds for control testes from both species because two-dimensional (2D; graphs 1 and 2) and three-dimensional (3D; graph 3) estimates are similar; however, test article treatment induces variability in Leydig cell size, resulting in overestimation of density in treated testes by 2D profile counts. Graphs 4 and 5 illustrate the effect of removing the assumption that test article treatment did not alter organ volume by multiplying 2D and 3D density estimates, respectively, by organ volume to obtain total Leydig cell number. As with the Leydig cell density estimates, data in graphs 4 and 5 demonstrate that the total Leydig cell number in test article–treated testes is overestimated by 2D profile counts. Graph 5 presents the accurate data, which indicates decreased total Leydig cell numbers in test article–treated testes compared with controls, the anticipated effect of testosterone and 17β estradiol treatment.
Regulatory Considerations
Basic compliance with the principles of Good Laboratory Practice (GLP) is a reasonable expectation. In the United States Code of Federal Regulations, Title 21, Food and Drug, section 58.120.a.6, for example, the Nonclinical Laboratory Study (NLS) Protocol is expected to include “a description of the experimental design, including the methods for the control of bias.” Furthermore, section 58.185.a.3 states that the NLS Report shall include the “statistical methods employed for analyzing the data.” To align with these regulations, the protocol description of a post hoc nonstereological cell count might have to read, “Cell profiles were counted on nonrandom routine tissues sections, with no control of bias,” and the report might best read, “The cell counts were analyzed by analysis of variance, noting that the expectation of random sampling was not met.”
In addition to the guarantee of statistical accuracy, increased sensitivity and precision are additional benefits of design-based stereology. Stereological methods are sensitive and can detect small changes because these methods estimate the true geometric structural feature, not a biased and noisy surrogate 2D parameter. The statistical sampling principles reduce variance, thus increasing the precision or reproducibly of the estimates. Critical questions, such as, “Is there a small but significant loss of neurons?” for example, cannot be answered by either the bench pathologist or by a surrogate 2D parameter. Since small changes in neuron numbers are significant, only stereological approaches can provide the necessary sensitivity to detect such changes. Therefore, a reasonable regulatory expectation for critical numerical data would be the following: When small changes in a cell number are critical to the risk assessment process, decisions should be based on design-based stereological data.
The validation of design-based stereology software systems in the GLP environment should be in compliance with appropriate internal standard operating procedures for computerized systems, including documented system installation and verification of function by the vendor and the generation of installation-, operational-, and/or performance-qualification documentation, as appropriate. A pair of disector sections, scanned images of those sections, and an appropriate sampling protocol for cells in those sections should be archived along with data collected from several replicates of the sampling protocol. Periodic, routine revalidation of the system is recommended.
Current Scientific and Regulatory Guidelines
Several scientific and regulatory guidelines affecting the generation of morphometric data are available. Major journals in the neurosciences and nephrology have issued policy statements promoting the use of stereological methods to generate quantitative data (West and Coleman 1996; Saper 1996; Madsen 1999). An official research policy from the leading U.S. and European respiratory societies defining design-based stereology as the preferred method for lung quantitation has been recently published (Hsia et al. 2010).
With respect to regulatory guidelines, the OECD Guideline for the Testing of Chemicals, Draft (2003) Proposal for a New Guideline 426: Developmental Neurotoxicity Study states in paragraph 42, “Stereology may be used to identify treatment-related effects on parameters such as volume or cell number for specific neuroanatomic regions.” A Society of Toxicologic Pathology “Best Practices” publication for neuropathologic assessment in developmental neurotoxicity testing includes an excellent discussion of current practices with respect to morphometry (Bolon et al. 2006). This article included a discussion of the 2D or linear morphometric measurements used in these studies and briefly addresses stereological methods, stating that “the potentially greater sensitivity of stereology only applies to those situations in which neural cell numbers are affected.”
With respect to assumption-based methodologies to estimate cell number, various labeling indices (LI) are perhaps the most prevalent examples in the regulatory environment. The industry has devoted significant resources to the refinement of these methodologies leading to, for example, the RITA-CEPA recommendations for standardized assessment of cell proliferation incorporating BrdU staining protocols (Nolte et al. 2005). While highly standardized and reproducible, these methods are encumbered with the risk of sampling- and assumption-driven biases as discussed previously. These include potential bias introduced by test article–related effects on cell or nuclear size or, more specifically, height relative to the plane of section. Unfortunately, with a change in the rates of cell death or cell division, changes in cell/nuclear size would not be unexpected. Where the magnitude of the between-group differences in LI is small, there is no way to differentiate effects on size/height from effects on rates of cell division. If accurate LI estimates are critical for risk assessment, LI data as currently generated should be interpreted with caution.
Ovarian primordial follicle counts are another example of assumption-based morphometry. Current best practice (Regan et al. 2005) is to count all primordial follicle profiles in five sections of each ovary cut from the central one-third of each organ. This approach assumes uniform distribution and size of primordial follicles across all groups. If test article effected a small reduction in follicle size or caused metabolic effects that exaggerated processing-induced shrinkage, this could be manifested as reduced number with this 2D method. Here, designed-based stereological protocols could be easily implemented since two-thirds of the ovary is already being sectioned. The critical question is, “Do accurate ovarian follicle counts really matter?” in the risk assessment of a given test article.
The Future Is Now
Clearly, the choices of morphometric methods do matter because they can greatly affect the veracity of the data and thus, as pointed out by Gundersen and coworkers (1988) with respect to pathology, “Unbiased quantitative data may mean the difference between ‘interesting observations' and real knowledge.”
Recent theoretical and technical advances now make it feasible to use stereological methods to estimate test article–related effects on cell number in a regulatory toxicology setting. However, stereological methods should be used judiciously and selectively in this setting. In most cases, the current paradigm for hazard identification and NOAEL/NOEL setting successfully meets all needs. However, in a few cases, quantitative data on cell number derived from histological sections may be needed in the decision-making process. If test article–related effects are sufficiently large (e.g., a three- to four-fold increase in BrdU labeling index), post hoc profile counting as previously described may have adequate sensitivity to detect a test article–related directional signal; this may meet development needs if knowledge of the true magnitude of change is not critical. However, when changes in cell number are small (e.g., a 40% increase in total number of BrdU-labeled cells), the inherent biases in post hoc morphometric approaches may lead to conclusions directionally opposite from the truth, and the truth—the true population mean—is never known. When test article–related effects on cell numbers may be small and cause potentially adverse, unmonitorable, and/or irreversible injury, and when detection of changes in cell number from histological sections is critical in the risk assessment process to limit human exposure, design-based stereological methods should be the morphometric method of choice. Adherence to stereological principles with strict attention to the technical details of their implementation provide the necessary sensitivity to detect small changes and ensure the statistical accuracy of the data—the mean of the estimates gets closer and closer to the true population mean with replication—thus providing a firm scientific basis on which to base decisions.
