Abstract
Method validation is a cornerstone on which biomarker development and utilization rest. However, given the abundance of biomarker candidates that are being identified and characterized, validation of these entities for the use in nonclinical studies can be complex. The objective of this continuing education course was to review current practices and challenges encountered during the validation of methods for the analysis of novel biomarkers. Additionally, the importance of biological validation and correlation with pathology end points for biomarker candidates was discussed. This article is a summary of the materials presented at the 36th Annual Symposium of the Society of Toxicologic Pathology for a continuing education course titled “Current Practices and Challenges in Method Validation.” The speakers were subject-matter experts in the validation of quantitative mass spectrometry, multiplex binding assays, biological biomarkers, and immunophenotyping and anatomic and clinical pathology considerations in biomarker qualification.
Keywords
Introduction
A continuing education workshop presented at the 36th Annual Symposium of the Society of Toxicologic Pathology addressed current practices, problems, and future directions of method validation. It focused on nontraditional instrumentations for clinical pathology evaluation. Quantitative mass spectrometry, an integral part in a growing number of clinical pathology laboratories, was presented as it can be an unfamiliar modality to some but has a myriad of applications in preclinical development. Mutiplex ligand-binding assays were also presented, as they are being used to support many stages of drug development, but their complexity creates unique challenges compared to single-plexed assays and necessitates various adjustments for validation and sample analysis. Pitfalls and challenges encountered with the use of commercial immunoassay kits or the use of in vitro diagnostic kits versus research kits were also presented. Such issues were discussed in short case presentations by several speakers. The most recent bioanalytical method validation guidance and the statistical rigor necessary for a good validation were reviewed, as well as the role of pathologists for biomarker qualification.
Challenges in Quantitative Mass Spectrometry Assay Validations
Jennifer L. Colangelo
The application of liquid chromatography–mass spectrometry (LC-MS) assays in the clinical pathology laboratory is increasing, including their application in drug development studies for toxicology assessments. In the clinic, the mass spectrometry platform has become the gold standard for some tests, such as vitamin D, triiodothyronine (T3), and thyroxine (T4), and is widely used for others, including newborn screening tests, drugs of abuse, testosterone, and other steroids (Wu and French 2013). Assays developed for human samples can be modified for application in preclinical species, enabling the translation of data between preclinical models and humans for safer drug development. As these assays become more widely used in drug development toxicology evaluations, robust assay validations are important to enable high-quality data generation and interpretation. The caveats of validations for mass spectrometry are very similar to the validations performed on other bioanalytical platforms, yet some unique considerations should be taken into account.
LC-MS Validation Parameters
The experimental flow for various quantitative assays developed on LC-MS platforms is very similar in general. First, an internal standard is added to the sample. Typically, this standard is a stable labeled version of the analyte. Second, the sample is processed, so that it is compatible with the separation and detection methods. This preparation can be a simple dilution of the sample, a protein precipitation of the sample, or a complex combination of extractions. Many options exist, and the optimum choices are dependent on the physical and chemical properties of the analyte as well as the biological sample being assessed. Finally, the sample is injected into the LC system where the analytes are separated from other endogenous molecules. The MS monitors the effluent for the intact ion and a fragment ion unique to the analyte of interest based on their retention time, providing selectivity for the analyte of interest.
The requirements for the validation of an LC-MS assay are very similar to other bioanalytical assays (Moein, El Beqqali, and Abdel-Rehim 2017). A calibration curve is constructed from standards at various concentrations to cover the range needed for the assay. Quality control (QC) standards at four concentrations over the calibration range are used to evaluate accuracy, intra-assay precision, and inter-assay precision. Multiple runs with replicates are evaluated over nonconsecutive days. The validation typically includes an evaluation of recovery of the analyte from the sample preparation, as well as a determination of whether or not carryover occurs in the process. Stability of the analyte should also be evaluated, as well as any other properties of the molecule that may affect analysis, such as light sensitivity.
LC-MS assays have the capability to quantitate one analyte or multiple analytes (Kruve et al. 2015a, 2015b). For example, individual bile acids, steroids, or multiple forms of vitamin D can be multiplexed with LC-MS assays. Assessments can be conducted with standards independently, or a mixture of standards can be used. The accuracy and precision of the different analytes may vary due to physical and chemical differences. Often, as the number of analytes increases, favorable accuracy and precision values are more difficult to obtain.
Challenges with LC-MS Validations
Although many of the challenges with LC-MS validations are similar to those of other bioanalytical platforms, this platform has some unique considerations, one of which is ion suppression or enhancement. On occasion, the ionization of the analyte in the mass spectrometer is affected by a substance in the matrix (Annesley 2003; Van Eeckhaut et al. 2009). This effect is often due to the affinity of the analyte and the substance to ions in the matrix. Ion suppression and enhancement are not common occurrences. When they do occur, quantitation becomes difficult, and sometimes compensation for the effect cannot be performed.
Defining the lower limit of quantitation (LLOQ) is another challenge for the MS platform (Kruve et al. 2015a, 2015b). Most of the guidelines for assay validation provide an acceptable signal to noise ratio to determine the LLOQ for the analyte. However, laboratory practices vary in the identification of the region of noise and the calculation of this signal, which complicate data comparisons between laboratories. Having a consistent, defined method is beneficial.
Comparisons of data between laboratories and between platforms can be challenging, and this is true for LC-MS applications. When data from an LC-MS platform are compared to other bioanalytical platforms, often differences in the concentrations reported are observed and can be a result of many factors, including, but not limited to, antibody cross-reactivity, interferences, and sample integrity. Identifying the source of bias can be challenging, yet calibrated reference standards are one tool that can aid in better platform comparisons (Bower et al. 2014). The number of certified standards available commercially is limited, and data from LC-MS platforms are even more limited. As the availability of certified standards for LC-MS applications expands, confidence in the accuracy of measurements can be increased, and higher-quality assessments between laboratories and platforms will be enabled for those analytes.
Properly validated LC-MS assays are necessary for the successful evaluation and integration of results from drug development toxicology studies. Considerations for the validation of LC-MS methods are very similar to those for other bioanalytical methods. The entire validation must demonstrate that the assay is suitable for its intended application to provide confidence in the data generated. As with any platform, understanding the caveats of MS methods is important to ensure the validation encompasses critical elements. As LC-MS assays are deployed in toxicology studies, robust validations will be necessary to ensure data integrity and regulatory acceptance.
Challenges in Multiplex Binding Assays
Simon Lavallée
The Luminex™ technology (Luminex, TX, USA) performs bead-based assays. The beads are polystyrene microspheres (approximately 5.6 µm in size) or magnetic beads. Each bead can be conjugated with a different capture molecule. That molecule can be an antibody, a peptide, or a receptor depending on what is to be tested. For the detection of cytokines, antibodies are mostly conjugated on the surface of the beads. For example, an anti-IL-2 antibody will be conjugated with a bead to detect IL-2. Once captured, the analytes are identified by detection antibodies labeled with biotin and streptavidin. Each bead is internally dyed with a mixture of two spectrally distinct fluorophores: red dye between 600 and 780 nm wavelength, and an infrared dye above 780 nm wavelength. The ratio between the two dyes varies from bead to bead and gives each bead a unique spectral signature, which is called a region. Luminex created 100 different spectral addresses for each bead using the ratio of red and infrared dye, which allows the identification of each bead by the instrument. Each spectral address is called a region. Therefore, the instrument has the potential to multiplex 100 different analytes.
The Mesocale™ (Mesoscale Discovery, MD, USA) performs multiarray assays with an electrochemiluminescence method. Capture antibodies are attached to the bottom of the microplates by high-binding carbon electrodes. Once captured, the analytes are detected by antibodies labeled with electrochemiluminescent Sulfo-Tag labels. An electric current is applied to the electrode placed underneath the plate. Voltage penetrates a few microns at the surface of each well. Only the electrochemiluminescent label of the detection antibodies bound to the antigen complex at the bottom of the plate is subjected to the voltage, and emits light. Assays are available in single-spot or multi-spot plates of up to 10 spots for a 96-well plate.
For both platforms, it is possible to develop a customized assay. The quality of the reagents is the determining factor of the performance of the assay. Both platform vendors have good practices on lot qualification, but variability of kits, reagents, and instrumentation can be encountered.
Advantages and Challenges of Multiplex Assays
The advantages of the multiplex assays are the very small sample volumes required for analysis (25 µl/well for the Luminex instead of 100 µl/well for a regular enzyme-linked immunosorbent assay [ELISA]). The Luminex methodology can be combined with the Curiox technology (Curiox Droparray™), which allows the miniaturization of the assays and requires 60% to 80% less reagents and decreases the sample volume to 5 to 20 µl per well. These platforms reduce the cost and time of analysis considerably when compared to single immunoassays. Up to 100 analytes can be assessed simultaneously on the Luminex platform.
The following are unique challenges when developing and validating multiplex assays: The data generated by a single plate of a multiplexed assay are equivalent to multiple ELISA plates, and data analysis is more complex and takes longer to complete; Validation of several parameters at once may not be optimal for all analytes assessed in the same panel. A more extensive validation may be required for some analytes; Due to the complexity of reagents, kit lot-to-lot variability is often observed; The reading time is long, which may cause a drift effect. For example, the reading time may be up to 45 min for the Luminex as opposed to a few seconds when using a spectrophotometer; and The use of beads in the assays may cause technical issues. For example, when using polystyrene beads, filter plates may be blocked by beads during the washing steps and magnetic beads may also aggregate during an assay.
Regulatory Considerations for the Validation of Biomarker Assays
Bioanalytical method validation guidance documents from the U.S. Food and Drug Administration (FDA 2013) and the European Medicines Agency (EMA 2011) do not provide specific guidelines for the validation of ligand-binding assays in nonclinical studies. However, there are several key aspects of assay validation that need to be addressed when biomarker data are used to support a regulatory action, such as the pivotal determination of safety and/or effectiveness to support labeled dosing instructions for any given drug. Method validation for biomarker assays should address the same questions as method validation for pharmacokinetic (PK) assays. The accuracy, precision, selectivity, range, reproducibility, and stability of a biomarker assay are important characteristics that define the method. Hence, the approach used for PK assays should be the starting point for the validation of biomarker assays. However, it is recognized that some characteristics may not apply or that different considerations may need to be addressed in the validation of biomarkers. During an American Association of Pharmaceutical Scientists (AAPS) Crystal City VI meeting held in September 2015, the validation process for biomarker assays as outlined in the 2013 FDA guidance document (FDA 2013) was discussed. Specifically, the differences between biomarker assays and PK assays were examined, and it was agreed that the guidelines developed for PK assays in the FDA guidance document may not be directly applied to biomarker assays. A consensus was achieved on the importance of validation parameters assessed using endogenous biomarkers in matrices, with an emphasis on the importance of demonstrating parallelism (Lowes and Ackermann 2016). Also, diagnostic kit validation data provided by the manufacturer may not ensure reliability of the kit method for drug development purposes. The performance of diagnostic kits should be assessed in the facility conducting the sample analysis.
Validation of Multiplexed Assays in Support of Preclinical Studies
Evaluated parameters for the validation of multiplex assays include selectivity, linearity of dilution/parallelism and prozone effect (to access specificity), range of response, intra- and inter-assay precision, and (relative) accuracy. Stability should also be evaluated at room temperature, 4°C, freeze and thaw cycles, and storage at −20/−80°C (Lee et al. 2006). Additional considerations and potential sources of challenges in the validation of multiplex biomarker assays, including premixed reagents, verification of vendor’s claim of performance, matrix-based QC materials, generating species-specific endogenous samples, and the importance of parallelism, are discussed below.
The reference material and calibrators, coated plates, conjugated beads, and detection antibodies are often premixed, thus limiting the opportunities to adjust the assay range for individual analytes. Reagent volumes are restricted and are often only in a ready-to-use format, making it difficult to prepare required validation samples or modify the concentrations for both testing and optimizing assay conditions. However, most vendors will have the flexibility to provide additional reagents for testing. Moreover, lot-to-lot variability makes it difficult to stay within the validated assay range across multiple lots of kits.
It is important to always verify the information provided by the vendor. Based on the requirements of the assay, the researcher’s expectations of performance may differ from those of the vendor. Individual analytes may have uneven levels of performance, dynamic range, and sensitivity, which may lead to issues with QC placement. Reference materials from different kits may also give varying values, making comparison of absolute concentration difficult.
Challenges can be encountered when using matrix QC samples, as reference material prepared in matrix may not behave the same way as when prepared with the kit’s diluent. In some cases, preparing the calibration in diluted matrix will ensure adequate performance of the QC samples prepared in matrix, but may affect the assay’s dynamic range and sensitivity. Some QC samples may have measurable matrix endogenous concentrations, precluding the use of matrix for the preparation of the calibration curve. It is possible that different lots of matrices need to be screened before a lot with no detectable endogenous concentrations can be identified. Matrices may also be treated to remove the endogenous biomarker.
In vitro/in vivo stimulation experiments may be beneficial in assay validation. Specifically, generation of samples with endogenous analytes may be used for testing for adequate interspecies cross-reactivity. Examples of situations where stimulation experiments can be helpful include using a human-specific kit for the measurement of nonhuman primate biomarkers or evaluation of the sensitivity of the method. Samples from in vitro/in vivo stimulation experiments may also be used as endogenous biomarker controls as necessary to monitor lot-to-lot variability.
It is essential to evaluate parallelism at an early stage of development. Parallelism demonstrates that the endogenous biomarker concentration is consistent with the calibration curve. Optimizing the minimal required dilution, the minimum sample dilution providing optimal accuracy and precision, has an effect on sensitivity and therefore needs to be examined as a part of parallelism.
Issues caused by lot-to-lot variability include the quality of recombinant proteins, coupling procedures for capture and critical reagent, and general lack of information on the identity and characterization of the critical reagent. Securing a large number of kits to cover validation and sample analysis, batch analysis, and bridging between different kits by using matrix-based endogenous QC material can help to address the issue of lot-to-lot variability.
The optimization of analytical conditions for each analyte in a multiplex assay often requires adjustments in assay performance and may impact the assay’s capability to measure biologically relevant changes. These adjustments are necessary due to the high number of individual components in multiplexed assays. Validation of multiplexed assays may therefore require several iterations before the validation is completed. Lot-to-lot variation will often result in significant changes in assay range. Therefore, it is necessary to have processes in place to ensure that performance of new lot is still within the conditions of the previously validated assay.
Strategies and Pitfalls in Biological Biomarker Validations
Adam D. Aulbach
With the introduction of a myriad of commercially available ligand-binding biomarker kits into nonclinical research settings over the last decade, the importance of sound assay validation practices cannot be overstated. Although the guidance on fit-for-purpose method validation of biomarker assays has been presented in the literature (Lee et al. 2005; Lee et al. 2006; FDA 2013), these documents largely focus on traditional aspects of analytical validation including intra-/inter-assay precision and accuracy, dynamic range, dilutional linearity/parallelism, sensitivity/specificity, storage and stability, and robustness. Of equal importance as analytical validation is the ability of the method to acceptably identify, measure, or predict the concept of interest within a biological test system. This process has several synonyms which include biological validation, species qualification or just qualification, proof-of-concept, in vivo qualification, clinical validation, or PK/pharmacodynamic (PD) modeling, among others. Biological validation experiments often follow the verification of acceptable analytical performance and should be included as part of the overall validation package for ligand-binding biomarker assays.
Components of Biological Validations
At the foundation of any biological validation is the generation of positive control samples. That is, samples that have been influenced through experimental manipulation to contain either increased or decreased concentrations of the analyte of interest in order to demonstrate proof-of-concept within a biological test system. These samples can be created from animal test systems, or may be produced in vitro, so long as they are in the appropriate species matrix and are predictably and discernibly identifiable relative to unmanipulated or baseline reference samples. Strategies can range from administration of a well-characterized toxin or environmental challenge to an animal test system, to spiking a pro-inflammatory compound into whole blood. The sophistication and complexity of any given biological validation should be balanced between the goals of the validation, feasibility, animal welfare, and cost/resource availability.
A core component of most biological validations is the correlation of biomarker findings with established gold standards. Comparing biomarker data to results from well-characterized modalities like histopathology, clinical pathology (e.g., hematology and clinical chemistry), PK data, and in-life findings is critical in demonstrating the utility of an assay in preclinical settings. This facet of a biological validation helps translate data from a relatively unknown or unproven biomarker assay to more familiar and traditional toxicology end points. Creative experimental designs can also be used to determine other aspects of the assay including timing and kinetics of biomarker responses, sensitivity and specificity relative to gold standards, and generation of important historical control data for interpretive purposes.
In Vivo Animal Studies
Utilizing live animal test systems is often the most straightforward, rewarding, and “data-rich” approach for biological validations. This method typically employs a basic toxicity-to-effect approach and uses well-characterized compounds and animal models with sound literature support. It often allows collections at multiple time points to assess biomarker kinetics, half-life, and reversibility, as well as assessment of related end points to inform on predictability and sensitivity, ultimately resulting in a more robust and well-characterized evaluation. When using test compounds, it is important to be mindful of appropriate compound selection as not all toxins will affect an organ system uniformly (e.g., proximal tubule vs. collecting duct injury). This aspect may be important if working with a biomarker that has specificity to a particular anatomic location or region within an organ, for example, kidney injury molecule-1 (proximal tubule) versus renal papillary antigen-1 (renal papilla/collecting duct; Bonventre et al. 2010). Besides compound/toxin administration, other in vivo strategies may include surgical (e.g., myocardial infarction and ovariohysterectomy) or environmental (e.g., altering light cycles and cage movement) manipulations.
Although there are many benefits to using animal models for biological biomarker validations, there are several drawbacks. In vivo animal studies are significantly more expensive (tens to hundreds of thousands U.S. dollars) than any other form of biological validation, depending on the scope of the project. This is due in large part to animal- and husbandry-related costs. Lastly, using animals, and potentially animal life, to simply help validate a biomarker assay should not come without serious consideration. Animal welfare and the 3 Rs (Replace, Reduce, and Refine) should be at the forefront of study design for any in vivo biological biomarker validation using animals.
Ex Vivo/In Vitro Studies
As opposed to in vivo animal studies, ex vivo and in vitro studies do not generally require the termination of animals, although the necessary biological samples (e.g., blood, urine, tissue) are collected from animals. By nature, in vitro biological validations are typically less expensive than their in vivo counterparts but also generally yield a less robust data set (i.e., may only answer a single question, did you see the anticipated response?) and may not be completely biologically applicable. The latter notion referring to the fact that the biological response being measured is occurring outside of the applicable animal or human test system and hence may lack the influence of all relevant factors/cells/proteins, and so on. In addition to the usual blood or urine samples used in these types of experiments, filtered tissue homogenates may be a viable option, especially if one has access to excess tissues from a necropsy laboratory. Examples of in vitro biological validations include spiking a stimulant into whole blood or leukocyte isolates (e.g., cytokine release assays and platelet activation by flow cytometry) and using tissue homogenates to determine specificity of muscle injury assays (e.g., cardiac troponins and creatine kinase-MB).
Linearity Kits and Reference Materials
Linearity kits are commercially available collections of reference materials that are used to verify linearity, calibration, and/or dynamic range of an assay or for the purposes of troubleshooting improper assay performance. Linearity kits include a material that contains a known concentration of the analyte of interest and may be distributed as a single vial or multiple levels/concentrations. Because linearity kits contain a known concentration of a biomarker, they can be used in a variety of ways to establish “proof-of-concept” for biomarker validations. The actual components of these products are rarely native species-specific proteins, will differ on a case basis, and may not always be clearly described (e.g., “…animal serum to which ‘substances’ have been added…”), hence they should be used with that consideration in mind. Additionally, species-specific linearity kits and reference materials are not always available for all biomarkers, especially for obscure and novel analytes. However, reference materials can be a low-cost option to independently verify the ability of a biomarker assay to measure the analyte of interest using materials separate from those included by the manufacturer.
Other Strategies
Other “low-tech” strategies to help verify the performance of a biomarker assay include tracking analytes over time in naturally occurring phases (e.g., juvenile growth and maturation, diurnal/seasonal rhythm, and reproductive cycles), comparing baseline biomarker values to published literature (granted the methods are the same), and sending aliquots to outside reference laboratories for comparison analysis. These approaches are low cost and can often serve the purpose of instilling some degree of faith that the method has reasonable utility.
The primary goal of a biological validation is to demonstrate that the assay “works” and that it measures the analyte in the context in which it is being used. It is critical to compare biomarker results with those from established gold standards and related end points to formulate a relationship between known and unknown assessment modalities. Design biological validation experiments to maximize the information gained from the work, while minimizing resources and animal use. Biological validations can be expensive, but they pale in comparison to the incurred costs of making incorrect conclusions about the safety of a test molecule. At a minimum, consider sending sample aliquots to a reference laboratory to inform the utility of the assay.
Validation Challenges in Flow Cytometry
Marie-Soleil Piché
Flow cytometry is a laser-based technology that analyzes multiple characteristics of a single particle (usually cells). It allows multiparametric analysis of thousands of particles per second and helps to adequately identify or functionally characterize complex cell populations of interest. It is often used in basic research, discovery, preclinical, and clinical trials. With the increasing proportion of biologics in the pipeline, flow cytometry has proven itself to be an indispensable tool in many cases to assess safety, receptor occupancy (RO), or PD.
During the preclinical phase of the development of new drugs, flow cytometry has routinely been used for assessing the immunotoxic effects of a candidate drug by evaluating the immunophenotype of various cell populations in whole blood, tissues, or other matrices. In addition, RO is commonly included in preclinical programs to evaluate the binding of the drug to target cells. Flow cytometry can also be used to assess PD markers of interest for further elucidation of on- or off-target effects. In a clinical setting, flow cytometry can be used for safety, RO, and PD assessments, but it can also be used for diagnostic purposes. Therefore, it is a great tool to use during drug development. However, flow-based methods are challenging to develop and validate because they involve a cellular measure, which is variable, and there is often a lack of reference material available. The reagents used are complex and sometimes unstable, especially when tandem dyes are being used. However, reagents have greatly improved and this has become less of an issue in the last years. Flow cytometry assays can be used for multiple different purposes, and it is important to know up front what the flow assay will be used for in order to conduct the appropriate validation to support good laboratory practices (GLP) studies.
There are currently no guidelines for the validation of flow cytometry methods in the context of preclinical studies. Some initiatives have been taken by the AAPS flow cytometry steering committee as well as by the International Council for Standardization of Haematology/International Clinical Cytometry Society committee in the writing of guidance documents describing flow cytometry method validation (Barnard 2012; Cunliffe et al. 2009; O’Hara et al. 2011; Wood et al. 2013; Wu, Patti-Diaz, and Hill 2010). These recommendations have not yet been integrated in an official document released by the regulatory agencies as has been done for other analytical methodologies. The most standard parameters performed during the validation of immunophenotyping panels with abundant cell surface markers include antibody titration, interactions between antibodies, precision (intra-assay/inter-assay/inter-analyst), day-to-day variability, specificity, stability, and establishment of a reference range. However, a fit-for-purpose approach (Lee et al. 2006) should always be taken when validating flow cytometry method. The design and depth of validation required for a specific flow cytometry–based assay will mainly be driven by the purpose of the assay and by the future use of the data generated with the assay.
Each laboratory has slightly different approaches for the validation of flow cytometry methods. However, the parameters tested are generally dealt with in a similar manner. Below is a short description of how these parameters can be tested.
Titration of antibodies: the optimal antibody titers are evaluated by comparing the staining intensity of positive cells and negative cells, which is represented by the signal over noise ratio. Each antibody is titrated and the ratio is calculated. The optimal titer has the clearest separation between the negative and positive populations, while keeping the shift of the negative population (background signal) to a minimum. Precision of the assay: samples from at least 3 donors are processed at least five times within the same experiment. Then, the same samples are reprocessed at least 3 times by different analysts. This enables the determination of the intra-precision, inter-precision, and inter-analyst variability of the assay. The acceptance criteria are generally set at a % CV ranging between 20% and 30%, depending on the frequency of the population of interest. Specificity: to verify that the response obtained is specific to the antibody used, isotype matched controls are tested. Isotype controls are expected to yield a negative signal, whereas the specific antibodies to the markers of interest are expected to yield a positive signal. Interaction between antibodies: it is important to test the interaction of the antibodies within a panel to understand whether one antibody negatively impacts the staining of another antibody. The samples are stained with the complete panel and with panels in which one antibody is removed at a time (fluorescence minus one [FMO]). The antibody interaction can be evaluated by checking that there is no unspecific binding in the FMO channel (channel missing the antibody) or by having a difference of less than 20% between relative percentages of the population of interest reportable in the FMO controls. Day-to-day variability: to evaluate the reproducibility of the method over time. Samples from at least 3 animals over 3 different occasions are analyzed with the method and the % CV across all occasions is calculated. Percentages between 20% and 25% are usually considered as being acceptable. This also allows one to understand biological changes when collecting samples on different occasions, whenever feasible. Reference range: serves to establish the biological variability of the markers/function analyzed in the species/population of interest in order to drive the interpretation of study data. In order to perform this parameter, samples from at least 5 healthy/animals/sex are recommended. Additional animals/sex can be included in cases where a large inter-animal variability is identified. Repeat (e.g., weekly) measures in the same animals can be included for the assessment of intra-individual variability. Collecting samples under consistent conditions (e.g., time of day and fed/fasted status) can help to diminish inter- and particularly, intra-animal variability. Stability: Stability of unstained samples as well as stained/fixed samples is required. The stability time and temperature required are based on the expected condition study samples will be exposed to. Results from samples stained following collection and acquired following staining (reference samples) are compared to samples stained a few hours per day after collection or to processed samples stored for a given period of time prior to acquisition on a flow cytometer. Percentage differences between the reference samples and stability samples are then calculated. Acceptance criteria varying between 25% and 30% are usually considered as being acceptable, depending on the overall variability of the methods established during the precision assessment. The parameters discussed above are the most commonly tested. However, depending on the method, additional parameters may be beneficial.
Case Studies
Four case studies were presented to illustrate different approaches taken to validate flow cytometry methods and what challenges were encountered.
The first case study described the validation of an assay used to monitor the efficacy of an immunomodulatory drug. The assay was later used to support two preclinical studies in which a neutralizing chemokine drug was tested. The validation included all the parameters described above in addition to agonist titration, activation time course, and in vitro drug titration. Agonist titration and activation time course were assessed, given that the methodology required cell activation. Indeed, the efficacy of the drug was monitored through the inhibition of cell activation. Therefore, the appropriate concentration of agonist, as well as appropriate incubation time, was required in order to capture the inhibition. In vitro drug titration in this case helped in the assessment of the appropriate concentration of drug to be tested in the first in vivo studies.
The second case study described the validation of a flow cytometry method for the measurement of platelet activation. This assay was designed to either monitor in vivo platelet activation or to assess in vitro platelet activation in the context of antiplatelet drug testing. In addition to the parameters described above, agonist titration, activation time course, and precision of activation were performed. For this methodology, sample collection was critical in order to obtain reproducible results. Moreover, variability in sample staining was observed, which was mitigated by performing replicates for activation and staining.
The third case study described the validation of a flow cytometry assay for the measurement of basophil activation in the context of a phase III clinical study. The parameters described above were tested but with emphasis on the stability testing given that samples from all over the world were received by the test site for testing. Extensive stability was tested on the samples, and the acceptance criteria were not met. However, positive samples remained positive after stability testing, whereas negative samples remained negative. Therefore, it was decided to use the method qualitatively and to define samples based on their positive or negative status rather than by the evaluation of their percentage of activated cells.
The fourth case study described the validation method to measure the regulatory T cells in rat whole blood and thymus, which is a very small population. The validation parameters described above were tested in addition to the determination of a limit of detection (LOD). The LOD is generally validated only when small populations of cells of interest are investigated. It is obtained by looking at the percentage of positive cells in the quadrant of the cell population of interest with the FMO tubes. This represents the frequency of false-positive events or the background noise in the quadrant of interest. This experiment is repeated with multiple samples and the percentage of mean and SD of background noise is calculated. The mean + 3SD is defined as the LOD. For abundant cell populations, the percentage of background noise does not impact the overall interpretation of the results. However, for smaller populations, this percentage is important in the interpretation of the data.
Most case studies presented did not represent conventional validations since acceptance criteria were not met for all parameters. However, by adapting the reporting strategy, it was shown that the methods could be used within the preestablished limitations. A fit-for-purpose approach should be taken when validating flow cytometry methods, as the steps included in the validation and development of an assay depend on the purpose of the assay. Therefore, all of the parameters described herein are not necessarily required in all assays being developed.
The Roles of the Anatomic and Clinical Pathologist in Nonclinical Safety Biomarker Qualification
Daniela Ennulat
Traditionally, new biomarkers gained regulatory exposure and acceptance primarily through data submitted in drug approval submissions, with minimal use outside of the proprietary drug development space, and essentially no sharing of data or marker experience in the public domain. The Critical Path initiative was launched in 2006 by the FDA to modernize drug and medical device development, and novel biomarkers were identified as an integral part of the “tool kit” considered essential for increasing success of drug safety evaluation (Woodcock and Woosley 2008). The first organized biomarker development efforts were to qualify novel urinary markers of renal injury in rats and began as a collaboration between the FDA and scientists from industry and academia. This collaboration was expanded to include additional scientists from pharma, academia, and international regulatory agencies as the Predictive Safety Testing Consortium (PSTC) in 2005, with development of a pilot process for interaction between scientists from regulatory agencies and sponsors in 2006 that culminated in the development of a formal Biomarker Qualification Program in 2009 (Woodcock et al. 2011). Seven urinary biomarkers of renal injury in rat submitted in a Voluntary Exploratory Data Submission by the PSTC Nephrotoxicity Working Group were qualified in 2008 (Vaidya et al. 2010), followed in 2010 by two urinary markers submitted by the HESI Nephrotoxicity Working Group (Harpur et al. 2011).
Increasingly, the biomarker qualification process has shifted from a formal qualification process to issuance of a Letter of Support (LoS) by the FDA and EMA. The purpose of an LoS is to encourage further exploratory use for promising new markers to ensure continued development, increased visibility, and increased understanding of marker biology. Two additional rat urinary markers received an LoS in 2016 (Phillips et al. 2016; FDA 2018), and most of the urinary markers qualified in rat in the original renal biomarker qualification submission have now received an LoS for exploratory clinical use from the FDA (2016a). To date, more clinical than nonclinical LoS have been issued, including exploratory markers of progression of drug-induced liver (FDA 2016b) or vascular injury (FDA 2016c), as well as a variety of prognostic, PD and even functional markers (FDA 2018).
For any new biomarker, understanding and implementation do not stop with either a formal regulatory qualification or LoS. Rather, biological qualification of new biomarkers is an iterative process that ideally extends beyond regulatory interactions into longitudinal evaluation and learning as implementation of new biomarkers increase over time. Veterinary anatomic and clinical pathologists have unique expertise in comparative medicine to increase the success of safety biomarker development and translation.
Selection of candidate biomarkers is a pivotal decision point in the development cycle of new biomarkers in any species. While most novel biomarker development efforts are currently occurring in the clinical space, there are still many toxicities that lack translatable, noninvasive safety biomarkers which would benefit from nonclinical biomarker development collaborations. In the past, many novel markers were identified using -omics platforms, primarily gene expression, for selection of proteins of interest. However, this approach often leads to false discovery and less than optimal selection of a candidate biomarker because concordance between message and protein is poor, and this relationship is unique and rarely known for each mRNA–protein pair (Guo et al. 2008). Another key consideration for selection of candidate safety markers is the relationship of the marker to the disease process. Thus, systems biology rather than transcriptomics approaches are generally better for selection of disease-relevant biomarkers through identification of markers related to a pathophysiologic process that have the greatest potential of being specific to the injury, measurable and translatable across species. Combining systeomics (computational techniques used to analyze how biological systems interact) with the comparative medical expertise of veterinary pathologists can help optimize safety marker identification and translation through consideration of variables that may be species- or analyte- or assay-specific such as dynamic range (e.g., glutamate dehydrogenase [GLDH] generally has larger excursions than alanine aminotransferase [ALT] in rat liver injury), species relevance (e.g., GLDH and sorbitol dehydrogenase [SDH] are preferred over ALT as hepatocellular injury markers in minipig), circulating half-life (e.g., false negatives due to the short circulating half-life of cardiac troponin I), analyte stability (e.g., long vs. short storage stability of microRNA in comparison with enzyme activity markers), or effect of disease-related modifications (e.g., underestimation of early albuminuria caused by the lack of immunoreactivity of urinary albumin fragments).
While the selection of a marker for qualification is often based on assay availability, a key aspect of biomarker qualification is assay validation. In addition to the rigorous validation practices described earlier, characterization of biological variability is essential for implementation of new biomarkers. This is not often done in novel biomarker qualification submissions, but understanding of the intra- and interindividual variability of a new marker and the effects of sex, age, diet, or circadian rhythm is essential for the interpretation of novel biomarker data.
Design and Execution of Biomarker Qualification Studies
Like the use of nonclinical studies to identify target organ toxicities in nonclinical drug development, investigative studies with well-characterized tool toxicants are commonly used to anchor the phenotype of novel safety markers based on histopathology findings and changes in traditional clinical pathology parameters. Dose selection is a critical component of biomarker qualification study design because the target organ toxicities commonly evoked by tool toxicants are generally much more severe than those seen in the drug development “space”. Thus, dose-range finding studies may be needed to ensure that a continuum of injury is obtained in biomarker qualification studies to assess the sensitivity of novel markers for early or less severe tissue injury. In nonclinical species, it is also critical to be mindful of differences in the injury response to toxicants that are related to strain or source, sex, or age, as this can have unexpected effects on study outcomes.
Biomarker qualification study designs are generally standardized, with sampling at study termination and inclusion of select tissues in addition to the target organ to assess marker specificity. However, while terminal sampling enables correlation with histopathology, it provides no insight into the chronology or dynamics of changes in novel biomarkers. To better understand the biology of a new marker while simultaneously minimizing animal usage, instead of using studies with multiple cohorts of animals terminated at different time points, it is possible to bridge study data across studies using histopathology data from earlier, shorter duration studies to inform the histopathology of biomarker changes on nonterminal samples in longer-duration studies. When bridging data across studies, it is important, however, to ensure parity of doses or systemic exposure, collection time points, and test animals across studies.
Controversy over the use of blinded versus open (knowledge of treatment group) histopathology evaluation has been long-standing; however, for biomarker qualification studies, it is acceptable for the pathologist to have access to study data other than data for the novel biomarker undergoing evaluation (Burkhardt et al. 2011). Knowledge of treatment group allows the study pathologist to calibrate background or spontaneous findings, while a targeted masked evaluation of the entire study is used to refine severity grading of specific target organ toxicities. Because meta-analyses of diagnostic utility are typically run on novel biomarker data generated by multiple sponsors over extended periods of time, use of agreed morphologic diagnoses from a standardized histopathology lexicon across studies is imperative. Consistency in both histopathology diagnoses and severity grading across studies is important even when histopathology findings are used as binary end points indicating the presence or absence of a target organ toxicity as is most often the case in diagnostic performance evaluation. This is particularly true for spontaneous lesions such as chronic progressive nephropathy (CPN) or single-cell hepatocellular or pancreatic acinar cell necrosis where discrimination between test article–related and potentially spontaneous or background findings may be necessary.
Role of the Veterinary Pathologist in the Meta-analysis of Novel Biomarker Data
Even with use of a histopathology lexicon, data curation to consolidate morphologic diagnoses is required prior to statistical analysis to position the histopathology data to accurately reflect primary versus secondary injury, minimize diagnostic redundancy, and maximize statistical rigor. For example, with a renal tubular lesion, observations such as degeneration and necrosis, vacuolation, or basophilia might exist within a study. Degeneration and necrosis would be considered primary manifestations of tubular injury and grouped together for analysis since they represent a continuum. Vacuolation could be a degenerative change, phospholipidosis, or an innocuous finding and would need to be assessed within the context of other study data (e.g., presence or absence of degeneration or necrosis, changes in urinalysis or clinical chemistry parameters, and use of cyclodextrin vehicles) or knowledge of test article. Findings such as tubular basophilia or regeneration would need to be differentiated from one another and from the basophilia of early CPN. Tubular basophilia can be an early degenerative change and might be evaluated statistically as such depending on the context of other study findings. Since an increased incidence or severity of CPN could be considered a manifestation of renal injury, knowledge of control animal findings is essential for discriminating between an incipient CPN from an early degenerative change and manifestation of test article–mediated renal tubular injury.
Other important logistical considerations include the use of consistent and standardized tissue sampling, processing tissues together by organ rather than by treatment group, and removal from formalin within 48 hr if immunohistochemistry or in situ hybridization is needed. It is extremely important to maintain study metadata (e.g., study pathologist, incidence tables, and individual animal data) in addition to the study report to clarify questions that may arise later during the qualification process.
Veterinary anatomic and clinical pathologists have unique expertise in comparative medicine that can enrich safety biomarker development and biological qualification. The complementary perspectives of the anatomic pathologist for the phenotypic anchoring of safety biomarker changes, combined with the expertise of the clinical pathologist for the understanding pathophysiological and analytical aspects unique to each novel biomarker are essential for novel biomarker development and translation. From input into selection of candidate biomarkers, nonclinical study design, and study data evaluation, to data curation for statistical evaluation, to progressive qualification by observational learning as novel biomarkers are implemented, veterinary pathologists play an essential role in novel biomarker development.
Footnotes
Author Contributions
Authors contributed to conception or design (FP, JC, SL, AA, MP, DE, LB, AM); data acquisition, analysis, or interpretation (FP, JC, SL, AA, MP, DE, LB, AM); drafting the manuscript (SL, AA, MP, DE, LB); and critically revising the manuscript (FP, JC, SL, AA, MP, DE, LB, AM). All authors gave final approval and agreed to be accountable for all aspects of work in ensuring that questions relating to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Declaration of Conflicting Interests
The author(s) declared no potential, real, or perceived conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
