Sage Journals: Discover world-class research

Abstract

One of the emerging concepts on the reduction of animal use in non-clinical studies is the use of virtual control group (VCG) to replace concurrent control group (CCG). The VCG involves the generation of a control data from historical control data to match a specific study design. This review focuses on two recently published proof-of-concept (POC) studies conducted in rats. One major issue that was consistently seen across these POC studies was the non-reproducibility of some quantitative endpoints between the CCG and the VCG, with clinical pathology parameters being the most affected. The inconsistencies observed with the clinical pathology parameters when using VCGs may lead to: (1) misconception about the accuracy and sensitivity of traditional clinical pathology biomarkers and its implications on safety monitoring in the clinic; (2) inability to correctly identify and characterize organ dysfunctions; (3) interference with the weight-of-evidence approach used in identifying hazards in toxicologic clinical pathology and toxicology studies at large; and (4) wrong interpretations and data reproducibility issues. Other alternatives to reduce animal use in toxicology studies are also discussed including blood microsampling for toxicokinetics, scientifically justified use of recovery animals, and appropriate use and continuous investments in new alternative methods.

Keywords

clinical pathology toxicology virtual control group concurrent control group 3Rs

Introduction

The concept of using Virtual Control Groups (VCG) involves generating a control data from historical control data (HCD) to match a specific study design including animal demographics, study procedures, environmental conditions, etc, in order to replace a concurrent control group (CCG) in an in vivo/animal experimental study.⁴⁶ Recently, using VCG as a replacement for concurrent controls has been closely evaluated in the preclinical space, considering this might reduce animal use in toxicology studies by up to 25%.^1,17

In the human and veterinary medical diagnostic field, population-based reference values (RVs) are commonly used for making clinical decisions as CCGs are not pragmatic in this setting. Prior to its application in veterinary medicine, the concept of population-based RV was first introduced in human medicine in 1969.^14-16,29-31 The RVs are generally reported as reference intervals (RIs) comprising 95% of a healthy reference population with predefined criteria bounded by the upper and lower reference limits. The statistical method that is chosen for the determination of reference limits is based on the number and distribution of the RVs. Non-parametric methods are recommended when at least 120 RVs are available.²² When fewer than 120 samples are available (eg, between 40 and 120 samples), alternative methods (NAMs) such as bootstrapping, robust, or parametric methods are recommended for RI generation.^11,23,24 In veterinary medicine, there are scenarios where low number of reference samples can be collected, such as for special and wild-animal species and neonates. In such cases, when 20 and <40 reference samples are available, RI should be calculated using robust (distribution-independent) or parametric (if normality can be established) methods. To underscore the uncertainty inherent with such small sample sizes, 90% confidence interval should be calculated along with reporting the histogram, mean or median, and minimum and maximum values in order to help with making informed clinical decision. When <20 reference samples are available, it is recommended that RI should not be determined given the small sample sizes that are unlikely to be representative of the distribution of a variable within a population.⁵²

In the toxicologic clinical pathology field, the concept of the HCD for data interpretation is generally similar to the use of RI in the diagnostic field. The HCD for specific endpoints represent a range of values that are considered normal in clinically healthy individual animals from a given species within a set of predefined stratified classifications. HCD are usually generated from values obtained from the stock colony animals that reside in the test facility, or from individual control animals and/or pretest values from actual toxicology studies. As such, there is no homogeneity among the individual animals that are used to generate the HCD. For example, one study might use sedation for radiographs, whereas another might use indwelling catheters placed prior to study initiation. Some studies might not perform any venipuncture prior to study start, whereas others may conduct multiple rounds of venipuncture. Test results obtained from control animals given saline weekly will be combined with values from control animals that are 20 weeks into a study with daily oral administration of a corn oil vehicle.² For an HCD set to be an ideal comparator for any given study, the HCD set must be designed in such a way that only the data derived from samples collected under the same conditions as the study population will be included. Most HCD sets are not adequately stratified to include this level of granularity (eg, vehicle type: saline vs corn oil; anesthesia exposure for exams; administration route: oral vs intravenous vs subcutaneous; the number of blood draws; etc) to serve as a useful comparator for any given study.² This is because these preanalytical factors are known to impact clinical pathology parameters.⁵¹ Hence, using control data generated from different conditions relative to a specific study might lead to clinical pathology data misinterpretations. This is the reason why CCGs are considered the ideal comparators over HCD for clinical pathology data interpretation in toxicology studies.

Nevertheless, with how they are constructed,^1,17 VCGs are meant to address most of the concerns and liabilities that make HCD not an ideal comparator for toxicologic clinical pathology data interpretations. Furthermore, with the availability of Standard for Exchange of Nonclinical Data (SEND),¹³ there is now greater accessibility and interoperability of in vivo toxicology data, along with the streamlining of terminologies in order to automate large-scale and robust data analyses. Although the general assumption is that the SEND data sets are consistent and complete mainly following the SEND guidance implementation; from our experience, we have observed that some of the covariates that are known to have physiological effects, and thus impact the clinical pathology values are usually not documented in the SEND studies’ data set. For instance, the animals’ age, vendor, diet, fasting status, restraining methods prior to blood sampling, site of blood collection, timing of blood collection, housing, animal origin, animal technician experience, etc, are some of the preanalytical variables that may affect clinical pathology data⁵¹ and are not recorded in SEND. While VCG is considered a vast improvement over HCD, it is not yet quite equivalent to using concurrent controls as comparators for toxicologic clinical pathology data interpretations as revealed in the two publications discussed below.

Findings From Recent Virtual Control Group Efforts

Two proof-of-concept (POC) studies that looked at the implementation of VCG for legacy studies interpretations were recently published.^1,17

For the first study, Gurjanov et al¹⁷ generated VCG data from a set of control animal data that was gathered from the HCD repository for preclinical safety studies at Bayer AG, Germany. The authors used different inclusion and exclusion criteria to match the animals in the three legacy studies that were selected for the POC study. The selection criteria included body weight, study duration (28 days), year of studies (2021 or 2022), number of dose groups, type of rat species (Han Wistar), same test facilities, treatment vehicle material, animal suppliers, non-fasting, animal housing, and time of clinical pathology and anatomic pathology sample collections. Statistical analysis was carried out independently using each VCG iteration as was used with the CCG for each study.

As a first step, the authors evaluated the performance of the reanalysis with VCGs in reproducing statistical significance and the direction of changes. When the CCG was replaced with VCG, the result showed that 31% (34 of 108 parameters), 40% (43 out of 107 parameters), and 49% (57 of 115 parameters) of the quantitative parameters were not statistically reproducible in the legacy studies A, B and C respectively (Table 1). This led the authors to conclude that the statistical results of the quantitative parameters, most of which were clinical pathology parameters from the original analysis with CCG were not fully reproducible in the reanalysis with VCGs. In the second step, the Subject Matter Experts (SMEs) re-evaluated the parameters that were statistically different between the CCG and the VCGs in a non-blinded manner. In all studies, the SMEs concluded that while some of the quantitative parameters with statistical differences between the CCGs and the VCGs appeared not biologically relevant, there were some differences that were interpreted as biologically relevant and even test article related. For instance, in legacy study A, after replacing the CCG with VCG, increased water consumption and bilirubin concentrations in the dosed animals were not observed vs observed in the original study; decreases in glucose were noted in all dose groups in females vs observed only in high dose males in the original study; GGT increases were observed in all dose groups and both sexes vs no GGT changes in the original study; and increases in relative liver weight were noted in all dose groups vs only in the mid and high dose groups in the original study (Table 2). In the study legacy B, the increases in total bilirubin and urine protein/creatinine ratio, along with decreased chloride concentrations, were detectable from the lowest dose upward vs mid and high dose groups in the original study. Moreover, the decrease in protein was not reproduced by VCGs. In addition, new findings such as increases in alanine aminotransferase, basophils, GGT, and glucose, which were not observed in the original study, were now observed with the VCG (Table 2). In the study legacy C, several test article-related changes of the quantitative parameters were not reproducible using the VCG: the decrease in food intake was only reproducible in males, and not the females; increase in serum protein in females and increased protein/creatinine ratio in high dose males were not reproducible with the VCG. Instead, decreased protein concentrations were observed in both sexes indicating misinterpretation. Increases in calcium were not reproducible in the VCG; significant decreases in chloride detected only in the female animals in the original study were now observed in male animals of all dose groups with VCG. Moreover, the increases in absolute liver weight in females observed in the CCG were not reproducible with the VCG (Table 2).

Table 1.

Summary of statistically significant non-reproducible quantitative parameters from the first study after replacing CCGs with VCGs.

	Legacy study A	Legacy study B	Legacy study C
Total number of quantitative parameters	108	107	115
Number and % of statistically significant non-reproducible parameters	34 (31%)	43 (40%)	57 (49%)

Table 2.

Summary of biologically relevant non-reproducible quantitative parameters from the first study after replacing CCGs with VCGs.

Quantitative parameters with changes considered biologically relevant	CCG	VCG
Legacy study A
↑ Water consumption	+	–
↑ Bilirubin	+	–
↑ GGT	–	+
↓ Glucose	+ (males, high dose group)	+ (females, all dose groups)
Legacy study B
↑ Bilirubin	+ (mid and high dose)	+ (all dose groups)
↑ Protein/creatinine ratio (urine)	+ (mid and high dose)	+ (all dose groups)
↓ Chloride	+ (mid and high dose)	+ (all dose groups)
↓ Protein	+	–
↑ ALT	–	+
↑ GGT	–	+
↑ Glucose	–	+
Legacy study C
↓ Food intake	+ (both sexes)	+ (males only)
↑ Protein	+	–
Protein/creatinine ratio	↑	↓
↑ Calcium	+	–
↓ Chloride	+ (females, mid and high dose)	+ (males, all dose groups)
↑ Absolute liver weight	+ (females)	–

Adapted from Gurjanov et al.¹⁷

Despite these discrepancies, the overall conclusions of these studies with regards to the No-Observed- Adverse-Effect-Level (NOAEL) or Severely Toxic Dose, 10% (STD10) remain unchanged when studies were reanalyzed with the VCGs.

For the second study, Andaya et al¹ used a somewhat similar approach as the first study in generating their VCG data but with slightly stricter selection criteria. Andaya et al generated their VCG data using HCD from an internal non-clinical data repository for non-clinical studies conducted at Genentech, Inc (GNE) during 2010 to 2022, in Sprague-Dawley rats (species) with study duration of 5 to 14 days. In addition to the selection criteria employed by Gurjanov et al,¹⁷ Andaya et al¹ also included the type of diet, route of administration, dosing frequency, dose volume, blood collection method, blood collection frequency, tissue processing protocol, and clinical pathology analysis method in their selection criteria to match the legacy study that was chosen for the POC exercise. The authors were able to generate 2 sets of VCG data. In the first study, Gurjanov et al¹⁷ conducted a statistical preanalysis to identify quantitative parameters with statistically significant differences between the VCGs and the CCG following which the affected parameters were reviewed by the SMEs in a non-blinded manner. In this study, Andaya et al did not conduct a statistical preanalysis to identify parameters with statistically significant difference between the CCG and VCGs. Once the VCGs were generated, 3 sets of data (2 VCGs and 1 CCG) were sent to the SMEs for their interpretations in a blinded fashion to prevent any interpretation bias. Overall, this POC study appeared to be conducted with more stricter guidelines to mimic real-life scenarios. Following interpretations by the SMEs, the results showed that there were no differences between body weight measurements, organ weight measurements, macroscopic and histopathology findings between the VCGs and the CCG. However, for clinical pathology findings, mild changes observed at the low- and mid-dose groups were associated with misinterpretations as 18 of the 21 findings observed at the lower doses using VCGs did not align with findings noted with the CCG.¹ New changes observed with the VCGs also lacked correlative evidence in the in-life clinical observation (data not shown) and histopathology endpoints. For example, with the VCGs, minimal to mild increases in red cell mass parameters (~5% increase) were observed. However, with the CCG, the red cell mass changes trended toward decreases. In addition, minimal to mild increases in albumin (~14% increase), total protein (~9% increase), sodium (~3% increase), potassium (~15% increase), chloride (~3% increase), and glutamate dehydrogenase (GLDH) (2× increase) concentrations, along with decreases in glucose (~24% decrease) and inorganic phosphate (~18% decrease) concentrations were observed in the lower dose groups (Table 3). The changes in albumin, total protein, sodium (Na+), potassium (K+), and chloride (Cl−) concentrations were suggestive of dehydration, and the changes in glucose and inorganic phosphate concentrations were suggestive of decreased food consumption. Moreover, the decreased glucose concentration observed in the lower dose groups was inconsistent with the increased glucose concentration observed in the high dose group in one of the VCGs. These changes were not observed in the CCG, and there were no abnormal in-life clinical observations (data not shown) or histopathology (for GLDH) findings to support these clinical pathology changes in the VCGs indicating misinterpretations. In the high dose group where the clinical pathology changes were more severe, outside of few inconsistencies, these changes were largely consistent between the VCGs and the CCG. The authors also assessed the coefficients of variation (CVs) of the clinical pathology parameters in the VCGs and CCG, and the results showed that the VCGs had more clinical pathology parameters with higher variability relative to CCG, including a total of 16 parameters in the VCGs with higher CVs vs only 3 parameters in the CCG with higher CVs. Given these inconsistencies were largely limited to clinical pathology changes at the low and mid dose groups, the replacement of the CCG with the VCGs did not alter the NOAEL of this study (mid dose).

Table 3.

Summary of biologically relevant non-reproducible quantitative parameters from the second study after replacing CCGs with VCGs.

	CCG			VCG1			VCG2
Dose groups	Low	Mid	High	Low	Mid	High	Low	Mid	High
Hematology
Hematocrit	-	-	↓	↑	-	↓	↑	-	↓
Hemoglobin	-	↓	↓	↑	-	↓	↑	-	↓
Red blood cells	↓	↓	↓	-	-	-	-	-	-
RDW	↓	↓	↓	-	-	-	-	-	-
Reticulocytes	-	-	↓	↑	↑	-	↑	↑	-
MCHC	-	-	↑	↓	↓	-	↓	↓	-
Clinical chemistry
Albumin	-	-	↓	↑	↑	-	↑	↑	-
Chloride	-	-	↓↓	↑	↑	↓	-	-	↓
GLDH	-	-	↑↑↑	-	↑	↑↑↑	-	↑	↑↑↑
Glucose	-	-	↑↑↑	↓↓	↓↓↓	↑↑	-	-	↑↑↑
Inorganic phosphate	-	-	↑	↓↓	↓↓	↓	↓↓	↓↓	↓
Potassium	-	-	↓↓	↑	↑	↓	-	-	↓
Sodium	-	-	↓↓	↑	↑	↓	↑	↑	↓
Total protein	-	-	↑↑	↑	↑	↑↑	↑	↑	↑↑

Adapted from Andaya et al.¹ 1 arrow indicates minimal to mild; 2 arrows indicate mild to moderate; 3 arrows indicate moderate to marked; 4 arrows indicate marked.

Abbreviations: CCG, concurrent control group; VCG, virtual control group; MCHC, mean corpuscular hemoglobin concentration; RDW, red cell distribution width; GLDH, glutamate dehydrogenase.

Potential Implications of Virtual Control Groups on Clinical Pathology Data Interpretations

Between the two studies mentioned above, there were a total of 5 POC toxicology studies that were evaluated with VCGs. Although the overall conclusions of the toxicology studies with regard to NOAEL or STD₁₀ were not affected, there were consistent issues with quantitative parameters with the clinical pathology endpoints being the most affected even though the two reports used slightly different approaches to generate their VCGs. This indicates that regardless of the approach that is used to generate VCG, clinical pathology misinterpretations in toxicology studies might be inherent with the use of VCG. The NOAEL or STD₁₀ were not affected because, except for some scenarios where the clinical pathology parameters affected do not have histopathology correlates (eg, red blood cells [RBCs], platelets, coagulation endpoints etc), adversity calls (eg, NOAEL) are predominantly determined by histopathology³⁹; and in these POC studies, histopathology interpretations were not affected by the replacement of the CCGs with the VCGs. Nevertheless, based on the observations to date on the utility of VCGs, here are some of the potential risks of using VCG for clinical pathology interpretations in drug development studies.

1) Misconception About the Accuracy and Sensitivity of Traditional Clinical Pathology Biomarkers and its Implications on Safety Monitoring in the Clinic: Although the NOAEL of the POC studies were not impacted because the histopathology interpretations were not impacted by the VCGs, it should be noted that histopathology is rarely the tool for safety monitoring in the clinic. This aspect of drug development is predominantly reliant on traditional and non-traditional clinical pathology safety biomarkers.^{6,28,33,41,42,53,54} Hence, generating data that falsely imply that safety biomarkers are not accurate or sensitive enough for safety monitoring in the clinic based on observation in non-clinical studies is alarming. For example, in the second study, with the CCG, the trend toward decreases in RBC mass in the lower dose groups was sustained in the high dose group with higher severity, suggesting a dose-dependent effect of the test article. However, none of the VCGs were able to identify this trend in the lower dose groups. In fact, the RBC mass parameters trended in the opposite direction which was suggestive of hemoconcentration. In the real world, this inability to capture the right trend with the VCGs would have led to a misinterpretation that the test article effect on the RBC endpoints were only observed at the high dose which would have falsely indicated the RBC toxicity was a steep dose effect. This would have also led to a misconception about the sensitivity of the RBC endpoints to capture this toxicity in the clinic before it becomes too severe which is dangerous for the patients. Therefore, when we look beyond the implications of these clinical pathology interpretations on non-clinical toxicology studies (ie, clinical trials), the industry runs the risk of questionable conclusions about the utility of safety biomarkers in the clinic for safety monitoring when VCG replaces CCG for clinical pathology interpretations in non-clinical toxicology studies.

2) Risk of Inability to Correctly Identify and Characterize Organ Dysfunctions: Generally, histopathology data on which NOAELs are often based largely measure structural tissue injuries while clinical pathology data can support both structural injuries and associated dysfunctions that may be associated with such injuries. And in some cases, some of these organ dysfunctions might not be associated with any structural injury. For instance, a recent report showed how a new molecule entity (NME) was interfering with vitamin K’s ability to activate the coagulation factors,⁴⁸ one of many functions of the liver, without any histopathologic evidence of liver injury. Although no coagulation parameters were affected in the POC studies to date, given the inherent errors that the use of VCG pose for clinical pathology interpretation, the industry should realize that this is a risk that would likely follow the use of VCGs for clinical pathology data interpretations in non-clinical toxicology studies. Another instance of dysfunction mischaracterization was observed in the VCG-study legacy B of the first study in which increases in urine protein/creatinine ratio along with decreased chloride concentrations were detectable at the lowest dose in the VCG vs mid and high dose groups in the original study falsely indicating that renal dysfunction might be present at the low dose group when using VCG, a finding that was clearly absent in the low dose group when using CCG.

3) Negative Effect on the Weight-Of-Evidence Approach Used in Identifying Hazards in Toxicologic Clinical Pathology and Toxicology Studies at Large: In toxicologic clinical pathology, interpretations are usually based on a weight-of-evidence (WOE) approach^2,3,26,39 in which dose-dependent changes and correlation with other related endpoints are employed for accurate interpretations. As shown in these POC studies, when CCGs were replaced by VCGs, this WOE approach was negatively impacted. In the second study, there was a pattern of misfits with some of the clinical pathology parameters. With the VCG, increases in albumin concentrations were observed in the low and mid dose groups which may indicate dehydration/hemoconcentration, with no albumin change in the high dose group. This did not make any scientific sense regarding test article-related change as the increases in albumin was supposed to become worse at the high dose. In the CCG, the only albumin change that was present was observed in the high dose group and it was a decrease, indicative of underlying inflammation or liver dysfunction as supported by other correlative findings in clinical- and histo-pathology. The pattern of this change bolstered the clinical pathologist’s confidence in interpreting the changes as a test article-related effect, whereas the pattern observed with the VCG made it difficult for the expert to make a reasonable and conclusive interpretation of the test article-related effect on albumin concentration. Similar misfits were also noted with other clinical pathology parameters when using VCGs vs the CCG. Owing to all these misfits, the clinical pathologist was able to easily identify the CCG as the actual control group even though this was a blinded study. In a real world, deviations from the control that do not show a clear dose-dependency are often disregarded as analytical noise. However, given the sheer number of parameters affected in just one study, a qualified clinical pathologist would have been worried about the reliability of the data for interpretation. This would have triggered some investigations on the preanalytical, analytical, and postanalytical procedures used to generate such data as it is impossible to accurately detect true test article-related subtle changes vs changes due to analytical noise. There are situations were minimal/mild changes could be noise and there are situations where they could mean something. The VCG completely takes away the opportunity to decipher between the two. It should also be noted that the task to investigate the cause of these deviations would be impossible with VCGs given some of the covariates responsible for these deviations may not be recorded in the database. Therefore, these results reveal the inherent potential of VCGs to negatively impact the WOE approach used in identifying hazards in toxicologic clinical pathology and toxicology studies at large as this increases the possibility of generating erroneous clinical pathology interpretations that do not match with other toxicology endpoints.

4) Wrong Interpretations and Data Reproducibility Issues: In study legacy A from the first study, there were increases in GGT in all dose groups in both sexes with VCG, whereas there were no such increases with the CCG. In a real world, this is a change that cannot be ignored by a qualified clinical pathologist. This is because in rats, GGT has a low baseline activity in serum due to low hepatic GGT activity and a short circulating half-life of ~30 minutes.^{7,10,25,27,45} Hence, GGT is considered a less sensitive biomarker in rats. So, when a GGT change is observed especially in the manner it was observed in the VCG-study legacy A, this is a finding that cannot be ignored even without an adverse histopathology correlate. Such a finding will be interpreted as a test article-related change as this change could also be due to drug-metabolizing enzyme induction^{12,34,40,43,49,55} leading to wrong interpretations. Moreover, given the inherent errors that VCG pose for clinical pathology interpretations, there is a high possibility that these data might not be reproducible when this study is conducted at another study facility using a different VCG.

5) Potential for More Errors in a More Biologically Variable Animal Species: Although the appropriate implementation of VCGs can lead to reduction of animal use in rodent toxicology studies, the animal species that its successful implementation will be most felt are non-human primates due to their shortage.^50,56 However, given the issues noted in a biologically constrained animal species (ie, rats) used in these POC studies, this casts some doubt on the successful application of this strategy in a highly biologically variable animal species such as non-human primates (NHPs). Moreover, the shortage in NHPs did lead to changes in the origin of the NHPs used, adding to the variability in this species, thereby increasing the challenge to create a reliable VCG for NHPs as the origin is not captured in the SEND data sets. For these reasons, considerably more work must be done to ensure the use of VCGs will not lead to questionable data interpretations for hazard identification for the purpose of safety monitoring strategies in clinical trials.

Potential Causes of Errors for Clinical Pathology Data With Virtual Control Group and Possible Mitigating Solutions

It has been shown that errors in laboratory data could be caused by preanalytical, analytical, and postanalytical covariates, some of which might be difficult to control for when generating VCGs.²¹ In these POC studies, the authors did a good job of controlling for many of these covariates when generating the VCGs to match well with each of the legacy studies. Despite this valiant effort by the authors, the clinical pathology interpretations were still marred with errors. In fact, Andaya et al¹ discussed in their report that the vehicle, dose volume, and housing conditions were incompletely matched as they were not readily searchable in their in-house historical control database. Moreover, none of the studies included variables such as assay kits and instrumentations, and variables such as the type of diet, route of administration, dosing frequency, dose volume, blood collection method, blood collection frequency, tissue processing protocol, and clinical pathology analysis method were not included in the first study by Gurjanov et al.¹⁷ As discussed by Andaya et al,¹ although individual covariate effects may be small, the additive effect of several covariates will lead to a large variability in the VCGs, as the totality of variability is unique to each study. The authors believed this was the reason for observing higher variability with more clinical pathology parameters in the VCGs relative to the CCG. Therefore, when VCGs are being generated for a study, a lot of effort must be put into making sure all necessary covariates are controlled for. That said, we are also aware that some of these information might not be accessible to create a perfect VCG, and as such, the industry must accept that the use of VCGs for toxicologic clinical pathology interpretations will be liable to errors. Another mitigating solution is to consider a hybrid approach in which a combination of CCG and VCG would be used for study interpretation. Another benefit to adopting a hybrid approach is the availability of data for VCG replenishment every 3 to 5 years due to data drift. Generally, given the biological/genetic drift that occurs in animals over time or clinical pathology parameters drift due to technical changes or variability, if CCGs are fully replaced by VCGs, there will be lack of data for VCG replenishment to account for these drifts leading to erroneous data interpretation.

Another area of concern with regard to VCG generation is the use of statistical significance that was employed by Gurjanov et al¹⁷ in the first study to identify VCG clinical pathology parameters that had different values from their CCG counterparts prior to the validation of these differences by the SME. The issue with this approach is that no single clinical pathology parameter should be interpreted in isolation. Interpretation of any clinical pathology parameter is always made within the context of other clinical pathology endpoints because a change in one parameter is a part of a broader pathophysiological process that may have affected other endpoints. The constellation of these changes is what helps a qualified clinical pathologist to make a distinction between a real change or noise when the changes are minimal to mild. For example, a minimal to mild decrease in albumin concentration in an animal could be due to negative acute phase response, liver dysfunction, renal loss, normal biological variation, or analytical noise. Providing just the albumin change based on statistical significance without the complete data set that showed that other related endpoints were affected because of either liver dysfunction, underlying inflammation, or renal dysfunction can lead the SME to falsely assume the observed change was not biologically significant. In fact, this method of VCG validation does not match how clinical pathology data interpretation is done in the real world as it will falsely assume there was no biological change. In contrast, Andaya et al¹ used the real world approach and were able to correctly identify how VCG could negatively impact clinical pathology data interpretations with regard to identifying noise from real change. This approach showed that VCG completely takes away the opportunity to decipher between a real change and noise compared with the CCG. Therefore, as the industry moves forward with the development of VCGs, caution must be taken on relying on statistical significance while ignoring the true biological significance of a change in a clinical pathology parameter.

Opportunities for Virtual Control Group Utility: Dose-Range Finding Studies

As highlighted above, more work needs to be done before VCG can be implemented for toxicology studies. That said, there might be scenarios where VCG can be utilized such as in dose-range finding (DRF) toxicology studies. For these studies, the aim is to identify the maximum tolerated dose (MTD) before adverse effects become unacceptable, and as in these studies, tolerability is the primary endpoint.²⁰ This information helps guide the design to set safe dosing parameters for subsequent studies. As such, subtle pathologic changes such as the ones observed in the low and mid dose groups of the second study will not have any effect on the outcome of DRF studies. In addition, there is anecdotal evidence that some drug development scientists do not include a CCG in DRF studies. As such, having a VCG will be an improvement over DRF studies that lack control groups, especially given the fact that there is a strong alignment between VCGs and CCGs when the pathology changes are moderate to marked in severity—changes that align with tolerability—as observed in the second study.

Other Alternatives to Reduce Animal Use in Toxicology Studies

Since the primary aim of VCG is to reduce animal use, the industry should adopt and invest in other emerging concepts of 3Rs that can help reduce animal use without generating compromised data. Some of these opportunities include blood microsampling for toxicokinetic (TK) assessment in toxicology studies,^5,19 reducing the use of recovery animals in toxicology studies^9,35,38 and replacement of in vivo studies with in vitro models when appropriate (such as back-up molecules).

For blood microsampling, advances in bioanalytical techniques have opened the opportunity to use smaller sample volumes (≤50 µL) for TK assessment in toxicology studies. Blood microsampling for TK assessment has enabled blood samples to be taken from the main study animals which reduces or avoids the use of satellite animals for TK assessment without having any negative effects on the clinical pathology parameters in toxicology studies.^{8,18,32,36,37}

Secondly, some recently published reports have suggested a more science-driven, case-by-case approach for inclusion of recovery animals in toxicology studies rather than inclusion based on standard company practice or perceived regulatory expectations.^35,44 A wide adoption of this practice will also help reduce animal use in toxicology studies without generating compromised data from such studies.

Finally, continuous investments and adoption of new NAMs will also help in reducing the use of animals in drug development studies. The United States government recently passed a law, called the FDA Modernization Act 2.0, in December 2022, in which the United States Food and Drug Administration (FDA) stated that NAMs can be used to reduce or replace conventional animal studies, where appropriate. Although the industry welcomes this new development, it is widely recognized that meaningful impacts are not expected in the near term as the path to validation, routine regulatory acceptance,⁴ and eventual widespread deployment is expected to be gradual and to span decades.⁴⁷ Nevertheless, there are scenarios where NAMs could be beneficial in the short term. For example, NAMs could be adopted for the screening of a backup molecule for a specific organ toxicity that has been observed and well characterized from prior assessment of the lead molecule, rather than using in vivo studies for such screening.

Conclusion

In conclusion, we provide a comprehensive review on the utility of VCG in toxicology studies. Although the replacement of CCGs with VCGs has the potential to reduce animal use by 25% in experimental animal studies, the utility of VCGs for clinical pathology data may lead to a compromised data interpretation even when the overall study outcome is not affected. Given the value of conventional and non-conventional safety biomarkers for safety monitoring in the clinic, our assessments suggested that the use of VCGs in non-clinical safety studies may lead to misconceptions about the accuracy and sensitivity of these biomarkers, thereby affecting their use for safety monitoring in the clinic. Therefore, more work needs to be done to address these vital concerns prior to VCG’s implementation.

Finally, we also propose other alternative approaches that can help reduce animal use in drug development studies including the wide adoption of blood microsampling for TK assessment, limiting the use of recovery animals only when scientifically justified, and more investments and adoption of NAMs when appropriate for drug development studies.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: All work was supported by Genentech, a member of the Roche group.

Ethical Approval

For the two primary references that were the subjects of this review article, the following ethical statements were made in those publications: Gurjanov et al,¹⁷ stated that the animals used in these studies were kept and treated in accordance with the German Animal Welfare Act and approved by the competent state authorities and Andaya et al,¹ stated that all historical studies were conducted at Genentech, a testing facility fully accredited by the Association for Assessment and Accreditation of Laboratory Animal Care. All procedures in the study complied with the Animal Welfare Act and were approved by the local Institutional Animal Care and Use Committee.

ORCID iD

Adeyemi O. Adedeji

References

Andaya

Sullivan

Pourmohamad

, et al. A proof-of-concept rat toxicity study highlights the potential utility and challenges of virtual control groups. Research article. ALTEX. 2024;41(4):647-659. doi:10.14573/altex.2404201.

Aulbach

Vitsky

Arndt

, et al. Interpretative considerations for clinical pathology findings in nonclinical toxicology studies. Vet Clin Pathol. 2019;48(3):383-388. doi:10.1111/vcp.12773.

Aulbach

Vitsky

Arndt

, et al. Overview and considerations for the reporting of clinical pathology interpretations in nonclinical toxicology studies. Vet Clin Pathol. 2019;48(3):389-399. doi:10.1111/vcp.12772.

Avila

Bebenek

Mendrick

Peretz

Yao

Brown

. Gaps and challenges in nonclinical assessments of pharmaceuticals: an FDA/CDER perspective on considerations for development of new approach methodologies. Regul Toxicol Pharmacol. 2023;139:105345. doi:10.1016/j.yrtph.2023.105345.

Bertani

Donadi

Franchi

, et al. Blood microsampling in cynomolgus monkey and evaluation of plasma PK parameters in comparison to conventional sampling. J Pharmacol Toxicol Methods. 2023;123:107298. doi:10.1016/j.vascn.2023.107298.

Biomarkers Definitions Working G. Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther. 2001;69(3):89-95. doi:10.1067/mcp.2001.113989.

Boyd

. The mechanisms relating to increases in plasma enzymes and isoenzymes in diseases of animals. Vet Clin Pathol. 1983;12(2):9-24. doi:10.1111/j.1939-165x.1983.tb00609.x.

Caron

Lelong

Bartels

, et al. Clinical and anatomic pathology effects of serial blood sampling in rat toxicology studies, using conventional or microsampling methods. Regul Toxicol Pharmacol. 2015;72(3):429-439. doi:10.1016/j.yrtph.2015.05.022.

Chapman

Andrews

Bajramovic

, et al. The design of chronic toxicology studies of monoclonal antibodies: implications for the reduction in use of non-human primates. Regul Toxicol Pharmacol. 2012;62(2):347-354. doi:10.1016/j.yrtph.2011.10.016.

10.

Clampitt

Hart

. The tissue activities of some diagnostic enzymes in ten mammalian species. J Comp Pathol. 1978;88(4):607-621. doi:10.1016/0021-9975(78)90014-2.

11.

Clinical and Laboratory Standards Institute. Defining, Establishing, and Verifying Reference Intervals in the Clinical Laboratory; Approved Guidelines. 3rd ed. Malvern, PA: Clinical and Laboratory Standards Institute; 2008.

12.

Ennulat

Walker

Clemo

, et al. Effects of hepatic drug-metabolizing enzyme induction on clinical pathology parameters in animals and man. Toxicol Pathol. 2010;38(5):810-828. doi:10.1177/0192623310374332.

13.

Hock

Pugsley

. Drug Discovery and Evaluation: Safety and Pharmacokinetic Assays. New York, NY: Springer; 2022.

14.

Friendship

Lumsden

McMillan

Wilson

. Hematology and biochemistry reference values for Ontario swine. Can J Comp Med. 1984;48(4):390-393.

15.

Grasbeck

. The evolution of the reference value concept. Clin Chem Lab Med. 2004;42(7):692-697. doi:10.1515/CCLM.2004.118.

16.

Gr ä sbeck

Saris

. Establishment and use of normal values. Scand J Clin Lab Invest. 1969;26:S62-S63.

17.

Gurjanov

Vieira-Vieira

Vienenkoetter

Vaas

LAI

Steger-Hartmann

. Replacing concurrent controls with virtual control groups in rat toxicity studies. Regul Toxicol Pharmacol. 2024;148:105592. doi:10.1016/j.yrtph.2024.105592.

18.

Hackett

Kinderknecht

Niemuth

, et al. A factorial analysis of drug and bleeding effects in toxicokinetic studies. Toxicol Sci. 2019;170(1):234-246. doi:10.1093/toxsci/kfz092.

19.

Harstad

Andaya

Couch

, et al. Balancing blood sample volume with 3Rs: implementation and best practices for small molecule toxicokinetic assessments in rats. ILAR J. 2016;57(2):157-165. doi:10.1093/ilar/ilw023.

20.

Herlich

Taggart

Proctor

, et al. The non-GLP toleration/dose range finding study: design and methodology used in an early toxicology screening program. Proc West Pharmacol Soc. 2009;52:94-98.

21.

Hooijberg

Leidinger

Freeman

. An error management system in a veterinary clinical laboratory. J Vet Diagn Invest. 2012;24(3):458-468. doi:10.1177/1040638712441782.

22.

Horn

Pesce

. Reference intervals: an update. Clin Chim Acta. 2003;334(1-2):5-23. doi:10.1016/s0009-8981(03)00133-5.

23.

Horn

Pesce

Copeland

. A robust approach to reference interval estimation and evaluation. Clin Chem. 1998;44(3):622-631.

24.

Horn

Pesce

Copeland

. Reference interval computation using robust vs parametric and nonparametric analyses. Clin Chem. 1999;45(12):2284-2285.

25.

Huseby

Kindberg

Grostad

Berg

. Clearance of purified human liver gamma-glutamyltransferase after intravenous injection in the rat. Clin Chim Acta. 1992;205(3):197-203. doi:10.1016/0009-8981(92)90060-4.

26.

Kerlin

Bolon

Burkhardt

, et al. Scientific and regulatory policy committee: recommended (“best”) practices for determining, communicating, and using adverse effect data from nonclinical studies. Toxicol Pathol. 2016;44(2):147-162. doi:10.1177/0192623315623265.

27.

Lahrichi

Ratanasavanh

Galteau

Siest

. Effect of chronic ethanol administration on gamma-glutamyltransferase activities in plasma and in hepatic plasma membranes of male and female rats. Enzyme. 1982;28(4):251-257. doi:10.1159/000459109.

28.

Lesko

Atkinson

Jr.

Use of biomarkers and surrogate endpoints in drug development and regulatory decision making: criteria, validation, strategies. Annu Rev Pharmacol Toxicol. 2001;41:347-366. doi:10.1146/annurev.pharmtox.41.1.347.

29.

Lumsden

Mullen

. On establishing reference values. Can J Comp Med. 1978;42(3):293-301.

30.

Lumsden

Mullen

McSherry

. Canine hematology and biochemistry reference values. Can J Comp Med. 1979;43(2):125-131.

31.

Lumsden

Rowe

Mullen

. Hematology and biochemistry reference values for the light horse. Can J Comp Med. 1980;44(1):32-42.

32.

NC3Rs. Microsampling. Accessed June 3, 2024. https://www.nc3rs.org.uk/3rs-resources/microsampling

33.

Newby

Rodriguez

Finkle

, et al. Troponin measurements during drug development—considerations for monitoring and management of potential cardiotoxicity: an educational collaboration among the Cardiac Safety Research Consortium, the Duke Clinical Research Institute, and the US Food and Drug Administration. Am Heart J. 2011;162(1):64-73. doi:10.1016/j.ahj.2011.04.005.

34.

Nishmura

Teschke

. Effect of chronic alcohol consumption on the activities of liver plasma membrane enzymes: gamma-glutamyltransferase, alkaline phosphatase and 5’-nucleotidase. Biochem Pharmacol. 1982;31(3):377-381. doi:10.1016/0006-2952(82)90185-x.

35.

Pandher

Leach

Burns-Naas

. Appropriate use of recovery groups in nonclinical toxicity studies: value in a science-driven case-by-case approach. Vet Pathol. 2012;49(2):357-361. doi:10.1177/0300985811415701.

36.

Powles-Glover

Kirk

Jardine

Clubb

Stewart

. Assessment of haematological and clinical pathology effects of blood microsampling in suckling and weaned juvenile rats. Regul Toxicol Pharmacol. 2014;69(3):425-433. doi:10.1016/j.yrtph.2014.05.006.

37.

Powles-Glover

Kirk

Wilkinson

Robinson

Stewart

. Assessment of toxicological effects of blood microsampling in the vehicle dosed adult rat. Regul Toxicol Pharmacol. 2014;68(3):325-331. doi:10.1016/j.yrtph.2014.01.001.

38.

Prior

Andrews

Cauvin

, et al. The use of recovery animals in nonclinical safety assessment studies with monoclonal antibodies: further 3Rs opportunities remain. Regul Toxicol Pharmacol. 2023;138:105339. doi:10.1016/j.yrtph.2023.105339.

39.

Ramaiah

Tomlinson

Tripathi

, et al. Principles for assessing adversity in toxicologic clinical pathology. Toxicol Pathol. 2017;45(2):260-266. doi:10.1177/0192623316681646.

40.

Rambabu

Matsuda

Katunuma

. Studies on turnover rates of rat gamma-glutamyltranspeptidase after chronic ethanol administration in vivo. Biochem Med Metab Biol. 1986;35(3):335-344. doi:10.1016/0885-4505(86)90091-5.

41.

Reagan

. Troponin as a biomarker of cardiac toxicity: past, present, and future. Toxicol Pathol. 2010;38(7):1134-1137. doi:10.1177/0192623310382438.

42.

Sasseville

Mansfield

Brees

. Safety biomarkers in preclinical development: translational potential. Vet Pathol. 2014;51(1):281-291. doi:10.1177/0300985813505117.

43.

Satoh

Igarashi

Hirota

Kitagawa

. Induction of hepatic gamma-glutamyl transpeptidase in rats by repeated administration of aminopyrine. J Pharmacol Exp Ther. 1982;221(3):795-800.

44.

Sewell

Chapman

Baldrick

, et al. Recommendations from a global cross-company data sharing initiative on the incorporation of recovery phase animals in safety assessment studies to support first-in-human clinical trials. Regul Toxicol Pharmacol. 2014;70(1):413-429. doi:10.1016/j.yrtph.2014.07.018.

45.

Siska

Schultze

Ennulat

, et al. Scientific and regulatory policy committee points to consider: integration of clinical pathology data with anatomic pathology data in nonclinical toxicology studies. Toxicol Pathol. 2022;50(6):808-826. doi:10.1177/01926233221108887.

46.

Steger-Hartmann

Kreuchwig

Vaas

, et al. Introducing the concept of virtual control groups into preclinical toxicology testing. ALTEX. 2020;37(3):343-349. doi:10.14573/altex.2001311.

47.

Stresser

Kopec

Hewitt

, et al. Towards in vitro models for reducing or replacing the use of animals in drug testing. Nat Biomed Eng. 2024;8(8):930-935. doi:10.1038/s41551-023-01154-7.

48.

Terrett

Katavolos

, et al. Discovery of TRPA1 antagonist GDC-6599: derisking preclinical toxicity and aldehyde oxidase metabolism with a potential first-in-class therapy for respiratory disease. J Med Chem. 2024;67(5):3287-3306. doi:10.1021/acs.jmedchem.3c02121.

49.

Teschke

Neuefeind

Nishimura

Strohmeyer

. Hepatic gamma-glutamyltransferase activity in alcoholic fatty liver: comparison with other liver enzymes in man and rats. Gut. 1983;24(7):625-630. doi:10.1136/gut.24.7.625.

50.

Tian

. China is facing serious experimental monkey shortage during the COVID-19 lockdown. J Med Primatol. 2021;50(4):225-227. doi:10.1111/jmp.12528.

51.

Tripathi

Everds

Schultze

, et al. Deciphering sources of variability in clinical pathology. Toxicol Pathol. 2017;45(1):90-93. doi:10.1177/0192623316675766.

52.

Vap

Harr

Arnold

, et al. ASVCP quality assurance guidelines: control of preanalytical and analytical factors for hematology for mammalian and nonmammalian species, hemostasis, and crossmatching in veterinary laboratories. Vet Clin Pathol. 2012;41(1):8-17. doi:10.1111/j.1939-165X.2012.00413.x.

53.

Wang

Ward

. Opportunities and challenges of disease biomarkers: a new section in the Journal of Translational Medicine. J Transl Med. 2012;10:220. doi:10.1186/1479-5876-10-220.

54.

Weingand

Brown

Hall

, et al. Harmonization of animal clinical pathology testing in toxicity and safety studies the joint scientific committee for international harmonization of clinical pathology testing. Fundam Appl Toxicol. 1996;29(2):198-201.

55.

Yamada

Wilson

Lieber

. The effects of ethanol and diet on hepatic and serum gamma-glutamyltranspeptidase activities in rats. J Nutr. 1985;115(10):1285-1290. doi:10.1093/jn/115.10.1285.

56.

Yost

Downey

Ramos

. Nonhuman Primate Models in Biomedical Research: State of the Science and Future Needs. In: Yost

Downey

Ramos

, eds. The National Academies Collection: Reports funded by National Institutes of Health. Washington, DC: National Academies Press; 2023.

Virtual Control Groups in Non-clinical Toxicity Studies: Impacts on Toxicologic Clinical Pathology Data Interpretation

Abstract

Keywords

Introduction

Findings From Recent Virtual Control Group Efforts

Potential Implications of Virtual Control Groups on Clinical Pathology Data Interpretations

Potential Causes of Errors for Clinical Pathology Data With Virtual Control Group and Possible Mitigating Solutions

Opportunities for Virtual Control Group Utility: Dose-Range Finding Studies

Other Alternatives to Reduce Animal Use in Toxicology Studies

Conclusion

Footnotes

Declaration of Conflicting Interests

Funding

Ethical Approval

ORCID iD

References