Abstract
Throughout the last 50 years, the paradigm for carcinogenicity assessment has depended on lifetime bioassays in rodents. Since 1997, the International Conference on Harmonisation (ICH) S1B has permitted the use of a 2-year rodent bioassay (usually in the rat) and an alternative, genetically modified mouse model to support cancer risk assessment of pharmaceuticals. Since its introduction, it has become apparent that many of the stated advantages of the 6-month Tg mouse bioassay have, in actual fact, not been realized, and the concern exists that an albeit imperfect, 2-year mouse bioassay has been replaced by a similarly imperfect 6-month equivalent. This essay argues strongly that model systems, using cancer as the end point, should be discontinued, and that the recent initiatives, from the Organization for Economic Cooperation and Development and Institute of Peace and Conflict Studies, on “mode of action,” “adverse outcome pathways,” and “human relevance framework” should be embraced as being risk assessments based upon the available science. The recent suggested revisions to the ICH S1 guidelines, utilizing carcinogenicity assessment documents, go some way to developing a science-based risk assessment that does not depend almost entirely on a single, imperfect, cancer-based end point in nonrelevant animal species.
Introduction
Depending on your point of view, the carcinogenicity study, as currently practiced, is too long, too short, too sensitive, lacking specificity, or as good a model as we can manage currently. One thing that is irrefutable is the longevity of the design. There can be few other aspects of scientific endeavor that have been practiced relatively unchanged for over 50 years.
For almost as long as the test has been performed, we have understood it has shortcomings—as does any model. As far back as 1955 Anne Bourke, an employee of the Food and Drug Administration (FDA), was quoted as saying, “Positive results in an animal test can be taken as a suspicion that the chemical under study may be carcinogenic, but do not prove it to be so” (Lehman et al. 1955, 679–748), so despite the fact that we have known for a long time that the data generated from carcinogenicity data are suspect, we continue to be wedded to the rodent bioassay paradigm.
History of the Bioassay
The basis of our current approach to testing for carcinogenesis in animal models dates back to the contribution of Garth Fitzhugh to the black book in 1949. This proposed animal group numbers of 10 + 10 and high-dose groups with significantly greater exposure than were likely to be experienced by man as well as intermediate-dose groups (Lehman et al. 1949). There was a major design change in 1962 by John and Elizabeth Weinberger and a few more modifications along the way, but in principle this design has become the study that we are all familiar with today. It is notable that despite all the scientific advances that have taken place in the intervening 65 years, the 2 basic premises of the test remain unchanged. The first of these premises is that effects in laboratory animals are applicable to man unless proven otherwise and that using higher doses than we, as humans, are exposed to, over the shorter life span of a rodent, is a valid method of identifying the hazard under investigation.
It is worth noting that in historical terms the original outline of the carcinogenicity test sits somewhat closer to the discovery of penicillin than it does to the current day. Since that time we have come to understand that cancer is a disease brought about by a combination of evolution and genes and to be a product of culture, diet, and lifestyle. The one-hit approach, whereby a single DNA lesion could result in neoplasia was discredited and discarded as a paradigm for cancer many years past. What we knew of the carcinogenic process in 1949 compared to our current state of knowledge surely suggests that a reassessment of the original paradigm is substantially overdue. With this in mind, we have recently been examining how the existing designs for lifetime bioassays meet the requirements of an assay predicting the potential for cancer development in man. As somewhere between 1,500 and 2,000 bioassays of one sort or another have been carried out since their inception, we should surely be able to adequately identify where we need refinements and what, and if, changes are needed.
Modeling Cancer
It goes without saying that all models are a simplification of the “true” situation and that as soon as we begin to model, we begin to err. To quote George Box, “All models are wrong, but some are more useful than others” (Box 1975). We need to recognize that the purpose of the model should dictate the approach, and in the case of modeling cancer, we need to use a model that relates the outcome of cancer, and the preceding changes, in a meaningful way.
Although there are some rare cancers that are genetically inherited and can occur in early or middle age, cancer is predominantly a disease of old age. We recognize that the carcinogenic process involves changes (mutations) to target genes in progenitor cells. Following mutation, or epigenetic alteration, of target genes, clonal expansion, and subsequently genetic instability, results in an accumulation of mutations, which in time may present as neoplasia. This is a complex process, which goes a long way to explaining why cancer needs time to develop. The principles apply equally to animals and man, and therefore we should probably consider that a relevant model, which is utilizing cancer as the end point, will require enough time for this end point to develop.
With respect to this question, the most obvious starting point should be whether the current design of the lifetime bioassay is likely to represent an accurate model of carcinogenesis. In terms of designing a suitably sensitive study of carcinogenic potential, it would be difficult to disagree with any of the following standards (Hayes et al. 2011):
the animal model should be sensitive to the study end points, the model should characterize both the test chemical and the administered dose, the study should employ a challenging dose and duration of exposure, sufficient numbers of animals per dose group should be used to power the ability of the study to detect an outcome, the study should use multiple dose groups to detect dosage effects, there should be complete, peer-reviewed, histological evaluations, and the design should be adequate to allow evaluation of data using pairwise comparisons and analyses of trends that rely on survival-adjusted tumor incidence—assuming that you allow your animals to become sufficiently aged for this to be an issue.
Clearly, it will never be sufficient to look at the results of a bioassay in isolation; any bioassay will need to be assessed in a context that takes into account genotoxicity, structure activity relationships, results from other bioassays, and other mechanistic investigations, including toxicokinetics, metabolism, and genetics.
“Carcinogenicity Assessment Documents” (CADs) and the Immediate Future
Having recognized that the current paradigm has been in place and almost unchanged for 50 years, recent moves have been made to improve the testing regime. One of the considerations of the recent the International Conference on Harmonisation (ICH) S1 expert working group was to encourage an intercompany exercise to evaluate the potential of utilizing data from shorter-term studies for predicting a carcinogenicity outcome for small-molecule pharmaceuticals. The suggestion from this group is that a CAD should be prepared to address the carcinogenic risk of the new chemical as predicted by specific end points. The CAD would be submitted by the sponsor to justify a “waiver request” that would omit the conduct of 2-year rat carcinogenicity studies, if the evidence was that a 2-year rat carcinogenicity study would not add appreciable value to that assessment (Federal Register Vol. 78, No. 52).
The reasonable assumption being made by this proposal is that the accumulated knowledge of pharmacologic targets and pathways, together with toxicological and other data, can anticipate to a high degree the outcome of 2-year rat carcinogenicity studies. Whether this will predict the risk of human carcinogenicity of a given pharmaceutical is another matter altogether, and it could be argued that this new guidance fails at the basic level, as we already have a pretty shrewd idea that a carcinogenicity study may not provide a clear indication of human risk in these circumstances.
The proposal is to distribute new chemical entities (NCE) into 3 categories. These categories are as follows:
Category 1: So likely to be tumorigenic in humans that a product would be labeled as such, and a 2-year rat study would not add value. Category 2: The available sets of pharmacologic and toxicologic data indicate that tumorigenic potential for humans is uncertain, and so a 2-year rat study is likely to add value to human risk assessment. Category 3a: Likely to be tumorigenic in rats, but not in humans, through established, well-recognized, mechanisms known to be human irrelevant, so that a 2-year rat study would not add value. Category 3b: So likely not to be tumorigenic in either rats or humans that no 2-year rat study is needed.
Although a rat carcinogenicity study will no longer be required for NCEs in category 3, a transgenic mouse study or a 2-year mouse study would be required. In many ways, all these seem to be a bit counterintuitive. Surely, the whole point of the weight of evidence argument is that pharmaceuticals falling into 3a and 3b do not require any further testing, any more than those in category 1, thus the requirement for a mouse study seems completely against the stated objective of making a judgement based on a weight of evidence assessment of the available toxicology data.
In the first instance, this is to be an information-gathering exercise, stretching out to cover 50 or so submissions, and no waivers will be granted. This is expected to last for about 2 years but may be extended—almost certainly if previous form is followed. According to the Federal Register, these data are to be “requested.” It is unclear just how benign the nature of this request will prove to be.
It is questionable as to whether this series of proposals will have any material effect on the decision-making process in the pharmaceutical industry. It seems that classification into a category other than 2 is going to require concurrence of opinion from 3 fairly diverse standpoints—the CAD has to be made simultaneously available to the United States, European, and Japanese authorities, and it would appear that they all need to agree that a 2-year study is unlikely to add value before a waiver would be granted.
Given the rather dubious advantages attached to this exercise and the rather open-ended nature of the process, it seems quite probable that many companies will decide the effort involved outweighs any potential gains and will simply press on down the traditional route of 2 year rodent carcinogenicity bioassays.
The International Life Sciences Institute (ILSI) Initiative and the Strain of Choice for Replacement of the Mouse Bioassay
Doubtless everyone is by now familiar with the ILSI health and environmental sciences institute initiative to identify a new model for carcinogenicity testing. However, looking at this initiative retrospectively it does rather appear to have been an attempt to demonstrate that the new models would not prove to be oversensitive, an area of great concern when the models were initially under consideration. In this respect, the initiative succeeded very well. All 3 noncarcinogens that were tested in the multilaboratory collaboration proved to be negative in all 5 mouse models (p53 knockout, rasH2 transgenic model, TgAC transgene, xeroderma pigmentosum, complementation group A [XPA] homozygous knockout, and the neonatal mouse model), and the majority of rodent carcinogens, that had scored positively in the lifetime rodent bioassays were also negative in the transgenic models (Cohen, Robinson and MacDonald 2001).
In p53+/− mice, 10 out of 12 rodent carcinogens tested were negative, and the remaining 2 showed an equivocal response. For the rasH2 model, the response was comparable, 2 out of 11 rodent carcinogens scored positively, while in the Xpa/p53 double transgenic mouse model, none of the 6-rodent carcinogens were positive. Thus, it was concluded, with this rather limited sample, that the transgenic mice were not overly sensitive, and rather than being more subject to false positives than the standard bioassay, actually reduced the number of what were considered to be false positives. An important consideration at this point is the conclusion that chemicals classified as “rodent-only” carcinogens pose no risk to man, but more of this later. In terms of a direct comparison with the rodent lifetime bioassays, a review of the Tg bioassays suggested that they were similarly as sensitive to the detection of known human carcinogens as the lifetime bioassays (Alden et al. 2011).
When it came to detecting false negatives, only the rasH2 strain was capable of detecting the carcinogenic property of the human carcinogen, phenacetin. Phenacetin is regarded as a weak carcinogen, and at the time it was considered that the 6-month exposure period in the transgenic assay was insufficient to detect tumor induction, implying that a longer, albeit undetermined, period of dosing may have permitted this chemical to be successfully detected as positive. Estradiol, another known human carcinogen, was not positively identified in any of these assays. The point being made here is not that the transgenic bioassays have no use in screening for carcinogenicity of potential new chemicals, but rather that they show no scientific advantages over the existing 2-year bioassays in terms of sensitivity, although they do appear to show fewer positive outcomes for the rodent-only carcinogens and therefore may have increased specificity for detecting human carcinogens. However, there remain significant questions as to whether or not the assays are capable of performing to the standards that would be acceptable as a genuine alternative to 2-year rodent bioassays.
Shortcomings of the Current Paradigm
In terms of being a genuine alternative to the traditional rodent bioassay, it is pertinent to highlight the shortcomings of the current paradigm, and which aspects could potentially be improved by the use of the transgenic models. In support of current designs, it is stated that most known human carcinogens also cause tumors in rodents, but this is based on quite poor data. It is rather surprising to note that toxicology sets a far higher standard for in vitro genotoxicity tests than for the bioassay.
In fact, if we consider only confirmed human carcinogens, there is adequate data for only 10 chemicals. Of these 10 chemicals, 6 have been tested in a national toxicology program lifetime study in both rats and mice. This is to some extent down to the nature of known human carcinogens. The International Agency for Research on Cancer list of known human carcinogens comprises around 113 of what could perhaps best be described as items. The list includes such things as sunlight, betel nut chewing, and human viruses, but relatively few simple chemicals. Given the nature of many of these known human carcinogens, a simple comparison to a rodent model is not very practical, and as a result there is relatively little robust data supporting the acknowledged human chemical carcinogens, despite the large number of rodent carcinogenicity studies performed
Of the 10 that have been investigated, there were 4 human carcinogens that were negative in rat bioassays—arsenic, azathioprine, Myleran (busulfan), and nickel compounds—and 3 human carcinogens that were negative in mouse bioassays—aflatoxin B1, arsenic, and nickel compounds. When considering failings of the study designs to adequately detect the carcinogenicity of these compounds, it was concluded that “… some of these bioassays must be considered inadequate for judging the absence of carcinogenicity, since there were various limitations on the way they were performed: too few animals, too short a duration, too low exposure concentrations, too limited pathology.…” (J. Huff 1999, 56–79).
Although these shortcomings can be addressed by adjusting the study designs, the fact remains that under standard protocols the 2-year rodent bioassay appears to lack sensitivity, these studies indicate sensitivity of only 50–90%, depending upon how the data are analyzed, individually or the 2 species combined (Ennever and Lave 2003). More often this looks to be the result of insufficient exposure time, rather than dose. It is therefore questionable whether the current design really adequately predicts risk for human beings in their 70s and 80s, rather than those in their 50s and 60s. The extended 2½- to 3-year bioassays of the European Ramazzini Foundation of Oncology and Environmental Sciences on toluene, benzene, radiation, and aspartame (Soffritti et al. 2004, 2007) clearly indicate that the results of the normal 2-year bioassay potentially underestimate true carcinogenic potency, and true rodent lifetime studies may more accurately reflect the likely impact of some environmental chemicals (Haseman et al. 2001; J. Huff, Jacobson, and Davis 2008).
More significantly, the current bioassay is not designed to evaluate the impact of pre- and early postnatal exposures on later life. Current work on transgenerational effects, male-mediated teratogenesis, and the general critical impact of early developmental windows, all point to the need to look at a study design that encompasses a lifetime in the truest sense of the word when considering unintentional environmental exposure to pollutants (Dolinoy, Huang, and Jirtle 2007; Newbold and McLachlan 1982; Sonne et al. 2008; Swan 2006).
There are plenty of indications that, in terms of a purely epigenetic effect, there may well be some pediatric treatments in regular use that have not been correctly assessed in terms of long-term risk. Even very-short-term exposure of young rats at therapeutic doses leads to increased tumorigenesis in later life and a 20% reduction in life span. There is also quite extensive evidence of increased susceptibility of rats to induced tumors when given carcinogens during their neonatal period (Agrawal and Shapiro 2005; Toth 1968).
Given that the laboratory mouse and rat used in the bioassay are genetically, and metabolically, much closer to each other than they are to man, one is forced to ask the question as to how likely it will be that a positive carcinogen outcome in a single species would predict a carcinogenic outcome in man? It seems highly probable that a single species carcinogenic effect, in the presence of a negative outcome in the second species carcinogen bioassay, is likely to be pathway, and hence species, specific. This in itself should not necessarily be considered insignificant in terms of human risk assessment, as that pathway may well exist in man, even if only in a minority of the population. In contrast, a carcinogenic outcome in both rat and mouse bioassays is less likely to be pathway specific and to have more relevance to human risk assessment. The question really is whether this is the best, or the most efficient, way to elucidate such species-specific pathways.
The rasH2 Mouse as a Model for Carcinogenesis
Although the U.S. Environmental Protection Agency, European Pesticide Regulatory Agencies, or the UK Chemicals Regulation Directorate is displaying a caution that illustrates the scientific and regulatory resistance that exists for the blanket introduction of the transgenic assays as an automatic replacement for the 2-year mouse bioassay, the FDA in the United States and the European Committee for Proprietary Medicinal Products now allow the use of transgenic mice in the regulatory testing of pharmaceutical compounds as an alternative to a second lifetime bioassay.
Of the potential Tg models that are available, it would appear that the rasH2 mouse is the model of choice and may be the only alternative in vivo test of carcinogenicity in the future, if the CAD recommendations become a reality, although clearly the option of carrying out a standard 2-year mouse bioassay will still be available.
The rasH2 mouse carries 3 copies of a normal (nonmutated) human H-ras gene under the control of its own human promoter. After exposing these mice to chemical carcinogens, the expression of H-ras is increased approximately 2-fold. The increased expression of the human wild-type ras gene, in conjunction with other changes induced by a carcinogenic compound, is considered to increase the predisposition of the mouse to develop tumors. Since these additional changes can be genetic, as well as epigenetic, the rasH2 model is expected to respond to both genotoxic agents and those acting through a nongenotoxic mode of action. RasH2 transgenic mice have been shown to start developing tumors spontaneously at the age of 6 months with the most commonly encountered being lung adenomas and adenocarcinomas, papillomas in the forestomach, harderian gland adenomas, splenic hemangiomas and hemangiosarcomas, and lymphomas (Nambiar, Turnquist, and Morton 2012).
One of the reasons that this strain has been deemed to be a particularly suitable model is because the overexpression of ras protein and mutations in the ras gene are commonly observed oncogenic changes in human tumors, such as those that arise in the pancreas, colon, and lung (Tamaoki 2001). Defects in ras genes are also frequently observed in spontaneous and chemically induced tumors in rodents. For some human cancers, such as exocrine pancreatic cancer, the incidence of ras mutation can be as high as 80%, whereas for other cancers, such as breast and colon, it can be as low as <10% (Bos 1989). Overall, however, ras defects are present in a minority (∼16%) of human tumors, and even then are only one of the many mutations that make up the average neoplasm (Prior, Lewis, and Mattos 2012). Thus, it is a little disingenuous to infer that the introduction of this transgene in some way humanizes the neoplastic process in this mouse strain.
The greatest concern with regard to the design of these studies is the length of exposure. As previously stated, neoplasia is known as being a human disease of old age, with the vast majority of neoplasms arising in people over 50 years of age. Experimental studies investigating the time-related onset of spontaneous tumors have shown that few appear in rodent studies before 18 months into a study, a pattern that appears to be repeated for nongenotoxic, chemically induced neoplasms.
If we are really looking for effects in old animals, it could be argued that even the current 2-year designs are inadequate, as they could be considered to terminate the animals at a time when they are only just the wrong side of middle age. However, since sufficient survival, even to 2 years, can be problematical for some of the strains used, considerable modification of study design, particularly in terms of dietary administration, would be needed to correct for this problem. If the length of exposure to a chemical and the age of the animal are as important as they may be, then to reduce this potentially highly significant factor further would seem to be a clear indication that using the rasH2 mouse as a model for carcinogenesis has at least this one major shortcoming.
There are several examples of pharmaceutical products that have a latency well in excess of 6 months. Peroxisome proliferator–activated receptor γ agonists, for instance, only induce fibrosarcomas/hemangiosarcomas at quite a late stage. These are tumors that do not have ras mutations as a common mutation, so it is highly questionable whether the model would be sufficiently sensitive for this type of compound. The relationship of chemical exposure to the end point measured in the bioassay is really the point of contention here. If the models were measuring other end points, such as induced organ growth or induced cell proliferation, for example, then clearly a 6-month end point would be sufficient to sensitively detect such changes. The unknown factor in the design of a transgenic mouse bioassay is whether a 6-month exposure period will be sufficient to detect cancer as an end point, when everything that we know about the natural history of cancer, and its biogenesis, suggests that the process requires a sufficiently long period of time for the multiple genetic mutations to accumulate within the neoplasm (J. Huff, Jacobson, and Davis 2008). Although the introduction of the ras transgene has effectively removed one stage in the process of neoplasia, whether 6 months can be considered a sufficient time to permit the development of cancer by low-potency carcinogens is open to question.
In support of this assertion, consider the nonresponsiveness of the rasH2 transgenic animal model to phenolphthalein, a chemical that is clastogenic and positive in the bone marrow micronucleus test. In standard rodent bioassays, this chemical causes an increase in adrenal pheochromocytomas and kidney adenomas in F344 rats, lymphomas and sex cord stromal tumors in B6C3F1 mice, and thymic lymphomas in a 6-month p53 knockout mouse bioassay. This also raises the possibility that there may well be a sensitivity issue with this model, over and above the question of the duration of testing.
Many of the positive points made by the proponents of the rasH2 model are well founded, if not necessarily compelling, but there is one claim that may be readily challenged, that is, will it really improve the prediction of relevant cancer risk for man? If the assumption that rodents adequately predict human risk is incorrect, it is hard to see why developing a model of the original standard will give a better approximation than the original test. Indeed, we are far from persuaded that there is any hard evidence that this transgenic mouse model is better, and there is some evidence to suggest that it may be worse, at predicting human carcinogenicity than the existing rodent bioassays.
As noted above, much of the stimulus to develop new models has been predicted by the desire to decrease the number of what are assumed to be false positives, but it is far from clear that decreasing the sensitivity will guarantee better predictivity. Neither is it clear that such a move should be acceptable nor desirable to the consumer, in this case the regulatory agencies, and more importantly to society as a whole.
It is scientifically hard to justify that modeling the standard 2-year bioassay advances the argument; to do that you really need to change the basic concept, not revisit the way you test for the same end point. While we subscribe to the argument that the transgenic bioassay is a potentially useful model, we would suggest an alternative approach and use alternative end points clearly related to the induction of cancer, but that don’t require extended dosing periods to be demonstrable.
Traditional Bioassay, Pros and Cons
If we accept that there are some areas of concern over the use of the rasH2 mouse model, are the arguments for maintaining the status quo sufficient to make the current 2 species rodent bioassay paradigm the most desirable way forward?
Before we discard the current paradigm completely, it may be pertinent to consider how we might assess the validity of rodent bioassays and whether they can be improved upon. In terms of trying to validate the standard bioassay, it is clear that the public cannot be exposed to potential carcinogens, but what about a near relative? How well does a comparison of primate and rodent carcinogenicity data for chemicals stand up to? Our best tool in this respect is the carcinogen potency database (CPDB), which potentially offers us the opportunity for just such a comparison (Gold et al. 1999, 2005).
Approximately half of the 22 rodent carcinogens reported in the CPDB were negative in the monkey species used—most probably due to the low doses used. The available data showed that the liver was the most common target site of chemically induced cancer in both primates and rodents (Gold et al. 1999). This is despite the fact that, in contrast to the situation in rodents, the control groups of primates had no spontaneous liver tumors in these lifetime studies. This suggests that the presence of the spontaneous liver tumors in the rodents, especially in some strains of mice, does not compromise their ability to detect hepatic carcinogens, as has been argued in the past (Carmichael et al. 1997).
For chemicals that were positive in both monkeys and rodents, there was a common target site between primates and at least one rodent species for all the chemicals tested, except for the alkylating chemotherapeutic drug melphalan, which was interpreted as being carcinogenic in monkeys only on the basis of “all malignant tumors” (Gold et al. 1999). Given the available information, which admittedly is less than ideal, the indications are that the sensitivity of the rodent bioassay is good. What is less certain, and what the general criticism revolves around, is the question of specificity (production of false positives), and unfortunately there is really no objective way of estimating this (Ennever and Lave 2003).
On the basis of these data, one could perhaps conclude that the bioassay has been reasonably successful—but if that is the case, is there potential to improve or streamline current practices. Two-year rodent carcinogenicity studies use large numbers of animals (approximately 1,000 per compound for 2 species studies) and, at least in preclinical terms, are reasonably expensive studies to conduct. One way to reduce the cost of these studies is to consider whether we could reduce animal use.
The rat bioassay has, by and large, not been the subject of intense critical appraisal, but there have been several reviews of the doubtful contribution of long-term studies in the mouse to the risk assessment process in man.
The review of Doe and associates suggested that the use of the mouse does not provide significant additional contribution to risk assessment (Alden et al. 1996; Doe et al. 2006; Ward 2007; Osimitz et al. 2013). Based on these reviews, there has been a strong case made for not maintaining the requirement for the mouse bioassay and focusing more on a better understanding of the mode of action in producing a toxic event when assessing carcinogenic risks from chemically induced toxicity (Boobis et al. 2006; Sonich-Mullin et al. 2001). It has also been suggested that carcinogenic risk assessment should be performed on the basis of a life span study in a single rodent species, in combination with short-term genotoxicity tests and mechanistic information (van Oosterhout et al. 1997).
Other paradigms that have been mooted as an alternative to the current life span bioassays in 2 rodent species, for assessment of human pharmaceuticals, include a long-term study in the rat supplemented by a short- or medium-term in vivo rodent test, such as a model of initiation promotion. The most recent proposal is the rat/transgenic protocol combination that is before us currently (Flammang et al. 1997; McClain et al. 2001; ICH S1B 1997).
Much of the antithesis toward the mouse as a model is based on the high spontaneous rate of liver tumors in male B6C3F1 mice. We would suggest a paradigm that uses male rats in combination with female mice as a study design. This is not a new idea and the available data indicate that this study would deliver a sensitivity of 90% (compared to a 72% positive rate in the rat bioassay alone), while simultaneously avoiding the issues associated with the concerns over the B6C3F1 male (Gold et al. 1989). As such we feel this idea is well worth revisiting, as the sensitivity is probably better than would be achieved from the rat/transgenic combination and provides a significant reduction in animal usage.
One thing that is still immutable is the passage of time and there is nothing to be done with regard to the duration of the study to reporting, often quoted at around 3 years—but in a properly planned program there is no reason this should delay the regulatory process.
Interpretation of Study Data
Generation of data from carcinogenicity studies is frequently less of an issue than the interpretation of equivocal results. There is a vast database for both the B6C3F1 and CD1 mice and for the Sprague-Dawley, Wistar, and F344 rat. Pathologists are familiar with the interpretation of such data and in making risk assessments on the basis of the data in the face of both positive and negative results. In the past, the maximum tolerated dose (MTD) was required as the top dose (Food and Drug Administration 2008), and this was broadly criticized scientifically because many of the compounds tested appeared to be carcinogenic only following exposures at these dose levels (Apostolou 1990; Carr and Kolbye 1991; Haseman and Seilkop 1992; McConnell 1989). Because of the associated toxicity resulting from these high doses, the data generated from such studies could be complex, but again pathologists were used to dealing with this and became skilled at differentiating direct from secondary effects of treatment. However, because of the complications inherent in the use of the MTD approach, refinements in dose selection, at least for pharmaceuticals, have led to the application of more rational doses based upon pharmacokinetics, toxicity, pharmacodynamics, and maximal feasible dose of the chemical under investigation (ICH 2008).
Experience has also permitted the discrimination between spontaneous background pathology and compound-induced effects. While scientific debate will doubtless continue to occur where equivocal increases in the incidences of common background pathologies are found, experience informs the debate in the majority of cases. The experience of the mass of the profession as a whole, together with the availability of detailed background historical data on tumor incidences, enables rational debate in the majority of cases (Deschl et al. 2002; Keenan et al. 2009).
There was an initial assumption that interpretation of the transgenic studies would prove more straightforward. However, initial claims of a lack of spontaneous tumors appearing in untreated control transgenic mice seem to have been overly optimistic, and it is clear that spontaneous neoplasia will continue to plague the interpretation of treatment-related cancer in the Tg assays as it does with standard 2-year bioassays, even if to a lesser extent. Compounds are still required to be dosed at high levels in the Tg assays so that the dose-related toxicities that plagued the 2-year rodent bioassays continue to be relevant to the 6-month transgenic mouse bioassays.
Of concern in the shorter term is the comparative lack of experience that pathologists have in interpreting changes in the rasH2 strain, in terms of both their histological diagnosis and their pathological significance. Even once this has been overcome, the extrapolations needed in the interpretation of the data are likely to be as great as in the traditional assay, and the probability of induction of strain-specific tumors does not appear to be materially different.
The rodent bioassay has also been criticized for being an experimental black box, in that it fails to generate the mode of action information. This is patently untrue—although it may not necessarily generate mechanistic data that are directly relevant to man, there are frequent indications from the nature of any tumors induced as to likely pathways. No one who has seen prolactin-induced mammary tumors could seriously doubt that. In addition, shorter-term studies, used to set doses for the 2-year bioassay, are indicative of much, although not all, of the target organ toxicity observed in the 2-year bioassay (Reddy et al. 2010; Jacobs 2005)
The Case for Abandonment of the Bioassay
Given that nearly half of all animal bioassays generate positive results, it would appear that there are many more rodent carcinogens than we currently suspect are human carcinogens. Beyond this issue, it is necessary to question not just the quality of the data, but how it is utilized. In a retrospective analysis of 182 pharmaceutical candidates, only 8% of positive findings from the carcinogenicity studies were not predicted by a positive genotoxicity result, data from rodent toxicity studies, or from their significant disruption of hormonal homeostasis. This 8% proved to operate through rodent-specific mechanisms (Sistare et al. 2011).
However, none of this appears to have significantly affected the regulatory process, as all these candidates subsequently became marketed products. In fact, some 56% of marketed drugs are reported to have a positive carcinogenicity finding in at least one species, but of the drugs currently labeled as possible human carcinogens, 15% are negative in carcinogenicity studies (Van Oosterhout et al. 1997; Alden et al. 2011; Friedrich and Olejniczak 2011). It would thus seem that the data generated from rodent bioassays are far from being as critical as one might imagine, at least with respect to pharmaceuticals.
This leads one to a consideration of the overall value of conducting these types of studies: if the data are apparently considered insufficiently meaningful to direct the regulation of many pharmaceuticals, does this mean that we should abandon the assay? We know that the rodent bioassay, when properly conducted, is an accurate assay for detecting both genotoxic and nongenotoxic rodent carcinogens, but it is unrealistic to expect that it will be an absolute model to detect human carcinogens—after all, it is a model. The fact remains that they have a rodent focus, and if we are really to improve our risk assessment for man we should probably focus on human carcinogenicity and not rodent carcinogenicity.
The tests we currently have for the assessment of likely genotoxic carcinogens, while not perfect, are very good. The current battery of in vitro and in vivo tests does well in terms of sensitivity and specificity, and new tests promise to improve that further in the immediate future. Unfortunately, there is no such clear strategy in place for the assessment of nongenotoxic carcinogens, other than the rodent bioassays.
Many nongenotoxic mechanisms are now well established, and we have a secure understanding of many of the effects of exaggerated pharmacological activity. Data from the long-term use of the contraceptive pill, hormone replacement therapies, and from transplant patients on long-term immunosuppression have clearly identified the risks attached, but importantly we have been able to contextualize those risks and manage the expectations arising from the necessary use of these chemicals. In terms of management of these risks, it is well within our grasp to assess alterations to hormone levels and to measure chemical-induced effects on the immune system, both in vivo and in vitro.
We have begun to take tentative steps toward a weight of evidence approach to the risk assessment process. We are well positioned to investigate the effects of repeated hits on the same molecular target, perform accurate metabolite identification, identify receptor presence and density, and investigate the effects of overloading of defense and repair mechanisms. Clearly, our knowledge of the molecular changes involved is incomplete, but with the advent of the “omic” technologies we should be in a position to investigate these, and other molecular pathways, sooner rather than later.
If we assume that the best model for humans is “human,” then it follows that scientific progress down that path should lead to the abandonment of the long-term rodent bioassay. Before this can become a reality, we need to develop “pathway-based” methods to mimic human biology, rather than rely on our current dependence upon the results of animal testing for our predictions of human toxicity. It is equally clear that this is not a simple undertaking and that this modeling process will certainly involve multiple, interconnected, pathway-based assays to make predictions. These methods may not be one-to-one replacements of the animal-based tests, but will instead predict likely safe exposures for specific toxicity pathways, rather than target organ toxicity per se (Lin and Will 2011; Groh et al. 2015; Jackson 1995; Li and Chan 2009; Vinken 2013).
In the 1960s, when the basic outline of the carcinogenicity study design was established (Jacobs and Hatfield 2013), in vitro and in silico approaches were largely nonexistent. In the last decade in vitro and in silico approaches, using human cells and human-derived data, have become sophisticated, and physiologically, and toxicologically, relevant. Despite these advances very few human-based tests have achieved formal validation. This is largely because the validation process is too time-consuming and relies too much upon animal in vivo data as a gold standard comparator for the in vitro approaches; tests that themselves have never been subjected to anything like the same scrutiny.
The science and technology of 21st-century toxicology is changing too rapidly for a validation process that typically takes a decade or more to complete. Validation procedures must be streamlined to accommodate the pace of change of today’s science, and be sufficiently flexible to accommodate a process of continual improvement in testing methods, rather than the step change approach we are currently working with. In short, the current validation processes are unsuited to the needs of 21st-century toxicology (National Academy of Sciences 2007; Hartung 2009; Andersen and Krewski 2009). As a first step, we require to have a vision of how in vitro tests will be applied to determination of exposures. Given the possible number of tests, some lowering of the regulatory bar will be required, which may require changes to environmental and regulatory law.
Conclusions
On the face of it, there is really no reason why we should not trust ourselves to make a sound value judgment based upon the known toxicity of test items. The reason we seem unwilling to go down this path is most likely down to the human condition. We are generally very poor at making valid risk assessments. Society usually overestimates risk in situations where we rely on others for quantification on our behalf, or are dependent on the judgment of an external subject. For instance, the majority of people perceive driving as safer than flying, despite all the evidence to the contrary.
Public demand weighs heavily on governmental agencies to control this risk. Our fear of cancer has resulted in a conservatism that has led to the perpetuation of the rodent carcinogenicity study—despite the knowledge that it has many shortcomings in terms of both true risk assessment and hazard identification. This is a reality that will mean that the bioassay is likely to continue in some shape or form, rightly or wrongly, for some time to come. In conclusion then it seems correct to accept this reality and consider the best short-term fix.
If we are looking to replace the standard bioassay, at the very least the new assay should limit, and preferably eliminate, the shortcomings of the current 2-year rodent bioassay. It is therefore only reasonable to expect the alternative assay to successfully identify known, or strongly suspected, human carcinogens. The new assay should also identify compounds, negative in the lifetime rodent bioassays, as noncarcinogens, and if we are looking for improvements then the alternative model should also test negatively for the so-called false positives that appear in the standard bioassay.
The proposed new approach, using a 2-year rat bioassay supplemented, or replaced, by a 6-month transgenic mouse bioassay, does not really signify a new way of doing things at all since we are still wedded to using a rodent in some shape or form, in an extended testing regime, to assess carcinogenic potential. This cannot really be considered to represent a paradigm shift, rather a refinement of our current approach. The basic question is whether this refinement is a true improvement. If we accept the premise that the mouse bioassay is an imperfect test, then the rationale for replacement with another imperfect assay, the 6-month transgenic bioassay, remains unclear. There is acceptance that rodent bioassays in general do not provide unequivocal proof of potential human carcinogenicity, nor do they, by themselves, appear to prove any specific mode of action. If we consider the criticisms leveled at the mouse 2-year bioassay and critically appraise the improvements offered by the transgenic alternative, then we are left with the conclusion that the latter fails to live up to the requirements of a replacement for the 2-year rodent bioassay in many respects. In terms of improvement, the transgenic mouse models do, of course, use a smaller number of animals and can be reported more quickly, with all that implies for cost reduction. However, this seems insufficient justification, given the basic failings of the model, which surely should be the primary consideration.
The shortcomings of the current 2-year rodent paradigm with regard to specificity seem likely to remain unclear, it seems rather more obvious that there may well be a question as to whether there is sufficient sensitivity to allow for detection of potential carcinogens. It remains to be proven incontrovertibly that most so-called rodent carcinogens do not pose a risk to man. This is likely to be unknowable in the immediate future, as it would require decades of observation before incontrovertible human evidence of cancer became apparent for any of these rodent-only carcinogens in the population. In the absence of any clear evidence to the contrary, it is surely unethical to be conducting experiments on the public at large, if animal evidence indicates a potential risk. Although the possibility that the period of testing may be inadequate is likely to strike fear into the heart of most pharmaceutical chief executives, the evidence is clear enough. If we are to continue down the route of the rodent lifetime bioassay, we should at least perform the test in the most honest way possible and learn how to manage the data that this generates.
There is a strong argument to be made that the science should move away from trying to establish human carcinogenicity risk with bioassays, especially in the case of environmental contaminants such as agrochemicals. In its stead, we should move toward a more complete use of all of the available data present from in vitro and in vivo assays that we have available, in order to generate a stronger scientific basis for the risk assessment. To put it another way, we need to change the basic approach, rather than revisit the way we test for carcinogenesis.
Returning to the start of this discussion, we suggested that depending on your point of view, the carcinogenicity study, as currently practiced, is too long, too short, too sensitive, or lacking specificity. Clearly, however, it is not as good a model as we would like to have at our disposal. We would suggest that efforts to develop meaningful in vitro methodologies need to be redoubled and that a sensible strategy toward validation be developed. In the meanwhile, the rodent bioassay should be refined to a multispecies (male rat/female mouse) design with true lifetime exposure and that whatever the question, the transgenic mouse assay is not the answer.
Footnotes
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
The author(s) received no financial support for the research, authorship, and/or publication of this article.
*
This is an opinion article submitted to the Regulatory Forum and does not constitute an official position of the Society of Toxicologic Pathology or the journal Toxicologic Pathology. The views expressed in this article are the sole opinions of the authors and do not represent the policies, positions, judgments, or guidances of the author’s employers including the U.S. FDA. The Regulatory Forum is designed to stimulate broad discussion of topics relevant to regulatory issues in toxicologic pathology. Readers of Toxicologic Pathology are encouraged to send their thoughts on these articles or ideas for new topics to
.
Author Contribution
Authors contributed to conception or design (ND and JF); data acquisition, analysis, or interpretation (ND and JF); drafting the manuscript (ND); and critically revising the manuscript (JF). All authors gave final approval and agreed to be accountable for all aspects of work in ensuring that questions relating to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
