The NTP 2-year bioassay: Controversies in counting rodent tumors to predict human cancer

Abstract

In late 2016, an examination of the 60 studies constituting the 2-year National Toxicology Program (NTP) inhalation study database conducted for the purpose of evaluating the clinical relevance to humans of chemical induction of bronchioloalveolar carcinomas in mice was initiated. From an anecdote of unknown origin that tumor concordance between rats and mice was reported to be 70%, therefore, the very high level of discordance in tumor formation and tumor site between rats and mice that the inhalation data actually showed was a surprising but interesting outcome. Further examination of the NTP inhalation database led to an initial publication in 2017.¹ The inability to extrapolate the results from the inhalation route to the other routes of exposure (feed, gavage, drinking water, dermal, intraperitoneal injection) in the NTP database motivated Smith and Perfetti to analyze the studies on these other routes of exposure in terms of rat–mouse tumor incidence and site concordance and correlation of rodent tumorigenicity with Ames test results and, similarly, with other tests of genotoxicity as reported by the NTP.^2,3

Over the last several years, Smith and his colleagues have closely followed the deliberations of the NTP expert panel regarding the potential genotoxicity and tumorigenicity of 1-bromopropane. From this experience, the emphasis placed by the NTP on historical data rather than on more recently conducted, state-of-the-art, GLP genotoxicity assays was striking. In the analysis of the feed, gavage, drinking water, dermal, and intraperitoneal injection 2-year NTP studies, Smith and Perfetti found a poor correlation between positive Ames test results and the development of rodent tumors and, concomitantly, a similarly poor correlation between negative Ames test results and the absence of rodent tumors. The inability to determine the quality of the Ames tests and other tests of genetic toxicity confounded the evaluation of whether the Ames test data were actually displaying poor predictive power or whether the Ames test data contained an inherently high error rate that interfered with the statistical analysis.

Given the limitations of the data in the NTP database, Smith and Perfetti shifted the focus of the analysis to the molecular determinants of rodent tumorigenicity. Smith and Perfetti had a long-standing collaborative relationship with Dr Corwin Hansch,^4
–6 the originator of the QSAR concept and methodology, and following his passing, with his former postdoctoral fellow Dr Rajni Garg and her former postdoctoral fellow Dr Gene Ko. In collaboration with Drs Garg and Ko, Smith and Perfetti correlated the following for all the chemicals in the NTP database: Ames mutagenicity, structural alerts of carcinogenicity, Hansch QSAR parameters (ClogP, CMR, MgVol), tumor site concordance/multiplicity, and tumorigenicity rank. One of the conclusions was that since the clinical relevance of rodent tumors to human tumors cannot be determined from this data set, development of a comparative scale wherein the rodent tumorigenicity of a chemical can be compared under similar conditions to the rodent tumorigenicity of a similar chemical is the best that can be accomplished at this time, albeit of limited usefulness in a regulatory setting.

Following completion of the collaboration with Drs Garg and Ko,⁷ a meeting was held with the USEPA Risk Assessment Division. At this meeting, EPA personnel expressed their confidence in EPA’s predictive program termed Oncologic. Following the meeting with EPA, Smith and Perfetti examined the predictive ability of the Oncologic expert system.⁸ The degree of correlation between tumors predicted by OncoLogic™ (Oncologic) and actual tumor formation as observed in the NTP 2-year rodent studies are lower for “justification reports” that incorporate historical data than for “data reports” that do not. The correlation between the ordinal ranking of the observed carcinogenicity of parent NTP chemicals and the predicted “level of carcinogenicity concern” from the justification reports obtained from Oncologic is poor (r = 0.56). Similarly, the correlation between the ordinal ranking of the carcinogenicity of metabolites from parent NTP chemicals and the predicted “level of carcinogenicity concern” from the justification report obtained from Oncologic is also poor (r = 0.43). In contrast, the correlation between the ordinal ranking of the observed carcinogenicity of parent NTP chemicals and the predicted level of carcinogenicity concern from the data reports obtained from Oncologic is comparatively better (r = 0.75). The correlation between the ordinal ranking of the carcinogenicity of metabolites from parent NTP chemicals and the predicted “level of carcinogenicity concern” from the data reports generated from Oncologic is also comparatively good (r = 0.68).

In the last paper of the series, Smith, Perfetti, and colleagues highlight the importance of the role of mitogenesis in mutagenesis but did not provide an historical perspective as to how this hypothesis was developed.⁹ In the 1980s, Cohen, Ellwein, and colleagues conducted a series of studies that demonstrated that cellular proliferation could amplify the background mutation rate thereby increasing tumor formation in experimental animals.^10

–13 The studies of Moolgavkar and Knudson also played an important role during this era.¹⁴ Throughout the 1990s, Ames and Gold incorporated these new findings into their thinking resulting in a series of publications, one of which is highlighted by Smith and Perfetti, that is, Ames and Gold.¹⁵ This paper published in Science was actually a commentary on the Cohen and Ellwein paper published in the same issue.¹⁰ Similarly, Smith and Perfetti cited the recent paper by Tomasetti and Vogelstein¹⁶ but did not provide the historical background that their conclusions were basically in concert with those expressed by Armitage and Doll in 1954,¹⁷ and neither do they note criticisms of the analysis by Tomasetti and Vogelstein in their paper. Both Armitage and Doll¹⁷ and Tomasetti and Vogelstein¹⁶ held stem cell number and proliferation rates constant. However, either can be influenced by environmental factors. A broader model incorporating the potential environmental influence on cell number, proliferation rates, and mutation rates (secondary to DNA reactive carcinogens) formed the basis for the Cohen and the Moolgavkar models.

The diversity of statistical methods employed, and number of comparisons made, makes it difficult for an outside reviewer to evaluate the mathematical aspects regarding the quality or accuracy of the extensive statistical analyses conducted by Smith and Perfetti. However, several general limitations of the statistical approach employed by these authors are relevant. First, the fundamental problem with the NTP database is rooted in molecular genetics and cannot be addressed directly via a statistical approach, that is, rodent tumors developing from short-term high-dose chemical exposures are not pathogenetically, molecularly or etiologically comparable to human tumors developing from lower dose chemical exposures (or to accumulation of either lifestyle-induced or random DNA mutations and repair errors) occurring over decades-long time frames. Smith and Perfetti were only able to address the clinical inapplicability of the rodent tumor data indirectly, that is, through their demonstration of the high degree of discordance in tumor development and organ site location between rats and mice despite their relative phylogenetic proximity compared with humans. Second, the restriction of considering any positive Ames test result as indicative of a positive genotoxicity result overall, while a practice of many NTP expert panels, necessarily introduces some unquantifiable degree of error into the statistical analysis. Third, while only a secondary consideration to the authors, the heterogeneity of “genetic toxicology assays other than Ames” limits the interpretability of these reported correlations. (Smith and Perfetti deemphasized these tests after the inhalation and feed route analyses in recognition of the interpretability issue.)

The authors’ development of an ordinal rank scale for the relative tumorigenicity of chemicals in rodents facilitated several interesting correlations. First, positive structural alerts of carcinogenicity results are strongly associated with ordinal ranks of increased tumorigenicity. Second, MgVol showed an average increase with ordinal rank of tumor potency. Therefore, smaller molecular volumes were associated with higher levels of tumorigenicity, an observation consistent with steric considerations. Third, positive Ames test results correlated with categorical and ordinal ranks of increased tumorigenicity. The rank sum test shows that the trend in Ames versus ordinal ranking is highly significant when all routes of administration are combined, somewhat overcoming the false positive and false negative rates observed when the Ames data are examined on a route-by-route basis. This is a very important observation in relation to the quality of the statistical analysis in that positive Ames tests correlating with more tumor induction in the rodents ameliorate the concern that the report of a single positive Ames test classifies a chemical as being considered positive in the Ames test. This is particularly critical for Ames assays performed before standardized methods, and interpretation criteria were established over the past few decades by OECD. From a regulatory rather than mechanistic standpoint, the self-referential nature of such an ordinal rank scheme questions its potential usefulness. Specifically, if the NTP studies are not providing a clinically relevant representation of human cancer risk, is it particularly useful to rank order the results?

In addition to the statistical limitations noted above, Smith and Perfetti noted an apparent anomaly in the fourth paper of the six-paper series without providing strong support for its putative incongruity, that is, no apparent relationship between ClogP and categorical or ordinal ranking of rodent tumorigenicity is seen. The mean ClogP for Ames positive chemicals was 1.424 (154 observations). The mean ClogP for Ames negative chemicals was 2.046 (325 observations). As noted above, it was demonstrated that chemicals testing positive in the Ames test tend to possess higher ordinal ranks for rodent tumorigenicity. Since chemicals testing positive in the Ames test had a statistically significantly lower ClogP value than Ames negative chemicals, the absence of an apparent relationship between higher ClogP values and rodent tumorigenicity is to be expected.

Recently, the USEPA has embarked on a multiyear program to reduce or eliminate the use of laboratory animals. A similar effort is being undertaken by the pharmaceutical industry through the International Conference on Harmonization.¹⁸ A result reported by the authors in the fourth paper of the series of six papers⁸ might be particularly relevant to this effort by EPA. In table 2 and figures 7 and 8, Smith and Perfetti showed the relationships between structural alerts of carcinogenesis, categorical rank (1–48), and ordinal rank (1–135). They reported their results as follows:

The Mann–Whitney–Wilcoxon rank sum test shows that the trend in structural alerts versus category ranking is highly significant (Z = −7:03; p value near 0); that is, positive structural alerts results are strongly associated with categorical ranks of increased tumorigenicity. The Mann–Whitney–Wilcoxon rank sum test shows that the trend in structural alerts versus ordinal ranking is highly significant (Z = −7:02; p value near 0). That is, positive structural alert results are strongly associated with ordinal ranks of increased tumorigenicity.

In the sixth and final paper of the series, Smith and Perfetti⁹ provide the following information regarding the sensitivity and specificity of using structural alerts of carcinogenicity to screen chemicals from the NTP database:

One hundred thirty-four of the 479 chemicals tested by way of inhalation, feed, gavage, drinking water, dermal administration, or intraperitoneal injection were negative [for tumor induction] in male and female rats and in male and female mice. Fifty-four of these 134 chemicals were ubiquitously negative for neoplasia; but nonetheless, contained a structural alert representing a false-positive rate of 40% (54/134). There were 330 chemicals that induced at least one tumor. Of these 330 chemicals, 54 chemicals did not possess a structural alert for carcinogenicity resulting in a false negative rate of 54/330 (16.4%).

From these results, one wonders if the current extremely expensive and time-consuming process of conducting 2-year rodent cancer bioassays is providing results superior to the consideration of structural alerts of carcinogenicity.

While the efforts of Smith, Perfetti, Garg, Ko, and Anderson represent an extensive statistical analysis of the NTP database, statistical analysis can only provide a retrospective critique of the extant data, but it does not address the major problem, that is, the mechanisms underlying the formation of human cancers are poorly modelled by chronic studies conducted in rats and mice. Although most known human carcinogens are positive in the rodent bioassay, the reverse is not true. Numerous chemicals are positive in the rodent bioassay producing tumors by modes of action not relevant to humans or only at doses that are not relevant to human exposures. Additionally, a significant shortcoming of the NTP studies is its almost exclusive reliance on genotoxicity as being the sole mechanism of carcinogenicity. Clearly, carcinogenicity can also occur via epigenetic mechanisms.¹⁹

In the future, the screening of chemicals should incorporate the most recent knowledge of human tumor biology. While the recommendations of Smith, Perfetti, and colleagues are consistent with their analysis, these recommendations only represent an incremental step forward on a much longer path toward mechanism-based assessment of human cancer risk. After reviewing the proposed mechanisms described by Cohen, Ellwein, Moolgavkar, Knudson, Ames, Gold, and others, and the results from their own analysis of the NTP database, Smith and Perfetti⁹ made two limited recommendations: (1) genotoxicity evaluations should be made based on recent state-of-the-art assays conducted on pure samples with a certificate of analysis employing accepted protocols; and (2) regulatory agencies should consider the possibility that cytotoxicity induced by high doses can sometimes induce rodent tumors by mechanisms that do not represent an increased risk of cancer to humans. Acceptance of these recommendations is consistent with a best science approach to carcinogen hazard assessment.

Footnotes

Declaration of conflicting interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The commentary along with the six (6) papers will be republished following additional peer review as a supplement by the Sage Publishing Company in Toxicology Research and Application with Drs Carr J. Smith and Thomas A. Perfetti serving as guest editors.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Berry, Cohen, Hayes, and Kaminski were compensated by Albemarle Corporation, a global specialty chemicals company, to review and comment on six (6) manuscripts originally published in Toxicology Research and Application. The commentary piece was written by these individuals without input from the funding source.

References

Smith

Anderson

. High discordance in development and organ site distribution of tumors in rats and mice in NTP two-year inhalation studies. Toxicol Res Appl 2017; 1: 1–22.

Smith

Perfetti

. Tumor site concordance and genetic toxicology test correlations in NTP two-year feed studies. Toxicol Res Appl 2017; 1: 1–12.

Smith

Perfetti

. Tumor site concordance and genetic toxicology test correlations in NTP two-year gavage, drinking water, dermal, and intraperitoneal injection studies. Toxicol Res Appl 2018a; 2: 1–18.

Smith

Hansch

. The relative toxicity of compounds in mainstream cigarette smoke condensate. Food Chem Toxicol 2000; 38(7): 637–646.

Smith

Perfetti

Garg

, et al. IARC carcinogens reported in cigarette mainstream smoke and their calculated log P values. Food Chem Toxicol 2003; 41(6): 807–817.

Smith

Perfetti

Garg

, et al. Utility of the mouse dermal promotion assay in comparing the tumorigenic potential of cigarette mainstream smoke. Food Chem Toxicol 2006; 44(10): 1699–1706.

Smith

Perfetti

, et al. Ames mutagenicity, structural alerts of carcinogenicity, Hansch QSAR parameters (ClogP, CMR, MgVol), tumor site concordance/ multiplicity, and tumorigenicity rank in NTP 2-year rodent studies. Toxicol Res Appl 2018; 2: 1–14.

Smith

Perfetti

. Comparison of carcinogenicity predictions by the Oncologic expert system, and National Center for Toxicological Research liver cancer database (NCTRlcdb) with NTP two-year rodent study tumorigenicity results. Toxicol Res Appl 2018b; 2: 1–11.

Smith

Perfetti

. The ‘false positive’ conundrum in the NTP 2-year rodent cancer study database. Toxicol Res Appl 2018c; 2: 1–13.

10.

Greenfield

Ellwein

Cohen

SM.

A general probabilistic model of carcinogenesis: analysis of experimental urinary bladder cancer. Carcinogenesis 1984; 5(4): 437–445.

11.

Cohen

Ellwein

. Cell proliferation in carcinogenesis. Science 1990; 249(4972):1007–1011.

12.

Cohen

Ellwein

. Genetic errors, cell proliferation, and carcinogenesis. Cancer Res 1991; 51: 6493–6505.

13.

Cohen

Purtillo

Ellwein

. Pivotal role of increased cell proliferation in human carcinogenesis. Mod. Pathol 1991; 4: 371–382.

14.

Moolgavkar

Knudson

Mutation and cancer: a model for human carcinogenesis. J Natl Cancer Inst 1981; 66: 1037–1052.

15.

Ames

Gold

. Too many rodent carcinogens: mitogenesis increases mutagenesis. Science 1990; 249(4972): 970–971.

16.

Tomasetti

Vogelstein

. Cancer etiology. Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science. 2015; 347(6217): 78–81.

17.

Armitage

Doll

. The age distribution of cancer and a multi-stage theory of carcinogenesis. Br J Cancer. 1954; 8(1): 1–12.

18.

Ohno

. ICH guidelines—implementation of the 3Rs (refinement, reduction, and replacement): Incorporating best scientific practices into the regulatory process. ILAR J 2002; 43(S1): S95–S98.

19.

Herceg

Lambert

M-P

van Veldhoven

, et al. Towards incorporating epigenetic mechanisms into carcinogen identification and evaluation. Carcinogenesis 2013; 34(9): 1955–1967.