Abstract
This paper considers the impacts of various patterns of differential or excess mortality on the biological and statistical interpretation of 2-year rodent carcinogenicity studies. It provides suggestions on experimental design that are intended to maximize the value of such studies for carcinogenic risk assessment. Specifically, it recommends dose reduction, possibly to the level of dose cessation, when biologically feasible and considers the merits of termination of the entire study as alternatives to the commonly employed strategy of terminating particular dose groups. It then recommends statistical analysis modifications that are appropriate when these suggestions on experimental design are adopted. One of the recommended modifications is a new statistical test to determine whether a dose group exceeds the maximum tolerated dose (MTD) on the basis of mortality. While the authors provide recommendations for the most commonly occurring exigencies, they acknowledge the need for and strongly support the practice of active engagement of the appropriate regulatory agency, e.g., the FDA, prior to any action.
Introduction
The assessment of the human safety of a pharmaceutical often includes the study of carcinogenic risk in 2-year rodent bioassays. The biological premise for this testing is that exposure for long duration at up to maximum tolerated doses in a relatively small number of animals will be informative about the risks of lower doses and shorter exposures in humans. Consequently, the standard designs of these studies employ lifetime exposures at up to maximum tolerated doses using sample sizes of at least 50 animals per dose per sex. In order to maximize statistical power, trend tests are commonly employed in the analysis of tumor incidence; an age adjustment is commonly incorporated in order to avoid bias.
Interpretation of these studies becomes more difficult on both biological and statistical grounds when the treatment groups differ substantially in their mortality rates and/or the mortality rates are extremely high.
Biologically, high mortality rates in and of themselves have the obvious effect of limiting full lifetime assessment of the treatment in affected groups. When high mortality rates are coupled with tumor findings, interpretation becomes even more problematic; the challenge, as discussed in the next paragraph, is then to decide whether the tumors are relevant to human risk assessment.
In the most common situation where not all of the increased mortality is attributable to tumors, the dose is, by definition, above the MTD. Dosing at levels above the MTD is known to have the potential to perturb biochemical pathways and can result in tumor formation by nongenotoxic mechanisms. Current thinking considers tumors at such dose levels irrelevant to human risk assessment if and only if they occur at a high multiple of the anticipated human exposure and neither the tumors nor their non-neoplastic precursor lesions are observed at the MTD or below. Conversely, when all of the increased mortality is due to tumors, the dose cannot be deemed above the MTD based on mortality.
Unless there is other evidence that dosing has occurred above the MTD (e.g., a substantial body weight reduction) or there is a defensible argument that the tumors arise by a mechanism not applicable to humans, tumors arising in such circumstances are usually regarded as relevant to human risk assessment. In rare circumstances, such a conclusion can be mitigated (but not eliminated) by a lack of findings at lower doses when those doses result in exposures that are high multiples of the expected human exposure. Thus, determining whether the dose in question exceeds the MTD can be critical to the ultimate assessment of the compound’s carcinogenic potential.
From a statistical perspective, any study mortality has a detrimental effect on the power of tests for dose response in tumor incidence rates. However, continuing the study to its scheduled completion will neither affect the validity of these tests nor exacerbate the problem of reduced power unless the sample sizes become both very small and extremely unbalanced. Quantitatively, one reasonable rule of thumb using purely statistical considerations would be to refrain from terminating a group unless its sample size was less than 10 and some other group’s sample size was at least 5 times as large or its sample size was less than 5 and some other group’s sample size was at least 3.5 times as large.* If the study has dual control groups, they should be pooled when determining this ratio. Note that the sample sizes specified in this rule of thumb are smaller than the FDA currently recommends allowing.
If the study is continued to completion when this rule so dictates, the information gained in the last part of the study will usually provide at least a small amount of additional power even though relatively few animals remain alive. Thus, early termination of one or more groups in the presence of high mortality is advisable for statistical reasons only under conditions that can never arise under current FDA policy and would occur in practice only rarely even under a more statistically optimal policy. It is usually contemplated exclusively on biological grounds, i.e., for the prevention of lost tissue due to autolysis.
Below we consider the statistical ramifications of differential or excess mortality. We propose alternatives to early termination of particular dose groups in certain situations and suggest modifications to the standard statistical analyses when the various design modifications are employed.
We discuss 4 situations involving increased mortality occurring in:
only the high dose group
the high dose and other treated groups (but not the controls)
a treated group or groups other than the high dose (but not the controls)
the controls and possibly one or more treated groups
The design modifications we consider are two different dose reduction strategies (one of which includes the possibility of dose cessation), treatment group termination with or without histological examination, and study termination.
All discussion applies to 2-year bioassays in rats and mice. The understanding throughout is that the male and female data from these studies are analyzed separately, and thus recommendations are specific to the sex(es) with increased mortality issues.
In all cases, increased mortality becomes problematic only when the absolute number of animals (not the percent surviving) in one or more groups becomes too small. Thus, the percentage of mortality that can be tolerated in a given study depends on the initial sample sizes of the treatment groups.
Increased Mortality in Only the High-Dose Group
In this situation, the 3 issues listed here will be paramount in the analysis and interpretation of the study. Consequently, our design modification recommendations are intended to maximize statistical power to determine whether:
the incidence of tumors is increased in the high dose,
the incidence of tumors is increased in the mid dose, and
the high dose is above the MTD based on mortality considerations.
In (b) above and throughout the rest of this paper, we use the term “mid-dose” to represent the second highest dose, regardless how many groups are present in the study.
The Design Modification Recommendations
We recommend that, when biologically feasible, the high dose be reduced to a level no lower than that of the mid dose and the animals be kept on study to the scheduled terminal sacrifice. Such an attempt to “save” the high dose is warranted even if the excess mortality occurs extremely early in the study. This dose reduction strategy maximizes animal exposure to treatment and preserves the rank ordering of the doses. Maximizing animal exposure works to ensure biological validity of the study. Preserving the rank ordering of the doses justifies use of trend testing methodology across all dose groups, which is particularly important in situations where the high dose is ultimately judged not to exceed the MTD.
If reducing the dose as described in the preceding paragraph fails to satisfactorily modulate the high dose mortality, or if dose reduction is not attempted because the study directors believe a priori that it will fail, the recommended design modification depends on how far the study has progressed.
If it is early enough in the study that the excess mortality clearly was not caused by tumors, then the high dose can be assumed to be toxic enough to exceed the MTD. In this case, we endorse the common practice of terminating only the high dose and doing so without histological evaluation, thereby obviating the need for the analysis of high dose tumor incidence. Reasonable scientists will disagree on how far into the study they would remain comfortable with this strategy. At its core, this is a biological decision. For what it is worth, we offer our observation that an upper limit for “early” in this context seems to typically fall somewhere in the range of 12–15 months after study start. If a group must be terminated early, doing it early enough to justify bypassing histological evaluation is least problematic and least controversial in terms of analysis, interpretation, and design modification issues. To accomplish this, it is crucial to identify doses above the MTD as early in the study as possible in order to allow attempts at dose reduction (to see if mortality can be stabilized) to be completed in an expeditious manner.
If the study has progressed to a point where it is too late to terminate the high dose group without histological evaluation, then consideration should be given to a more However, it might be possible in some cases to provide biological justification that the dose ranking is still intact based on lifetime dose, thereby preserving the validity of a trend testing approach.drastic dose reduction to a level below that of the mid dose (perhaps even to zero). If this action is taken, the dose ranking is usually destroyed, and the most defensible strategy for the analysis of high dose tumor incidence is the two-group comparison of high dose versus controls.
If both of the above dose reduction strategies are either rejected or tried without success, the only remaining design modification choices are early termination of either just the high dose or all groups (in that sex). Early termination of the high dose (or any group) after 12–15 months but prior to the other groups creates problems, sometimes extreme, with the statistical analysis in general and with the trend test in particular. The strongly adverse effect on power is described in some detail in the next paragraph. On the other hand, early termination of all groups foregoes the potential information in the remaining weeks of exposure in the groups below the high dose. If this decision point arises extremely late in the study, so that very little time remains, perhaps the statistical benefits of sacrificing all groups at the same time might outweigh the potential information to be gained from lower doses by terminating only the high dose. Although there are no rigorous criteria to aid this choice, one reasonable rule of thumb might be to terminate all groups during or after Week 100 and to terminate just the high dose before Week 100.
If the decision is made to terminate the high dose group after 12–15 months but earlier than the other groups, either before or after trying one or both of the dose reduction strategies described above, we recommend sacrificing some control animals at the same time the high dose is terminated. The appropriate number of control animals is not clear cut, but one reasonable rule of thumb is the lesser of 12 or the number of high dose animals remaining just prior to the group’s termination. This is necessary to maximize the statistical power of the analysis of high dose tumor incidence. We note, however, that many (perhaps most) tumor types will still have insufficient power despite this action. For example, suppose that we are dealing with the extreme case of an old age tumor which rarely appears before Week 95, and suppose that there are 15 animals remaining in the high dose at Week 95.
If we sacrifice the 15 high dose animals and 12 controls at that time, none of the animals that died earlier provide any information about this tumor type. For this design modification, our recommended statistical analysis, described below, is that high dose tumor incidence should be assessed via a two-group comparison versus controls. This is equivalent to doing a study with sample sizes of 15 and 12. Obviously, there would be almost no power for detecting an increase in the incidence of this tumor type in the high dose. Tumor types that have a greater frequency of early onset would have greater effective sample sizes and hence greater power; and it is these tumor types whose analyses can be helped somewhat by sacrificing some control animals at the same time that the high dose is terminated, as otherwise the high dose animals sacrificed at the group’s termination would contain no statistically usable information for the two-group comparison. But even for these tumor types, the effective sample sizes would still be smaller than usual, and power is still hampered by the necessity for a two-group comparison rather than a trend test. These severe power issues are the main reason why we regard this option (early termination of just one group) as at best a last resort to be used only if none of the other alternatives are at all feasible.
The reduction in the number of control animals going the full two years via the sacrifice of some controls to coincide with the early termination of the high dose will usually not be problematic for the analysis of tumor incidence at the (lower) dose levels that receive the full two years of exposure, especially in cases where two control groups are maintained. While statistical theory mandates treating the two control groups as one in all statistical analyses, there is no harm in sacrificing animals so as to equalize the remaining sample sizes of each of the two groups. In studies with one control group equal in sample size to the treated groups, a bit more care is required in deciding how many control animals to sacrifice early, as there might be an occasional situation where the potential tradeoff between power for assessing tumor incidence at the high dose and power for assessing tumor incidence at lower doses turns out to be of practical importance.
The analysis modification recommendations to provide some context, in the more common situation where excess high dose mortality is not an issue, the almost universally used procedure is to perform the analysis of high dose tumor incidence using trend testing methodology and to regard this as the primary analysis. In such situations, there is no reason to perform any analysis to determine whether the high dose exceeds the MTD due to mortality, and the analysis of mid dose tumor incidence is typically done as a followup trend test only for those tumor types where significantly increased tumor incidence was detected in the high dose group.
In situations where high-dose mortality is an issue, we recommend the following modifications to the analysis.
Both the high dose and the mid dose should be analyzed for increased incidence for every tumor type. When the high dose is above the MTD, the analysis of the mid dose is the primary analysis of tumor incidence, and analysis of the high dose plays just a secondary role. When the high dose is not above the MTD, the analysis of the high dose retains its usual role as the primary analysis of tumor incidence. Even in this situation, we recommend performing a secondary analysis to detect increased tumor incidence in the mid dose for all tumor types rather than just the ones where incidence was significantly increased in the high dose. The reason is that the determination as to whether or not the high dose is above the MTD is very often not clear cut, and if there is any ambiguity at all about this point, it can sometimes be helpful to be able to exhibit a completely clean mid dose. Regarding the implementation of these analyses, a trend test is still always recommended for the analysis of mid dose tumor incidence. The high dose should also be analyzed for tumor incidence using a trend test unless the group was terminated early or the dose was reduced sharply enough to destroy the dose ranking. In either of the latter situations, a two-group comparison of the high dose versus controls is recommended.
We recommend the use of an analysis to determine whether or not the mortality and tumor patterns indicate that the high dose is above the MTD. This analysis should be conducted (one analysis per sex, not a separate analysis for each tumor type) in the same manner as the usual survival analysis, except that only deaths of animals without any fatal tumors in any site should be counted as “real” (uncensored) events. Note that this requires lethality information even in sites such as the skin and mammary gland, where it might not normally be provided. We have never seen this analysis proposed previously and believe that this is a new idea. If this test produces a significant result, it provides statistical evidence that the high dose has exceeded the MTD based on mortality considerations. The justification for this claim is that a significant increase in deaths that are not tumor related implies almost by definition that the dose is above the MTD. Conversely, tumor-related deaths cannot be used as a basis for any inference about the MTD; the tumors that killed the animals might have been caused by either above-MTD toxicity or relevant carcinogenic mechanisms. It is worth repeating that MTD information can be critical to the study director in deciding the relevance of high dose tumor findings for human risk.
Increased Mortality in the High Dose and Other Treated Groups (but not the Controls)
This situation is very similar to the case above. One needs to decide which of the various design changes (mild dose reduction to a level no lower than that of the next highest dose; treatment group termination early enough to forego histological examination; drastic dose reduction below the level of the next highest dose, which includes the possibility of dose cessation; treatment group termination with histological examination; or study termination) to employ and (for all choices except study termination) at which doses. The analysis options remain as above.
Increased Mortality in a Treated Group or Groups Other than the High Dose (but not the Controls)
This scenario presents the paradoxical situation where mortality is not dose dependent. Because the high dose mortality is unaffected, we support the common practice in which such studies are run to completion without any intervention. The rationale is that the lack of excess mortality in the high dose group allows for a biologically valid assessment of carcinogenic risk at the best estimate of the maximum tolerated dose. However, the reduction of any dose to a level no lower than the next lowest dose can cause no harm and would be justified whenever this might substantially improve the final survival numbers.
Increased Mortality in the Controls Only or in the Controls and One or More Treated Groups
High control group mortality is relatively rare but, depending on when the mortality accelerates, can have very serious implications. When it occurs early, doubt regarding study validity is almost unavoidable. When it occurs relatively late, partial information on the carcinogenic risk of the compound may still be obtainable. Early termination of the entire study is recommended in most such situations. Continuation of the study should be considered only in the case where the number of surviving animals is balanced across all groups (including the controls, pooled together if there are two of them), so that a small amount of additional exposure to the compound can be achieved without risking the validity of the statistical analysis. In such a case, the study should be frequently monitored for continued sample size balance across the treatment groups.
The FDA Guidelines
The section of the Guidelines pertaining to early termination is attached below. It is our interpretation that these guidelines are motivated by biological considerations rather than statistical ones. The recommendations on when to terminate dose groups in terms of percentage surviving are, in our view, intentionally vague in order to be applicable to a fairly wide range of sample sizes. We note that they seem to be written in the context of a study design that includes only a single control group equal in sample size to the treatment groups.
Conclusions
We have proposed several recommendations for dealing with differential or excess mortality in 2-year rodent carcinogenicity studies. In general, we discourage early termination of dose groups if other options are biologically feasible. Our motivation is maximization of study utility for carcinogenic risk assessment. The reader will note that we have attempted to address the most common patterns of differential or excess mortality in a systematic fashion. We hope we have provided a useful conceptualization of the problem and some practical suggestions for addressing it. We do, however, acknowledge that no aspect of the conduct of an experiment as complex as the bioassay can be addressed by a simple algorithm. Consequently, we agree fully with the suggestion presented in the FDA Guidelines that sponsors confer with the agency when confronted with a study presenting mortality related issues.
FDA Guidelines
The following is from the FDA Guidance for Industry: Statistical Aspects of the Design, Analysis, and Interpretation of Chronic Rodent Carcinogenicity Studies of Pharmaceuticals (Draft May 2001):
“However, early termination of a study for mortality, even if unavoidable, may render a study uninformative, leaving too few animals living long enough to represent adequate exposure to the chemical. This is especially important in the evaluation of the design validity of a negative study. In general, a 50 percent survival rate to weeks 80 to 90 of the 50 initial animals in any treatment group is considered adequate. The percentage can be lower or higher if the number of animals used in each treatment/sex group is larger or smaller than 50, but between 20 to 30 animals should be still alive during these weeks (Lin and Ali 1994). Whether a study could be terminated before the scheduled termination date if the survival of any treatment group goes below 50 percent or 20 to 30 surviving animals (provided that sufficient numbers of animals were exposed through week 80 to 90) depends on the situation. For example, there is no reason to stop a study if the survival of only the low-dose group and/or the medium-dose group is altered, because the control vs. high-dose comparison will still be informative. If the survival of the high-dose group falls below 50% or 20–30 surviving animals after week 80, the study should be continued, either stopping dosing of animals in the high dose or terminating only the high dose group, because the comparison of at least the control and low/middle doses would still be informative (the high dose comparison would depend on the situation). A study could be terminated early if the survival of the control group (or groups) goes below 50 percent or 20–30 surviving animals after weeks 80 to 90 as the later comparisons would not be informative. Others have suggested, for example, that an experiment be terminated early when the survival of the control or low-dose group is reduced to 20–25 percent of the original number of animals. If the mortality is increased only in the high-dose group, consideration can be given to early termination of that group (OFR 1985). Because early study termination poses complex problems, it is strongly recommended that a decision to terminate a study or a study group early be made with input from the Center and the medical division responsible for the review of the associated application.
If in discussions with CDER, the Center approves the early termination of a study under this recommendation, the study’s sponsor can be assured that the study will be considered by the Center as valid in terms of adequate duration of drug exposure.”
Footnotes
*
Although this rule of thumb was included only in a fairly late revision at the request of a reviewer, an even later reviewer strongly suggested that we also provide detail to justify it. After considering the length and complexity of a full explanation, we have decided not to provide one. Briefly, the combination of at least one very small sample size and one comparatively large one leads to both power and bias issues for the analysis, regardless whether the best analysis turns out (see ensuing discussion) to be a trend test or a pairwise comparison against controls.
