Abstract
Benchmark Dose Model software (BMDS), developed by the U.S. Environmental Protection Agency, involves a growing suite of models and decision rules now widely applied to assess noncancer and cancer risk, yet its statistical performance has never been examined systematically. As typically applied, BMDS also ignores the possibility of reduced risk at low doses (“hormesis”). A simpler, proposed Generic Hockey-Stick (GHS) model also estimates benchmark dose and potency, and additionally characterizes and tests objectively for hormetic trend. Using 100 simulated dichotomous-data sets (5 dose groups, 50 animals/group), sampled from each of seven risk functions, GHS estimators performed about as well or better than BMDS estimators, and a surprising observation was that BMDS mis-specified all of six non-hormetic sampled risk functions most or all of the time. When applied to data on rodent tumors induced by the genotoxic chemical carcinogen anthraquinone (AQ), the GHS model yielded significantly negative estimates of net potency exhibited by the combined rodent data, suggesting that—consistent with the anti-leukemogenic properties of AQ and structurally similar quinones—environmental AQ exposures do not likely increase net cancer risk. In addition to its simplicity and flexibility, the GHS approach offers a unified, consistent approach to quantifying environmental chemical risk.
1. INTRODUCTION
Dose-response assessment is an essential step in characterizing the extent to which environmental contaminants pose human health risks (NRC, 1983, 1994). Benchmark Dose Model software (BMDS) and related procedures (EPA 2000a, 2010a,b) comprise a suite of dose-response models and a related set of quasi-statistical and decision rules developed by the U.S. Environmental Protection Agency (EPA) that are increasingly broadly applied as an integral component of regulatory noncancer and cancer risk assessment for environmental chemicals. In particular, this approach is used to estimate “benchmark dose” (BMD) and corresponding low-dose dose-response slope, or “potency,” exhibited by a set of experimental toxicity data. This multi-model approach is first summarized, some concerns about it are discussed in view of constraints on low-dose dose-response information typically available from toxicity studies, and then its performance when applied to simulated sets of quantal (i.e., dichotomous-response) dose-response data is compared to that of a simpler “Generalized Hockey-Stick” (GHS) model proposed to estimate BMD and potency. Results of these analyses are presented, together with an application of the GHS model, to estimate potential human cancer risk posed by an environmental chemical contaminant known to both increase and reduce cancer incidence in chronically exposed rodents.
1.1 BMDS Modeling Procedure
The BMDS approach is used to characterize toxicity risk for both non-cancer and cancer endpoints based on quantitative analysis of either quantal or continuous dose-response data, and is typically applied to identify or characterize toxicity using data that exhibit a significantly elevated toxicity response and, moreover, represent the most sensitive identified endpoint, species, sex, and strain (EPA 2000a, 2010a,b). The resulting characterization is expressed in terms of estimated BMD and its lower (typically one-tail 95%) confidence bound (BMDL), both of which pertain to a user-specified level of “benchmark response” (BMR), in excess of background risk, that is estimated to lie within or near the observed range of response. For many types of quantal toxicity data, BMR = 0.1 typically, by default, and corresponding BMD and BMDL estimates are denoted BMD10 and BMDL10 (EPA 2000a, 2010b); for brevity, these estimates are denoted herein as d10 and
1.2 Some Drawbacks of the BMDS Approach
The six-step BMD “decision tree” recommended to evaluate BMDS modeling output does not address or control for procedure-wise error rates that result from its application of multiple statistical procedures. For example, to evaluate output from fits of each of nine primary and/or seven alternative quantal BMDS dose-response models to quantal (dichotomous) data, the software provides multiple options for parameter choices, the decision tree recommends how to interpret the estimates and statistics generated in order to choose among the array of BMDL estimates generated (EPA 2000a, 2010a b). These guidelines address neither how well the recommended modeling procedure actually performs in terms of accuracy or coverage by estimators used in the procedure, nor whether simpler procedures might yield results of similar or greater reliability, nor how the application of multiple statistical tests to multiple model-specific fits affects the statistical interpretation of modeling results obtained. Users are required simply to assume that returned results are meaningful and reliable when applied to relatively small sample sizes like those typical of toxicity data sets.
The BMDS approach relies exclusively on numerical methods to obtain maximum likelihood estimates (MLEs) that have valid and optimal asymptotic properties, but that may be biased when applied to realistic data sets involving relatively few observations. EPA (2000b) recommended “If asymptotic normality cannot be assumed either because the sample size is too small … or because MLEs were not (or could not) be obtained, bootstrap methods should be employed [… as] a versatile nonparametric method that can be used in a wide variety of situations to obtain the sampling distribution of any model parameter.” Consistent with this recommendation, particularly for small sample sizes (e.g., less than ∼30 to 50), parametric bootstrap methods (see DiCiccio and Efron 1996) have been applied as an alternative approach to estimate parameters of quantal dose-response response models, (Portier and Hoel 1983; Bailer and Portier 1988; Morris 1988; Smith and Sielkin 1988; Foster and Bischof 1991; Bailer and Smith 1994; Al-Saidy et al. 2003; Nitcheva et al. 2007; Swartout 2007; Zhu et al. 2007). Bootstrap methods remain unavailable through BMDS to estimate model parameter values or confidence limits, as do modified profile likelihood methods that can also reduce bias and improve coverage accuracy for small samples (see, e.g., Barndorff-Nielsen 1983; Brazzale and Davison 2008).
None of models in the BMDS suite typically applied for routine potency or BMD estimation are particularly suited to characterize a hormetic dose-response relationship (i.e., one in which response at least initially declines with increasing dose). Rather, nearly all models assume that risk or detriment in excess of background increases monotonically with increasing dose. BMDS procedures include an analysis of residuals that should identify any clearly anomalous set of data that is inconsistent with this assumption of monotonically increasing risk. Although unconstrained polynomial models offered by BMDS for continuous or quantal data could be applied in any such case, these particular models are not typically considered in regulatory applications, in view of their essentially arbitrary ability to fit a curve to any pattern of data without reference to a plausible mechanistic or biological basis. To the extent there is no intrinsic plausibility of quantal BMDS models applied, the BMDS procedure amounts to an inefficient way to fit an arbitrary smooth curve and confidence bounds through binomially distributed data. To the extent that fits of meaningful BMDS models are affected by data at relatively high dose(s), the modeling effort is effectively “wasted” by addressing data that, by definition, are less relevant or irrelevant to the intended BMD and/or potency (i.e., low-dose dose-response) measure(s) of interest.
1.3 Generic Hockey-Stick (GHS) Model
By definition, only information about response at the lower end of the dose-response curve bears most clearly and directly on BMD and potency. Moreover, it is typically impossible from a statistical standpoint to rule out the possibility that the observed data were sampled from a dose-response relationship that either (1) contains a (non-zero) linear coefficient in dose, or (2) reflects a mixture of two or more functions (e.g., response patterns pertaining to two or more corresponding phenotypes and/or genotypes), one of which contains a (non-zero) linear coefficient in dose. For any risk function R(d) of dose d that is monotonically proportional to a polynomial in d, such as the traditional multistage risk model (Anderson et al. 1983), a linear term q d of that polynomial with q ≠ 0 must dominate in the limit as d → 0. Consequently, for data that at some dose(s) exhibit significantly increased risk above background, plausible BMD and potency estimates and their confidence bounds can always be obtained using a suitably generic “hockey-stick” (GHS) version of such a multistage-type risk model. By definition, such a model contains an unconstrained linear coefficient in dose combined with enough additional non-negatively constrained polynomial terms to estimate both BMD and potency aspects of increased risk.
The present study was undertaken to compare BMD and potency estimates obtained using a relatively simple GHS model (described in Methods) to those obtained using the BMDS approach, using quantal dose-response data simulated at specified doses from six specified models of net risk of extra response above corresponding assumed independent background rates of response. An additional “hormetic” risk function was also considered, simply to illustrate the relative flexibility of a GHS model, and also to characterize negative/hormetic dose-response relationships, and the straightforward approach it offers to test objectively for such possible negativity, either as an attribute of one specific data set, or as a net characteristic implied by a set of related data sets that exhibit a combination of significantly positive and significantly negative dose-response relationships. The latter GHS-model capability was illustrated specifically as described below.
1.4 GHS Estimation of Net Anthraquinone Cancer Risk
Anthraquinone (AQ) increased the incidence of several types of tumor in rats or mice chronically exposed by diet in a National Toxicology Program (NTP) bioassay, but also markedly reduced the incidence of mononuclear cell leukemia (MCL) in male and female rats (NTP 2005). This reduction was considered a direct effect of AQ (NTP 2005; pp. 83, 86, 94):
Several drugs are based on the AQ ring system, including the anthracycline glycosides doxorubricin and daunorubricin, which are used extensively in cancer chemotherapy as well as newer chemotherapeutic agents such as mitoxantrone…. The incidences of MCL were markedly reduced in exposed male and female rats. Although splenic toxicity is often correlated with reduced incidences of MCL …, it is unlikely that the mild nature of the lesions that occurred in the spleen in the current study could account for the dramatic decrease in incidences. This suggests that the reduction was due to a direct effect of AQ or its metabolite(s) on the development of MCL. Similar decreases have been observed in the 2-year studies of 1-amino-2,4-dibromoanthraquinone and emodin…. Decreased incidences of MCL in male and female rats were attributed to exposure to AQ.
It is noteworthy in this regard that, in 2009, NTP started using Harlan Sprague-Dawley rats for its future studies, due to health-related concerns about their F344N rat colony, including a high incidence of leukemia (King-Herbert and Thayer 2006; King-Herbert et al. 2010). However, EPA (2005) cancer risk assessment methods make no default assumption concerning tumor site-concordance between species; rather, tumor potency exhibited in available bioassay animal data is typically assumed to demonstrate potential cancer potency in humans.
AQ-related compounds can induce apoptotic cell killing by a variety of molecular mechanisms, including inhibition of protein kinase CK2 and/or topoisomerase II (Hartley et al. 1990; Sengupta 1993; De Moliner et al. 2003; Koceva-Chyta et al. 2005), and can directly inhibit human tumor cell proliferation (Cichewicz et al. 2004). It is particularly noteworthy that 1,4-diamino-substituted AQ antitumor agents are cytotoxic to human leukemic cells, and that dose-response kinetics for cell killing examined in detail for 1,5- and 1,8-diamino-substituted anthraquinones in these cells, and in LoVo cells (a human Caucasian colon adenocarcinoma cell line) and Chinese hamster ovary cells exposed to other substituted anthraquinones, clearly exhibit linear-no-threshold-like, “one-hit” kinetics of induced cell killing (Kimler and Cheng 1982; Drewinko et al. 1983; Hartley et al. 1990). Such evidence for linear cell-killing kinetics for AQ-related compounds was the basis for the present illustrative application of a GHS model to characterize observed AQ-induced reductions in rat MCL risk, in view of potential human cancer risk posed by environmental AQ exposures (CalEPA 2006). Specifically, this information was used to combine estimates of AQ-induced tumor induction and suppression in rodents to estimate the potential net carcinogenic potency of AQ to environmentally exposed humans.
2. METHODS
All calculations described were performed using Mathematica® 7.0.1 software (Wolfram 2010) and related RiskQ software (Bogen 2002).
2.1 GHS Model
The GHS model described extends an earlier approach developed to assess and model potential response reduction with increasing dose (Nascarella et al. 2009), by adding polynomial flexibility to a hockey-stick model of quantal-response data that provides quantitative estimates of corresponding BMD and potency and associated estimation errors. Specifically, the linearized multistage model (Anderson et al. 1983) was modified as follows. BMDS software implements a BMD version of the linearized multistage model referred to as the quantal Multistage Cancer (MC) model. Specifically, the GHS function 1 – exp(–Σ i qi di), i ∊ G for G = any combination of ≤g elements of {0, 1, …, g–1, g+1}, was used to model the risk or probability R(d) of quantal (i.e., dichotomous) response among ni individuals in the ith of g dose groups (including the control group) as a function of dose d. Extra risk A(d) over an assumed independent background risk p0 = R(0) was calculated using Abbot's correction as A(d) = [R(d)–p0]/(1–p0). In the multistage model, Max(i) = g–1 and all the exponentiated polynomial coefficients qi are constrained to be non-negative. In contrast, in the GHS model, Max(i) = g+1, and all coefficients qi are constrained to be non-negative, except for the linear (potency) coefficient, q1 (for convenience hereafter denoted simply as q), which is constrained only to ensure that R(d) ≥ 0 for all d ≤ Max(dj), where dj = denotes the jth dose (2 ≤ j ≤ g) included in the data set being fit. The GHS model thus can have a slope that is negative as d→0, and otherwise has somewhat greater flexibility than the MC model to reflect more abrupt (albeit, as with the MC model applied using its default assumptions, only monotonically increasing) nonlinearity in dose-response.
Parameters and confidence limits for the GHS model were calculated using a modified version of the method for fitting a multistage model to quantal data described previously (Bogen 1994; Bogen and Witschi 2002), whereby each transformed data set {dj, –ln(1–Rj)} is fit by completely analytic, nonnegativity-constrained weighted least-squares regressions of all polynomials assuming binomial sampling error, with the best fit defined as that fit (among all good fits obtained by this procedure) which yields the minimum value of chi-square, χ2 = Σ
j
(SR
j
)2 = Σ
j
[Rj – n
i
R(dj) + 1/2]2/{nj R(dj)[1–R(dj)]}, with respect to observed data {dj, Rj}, where R(dj) = predicted response at dose dj, and SR
j
= the jth standardized residual. The GHS model applies this method to all polynomials implied by sets G defined above, and (as the only GHS procedure that may involve numerical optimization, if necessary) solves one or more convex-polynomial roots to impose the additional constraint mentioned concerning the linear coefficient q. Good fits were defined as those with χ2 p-values >0.05 and Max(|SR
j
|) of < 2, where Max(|SR
j
|) = the maximum absolute squared residual (MASR). As with the linearized multistage model (Anderson et al. 1983), initially poor fits (if g > 2) were re-fit after sequential elimination of the highest dose group until a good fit was obtained. Each fitted GHS model yielded a direct estimate of q, whereupon d10 was calculated as the root in dose of BMR = 0.1 = A(d10). Distributions characterizing estimation error in q and d10, from which were calculated the upper one-tail 95% confidence limit (q*) on potency q and lower one-tail 95% confidence limit (
GHS fits could sometimes be improved (e.g., for data sets exhibiting a relatively steep positive or negative initial slope) by imposing the additional constraint that a good fit must include an estimated non-zero linear coefficient, in which case the GHS model was referred to as being “linearized.” Even the linearized GHS model can fail to provide a meaningful estimate of potency if data happen to exhibit a steeply negative dose-response. For quantal data points {dj, Rj} in such cases, an alternative “complementary linearized GHS” procedure instead fits a standard one-hit model R(d) = 1 – (1–r′)exp(–a d) to the complementary data {dj, 1–Rj}, and to then estimate the potency b of monotonic risk suppression as b = a r/(1–r) pertaining to the original data, where non-complementary background risk r is, in this case, estimated by r = 1–r′ (see Appendix).
In the context of the GHS model, potency q (or q*) is the value (or upper bound) of slope or derivative of A(d) with respect to d in the limit as d → 0. Conditional on a linear (one-hit) model such as A(d10) = BMR = 1 – exp(–q d10), and on q* > q > 0, it follows that d10 = –ln(1–BMR)/q and
2.2 Comparison of BMDS vs. GHS Estimates of BMD and Potency
The BMDS approach was implemented using EPA software and corresponding recommended “decision-tree” procedures for interpreting BMDS ouput for fits to quantal data (EPA 2000a, 2010a b), except that pertaining to visual inspection of plotted fits because this recommended step could not be automated. BMDS methodology was applied (in automated batch-file mode) conditional on model-specific default assumptions, to fit each set of simulated data to the following eight BMDS quantal models: quantal linear (QL), multistage cancer (MC), gamma (GM), logistic (LG) probit (PR), Weibull (WB), log-logistic (logLG), and log-probit (logPR). This subset of the current set of nine dichotomous models offered by BMDS was deemed adequate to characterize BMDS performance for a relatively large set of quantal models typically applied. The BMDS model subset used includes all six of the six non-hormetic risk models used to simulate quantal data (as described below), and excludes the dichotomous Hill model (added to BMDS in 2008) and seven “dichotomous alternative” models offered that differ from corresponding default dichotomous models only insofar as they estimate background in a different way (e.g., background dose instead of background risk). BMDS “decision-tree” steps 1–5 (EPA 2010b) were applied to each data set by: (1) setting BMR = 0.1, (2) fitting each of the eight models listed above (dropping the highest dose group), (3) retaining only “good” fits defined as those with χ2 goodness-of-fit p-values >0.1; (4) collecting the sorted BMDL and corresponding Akaike Information Criterion (AIC) values (BMDL j and AIC j , j = 1,…k with k ≤ 8) calculated for each jth retained fit; and finally (5) selecting BMDL1 = Min(BMDL j ) unless (a) BMDL1 was excluded as an “outlier” or (b) BMDLmin ∊ BMDL s (for any sorted BMDL subset specified by s = 1,…, m with m ≤ k) with BMDL m /BMDL1 ≤ 3, in which case the BMDL associated with Min(AIC s ) was selected, or if AIC s values were all equal, then the geometric mean of all BMDL s values was selected. In BMDS methodology, Step 5a is addressed, but not defined explicitly (EPA 2010b). To implement Step 5a, BMDL1 was defined as an outlier if a p-value of ≤0.001 was produced by nested F-test for outliers (Selvin 1995) comparing the variance of the set of log-transformed BMDL s values with vs. without BMDL1, provided that m ≥ 3 and Max(AIC s ) – AIC1 < 2. That is, if m < 3 or the latter AIC difference was ≥2, then BMDL1 was accepted as a potentially meaningful BMDL estimate. After applying the five steps described, the BMDS quantal model(s) associated with the selected BMDL value was (or were, with equal weight) recorded as the corresponding “BMDL-associated” risk model(s).
BMD and potency estimates obtained by the BMDS and GHS methods described above were compared for 100 sets of data simulated conditional on doses, background (p0) risks, numbers nj of exposed animals, and each of the seven quantal risk models summarized in Table 1, assuming binomial sampling error about model-specific predicted risks conditional on nj. As noted above, six of the seven risk models are quantal models specifically included in BMDS and the seventh is a hormetic quantal response model included to illustrate GHS-model flexibility. These comparisons addressed: difference (bias) between the arithmetic mean of each estimate made by obtaining a (for BMDS estimates, BMDL-associated) fit to simulated data in relation to its corresponding expected value listed in Table 1, corresponding standard error of each estimate, the significance of estimated bias as assessed by t-test adjusted (as indicated by a p-value denoted padj) for multiple (here, seven) independent comparisons using Hommel's modified Bonferroni-type procedure (Wright, 1992), and statistical coverage (a summary measure of performance) by the estimators q* and
Doses and risk models used to simulate quantal response data.a
A total of N = 50 animals was assumed in each dose group. exp(x) = ex for any x where e is Euler's constant; Φ = the cumulative standard normal distribution function; G(a, b, x) = the cumulative distribution function evaluated at x for a generalized gamma distribution with shape parameter a and scale parameter b. A total of n = 100 data sets, each ith set (for i = 1,…, n) containing the five data points {Doseij, Responseij} = {dij, nij*/N} (for j = 1,…,5), were simulated using each of the indicated models, where each of the simulated nij* values was sampled randomly and independently from a binomial distribution with parameters N and R(dij), using risk functions R(d) defined in column 5 each evaluated at corresponding doses d = dij. Models L and LQ have corresponding BMDS quantal dose-response model names: Quantal Linear (QL) and Multistage Cancer (MC), respectively.
q = potency (limiting value of increased risk per unit dose as dose approaches zero, assuming an independent background risk); a negative value indicates risk-reducing potency.
BMD = BMD10 = d10 = benchmark dose assuming a Benchmark Response (BMR) level of 0.10, defined as target excess risk assuming an independent background risk level p0, i.e., assuming BMR = 0.10 = [R(d10)–p0]/(1–p0).
GSH and BMDS estimates of d10 and
2.3 GHS Estimation of Net Cancer Potency of Anthraquinone
Absent adequate epidemiological data as reviewed by NTP (2005), net potential human cancer risk of AQ was estimated from NTP bioassay data involving: groups of 50 male and 50 female F344/N rats fed diets containing 0, 469, 938, or 1,875 ppm (for males 0, 20, 45, 90, and 180 mg/kg/day, and for females 0, 25, 50, 100, and 200 mg/kg/day) of AQ for 105 weeks; and groups of 50 male and 50 female B6C3F1 mice fed diets containing 0, 833, 2,500, or 7,500 ppm (for males 0, 90, 265, or 825 mg/kg/day, and for females 0, 80, 235, or 745 mg/kg/day) of AQ for 105 weeks. Survival of all groups of male rats was similar. Survival of exposed groups of female rats was significantly greater than that of corresponding control rats. Mean survival of AQ-exposed female mice equaled or exceeded that of control mice; survival of exposed male mice was reduced significantly only in the highest exposure group; and the highest exposed male mouse dose group also exhibited >3-fold more “natural” deaths than did male control mice (NTP 2005, pp. 8–9). Because survival was not reduced in the rats or female mice studied, and was not reduced in any dosed group of male mice except (significantly) in the highest dose group, time-to-tumor risk models were not considered relevant to characterizing BMD and potency based on the NTP (2005) bioassay data for AQ described. Instead, tumor data were adjusted for intercurrent mortality differences among dose groups, as is done for the “Poly3” test that is routinely applied by NTP to assess the significance of dose-related effects on tumor incidence (Bailer and Portier 1988). Tumor types analyzed were limited to those tumor sites, or combinations of tumor sites, reported by NTP (2005) to have a tumor incidence that was statistically significantly elevated (p < 0.05) in a trend-wise fashion, among those tumor types for which NTP concluded there was biologically meaningful evidence of a dose-related trend. Potencies for benign tumor types were not calculated if the tumor-specific incidence data indicated unambiguously that potency estimated for corresponding combined benign and malignant tumors would substantially dominate that estimated only for benign tumors.
The estimated potency distribution for AQ-reduction of MCL risk was fit using the complementary linearized GHS procedure (Section 2.1) for sex-specific MCL incidence data after adjusting for intercurrent mortality. Resulting estimated distributions of sex-specific potency were given equal weight, and the result was weighted equally with that for the one other significantly (positively) affected rat tumor type. Sex-specific potency distributions for the most elevated tumor type observed in mice were also weighted equally. Resulting species-specific distributions were weighted equally. For each species/sex/type-specific rodent potency q estimated for animals of weight W (kg), a corresponding human-equivalent (HE) potency (qHE) was estimated as qHE = q [(70 kg)/W]1/3 (Anderson et al. 1983; EPA 2005). Net AQ potency (in rodents or humans), aggregating over all tumor-suppressing and all tumor-inducing potencies, was estimated using a modification (see Appendix) of an approach previously described to estimate aggregate excess risk for nonthreshold, quantal, toxic end points caused by exposure to multiple non-hormetic agents, assuming independent actions and background risks (NRC 1994). This modification provides a general approach to calculating the net potency of any set of jointly induced and suppressed toxic risks, assuming independent corresponding background risks.
3. RESULTS
3.1 Comparison of BMDS and GHS Estimates of BMD and Potency
Table 2 summarizes BMD and potency estimation achieved by applying the BMDS procedure to 100 sets of response data at five doses that were simulated based on each of the seven assumed quantal risk models described in Table 1, which include six of the eight quantal BMDS risk models that were fit to each simulated data set, and a hormetic risk model that illustrates a dose-response pattern not typically addressed by any of the quantal BMDS models routinely applied in regulatory-compliance contexts. Thus, a total of 700 data sets were fit to each of eight BMDS quantal risk models, with convergence on d10 and
BMDS model output for 100 sets of response data at five doses simulated from different dose-response patterns.
The seven risk models (Mdata = L, LQ, PR, LG, WB, GM, or H) used to generate simulated data, the doses, the parameters (q and d10) and the corresponding model-dependent expected parameter values and corresponding units are defined in Table 1. EZ = the expected value of parameter Z. Symbols
Out of 100 Mdata-specific sets of simulated data, n = the number of data sets for which the BMDS procedure yielded convergent estimates for parameters q and d10.
RMCV = root mean square (RMS) coefficient of variation = 100%(RMSE/EZ), where RMSE = RMS error estimated as Sqrt[Σ{(Zi–EZ)2/[(ni–1)]}.
Padj = p-value from t-test of difference between
Coverage = probability that Z* is ≥EZ or is ≤EZ for Z=q or Z=d10, respectively; value in parentheses = Prob(q* ≤ 0).
Table 3 summarizes the degree to which BMDL-associated risk models identified by BMDS tended to mis-specify the correct non-hormetic risk model from which simulated data were actually sampled. Rates of correct risk-model specification, shown in the shaded diagonal array of cells in Table 3, were all <33%, with an average value (±1 SD) of 12% ± 13%, and was 0% for quantal data simulated using a Weibull (WB) model. Nearly all (65) convergent BMDL-associated quantal models fit to hormetic data were approximately evenly divided between multistage cancer (MC) and gamma (GM) models.
Percent agreement between the models used to generate simulated data, and BMDL-associated risk models identified by BMDS.
The seven risk models (Mdata = L, LQ, PR, LG, WB, GM, or H) used to generate simulated data, the doses, the parameters (q and d10) and the corresponding model-dependent expected parameter values and corresponding units are defined in Table 1.
Out of 100 Mdata-specific sets of simulated data, n = the number of data sets for which the BMDS procedure yielded convergent estimates for parameters q and d10.
BMDS dichotomous dose-response model types (Mfit) used to fit to each set of simulated data: QL = Quantal linear, MC = Multistage cancer, PR = Probit, LG = Logistic, WB = Weibull, GM = Gamma, logPR = log-Probit, and logLG = log-Logistic. Percentage values (Pi) listed across all model types in each of the seven Mdata-specific rows are defined as follows, for j = 1,…,7: P j = 100% mj/n, where mj = the number of fits involving the jth Mdata type. For this calculation, Mdata values of L and LQ were assumed to be equivalent to Mfit values of QL and MS, respectively. Row-specific Pi values may sum to >100% due to rounding. BMDS fits, and associated P-values, may reflect either multiple Mfit types that when optimized had an equivalent form, or a geometric mean of Mfit-specific fits that all yielded similar BMDL estimates within a 3-fold factor (see Methods). In each case k total models contributed to a “best fit” Mfit to data simulated from the jth Mdata type, each of the k contributing model types was given a weight of 1/k when calculating mj and Pj. The difference between 100.0 and each Pi value listed in bold typeface indicates the magnitude of non-concordance observed between corresponding values of Mdata and Mfit.
Performance of BMD and potency estimation using the GHS approach is summarized in Table 4. The GHS approach yielded (as defined above) plausibly unbiased estimates of q and plausibly adequate q* coverage for four and six, respectively, of the seven risk models (including the hormetic model) used to simulate quantal data, and yielded plausibly unbiased estimates of d10 and plausibly adequate
GHS model output for 100 sets of response data at five doses simulated from different dose-response patterns.
Risk models (L, LQ, PR, LG, WB, GM, H), doses, the number of animals assumed in each dose group, parameters (q and d10) and corresponding model-dependent expected values of the model parameters (and corresponding parameter units) are all defined in Table 1. Definitions of Z, EZ, RMCV and Padj are given in Table 2. P-values < 10−15 are listed as 0. The GHS model was fit to a total of 700 data sets, consisting of 100 sets simulated assuming each of the seven indicated dose-response function shapes, to obtain the listed corresponding estimates for parameters q and d10. Estimates
Coverage = probability that Z* is ≥EZ or is ≤EZ for Z=q or Z=d10, respectively; value shown in parentheses is Prob(q* ≤ 0).
Figure 1 compares d10 and

GHS estimates of (
The top and bottom plots of Figure 2 compare d10 and

Comparison of (
3.2 GHS Estimate of Net Anthraquinone Cancer Potency
Tumor types deemed by NTP (2005) to have been affected by chronic dietary exposure of rats and mice to AQ are listed in Table 5, together with corresponding estimates of rodent potency and human-equivalent GHS potency, and associated confidence bounds (q* and HE q*, respectively). These tumor types include mononuclear cell leukemia (MCL) in male and female rats; renal cell adenoma or carcinoma (RTAC) in female rats; hepatoblastoma (benign or malignant, HB), hepatocellular carcinoma (HC), and/or hepatocellular adenoma or carcinoma (HAC) in male and/or female mice. The GHS estimates of AQ potency for suppressing MCL in male and female rats are summarized in Figure 3. The estimated MCL potencies for male and female rats are both significantly negative (p < 0.01), which illustrates the ability of the GHS model to provide an objective statistical test of whether an agent exhibits a truly negative initial dose-response trend.
GHS-model estimates of AQ tumorigenic potency in rodents.
MCL = mononuclear cell leukemia, RTAC = renal cell adenoma or carcinoma, HB = hepatoblastoma (benign or malignant), HC = hepatocellular carcinoma, HAC = hepatocellular adenoma or carcinoma.
q = tumorigenic potency (limit on increased risk per unit dose d as d→0); columns 3 and 5 list estimated expected values of q; asterisk (*) indicates 1-tail 95% confidence bound(s); HE q-subscript indicates human-equivalent potency, derived assuming qHE = q(70 kg/w)1/3 where w = adult animal body weight in kg, and w was assumed to be 0.445 and 0.280 kg, and to be 0.050 and 0.055 kg, for male and female rats, and for male and female mice, respectively, used in the NTP (2005) rodent cancer bioassay of AQ.

Cumulative distribution of anti-tumorigenic AQ potency at suppressing spontaneous mononuclear cell leukemia in male and female F344/N rats, based on GHS-analysis of corresponding NTP (2005) cancer bioassay data.
The initial GHS fit to all female rat RTAC data yielded a predicted control incidence rate (7.5%) that, assuming binomial sampling error, is statistically inconsistent (p = 0.00061) with the corresponding historical control rate of 1/901 (NTP 2005, Table B4a). The listed RTAC potency, obtained after deleting the two highest dose groups, yields a GHS fit predicting 0% incidence in the control group, which is statistically consistent with the historical data. The female mouse control incidence rate of 6/48 (12.5%) of HAC or HB (or HAC alone) was significantly less than the corresponding rate of 273/852 (32.0%) exhibited in Battelle Columbus Laboratories’ historical control data (NTP 2005, Table D4a) among untreated female B6C3F1 mice (p = 0.0021, by 2-tail Fischer exact test). The latter control data exhibited identical incidence rates and range for HAC alone or for combined HAC or HB. The control rate listed for these tumors is also less than the historical control rate for female B6C3F1 mice in all NTP contract laboratories, as reported by NTP (2006) (444/1601 = 27.7% for HAC or HB, 443/1601 = 27.7% for HAC alone, p = 0.011 by 2-tail Fischer exact test). Thus, it can be argued that potency information for AQ-induced HAC or HB in female B6C3F1 mice should not be used to estimate potential net potency of AQ in humans, because the corresponding bioassay data are anomalous. However, in the present study, this information was (conservatively) used for GHS-based estimation of net potency, as described in Methods.
The GHS-based estimate of net human-equivalent tumorigenic AQ potency (qHE), shown in Figure 4, has an expected value, and 50th, 95th, and 97.5th values of –0.077, –0.071, –0.0080 and 0.0018 (mg/kg/day)−1, respectively, and indicates that Prob(qHE > 0) = 0.028. The latter probability illustrates how the GHS model can be combined with the method described in the Appendix to assess the likelihood of net positive or net negative potency of an agent or exposure scenario that jointly induces and suppresses cancer risk.

GHS-based estimate of net human-equivalent tumorigenic AQ potency (QHE), calculated as a stochastic difference between (a) the stochastic sum of positive carcinogenic potencies estimated from data indicating the ability of AQ to induce renal tubular cell adenomas or carcinomas in female F344/N rats and hepatoblastomas in male B6C3F1 mice, and (b) estimated AQ potency at suppressing spontaneous leukemia in F344/N rats, after first applying interspecies adjustments to all estimated rodent potencies involved. Dashed vertical line corresponds to QHE = 0.
4. DISCUSSION
The GHS model is simpler to use to estimate BMD and potency aspects of low-dose dose-response than the multi-model BMDS numerical optimization and decision-tree procedure (EPA 2000a, 2010b), for five key reasons:
GHS-based estimates of BMD and potency obtained for simulated non-hormetic data sets were found to be largely redundant, in the sense that these two measures of low-dose dose-response conveyed essentially equivalent information (Figure 1). This result raises a fundamental question concerning the entire rationale of using a BMD approach. If BMD and BMDL can nearly always be predicted very accurately, or otherwise typically at least fairly accurately (within a factor of 1.5), from corresponding estimates of potency and upper-bound potency, respectively, and if substantial deviations from this predictive relationship are in each case statistically supported by available low-dose data, why not base decisions solely on the estimated value or upper bound of the maximum slope (i.e., potency) of low-dose dose-response that is statistically consistent with those data? A key point of the GHS model is that by applying it, relevant available data can be used to estimate potency or upper-bound potency in a way that is statistically consistent with those data. By applying the GHS model, these estimates are not pre-judged by inferring any limitation on their meaning or reliability by reference to a BMD or BMDL that is in turn defined by a BMR selected without reference to the data being fit. A focus on estimating slope per se, which is provided by the GHS approach, does not necessarily imply a belief, or policy of inferring, that a low-dose slope estimate that is statistically consistent with a set of dose-response data is meaningfully interpreted at doses far below the dose associated with the lowest observed response above background. Such a belief or policy is typically adopted for ionizing radiation, radiomimetic chemical carcinogens, and many or most other genotoxic chemical carcinogens (e.g., EPA 2005). For other (presumed “nonlinear”) endpoints, the predictive potency/BMD relationship observed in this study could be applied to implement the usual method of deriving a reference dose/concentration (EPA 2002), and to help formulate unified risk-based approaches to regulating environmental exposures associated with different types of toxic endpoints (NRC 2009).
GHS-based estimates of BMD and potency performed about as well or better than corresponding BMDS estimates, at least for quantal data of the type investigated (Tables 2 and 4). Specifically, the GHS approach clearly outperformed the BMDS approach in estimating potency and its upper confidence limit, which is understandable insofar as the BMDS approach was not designed to estimate this low-dose dose-response characteristic, despite widespread use of the BMDS approach to do just this (see Introduction). The BMDS approach yielded plausibly unbiased estimates of BMD for more (six of seven) types of simulated quantal data than did the GHS approach (four of seven types). However, the GHS approach yielded fairly good BMDL coverage (≥0.89) for all seven data types simulated, while the BMDS approach did so for only four of the seven data types. Risk acceptability decisions based on BMD methods typically are based on potency or reference dose/concentration calculations involving BMDL rather than BMD estimates (e.g., Cal EPA DPR 2004; EPA 2000a, 2002, 2005; CalEPA OEHHA 2009).
BMD estimates and approximately 62% of BMDL estimates obtained by both methods applied to simulated sets of non-hormetic data were highly correlated (Figure 2). Approximately 35% of BMDL estimates obtained by the GHS approach were <75% below (i.e., more conservative than) corresponding BMDS estimates, and approximately 3% of BMDLs estimated by BMDS had very low values far below the corresponding GHS estimates (Figure 2). The BMDS approach failed, on average, about 90% of the time to correctly identify BMDL-associated quantal dose-response models that were used to simulate corresponding sets of non-hormetic dose-response data sets analyzed, which all were generated assuming 50 animals in each of five dose groups (Tables 1 and 3). The sample size used in this study exceeds that of many if not most toxicity data sets used for regulatory risk assessment. The very low rate of successful model identification indicates that even this sample size is far from approximating asymptotic conditions under which maximum-likelihood BMDS methods guarantee a 100% success rate. The low success rate does not indicate any fundamental flaw in the BMDS approach, because it was not specifically optimized to correctly match models used to estimate BMDL with those that actually generate data to be fit. Rather, the low success rate indicates the degree to which model-specific information turned out not to be relevant to estimates obtained by using the BMDS approach.
Modifications could be made to the BMDS-recommended model-evaluation decision tree (EPA 2000a, 2010b) to enhance its reliability and performance. Such modifications were not investigated in view of the adequate performance of the proposed, simpler GHS approach to modeling quantal data, which can readily be extended to address the case of continuous data.
The application of the GHS model to anthraquinone data, together with the net-potency-calculation method described in the Appendix, illustrate how these approaches can be used to help assess whether a net positive or negative cancer risk is posed by environmental or dietary exposures to one or more agents that, singly or jointly, exhibit an ability to both induce and suppress cancer risks. Agents or agent mixtures with such a capacity currently pose a regulatory dilemma, insofar as no consensus exists on how (or whether) to perform quantitative net-potency assessment for such agents, rather than simply ignoring demonstrated tumor-suppressing capacity. From a public health perspective, risk management must necessarily consider indirect (unintended) imposition or augmentation of net expected harm, if and whenever it may be likely to occur—a point emphasized in a key National Research Council recommendation (NRC 1994). The key question upon which a determination of significant net cancer risk hinges is whether or not hypothesized significant anti-carcinogenic effects are plausibly induced at low environmental exposure levels with a linear no-threshold dose-response; that is, with the same type of low-dose dose-response relationship that is typically assumed for many, if not most, chemical carcinogens. This appears to be likely in the case of AQ-induced MCL suppression (see Introduction). For this type of agent, such information must be incorporated into net-potency estimation in order to implement the National Research Council recommendation that quantitative probabilistic approaches be used to ensure that regulatory decisions inflict no public health detriment (NRC 1994).
Footnotes
APPENDIX:
ACKNOWLEDGEMENTS
Many thanks to Paul Booth at Exponent, who prepared DOS batch files that enabled automated applications of EPA BMDS software with multiple sets of input data and model specifications used in this study, and for reviewer comments that improved this manuscript.
