Abstract
Law and science combine in the estimation of risks from endocrine disruptors (EDs) and actions for their regulation. For both, dose–response models are the causal link between exposure and probability (or percentage change) of adverse response. The evidence that leads to either regulations or judicial decrees is affected by uncertainty and limited knowledge, raising difficult policy issues that we enumerate and discuss. In the United States, some courts have dealt with EDs, but causation based on animal studies has been a stumbling block for plaintiffs seeking compensation, principally because those courts opt for epidemiological evidence. The European Union (EU) has several regulatory tools and ongoing research on the risks associated with bisphenol A, under the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) Regulation and other regulations or directives. The integration of a vast (in kind and in scope) number of research papers into a statement of causation for either policy or to satisfy legal requirements, in both the United States and the EU, relies on experts. We outline the discursive dilemma and issues that may affect consensus-based results and a Bayesian causal approach that accounts for the evolution of information, yielding both value of information and flexibility associated with public choices.
Introduction
Causation and its scientific evidence are necessary, but not sufficient, components of risk–cost–benefit analysis in public health. Causation is a key to understand and justify alternative choices of allowable exposure (eg, concentrations, doses, and dose rates). Causal analyses rely on dose–response models that relate exposure (eg, mass/volume) to specific adverse health effects and monotonic dose–response models that connect dose (eg, mg/kg body weight/d) using either cumulative probability of response or percentage effect. Some models are nonlinear (eg, nonmonotonic) at low doses. Causation is part of the legal evidence that justifies the often costly choices to reduce exposure to hazardous agents. An agency or other authority’s standard setting may trigger judicial review by those who believe that the regulation went too far (and thus caused them costly actions) or not far enough (and thus endanger the public). This article deals with the legal–regulatory aspects of exposure to chemical endocrine disruptors (EDs) in the United States and European Union (EU) law by developing its following aspects: judicial acceptance of scientific expert testimony, policy science acceptance of conflicting scientific evidence, aggregation of informed beliefs, choices justified by theoretical principles, and probabilistic causation based on Bayesian reasoning.
The effects of EDs at very low doses are unlike those predicted by the linear no-threshold model (LNT) at low doses (a monotonic function) and its main alternatives, the J-shaped biphasic (hormetic) models for cancer and the U-shaped dose response (both of which are nonmonotonic) for other end points. The reason is that it is increasingly clear that the correct dose–response model for those disruptors is nonmonotonic, due to biological mechanisms that are present at very low levels of exposure, but not at higher levels, as is commonly assumed for carcinogenic and other toxic end points. A class of models 1 that allows for all of these alternative qualitative behaviors may be useful to describe possible counterintuitive properties.
Causal models are essential components of risk–cost–benefit analyses generally required under US environmental and health legislation and European constitutional law, under the Precautionary Principle. Specifically, the EU’s Consolidated Treaties
2
deal with the protection of the environment and public health and “the prudent and rational utilization of natural resources.” It states that: Union policy on the environment shall aim at a high level of protection taking into account the diversity of situations in the various regions of the Union. It shall be based on the precautionary principle and on the principles that preventive action should be taken, that environmental damage should as a priority be rectified at source and that the polluter should pay.
2
The Commission stresses that the precautionary principle may only be invoked in the event of a potential risk and that it can never justify arbitrary decisions. The precautionary principle may only be invoked when the 3 preliminary conditions are met: identification of potentially adverse effects, evaluation of the scientific data available, and the extent of scientific uncertainty.
The European Commission guidance interprets this command by stating that
3
: … the general principles of risk management … are: proportionality between the measures taken and the chosen level of protection, nondiscrimination in application of the measures, consistency of the measures with similar measures already taken in similar situations or using similar approaches, examination of the benefits and costs of action or lack of action, and review of the measures in the light of scientific developments.
The EU and its Member States have different views on the importance of causal evidence and on the dose–response models used to assess risks associated with exposure. For example, if a carcinogen acts directly on a gene, then the assumption is that there is no threshold for that carcinogen. The LNT is used. When the regulation involves a tumorigenic dose (TDx%) that has been determined to cause cancer in 25% (or less) of the animals in a study, the tolerable exposure level for humans is 1 of 1000 times lower than the TDx%. If a carcinogen’s mode of action is epigenetic, it may be characterized by an experimental threshold (and thus the cognizant public agency may use factors of safety to yield a tolerable dose for humans). Some Member States do not use these approaches but rely on the scientific consensus about the danger from exposure. 6 The difference in setting acceptable or tolerable doses is that some Member States use an LNT model, whereas others opt for the no observed adverse effect level or the LOAEL, and thus obtain thresholds. That is, the experimental exposure is decreased, through factors of safety, to establish a legally justified acceptable exposure. For example, under the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) Regulation, discussed later, the derived no-effect level (DNEL) “represents a level of exposure above which humans should not be exposed.” 7 When no DNEL can be derived, “REACH requires a qualitative assessment to be performed.” 7 Also, “for non-threshold endpoints, if data allow, the development of a (semi) quantitative reference value (the DMEL, derived minimal effect level) may be useful.” 7
Endocrine disruptors are ubiquitous and can be introduced in the body through various routes; a key mechanism of action is preventing a natural hormone to bind with its receptor. A vexing issue with these disruptors is their higher potency at low doses rather than at higher doses.
8
This would prevent the inclusion of either a threshold or a J-shaped biphasic mechanism in which the descending arm of the J-shaped curve implies some reduction in the percentage response. Its ascending arm depicts increasing percentage adverse response. In the EU, these and many other toxicological findings have led to calls for a strategy that goes beyond REACH as well as bans on bisphenol A (BPA) by a number of Member States of the EU. For instance, European Food Safety Authority (EFSA)
9
reconfirmed a TDI of 0.05 mg/kg body weight, although, in the United States, the US Food and Drug Administration (FDA) considers BPA to be safe. The EFSA’s reevaluation of BPA states that (emphasis omitted)
10
: EFSA’s comprehensive re-evaluation of … (BPA) exposure and toxicity concludes that BPA poses no health risk to consumers of any age group (including unborn children, infants and adolescents) at current exposure levels. Exposure from the diet or from a combination of sources (diet, dust, cosmetics and thermal paper) is considerably under the safe level (the “tolerable daily intake” or TDI).
Studies conducted with low and high dose of BPA show effects at the low dose that are not apparent after exposure to the high dose, unlike the great majority of toxic chemicals. 11 Very low levels of BPA, through several pathways, can stimulate cellular response. Hence, this compound can be much more potent than previously thought, thus pointing to a revised TDI, which is what the EU is doing. Regarding dose–response mechanisms for EDs, different pathways lead to different nonlinear dose–response curves (for the same outcome). Others believe that the descending and ascending arms of a U-shaped dose response can be due to different processes, and hence, the curve itself consists of 2 different processes, each of which is monotonically increasing and decreasing 12,13 : a proliferative mechanism in one region and an inhibitory one in the other. 14
Perspective on Public Regulatory Actions: Judicial Review
One of the clearest enunciations of practical risk-based policy analysis is that of the US EPA
4
: conducts risk assessment to provide the best possible scientific characterization of risks … on a rigorous analysis of available information and knowledge, … a summary of the confidence or reliability of the information available to describe the risk, … … can help guide risk managers to decisions that mitigate environmental risks at the lowest possible cost and which will stand up if challenged in the courts.
In Industrial Union Department. AFL-CIO v. American Petroleum Institute (448 US 607, 1980), the Court held that the Occupational Health and Safety Administration (OSHA) had the burden of showing that it is at least more likely than not (greater than 50% evidence) that long-term exposure to 10 ppm of benzene presents a significant risk of material health impairment. The Court required OSHA to develop better evidence of leukemia from occupational exposure to airborne benzene, concluding that “safe” is “not equivalent to risk-free.” Yet, the Court also held that the significance of risk is not a mathematical straitjacket, and OSHA’s findings of risk need not approach anything like scientific certainty. Moreover, the Court stated that “the reviewing court must take into account contradictory evidence in the record…, but the possibility of drawing two inconsistent conclusions from the evidence does not prevent an administrative agency’s findings from being supported by substantial evidence.”
However, the Court did not provide the risk acceptance criterion of “risk significance” (which is a number that US federal agencies provide, such as 1 in a million lifetime probability of cancer from environmental exposure).
15
The Court held that when statutes contain terms such as “substantial release and significant amounts, an agency must establish a rational connection between the facts … and the choices.” The “rational connection” implies a weak (constitutional law) standard of review that ends up being deferential to agency rulemaking.15,16
In another case, Baltimore Gas and Electric, Co v. Natural Resources Defense Council, the Court unanimously reversed the DC Court of Appeals judgment that the US Nuclear Regulatory Commission (US NRC) had acted “arbitrarily and capriciously” (note 1). The Court held that a “most deferential” approach should be given an agency engaged in making legitimate predictions of risks that fell “within its area of expertise at the frontiers of knowledge, and when the resolution of … fundamental policy questions lies … with … the agency to which Congress has delegated authority.” The US NRC won. And, a US Court of Appeals held that the US NRC could adopt conservative assumptions “risking error on the side of over-protection rather than under-protection… when those assumptions have scientific credibility” (note 2). This result was a “satisfactory basis” for finding an “unreasonable risk” (statutorily defined in 5 USC §2058(f)(3)(A)). This court accepted the Commission’s bounding that a risk “somewhere between one in two thousand and one in fifty million, is appropriately left to the Commission’s discretion, so long as it was reasonable.” “Reasonable” is clarified in Chevron v. Natural Resources Defense Fund (note 3), where the Court stated that: The Administrator’s interpretation represents a reasonable accommodation of manifestly competing interests and is entitled to deference: the regulatory scheme is technical and complex, the agency considered that matter in detailed and reasoned fashion, and the decision involves reconciling conflicting policies … Judges are not expert in the field, and are not part of either political branch of the Government… When a challenge to an agency construction of a statutory provision, fairly conceptualized, really centers on the wisdom of the agency’s policy, rather than whether it is a reasonable choice within a gap left open by Congress, the challenge must fail.
European Union: Aspects of EDs Regulatory Law
In the EU, “all human health risk assessments of chemicals include hazard identification, dose-response assessment, exposure assessment and risk estimation/characterization.” 18 What is the flexibility inherent to using alternative dose–response models? Dose–response assessment consists of the mechanistic formulation and estimation of the parameters of the function or relationship between dose, or level of exposure, to a substance and the incidence of diseases. Hence, a choice is implicit, but how is it achieved? An indication of how the EU might deal with EDs, and thus BPA, is suggested by the EU Water Framework Directive (2000/60/EC), which looks at them as substances of equivalent concern to substances of relevance to the REACH Regulation. 7 Endocrine disruptors are referred to in several EU regulations, for instance, (1) REACH (EC No 1097/2006), (2) the Plant Protection Products Regulation (EU No 1107/2009), (3) Biocidal Products Regulation (EU No 528/2012), and (4) the Cosmetics Regulation (EU No 1223/2009). Specifically, chemical agents should be identified as EDs through well-enunciated criteria for identification and approved testing methods/methods of detection and any other aspect of best practice and consistency with the state of the science (eg, Regulation EEC No 2377/90 [EU]). The REACH (EC No 1097/2006; EC 2006) illustrates key aspects of EU’s regulatory law (through its secondary legislation). Because REACH is a Regulation, every Member State of the EU has to integrate it in their national legislation exactly as it is, unlike a Directive. The REACH is explicit 7 : “This Regulation should ensure a high level of protection of human health and the environment as well as the free movement of substances, on their own, in preparations and in articles, while enhancing competitiveness and innovation. This Regulation should also promote the development of alternative methods for the assessment of hazards of substances.” The REACH also states that 7 : “it is based on the principle that it is for manufacturers, importers and downstream users to ensure that they manufacture, place on the market or use such substances that do not adversely affect human health or the environment. Its provisions are underpinned by the precautionary principle.”
This approach arises from a very different command than of US laws because the Precautionary Principle is a fundamental constitutional principle to justify environmental choices. This Principle may have the effect of trumping the choice of not acting because of scarce information on potential public exposure to agents that could cause serious or irreversible harm. For example, if the EU were to use the LNT because of lack of certainty about a mechanism of action associated with an ED, that choice would prima facie appear to be conservative and thus protective. Yet, this causal model is being increasingly demonstrated to be less conservative for EDs, due to fact that they cause increasing harm at very low doses, larger than that predicted by the LNT. The REACH 7 also defines the role of the European Commission in assessing the evidence about EDs in Article 138(7): “By 1 June 2013 the Commission shall carry out a review to assess whether or not, taking into account latest developments in scientific knowledge, to extend the scope of Article 60(3) to substances identified under Article 57(f) as having endocrine disrupting properties. On the basis of that review the Commission may, if appropriate, present legislative proposals.” For our work and suggestions contained in later section of this article, that basis includes 7 :
… hazard identification for the effect based on all available non-human information;
—the establishment of the quantitative dose (concentration)—response (effect) relationship.
…
When it is not possible to establish the quantitative dose (concentration)–response (effect) relationship, then this should be justified and a semiquantitative or qualitative analysis shall be included…
Unfortunately, the devil is in the details, and these do not appear to be specified. For example, under REACH
7
: If one study is available, then a robust study summary should be prepared for that study. If there are several studies addressing the same effect, then, having taken into account possible variables (eg conduct, adequacy, relevance of test species, quality of results, etc.), normally the study or studies giving rise to the highest concern shall be used to establish the DNELs, and a robust study summary shall be prepared for that study or studies and included as part of the technical dossier. Robust summaries will be required for all key data used in the hazard assessment. If the study or studies giving rise to the highest concern are not used, then this shall be fully justified and included as part of the technical dossier, not only for the study being used but also for all studies demonstrating a higher concern than the study being used. It is important irrespective of whether hazards have been identified or not that the validity of the study be considered.
These qualitative clarifications can be expressed formally and thus limit possible ambiguities. This reasoning should be extended to empirical results that—for some EDs—confirm a region of probable supralinearity of the dose response, provided that the scientific evidence on point produces sound theoretical understanding of the biological pathways and empirically validated experimental results applicable to humans (if that is the species of concern). Later sections of this article explain and exemplify how these aspects comply with the full extent of the Precautionary Principle, as the Treaty on the Functioning of the European Union (TFEU) commands.
Endocrine Disruptors: Admissibility of Scientific Expert Opinions in Judicial Proceedings
Endocrine disruptors in the US State Courts
Scientific experts deal with causation in both regulatory law and in litigation designed to provide monetary or other relief to individuals who believe that they were harmed by exposure to EDs. How to deal with uncertain evidence of cause and effect for diethylstilbestrol (DES) was assessed in a California case, Sindell v. Abbott Laboratories (26 Cal.3d 588, 163 Cal. Rptr. 132, 607 P.2d 924, cert. denied, 449 U.S. 912, 101S.Ct. 285, 66L.Ed.2d 140, 1980). There, the California Supreme Court developed a then novel legal theory: market share liability. The California Supreme Court held that a plaintiff could recover by showing that her injuries had been caused by DES and by joining as defendants “the manufacturers of a substantial share of the DES which her mother might have taken.” Each DES manufacturer becomes “liable for the proportion of the judgment represented by its share of [the] market unless it demonstrates that it could not have made the product which caused plaintiff’s injuries.” The New York Court of Appeals adopted Sindell in Hymowitz v. Eli Lilly and Co, 73N.Y.2d 487, cert. denied, 110S.Ct. 350 (1989). Causation specific to EDs was addressed in Beck v. Koppers Inc, (2006 US Dist. Lexis 2551 (N. D. Miss. 2006)), where it was claimed that dioxins cause an increased risk of breast cancer.
Epidemiological evidence is critical to a finding of cause and effect that is accepted as sound scientific evidence by the courts (eg, Brock v. Merrel Dow Pharm. 884 F.2d 166 [5th Cir. 1989]; Chambers v. Exxon Corp. 81 F. Supp. 2d 661 (M. D. La. 2000, affirmed 247 F.3d 240 [5th Cir. 2001]) seeking to resolve civil litigation in toxic torts (while animal studies may suffice for regulatory law choices). Although we cannot discuss how animal studies (as well as other studies) can establish causation, absent epidemiological evidence, if the exposure to which animals are subject is different from its human equivalent, US courts will generally not accept that evidence as being analogue of human exposure and response, as held by the US Supreme Court in General Electric Co. v. Joiner, 522 US 136 (1997). It found that expert testimony relying on that evidence was not reliable. Moreover, interspecies conversions are also deemed to be unreliable (In re Human Tissue Product Liability Litigation, 582 F. Supp. 2d 644 [D.N.J. 2008]), as are comparisons based on structure activity (McClain v. Metabolife, 401 F. 3d 1233 [11th Cir. 2005]). Clearly, scientific evidence is introduced by experts who justify their choices and conclusions in administrative proceedings, such as Science Advisory Boards or in Congressional testimonies, as well as in trials. Hence, it is also important to understand how expert opinions are evaluated before being introduced in a trial, where that evidence undergoes further scrutiny and rebuttals through testimony by expert witnesses.
US Supreme Court: Admitting Expert Testimony in Judicial Proceedings
In 1993, the US Supreme Court, in Daubert (note 4), addressed which scientific results could be allowed in court, before trial (admissibility hearings precede trial and are under the exclusive jurisdiction of judges). Before Daubert, epidemiological evidence of the effect of Bendectin (note 5) based on chemical structure activity, in vitro, animal tests, and recalculations that had not been peer reviewed did not meet the legal test that controlled the admissibility of scientific evidence: the Frye test (note 6). It is a test of “general admissibility,” stating that: (j)ust when a scientific principle or discovery crosses the line between the experimental and the demonstrable stages is difficult to define. Somewhere in this twilight zone the evidential force of the principle must be recognized, and while the courts will go a long way in admitting expert testimony deduced from well-recognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs. … the “general acceptance” test is not a necessary precondition to the admissibility of scientific evidence under the Federal Rules of Evidence, but the Rules of Evidence—especially Rule 702—do assign to the trial judge the task of ensuring that an expert’s testimony both rests on a reliable foundation and is relevant to the task at hand. Pertinent evidence based on scientifically valid principles will satisfy those demands.
Neither scientific nor legal causation can hope for complete certainty. Statistical evidence alone, even when based on well-conducted studies, can be rebutted at trial using contradictory scientific theories and data. The strength of the legal causation combines scientific evidence and the legal standard of proof appropriate to the case—if and only if that evidence has passed the burden of admissibility. In the United States, when scientific uncertainty about causation was stated at trial as a “possibility,” it was held to be insufficient (note 7) to demonstrate legal causation by the “preponderance of the evidence” test (the “more likely than not,” >50%, test) (note 8). Of course, probabilistic reasoning alone will not be adequate for correctly integrating heterogeneous evidence and complex causal models of disease. As can be inferred from our discussion of US law, the acceptability of new science by the courts is a serious concern due to the inherent resistance of the judicial to accepting new methods (eg, the doctrine of stare decisis) and the fact that judges are the “gate-keepers of science.”
Policy Science Controversies for BPA
We summarize aspects of the reasoning affecting policy debates concerning the effects of BPA 14 in Table 1. To clarify, we use a citation 14 to a US FDA statement that this agency believes that “there is a large body of evidence that indicates that FDA-regulated products containing BPA currently on the market are safe and that exposure levels to BPA from food contact materials, including for infants and children, are below those that may cause health effects.” To reach this conclusion, the FDA had relied on 2 animal studies. In the first, its authors concluded that there was no effect from BPA exposure. 21 However, because the Sprague Dawley rat is insensitive to estrogens and the study design lacked positive controls, whether BPA had no effect on the multiple generations of animals exposed or whether the rats were simply insensitive to estrogens is not decidable. 22 The other study did not find low-dose effects of BPA. 23 However, this study had questionable positive controls. 24 There is disagreement at the consensus level, as a subcommittee of the US FDA in 2008 (Science Board Subcommittee on Bisphenol A) concluded that “coupling together the available qualitative and quantitative information (including application of uncertainty factors) provides a sufficient scientific basis to conclude that the margins of safety defined by FDA as ‘adequate’ are, in fact, inadequate.” 25 A summary discussion of the effects of EDCs states that it can be estimated “with confidence” that 20 :
Implications of BPA-Related Controversies Regarding Evidentiary Issues of Cause and Effecta on Regulatory and Tort Law in the United States and the EU.
Abbreviations: BPA, bisphenol A; DES, diethylstilbestrol; EU, European Union; PBPK, physiologically based pharmacokinetic.
a Developed from Vandenberg et al 14 ; citations from her work.
Effects will be due to “multiple hits” of environmental exposures and may occur only after a latent period of months to decades, requiring a lifespan research approach, including prospective human studies. There are multiple specific windows of enhanced susceptibility to metabolic disruptors across the lifespan, including paternal, in utero, early childhood, prepuberty, pregnancy (for the mother) menopause, and aging. Development, in utero and during the first few years of life, is the most sensitive window of susceptibility for metabolic disruption. The 2 sexes show differential susceptibility to metabolic disruption as well as different critical windows for, and different effects of, exposure. Understanding environmental effects on these diseases requires sensitive measures of personal exposures and sensitive end points to identify phenotypes. Effects of EDC exposure will vary depending on cooccurrence of other environmental stressors such as prescription drugs, sleep, hypercaloric diet, activity, stress, socioeconomic status, infections, microbiome, anxiety–depression, and so on, requiring a detailed analysis of potential interacting and confounding factors.
In judicial processes, the probative value of the evidence provided by both parties is assessed individually and then heuristically aggregated by decision makers (a jury or one or more judges). In agency rulemaking, that evidence is aggregated by agency experts so that a subset of that evidence is used numerically to establish a standard. We will turn our attention to this aspect next, focusing on regulatory law, rather than judicial aggregation, by outlining a quantitative approach to how to aggregate experts’ opinions that allows a formal justification for any asserted consensus or, more precisely, aggregate opinions because it makes them transparent. We suggest that this aspect has been neglected but is needed. For example, in the EU, there are several initiatives (not yet completed) regarding the robustness, openness, and transparency of scientific assessments that directly affect the regulation of EDs. A critical aspect is coordination and consistency of the assessments conducted within a set of initiatives collectively falling under the PROMETHEUS project. The EFSA suggests a thorough emphasis on probabilistic reasoning (frequentistic and subjective). 10 Additionally, EFSA discussions suggest a differentiation between concepts such as “weight of evidence” 10 and the US EPA’s concept of “evidence integration” under its Integrated Risk Information System (IRIS) program. 26 The EFSA is concerned with the balance of the overall scientific evidence. 10 In particular, for EDs, there are different sources of uncertainty (eg, diet is more certain than nondietary exposures). It considers the strength of that evidence (eg, quality and confidence) and its dynamics (eg, how changes can affect a decision). Although these concepts aid the choice of the evidence to be used in setting standard by providing well-documented knowledge, they can be strengthened by additional quantitative analyses.
Consensus-Based Aggregation of Evidence
A limited search of the several government Web sites does not disclose the details needed to understand how consensus is actually reached. Thus, it is difficult to discern voting criteria (other than majority-type criteria), how votes are counted and weighted (eg, Borda counts), and the details of aggregating expert opinions. Table 2 contains the typical elements of a situation in which 3 experts are assessing the evidence before them. The experts are independent and should vote on all of the elements of a causal argument: antecedent, logical connection, and consequent. For simplicity, we use Boolean (true [T] or False[F]) states, do not include probabilities, use a logical connections (AND), and a single If… Then statement. The aggregation of the experts’ judgments rule is by simple majority over all of the 3 components of the choice (using 3 rather than more experts is a minor simplification). The important issue regarding aggregation is the “discursive dilemma”: results are false, under the majority rule, whereas the same rule makes the process true and the premises true. All of the information summarized in Table 2 are essential for understanding consensus-based choices. The solution to this dilemma requires considering alternatives that are beyond our discussions. 27,28 Moreover, when there are 3 or more voters, it is well known that only a dictatorial rule is satisfactory if voting (which is an aggregation of individual preferences) has to meet principles such as unanimity, anonymity, monotonicity, and systematicity. 28
Individual Opinions and Aggregate Results—the Discursive Dilemma.
Combining data through different mathematical operators include intersection (the min), various types of averages, and union (the max; note 9). Averages (note 10), from the arithmetic mean to the weighted generalized average, fall between the results obtained using intersections and unions: Conjunctive (union; logical AND; t-norms; Min): given partial scores, the aggregate score will be high iff the partial scores are high. Disjunctive (intersection; logical OR; t-conorms; Max): given partial scores, the aggregate score will be low iff partial scores are low. Limitations similar to conjunctive aggregation operators apply. Compensatory (eg, averaging operators): low and high scores compensate each other. Averaging operators are monotonic and idempotent. Order statistics apply to cardinal information, and these operators are idenpotent, continuous, monotonic, neutral, and compensative. Noncompensatory (symmetric sums): scale reversal does not affect the results. These operators are continuous, nondecreasing, and commutative.
Computational aggregation has formal properties such as closure, continuity, idempotency, commutation, and others. Closure means that an uncertain number results from the aggregation of uncertain numbers. Idempotency means that an operation on uncertain quantities yields the same uncertainty. Finally, the commutative property implies order independence, and continuity implies that a small change in a quantity does not affect the final result.
Discussion
Ideally, Aumann’s 29 proofs that “like-minded” decision makers “cannot agree to disagree,” provided they have the same information knowledge (encoded by priors and likelihoods), should be the practical terminal target of the trajectories shown in Figure 1. However, several researchers and practitioners have shown that Aumann’s convergence does not occur and that rational behaviors (by 2 or more stakeholders with independent of beliefs) do not converge because of different strategies taken by those actors to achieve their goals. We assume that a public decision has a common basis of scientific facts and knowledge and that these are available to all parties. Figure 1 is a hypothetical description of asymmetries regarding the eventual convergence of the trajectories of concern, knowledge, and an aggregate measure of epistemic uncertainty about that knowledge, conditioned on the nature of the hazard and time. The terminal state is labeled “truth.” The time horizon is the interval of time characterized by an initial policy concern about a hazard and its eventual regulation through any legally based process. We omit, for brevity, the fact that standard setting processes involve changes subsequent to the enactment of the standard, some of which are induced by litigation or by legislatively mandated revision that account for new scientific or technological developments. It follows that each trajectory is not deterministic.

Plot of uncertainty and knowledge over the time horizon of the concern.
Finally an aspect of these discussion has to do with (1) agreement/disagreement and (2) asymmetric information may be available to stakeholders, contrary to Aumann’s ideas. Flexibility is the ability to consider alternative theories and decide on one to be optimal even though others disagree. This implies an explicit, formal analysis of the amount of information and knowledge. Often, depending on who the stakeholders are and their level of access to research funds, the information/knowledge/processing triplet can be asymmetric: some have more than others, and thus, uncertainty is higher. The higher the uncertainty, the greater the value of flexibility: “it is a question of the optimal course of action, with insufficient time or objective data for beliefs to converge.” 30 Flexibility, however, has nothing to do with asymmetry of information because, even when there is symmetry of information and knowledge, beliefs may not converge. Therefore, commonality of interests does not guarantee that beliefs converge.
Informing the Public
Both the EU and the United States have legislation that commands public involvement and information in regulatory choices. The Office of Management and Budget (OMB, an agency of the White House) established Agency Information Quality Guidelines that control the collection, processing, and dissemination of information that has to do with risk assessment.
31
The OMB refers to the Safe Drinking Water Act (SDWA) as the gold standard for justifying public decision making based on the minimization of risk (note 11). There, “influential information” is defined to be that scientific, financial, or statistical information, which will have or does have a clear and substantial impact on important public policies or important private sector decisions. Moreover, the OMB Guidelines also allow an individual to bring civil law suits to challenge the value of the influential information, including risk assessments. The SDWA (administered by the EPA) applies to the OMB’s influential information, as follows
32
: Use of science in decision making. In carrying out this section, and to the degree that an Agency Action is based on science, the Administrator shall use: the best available, peer-reviewed science and supporting studies conducted in accordance with sound and objective scientific practices and data collected by accepted methods or best available methods (if the reliability of the method and nature of the decision justify use of the data). Public information.… the Administrator shall ensure that the presentation of the information on public health effects is comprehensive, informative, and understandable. The Administrator shall, in a document made available to the public in support of a regulation promulgated under this section, specify, to the extent practicable: each population addressed by any estimate of public health effects, the expected risk or central estimate of risk for the specific populations, each appropriate upper-bound or lower-bound estimate of risk, each significant uncertainty identified in the process of the assessment of public health effects and studies that would assist in resolving the uncertainty, and peer-reviewed studies known to the Administrator to support, are directly relevant to, or fail to support any estimate of public health effects and the methodology used to reconcile inconsistencies in the scientific data.
Regulation Should Follow Decision Theoretic Principles
Causal arguments directed to justify EDs policy combine qualitative and quantitative descriptions because: causal explanations must be given to the stakeholders, laws and regulations are based on scientific and legal causation, scientific reasoning and explanations are or attempt to be causal, stakeholders need to know how and why risky events can generate adverse consequences, risk factors are used in apportioning liability to the sources of the hazard or hazards, and risk reduction and minimization actions require causal knowledge to be fair and equitable.
The EFSA apparently used methods such that uncertainties are analyzed one by one and combined with expert judgments to yield a final TDI value. We suggest that the criterion selected to justify the results of any final (in the regulatory sense) reevaluation of BPA might be based on a specific decision theoretic criterion, which would support informed judgments that yield the final regulatory number sought for BPA. The reason is that different criteria yield different solutions, but each has formal properties that may be preferable on a case-by-case basis.
We disassociate “choices” (the items under analysis from which an optimal or preferred one is demonstrably superior to other choices being analyzed) from “decisions.” Decisions are public actions undertaken by public decision makers who are legally bound to make them and who may be prosecuted for failure to act. Those decisions may or may not conform with the results of the analysis of choices and their optimality or preferential rankings and can be scrutinized by a court. Some elementary analytical criteria for justifying a choice, from a set of possible choices, properly to inform decision makers 33 -35 are briefly discussed next, as means to provide an initial answer to policy science issues developed throughout this article.
Several steps for modeling causation include
36
: identify a consistent, nonrandom association between exposure and response, identify, explain, and include, in the causal system, the physics of the relation, eliminate, or at least explicitly account for, the effect of confounding factors, eliminate, or at least explicitly account for, sampling, information, and modeling biases, test and confirm temporal precedence and conditional independence, develop, explain, and confirm the effect of policy interventions through changes in the value of the variables affected (for a baseline causal model) on response.
For simplicity, uncertainty is handled through probabilities and their calculus. It suggests the following protocol: define and identify the boundaries of the risky choice or problem, define working hypotheses and conjectures and give a qualitative description of the processes leading to all relevant outcomes, determine the state of knowledge about the decision process being studied, assess the need for additional information and the stopping rules to limit additional costly information, based on value of information and value of flexibility calculations, use experimental data to provide likelihoods (conditional probabilities), choose the criterion that is consistent with risk reduction (eg, minimax cost, minimax regrets, and maximize expected monetary value or utility) and explain the rationale for the choice of criterion for selecting an alternative,
37
identify the optimal choice, communicate it to the decision maker, resolve outstanding issues, and conclusions consisting of joint or marginal probability distributions and their (appropriate) moments, as needed by the stakeholders.
The combination of these 2 sets leads to an analysis of choices that can have different implications for informing decision makers. An analyst may use criteria such as the maximization or minimization of an objective function (eg, min [expected number of malformations in a population at risk]). Often used alternative criteria are as follows.
Max (expected utility) or max (expected net monetary values) criterion. The theory is expected utility (EUT) which is rational. That is, if its axioms (note 12) are accepted, then the best choice that the single decision maker can make is the one that has the highest expected utility (or positive expected monetary value). Choices are described as probabilistic gambles (in which [0 ≤ probability ≤ 1]). A theorem demonstrates that the optimal choice based on EUT is guaranteed by the decision maker consistency with those axioms.
38
If monetary values are used instead of utilities, the criterion is the maximization of the net expected benefits from each action, over all possible actions. Because empirical evidence indicates that individuals often do not seek to maximize their expected utility, the criterion of maximization of the expected utility is descriptive but weakly predictive. The empirical findings that demonstrate violations of the axioms and assumptions that characterize EUT theory have led to several new theoretical variants, such as prospect theory. Those weaken one or more of the axioms (eg, the independence axiom in weighted utility theory and in rank-dependent utility theory) and are more consistent with human behavior.
Maximin, maximax, and minimax criteria. The maximin criterion consists of selecting the choice that is the maximum of the minima payoff of the choices available. The maximax criterion selects the maximum of the maxima. An alternative criterion is to minimize the maximum loss, this is the minimax criterion. Although these criteria are deterministic, they may be used in situations where there is considerable uncertainty and the decision maker feels uncomfortable in assigning probability numbers to outcomes. Probabilities are not used in these calculations, and the choice is said to be made under uncertainty.
The pessimism–optimism criterion. This criterion uses a coefficient (bounded between 0 and 1, included but it is not a probability measure) to capture attitudes that fall between pessimism and optimism. It includes the maximin and maximax criteria and is neither deterministic nor probabilistic.
39
In the next section, we suggest how a Bayesian approach meets these desiderata, particularly when the model consists of a Bayesian network (BN) that allows feedbacks. We also note that Bayesian methods combine expert opinions (as prior probabilities or distributions) with empirical results (as likelihoods). Decisions taken by duly empowered decision makers are informed by probabilistic analysis that develops the optimal or preferred choice, selected on criteria such as the EUT, which is then provided to the decision maker for his or her consideration but may not be used because other factors (eg, geopolitical) may affect his or her final decision.
Probabilistic Causation: Outline of a Bayesian Solution
Probabilities or probability distributions represent uncertain knowledge and beliefs. Specifically, prior probability distributions (such as density functions for continuous data and probability mass functions for discrete distributions) represent prior knowledge, information, judgments, and beliefs for the independent variables.
40,41
Likelihood functions represent empirical evidence. All empirical data and modeling information are summarized by likelihood functions. Given a probability model pr (y; x, b), the corresponding likelihood function for the parameter
Updating prior beliefs with new evidence uses likelihoods. Given the prior beliefs F ( the posterior beliefs (F|L) from prior beliefs that are encoded in F and the assumptions and evidence encoded in L, which depend on the probability model, pr, short for pr(
In practice, uncertainty about the correct model, pri, out of several alternative models, is often the largest affecting the analysis. To account for it, let {pr1,…, prn} denote the set of alternative models that are known to be (or are considered) mutually exclusive and collectively to exhaust all possible probability models. Let L1,…, Ln denote the likelihood functions for alternative models, and let w1,…, wn be the corresponding judgmental probabilities, also called weights of evidence, that each model is correct. If the models are mutually exclusive and collectively exhaustive, these weights must sum to 1. The posterior probability distribution that is obtained from the prior F, data ( Showing the distribution of values for the true parameter b (or vector Distinguishing the contributions of different sources of evidence and identifying specific areas where additional research is most likely to make a significant difference in reducing final uncertainty. This deals with the uncertainty of model building. Despite the advantages, probabilistic methods have limits to their ability to represent uncertainty. When more than one decision maker is involved, the unified probabilistic presentation of different types of uncertainties (eg, observer-independent stochastic characteristics with theoretical assumptions, subjective judgments, and speculations about unknown parameter and variable values) is as much a liability as it is an asset.
42
There are disadvantages. Two individuals with identical but incomplete objective information might express their beliefs with different prior probability distributions. Moreover, probability models cannot adequately express ambiguities about probabilities. For example, an estimated probability of .50 that a coin will come up heads on the next toss based on lack of information may not be distinguished from an estimated .50 that is based on 10 000 observations. The Bayesian view is that an analyst normatively should use either his or her own knowledge and beliefs to generate a probability model when objective knowledge is either incomplete or even inadequate. The opposing view is that the analyst has no justification, and should not be expected or required, to provide numbers in the absence of substantial and relevant knowledge. When the correct model is unknown and multiple models and weights are used, or when multiple sources of evidence giving partially conflicting posterior probabilities are combined, the resulting aggregate posterior probability distribution is ambiguous. An infinite variety of alternative models and weights are mapped by the formula [w1 (F|L1) + , … , + wn(F|Ln)] onto the same aggregate posterior probability distribution for the risk estimate. A partial solution to these problems is to present posterior distributions and corresponding weights for each model separately. Some of the knowledge used to draw practical conclusions about risks can be abstract or nonquantitative. Other aspects of qualitative knowledge can be used to constrain probabilistic calculations but cannot be represented by probabilities. A set of mutually exclusive, collectively exhaustive hypotheses about the correct risk model is seldom known, making the use of probabilistic weights of evidence for different possible models inexact. Probability models inherently make the “closed-world assumption” that all the possible outcomes of a random experiment are known and can be described (and, in Bayesian analysis, assigned prior probabilities). This assumption can often be unrealistic because the true mechanisms may later turn out to be something entirely unforeseen. Conditioning on alternative assumptions about mechanisms only gives an illusion of completeness when the true mechanism is not among those considered.
Overall, the usefulness of Bayesian analysis is that it provides a formal method for updating scientific knowledge by requiring the researcher to think in probabilistically about events and the causal structure linking them. The analyst must disclose the reasons for his or her choice of prior distribution, shown them through the form of the prior distribution, and clarify them by giving the reasons for adopting that distribution over its alternatives. This requires the risk assessor having to specify how past information can be folded into a distribution function and assess the impact on the likelihood, which links the sample design to the structure of the model most likely to be determined by that sample. Finally, the analysis is transparent and can be discussed for lack of completeness, arbitrariness of assumptions, and adequacy of experimental results and links directly to value of information and flexibility. 43,44 The analysis is independently replicable, using alternative models, and the difference between the results can be studied under a common and axiomatically correct methodology.
The last step provides further integration through probabilistic networks at various biological levels and their combination across those levels. It begins by developing the network that plausibly represents the disease of concern, choose probabilities to characterize uncertainty, conditioning them on current information and knowledge, and infer accordingly. In short, a Bayesian directed acyclic graph (DAG), and a modification to account for feedbacks, describes the key subprocesses causing a disease. Bayesian networks rely on DAGs to represent relationships between random variables. Acyclic implies no feedbacks, 40 a condition that can be relaxed (note 13). The structure of the graph is based on known or hypothesized relations (eg, a node, an arc, and another node). Bayesian networks describe probabilistic conditioning and thus the dependence–independence structure among the variables of the network. There must be “meaningful directionality” between the variables in the network. Testing “whether a proposed set of causal relationships is consistent with the available temporal probabilistic information” can be obtained with BNs. 40 An advantage of BNs arises from the causal meaning of the directed arcs. If S1 represents a BN, S2 another BN, and if pr(S1|D) is larger than pr(S2|D), then the change in probability makes the causal link stronger. A further advantage of a BN is the concise and comprehensive representation of the relationship between the risk factors. However, the “close-world” assumption, assuming knowledge of the distribution of each random variable, can be too demanding. This assumption involves knowing all relevant factors and separating them into the causally relevant, background factors and irrelevant factors. An aspect of this analysis that is useful in deciding between competing and alternative causal structures uses the principles of “stability” (meaning that there is an isomorphism between 2 structures) and “minimality” (meaning that the least complex of 2 structures, T1 and T2, given the same data, is preferred). Minimality relates to the mathematical form and number of variables of the causal network. Stability relates to the probability of events in T1 that make events in T2 more probable; it refers to lack of extraneous probabilistic conditional independences in a BN. 40 This principle states that it is most improbable that the 2 competing structures overlap: a causal structure is minimally relative to a larger set of potential structure if no element of the minimal structure is preferred over those of any other structure in the class of structures considered: minimality and stability assess the uniqueness of the network. This is one of 2 important aspects of obtaining a solution, the other being its existence.
Conclusions
We believe that society can be better served when the full protocol used to decide on what is asserted through consensus is independently replicable and that its key axioms are tested. For instance, to the extent that the focus of this EFSA’s efforts is both for internal use as well as a means to inform scientific advisory group, the discussions we provide can be useful toward formulating rigorous science-policy choices. We have linked legal and scientific aspects of causation using the United States and the EU as paradigms to help public decision making to limit chemical exposures, in particular EDs (BPA being prototypical). We conclude that US federal administrative law has developed to the point that it has the necessary and sufficient qualitative aspects to deal with human health risks associated with exposure to EDs. This is not yet the case for the EU as its efforts are still under development. We suggest that the EU should take note of the ways in which the United States has dealt with the admissibility of scientific evidence—in terms of relevance and reliability—and how it has balanced that evidence, a major concern of EFSA and US agencies. To advance this discourse, we conclude that a plausible way to deal with uncertain causation is provided by Bayesian methods, which we briefly outline. The reason for focusing on those methods is that they are understood by the courts, are used by agencies of the government and are prevalent in causal reasoning about uncertain exposure and response, conditioned on one or more biological or other mechanisms. Hence, these methods are established even though they may not be the best under all uncertain situations.
We find that consensus guides public policy choices—obtained though their formal analysis—to inform decision makers and stakeholders. The protocols for assessing evidence for or against a particular effect of exposure canonically consist of premises, rules, and conclusions. We offer initial guidance as how to formally frame the discourse that leads to an assertion of effect from exposure. This is a critical aspect in supporting a policy because the aggregation of experts’ opinion results in several paradoxes. One of those discussed, simple majority, can be robust under very specific circumstances. However, most alternative voting criteria involve a number of complexities that, paradoxically as well, generally lead to a dictatorial solution.
As both case and regulatory law indicate, the preferable form of evidence is human exposures and responses through a well-specified probabilistic model that can be used for prediction. Hence, regarding both factual and theoretical evidence, we conclude that epidemiology—although it has several problems—would be the best evidence for supporting regulatory law choices. However, we also conclude that linking EDs to the outcomes observed in mammalian species, or other testing protocols, is necessary to buttress epidemiological results, both at the ultimate end point level and at intermediate end points leading to it. The latter is essential for formulating the mechanistic aspects of epidemiological causal models. We conclude that a properly designed multilevel BN can approximate what is needed to produce replicable causal analyses and thus correctly inform standard setting. The final choice of standard value is a political choice by duly empowered officials. It is therefore outside the purview of the analysts. This is, of course, has a parallel in what happens in judicial decisions: the assigned of culpability is not for the lawyers to make. It is the exclusive province of judges.
Footnotes
Notes
Acknowledgments
PFR thanks Louis Anthony (Tony) Cox jr and Huaxia Sheng for discussions and joint work that has led to some of the ideas contained in this paper, as well as 2 anonymous referees for their comments. A special debt of gratitude goes to Elena Fabbri for all her efforts in steering this special issue and review this paper: her contribution goes beyond collegial duty.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
