Abstract
Proportionality Analysis (PA) is usually perceived as applying a rationality-based formula to determine whether a legal act is (un)constitutional. However, behavioral economics suggests that decisionmakers—including judges—may be susceptible to various cognitive biases, which implies that PA might be similarly affected. Using a vignette experiment, we examine how different framings of legal cases influence PA judgments across three groups: administrative judges, law students, and non-law students. Results show that judges demonstrate minimal susceptibility to framing effects when conducting PA, suggesting that legal expertise and professional experience can provide significant protection against cognitive biases in judicial decision-making. These findings provide reassuring evidence for the rationality of PA as applied by professional judges, while demonstrating the debiasing impact of legal training and expertise. However, we also find that judges remain susceptible to other behavioral effects when making decisions that are unrelated to PA. We discuss the relevance of our findings for the current debate surrounding constitutional review, contrasting PA—used frequently around the globe—with the specific constitutional review process in the United States.
Introduction
Proportionality Analysis (PA) is a framework used by courts when reviewing clashes between rights and public interests. It embodies the old principle that one should not use a sledgehammer to crack a nut (Brems & Lavrysen, 2015). 1 Thus, if a public measure (e.g., a legal act or an administrative decision) infringes on a right, PA asks whether that infringement is proportional. Answering the question requires taking a set of analytical steps, such as placing weights on rights and applying an optimization formula to resolve clashes.
The logic underlying PA has had immense success in both international law and the national law of many countries (see Barak, 2012; Bulman-Pozen & Seifter, 2023; Collings & Barclay, 2022; Greene, 2018; Jackson, 2015; Popelier & Van De Heyning, 2013; Stone Sweet & Mathews, 2008). Scholars have, consequently, described PA as a “central feature of rights reasoning” (Klatt & Meister, 2012, p. 691), a “master concept of public law” (Mathews, 2017, p. 2), or even a “global constitutional principle” (cf. Kremnitzer et al., 2020, p. 9; Peters, 2017; Petersen, 2017, p. 6). 2 The United States is a notable exception: it does not formally engage in PA (Aleinikoff, 1986; Cohen-Eliya & Porat, 2011; Tebbe & Schwartzman, 2021) and relies more on categorical definitions and tiered scrutiny rather than explicit proportionality balancing (see, e.g., Greene, 2018). However, scholars have noted that even U.S. constitutional law engages in balancing of rights and interests, at least implicitly (Waldron, 2003). 3 This emphasizes the importance of understanding how judges engage in such practices, whether through formal PA or other balancing mechanisms.
The prevailing understanding of these practices, particularly PA, has been shaped by certain assumptions about judicial reasoning. Hitherto, PA has been largely seen as a process in which judges rationally apply formulas to determine the outcome of legal cases (see Alexy, 2017; Brems & Lavrysen, 2015; Möller, 2012; Popelier & Van De Heyning, 2013). But does PA truly lead to a more rational decision? From numerous experiments, we know that adjudicators, such as judges and arbitrators (Franck et al., 2016; Guthrie et al., 2001; Helm et al., 2016; Rachlinski & Wistrich, 2018; Spamann & Klöhn, 2016), might fall prey to various heuristics and biases. In other words, “judges are incurably human” (Frank, 1931, p. 24), just like everybody else (Posner, 1993).
However, there is only scant evidence on the role of such biases in PA and no direct evidence on the effects of well-established behavioral effects, such as framing or loss aversion. For example, Sulitzeanu-Kenan et al. (2016) ask Israeli legal experts to conduct a PA of targeted killing scenarios. They find that the evaluation is affected by the experts’ policy preferences and various dimensions of the case’s facts, but they do not test for specific biases and heuristics. Broude and Levy (2019) similarly use scenarios that arise in international humanitarian law to investigate how a specific bias (the “outcome bias”)—affects PA. 4 Their study contrasts laymen and experts, but does not include judges. Steiner et al. (2022) contrast different versions of PA (described in terms of either necessity or balancing) and find that a necessity test yields stronger protection of human rights compared to balancing. Their study, hence, also does not directly measure behavioral biases. Kantorowicz-Reznichenko et al. (2022) use a vignette study that describes a demonstration prohibited by the authorities. Varying the identity of the demonstrating organization, they elicit views about the legality of the prohibition using a simplified version of PA. 5 Their study focuses on detecting an ideological (and not a behavioral) bias and uses a lay sample. Our study, thus, differs from the all the existing studies in two main ways: (i) we test for behavioral biases in the form of framing effects and (ii) we contrast different subject groups with varying degrees of legal expertise—non-law students, law students, and judges. 6
Specifically, we are interested in whether legal expertise can work as a debiasing mechanism, enabling subjects to overcome the (potential) influence of framing effects. While many studies generally find cognitive biases also among expert adjudicators (outside the context of PA), some studies do find evidence of diminished bias in experts, including legal experts, compared to other populations (see, e.g., Broude & Levy, 2019; Mizrahi, 2018; Shereshevsky & Noah, 2017). 7 Our experiment tackles this issue directly, thereby contributing also to the literature comparing laymen and experts.
In our experiment, subjects conduct a full PA of different vignettes (legal cases) by eliciting answers to the traditional tests of proportionality, including whether the act in question meets the scope of a right (and if so, whether it is the core of the right); whether the purpose of the act is proper (or legitimate), whether the measures used are suitable and necessary, and whether the act adequately balances public interests and rights. Each vignette has two different versions (only one of which is presented to the subject), which only differ in their framing. The differences in the frame each correspond to a typical behavioral effect, mostly focusing on the role of loss aversion (we included one additional vignette unrelated to loss aversion; see Appendix B for details).
We find clear evidence of framing effects in PA in some (but not all) contexts, especially for non-law students. Namely, non-law students were more likely to classify an act as disproportional if (i) its goal was framed as achieving a gain (rather than averting a loss) and (ii) if the act was framed as converting a stochastic loss into a certain loss (rather than converting a stochastic gain into a certain gain). 8 Law students, however, were only affected by the latter: they were unaffected by the framing of the goal; but were equally affected as non-law students when the act was framed as converting a stochastic loss into a certain loss. Judges were only weakly affected by the framing in both of these vignettes. Specifically, framing had no effect on judges’ overall classification of the act as (dis)proportional; it only affected some of the interim stages of PA.
In order to test whether judges were able to avert the influence of framing due to their expertise, we presented them with standard problems from behavioral economics that focus on specific biases, such as overconfidence and conjunction fallacy. Interestingly, judges did demonstrate susceptibility to these biases, suggesting that they are not generally immune to behavioral biases. Rather, once one steps out of the domain of expertise, judges are still very much susceptible to standard behavioral effects. Overall, our findings suggest that framing effects do seem to matter in some contexts and that expertise may de-bias individuals in their professional context, but not in other contexts.
Importantly, we do not submit that PA is unhelpful for the process of legal reasoning. The need to structure one’s thought as a step-by-step analysis might even operate as a debiasing device, moving one’s thought process from System 1 (quick, emotional) to System 2 (slow, deliberate, rational). 9 Yet, biases may still arise at each step of the way, possibly with spillover effects from one stage to the other. Whether this is the case is the object of this study.
The article’s contribution is fourfold. Firstly, to our knowledge, we are the first to test whether framing affects PA. We do so using a unique subject group, consisting of administrative judges and students in Germany. The evidence suggest that professional judges conducting PA are largely protected from framing effects that significantly influence other decision-makers. This finding challenges some previous studies, which found that judges in the United States are susceptible to framing in a legal context (see Rachlinski & Wistrich, 2018, with eight such studies), but is otherwise in line with the aforementioned general literature that finds that expertise can mitigate bias. Second, we demonstrate a clear gradient where legal training progressively reduces susceptibility to framing effects, with judges showing minimal bias, law students showing intermediate effects, and non-law students showing more substantial bias. Third, we show that this protection is domain-specific—judges remain susceptible to biases outside their area of expertise. Fourth, we enrich the discussion on the rationality of PA, pointing out how additional biases may affect it more generally (notwithstanding the possibility that the discursive method underlying PA might lead to more rational decisions compared to other alternatives).
The remainder of the article is organized as follows: the second Part (“The Theory of Proportionality and its Applications”) provides an introduction to PA, with specific attention placed on its different elements. The third part (“Biases and Heuristics Relevant to Proportionality Analysis”) entails a brief overview of behavioral effects—biases and heuristics that have been identified by the literature on behavioral law and economics; particularly those that potentially apply to PA. The fourth part (“Hypotheses Development”) follows by connecting these behavioral effects to the distinct elements of PA and by developing hypotheses. The fifth part (“Experiment”) 5 is dedicated to our experiment, describing its design and results. The sixth part (“Discussion”) discusses the general importance of our findings and elaborates on how and why they are relevant to the United States. The seventh and last part concludes.
The Theory of Proportionality and its Applications
PA is a structured judicial methodology employed in constitutional and administrative review to evaluate whether governmental actions that restrict individual rights or freedoms are justified under the circumstances (Barak, 2012; Bendor & Sela, 2015). This analytical framework systematically examines whether state interventions achieve a proper balance between legitimate governmental objectives and the degree of rights infringement imposed. The analysis typically proceeds through several sequential stages, looking at the aim of the government’s actions, the connection between the aim and the means, the availability of alternative measures, and a (normative) cost-benefit calculation.
Many scholars view PA as a rationality-based procedure (Brems & Lavrysen, 2015; Popelier & Van De Heyning, 2013), close to the reasoning of a cost-benefit analysis. Each step is designed to introduce rational scrutiny, to ensure the measure that infringes on human rights is not arbitrary or excessive. This view is reflected the influential conceptualization of PA developed by Alexy (2000, 2003, 2014, 2017), which remains prominent in legal scholarship. However, critics argue that PA is not truly objective but rather masks moral and political decision-making behind formal legal reasoning (Greene, 2018; Webber, 2010). 10 Waldron (2003) has raised similar concerns about judges who implicitly balance rights and public interests in the U.S. context, though he does not address behavioral biases in judicial decision-making.
Although the precise application and the steps taken in PA differ across jurisdictions (see, e.g., Andenas & Zleptnig, 2006), we will follow a stylized version, omitting some of the fine-grained details that comparative legal scholars discuss. In particular, we consider a two-stage model of PA for constitutional review, similar to the one used, for instance, in Germany, Canada, South Africa, and Israel (see Barak, 2012; Bendor & Sela, 2015; Cohen-Eliya & Porat, 2010; Greene, 2018; Jackson, 2004; Petersen, 2020). This model is based on the distinction between the scope of the right in question and the extent of its protection (Barak, 2012; Bulman-Pozen & Seifter, 2023). Thus, judges first consider aspects pertaining to the petitioner’s right itself (do the petitioner’s claims demonstrate that a protected right is being interfered with? If so, is it an interference with the core of the right?). Provided that an interference with the right was found, PA has to be applied to determine whether the interference was justified.
PA is composed of four elements (or “prongs”): (i) a proper purpose (if a statute is under consideration), (ii) suitability (or rational connection), (iii) necessary means (or least intrusive means), and (iv) proportionality strictu sensu (balancing). 11 Suitability and necessity (the second and third prongs) focus on instrumental rationality and empirical concerns, whereas a proper purpose and balancing (the first and fourth prongs) are partially normative: they “express the requirement that principles be realized to the greatest possible extent given countervailing normative concerns” (Kumm, 2007, p. 137). We discuss each prong in further detail below.
Proper Purpose
Let us turn first to the proper purpose of statutes. Deciding whether a purpose is “proper” can be challenging, as it is a value-laden component (see Schlink, 2012). Nonetheless, in some cases, it is relatively easy to satisfy the threshold requirement of a proper purpose, especially if the constitution does not explicitly restrict aims or makes special requests (Kumm, 2007). For instance, Germany’s basic law does not have an explicit constitutional foundation for a “proper purpose”. The purpose has to be legitimate, but does not have to be compelling or enumerated. The German approach simply requests legality of an administrative act or general constitutionality of the purpose (Grimm, 2007; Schlink, 2012). Israel, 12 Canada, 13 and South Africa 14 have explicit foundations (see Barak, 2012). As an example from the international sphere, Article XX of the General Agreement on Tariffs and Trade and Article XIV of the General Agreement on Trade in Services of the World Trade Organization enumerate purposes allowed to be pursued by restrictive trade measures (Bartels, 2015), reflecting the idea of proper purpose as well.
Suitability and Necessity
The principle of suitability covers the question of whether the measure or the law is suitable to achieve its goal. The mode of reasoning is a ‘means-end’ relationship (see, e.g., Kretzmer, 2013). Alexy (2000) uses the following formalization: suppose that a measure M interferes with principle P1 in order to (supposedly) promote principle P2. If M does not actually promote P2, then omitting it improves on P1 at no cost. Hence, this step reflects of the idea of Pareto-optimality (Alexy, 2014; Chang & Dai, 2021; Petersen, 2020; van Aaken, 2003): omitting M improved one position without detriment to the other.
Yet, as the measure applied by a sensible legislator will usually promote their aim at least to some degree, there is a next step—that of “necessity”. The necessity test covers whether there are other, less intrusive means, which are equally able to achieve the stated goal of the measure. 15 This principle requires that if there are two measures, M1 and M2, which promote P1 equally, then the less intrusive measure (i.e., the measure detracting least from P2) is chosen. Again, this closely reflects Pareto-optimality (Alexy, 2000): if there exists a less intrusive measure that is equally suitable for achieving its goal, then switching to it improves on P2 without any cost. To avoid over-reliance on intuitions, both the suitability and necessity tests would, in theory, require an empirical analysis to determine the relevant probabilities, benefits, and costs of each principle under each of the possible measures (see, e.g., Greene, 2018).
Proportionality Strictu Sensu or Balancing
The third principle, proportionality strictu sensu (balancing), 16 captures a balance between the satisfaction of one principle and the detriment to another. To capture this balance, Alexy (2003) developed a formula that combines several elements: the weight of the colliding principle in the abstract, the weight of these principles in the concrete case, and the reliability of empirical assumptions. 17 This forumla establishes scales on the intensity of interference with rights (“light”, “moderate”, “serious”) and the importance of the right (e.g., the right to life has a more abstract weight than freedom of expression, though the concrete weight has to be determined on a case-by-case basis). These scales are supposed to be justified by giving reasons in order to be contestable. Alexy’s formula assigns numerical values to each of the elements (Klatt & Meister, 2012), 18 which should then enable to decision maker to determine whether there is “significant disproportionality between the marginal benefit to the government and the marginal cost to the rights bearer” (Greene, 2018, p. 59). 19
Differences Across Jurisdictions
While all jurisdictions that apply PA go through somewhat similar steps, there are some differences in the fine-grained details (Greene, 2018; Grimm, 2007; Kremnitzer et al., 2020; Petersen, 2017), some of which are more important than others. For instance, Canada considers whether the purpose of the act is of a “pressing and substantial concern”, whereas Germany only requires a “legitimate purpose” (Grimm, 2007, p. 388). However, Germany then considers whether the purpose is sufficiently important as part of proportionality strictu sensu, such that the final outcome of whether the act is struck down is still quite similar in both countries. 20 Thus, although each country takes into consideration different factors in a different order, the relevant factors can arguably be found somewhere along the prongs of PA, leading often to de-facto identical decisions. 21 In our experiment, we focus attention on the subjects’ stated overall conclusion of whether the law is proportional but also look at subjects’ responses to the questions about the steps (prongs) that preceed the overall conclusion, thus making those fine-grained insights relevant for all jurisdictions, no matter how they use the prongs.
Furthermore, we submit that our study of PA with German judges offers relevant insights for American constitutional jurisprudence. At first glance, there are some methodological differences between U.S. judicial review and PA (see generally Cohen-Eliya & Porat, 2010, 2021; Greene, 2018; Lord, 2023). 22 In particular, the American approach, often characterized by Dworkin’s idea of ‘rights as trumps,’ is sometimes seen as categorical—suggesting that litigants either possess rights that override governmental action or they do not (Greene, 2018, p. 65; Lord, 2023, pp. 13–14). American constitutional review also uses a seemingly different process, applying levels of scrutiny (see, e.g., Bulman-Pozen & Seifter, 2023): (i) “rational basis” (for non-fundamental, non-suspect or quasi-suspect rights) that examines only if laws rationally relate to their aims; (ii) “intermediate scrutiny” that requires important interests and substantial means-ends relationships (e.g. Klein, 1984); 23 and (iii) “strict scrutiny” (for fundamental rights) that demands compelling state interests with narrowly tailored implementation. 24
Nevertheless, the American approach is not fundamentally different from PA. Like PA, American constitutional analysis often requires identification of a governmental purpose (e.g., an important interest under intermediate scrutiny or a compelling interesting under strict scrutiny) and an ends-means analysis (e.g., “narrow tailoring” under strict scrutiny”, or a rational relationship between the challenged law and a legitimate government interest under rational-basis review). We submit that much of our discussion on framing is relevant to U.S. constitutional review as well, including the formulation of important interests.
More generally, the discussion of PA connects to current U.S. legal debate through three main points. First, prominent legal scholars have advocated for the Supreme Court to adopt PA as an alternative to its current approach, with notable articles in the Harvard Law Review (Greene, 2018), Columbia Law Review (Bulman-Pozen & Seifter, 2023), and the Yale Law Journal (Jackson, 2015) all supporting this shift. Critics of the Dobbs 25 decision have also suggested PA could have yielded different outcomes.
Second, some scholars argue that proportionality is already implicitly used by some Supreme Court justices (Cohen-Eliya & Porat, 2011) and increasingly applied in other contexts, including content moderation by social media platforms (Douek, 2021). More broadly, as the idea of proportionality is, to some extent, already recognized in U.S. constitutional law (Jackson, 2004), 26 judges may employ elements of proportionality reasoning, 27 making our findings applicable even without formal adoption of PA. Third, the United States is subject to international law and may be a party to litigation taking place in tribunals that implement PA explicitly. Therefore, our findings are informative for understanding whether heuristics and biases are likely to emerge within such litigation.
Biases and Heuristics Relevant to Proportionality Analysis
Legal institutions have built on particular assumptions of what may be termed ‘perfect’ rationality, roughly conforming to Rational Choice theory (see, e.g., Chapman, 1994; Jolls et al., 1998), which assumes the rationality of adjudicators and other actors applying the law (implicitly or explicitly). The rationality assumption has been called into question by cognitive psychologists and behavioral economists, such as Daniel Kahneman, Amos Tversky, and Gerd Gigerenzer. 28 These scholars explored systematic biases and heuristics—“mental shortcuts” or “rules of thumb” used in decision-making—that counter the rationality assumption, searching for a more realistic model of human behavior. Subsequently, the value and validity of applying a rationalist theory to legal questions has also been questioned by a movement often referred to as “behavioral law and economics” (see generally Zamir & Teichman, 2018). Behavioral law and economics focuses on systematic divergences from perfect rationality mainly using experimental studies in the lab under controlled conditions. Yet PA itself has largely been untouched by the behavioral analysis of law.
Kahneman and Tversky mainly dealt with facts and elementary logic, demonstrating that heuristics sometimes lead to errors. Kahneman (2011) differentiates between a fast and a slow system of human decision-making (so-called “dual system” theory). 29 The first system (System 1) is intuition, and the second (System 2) is logical thinking and reasoning. Intuitive decisions occur quickly, automatically, simultaneously, and without effort; they are associative and emotional. This system is prone to cognitive errors (see, e.g., Morewedge & Kahneman, 2010). The second system is, by contrast: slow, controlled, rule-governed, flexible, and non-emotional. It requires effort. Human beings often switch between these two systems when they have reason to do so; for instance, when they become aware of earlier failures of their own doing (Kahneman, 2011; 2013). It has been mainly tested by the Cognitive Reflection Test (“CRT”; Frederick, 2005), 30 which we also use in our experiment as a measure of individual proneness for falling prey to (System 1’s) intuitions. In the following paragraphs, we briefly explain the biases that might influence PA, with a special view to the type of subjects participating in our experiment.
In standard rational-choice models (“expected utility” concept), as in the weighing formula of Alexy in PA, the utility of each possible outcome is weighted by its (objective) probability. However, many experiments reveal that individuals may deviate from their expected utility, for instance, because their utility function is ‘rank-dependent’, that is, people assign different weights to different outcomes, which may diverge from the simple objective probabilities (for an overview, see Diecidue & Wakker, 2001). The most prominent version of such a function is so-called “Prospect theory” (Tversky & Kahneman, 1979; 1992).
In a nutshell, Prospect Theory identifies three key effects: (i) reference-dependent utility, (ii) loss aversion, and (iii) diminishing sensitivity. Let us briefly explain each in turn. The first effect implies that individuals do not think of payoffs in absolute terms, but rather compare them to a baseline—a “reference point”. Payoffs that are above the reference point are perceived as a gain, whereas those below the reference point are perceived as a loss (Kőszegi & Rabin, 2006; 2007; for a summary, see Feess & Sarel, 2022; Sarel, 2022). For example, an employee who was expecting a large bonus at the end of the year but received a small bonus gets a positive sum of money, but may still perceive it is a loss because it is less than expected.
The second effect, loss aversion, relates to how people weigh losses vs. gains: Prospect Theory predicts that individuals care more about incurring a loss than an equally sized gain. In other words, the increase in utility from a 100 Euro gain is less than the increase in utility when averting a 100 Euro loss. Most studies suggest that losses are approximately twice as powerful, psychologically, as gains (see Brown et al., 2024).
The third effect, diminishing sensitivity, describes how individuals become less sensitive to changes in value as the magnitude increases. This means that the psychological impact of an additional dollar gained or lost decreases as the total amount grows larger (in absolute terms). Prospect theory illustrates this with an S-shaped utility function that is concave in the domain of gains and convex in the domain of losses—changes close to the reference point have a large marginal effect on utility (as the slope is steeper), whereas changes far away from the reference point have a much smaller marginal effect. The most important implication of this effect is a divergence in risk attitudes across domains: individuals are risk-averse in the domain of gains but risk-seeking in the domain of losses (see, e.g., Sarel, 2022; Shefrin & Statman, 2003). 31
Consequently, when the situation involves losses (compared to the reference point) individuals are more willing to take risky gambles and potentially incur large losses (to which they are marginally less sensitive), as long as they do not have to incur a loss with certainty. 32
Prospect theory is closely related to framing (see, e.g., Rachlinski & Wistrich, 2018), as the same situation can be either framed as a gain or as a loss, subsequently affecting how people perceive the payoffs from their decisions. Unlike Rational Choice Theory, which assumes description-invariance (that is, equivalent formulations of a choice problem should give rise to the same preference order; see (Arrow, 1982), framing effects imply that logically-equivalent presentations of a circumstance might nonetheless lead individuals to different choices. For instance, decisions may vary depending on whether circumstances are presented as positive or negative. Decisions about medical interventions are a typical example: A standard (rational choice) model would predict that patients would choose the most secure therapeutic method independent of how the choice is presented to them (for example, as death rates or survival rates) but framing effects imply that if a relatively safe therapeutic method is presented to a patient in terms of death rates (that is, potential loss), and an unsafe method is presented in terms of survival rate (that is, potential profit), then patients might suddenly prefer the unsafe over the safe method. 33 Interestingly, framing affects not only patients and their relatives, but also medical staff (i.e., experts; Druckman, 2004). 34 Thus, even expert decisions can be considerably influenced by factors that determine the manner in which a problem is presented, including the law (Rachlinski & Wistrich, 2018). In our context, this implies that adjudicators may reach different rulings, depending on whether different elements of the case are presented negatively or positively, as this can affect whether they perceive the legal outcome of the case as a gain or as a loss. These perceptions may be related either to the judge’s own ‘payoffs’ from adjudicating (for example, judges may perceive being overturned on appeal as a loss, causing them to be more careful from making errors) 35 or to the payoffs of the litigating parties (which the judge may care about). The important point is that framing can distort a rational decision, working via System 1.
There are many biases documented in the behavioral economics literature that could potentially affect PA, including the availability heuristic, representativeness heuristic, certainty effect, anchoring effects (Tversky & Kahneman, 1973, 1974, 1981), 36 and the hindsight bias (Guthrie, 2006, p. 432). 37 However, some biases seem especially likely to arise in PA due to its sequential multi-prong structure. For instance, A judge who decides that a petitioner’s claim resides within the scope of a right takes a sort of mental step in the direction of declaring the act as disproportional. The judge may then proceed down that path (irreseptive of the facts) due to a either a psychological need for maximal coherence 38 or a psychological need for consistency (for a meta-analysis, see Mullen & Monin, 2016). The judge might also perceive a deviation away from the direction of the initial step as a loss, echoing the “endowment effect” (Kahneman et al., 1990; Knetsch, 1989). 39 However, our experiment largely focuses on framing effects related to prospect theory, which we examine in detail in the following section.
Hypotheses Development
Behavioral Effects in Proportionality Analysis
If all human beings succumb to biases and heuristics, including—as hitherto found in research—judges and arbitrators, it is interesting to explore whether these also affect PA. 40 While judges may derive some direct utility from their PA decisions, the decision primarily affects the parties to the dispute. It is not directly obvious to which degree judges internalize the impact of their decisions on the parties: do judges care whether PA imposes a loss on the petitioner or on the defendant? There is some experimental evidence suggesting that biases such as the endowment effect may be weaker when decisions are taken by agents on behalf of third parties. 41 When judges conduct a PA, their decision is not precisely “on behalf” of the parties, but the same mechanism that weakens the bias may nonetheless arise.
In previous studies on judges and arbitrators (not in the context of PA), some biases and heuristics were indeed found to be present (Franck et al., 2016; Guthrie et al., 2001; Helm et al., 2016; Rachlinski & Wistrich, 2018; Spamann & Klöhn, 2016). Subjects were specifically found to be at least somewhat susceptible to framing effects: one study found that judges reached different decisions (in an analysis unrelated to PA), depending on whether the defendant’s payoffs in a civil case were framed as a gain or a loss (Guthrie et al., 2001), but were less susceptible to these effects compared to other decision-makers (experts and laypeople). Another study found evidence of framing effects among U.S. judges in eight experiments on civil-dispute settings (product liability, contracts, bankruptcy, and others; Rachlinski & Wistrich, 2018). 42
Yet it is possible that a framing effect only arises for some judges and not others, due to differences in institutional, cultural, or personal attributes. For instance, unlike U.S. judges, the German administrative judges who participated in our experiment have been educated in the civil-law tradition and are used to operating under the constraints of the German court system. This might entail various biasing or debiasing mechanisms that are absent in the U.S. 43 As one example, a civil-law judge who is more used to methodically applying codified rules might react differently to how a case is framed compared to a common-law judge who is more used to applying case law. 44
Whether PA is more or less susceptible to behavioral biases is far from straightforward. On the one hand, the need to structure one’s thought in steps might operate as a debiasing device, forcing the decisionmaker to exert more cognitive effort and potentially switch from System 1 to System 2. On the other hand, each step may be subject to biases, which may even spillover from one step to another.
For instance, consider the issue of uncertainty in PA. Following Alexy, all steps of PA (except the proper purpose) are based on, or partially influenced by, factual uncertainty. Inter alia, uncertainty in PA translates into probabilities, for instance, the probability that the means chosen will achieve the purpose of the act (as part of suitability ) or the probability that a less intrusive measure will be as effective (as part of necessity). Furthermore, in proportionality strictu sensu, all weights are multiplied with probabilities.
When dealing with probabilities as part of PA, several heuristics and biases discussed above may play a role. First, when the legislator claims that P2 will be achieved with some probability of X%, this may already anchor the adjudicator to that X. Second, judges may misestimate or misperceive probabilities due to many of the aforementioned biases (availability heuristic, representativeness heuristic, and so on). 45 Third, judges may respond to how the facts or legal questions are framed. Henceforth, we will focus on the latter, specifically on how framing connects with the effects identified by Prospect Theory (reference-dependence, loss aversion, and diminishing sensitivity).
Framing Uncertainty in PA: Reference Point and Diminishing Sensitivity
A legal case can be framed as involving either losses or gains, depending on which reference point it induces, and thereby influence whether the judge is more or less sensitive to the consequences of striking down a legal act. Specifically, judges would prefer a riskless prospect to a risky prospect of equal expected value in a gain frame, but prefer the opposite in a loss frame.
To understand how this applies to PA, consider that legal measures often involve uncertain outcomes—what we call “stochastic” effects, meaning the results involve some degree of randomness or probability rather than certainty. For instance, a security measure might reduce terrorist attacks by some unknown amount, or an environmental regulation might prevent an uncertain number of health problems. The question becomes: how do decision makers evaluate policies that convert these uncertain (stochastic) outcomes into more certain ones? We hypothesize that:
Decision makers tend to see an act as more proportional if it is framed as converting a stochastic gain into a certain gain rather than converting a stochastic loss into a certain loss, ceteris paribus.
In other words, when a policy is framed as securing uncertain benefits (like “this measure might prevent some attacks”), people are risk-averse and appreciate measures that make those benefits more certain. But when the same policy is framed in terms of uncertain losses (like “without this measure, some attacks might occur”), people become more accepting of that uncertainty—they are less motivated to eliminate the risk through government intervention. This difference in risk tolerance across gain and loss frames would cause decision makers to view the same policy as more or less proportional depending solely on how it is presented.
Framing the Legal Act’s Purpose: Reference Point and Loss Aversion
Whether a purpose of the challenged measure is perceived as “proper” may also depend on how it is framed. The formulation of governmental purposes can vary along several dimensions that may influence their perceived legitimacy and weight. Purposes can be presented abstractly (such as “environmental protection”) or concretely (such as “preventing 10,000 premature deaths annually”). They can focus on collective benefits or highlight the plurality of individuals who would gain from the measure. Additionally, purposes framed as addressing highly salient concerns like national security may naturally receive more weight from decision-makers.
Most relevant to our study is how purposes can be framed in terms of gains versus losses. The same governmental objective can be stated either as avoiding a negative outcome (highlighting what would be lost in the absence of the measure) or as securing a positive benefit (highlighting what would be gained from the measure). For instance, an anti-terrorism measure can be framed either as “averting losses from terrorist attacks” (loss frame) or as “promoting a more secure society” (gain frame). Similarly, a public health intervention can be presented as “avoiding the dangers of poor health and death” (loss frame) or as “promoting healthy lifestyles” (gain frame). Such framings can potentially affect the judges’ reference point, which again determines whether the outcomes are perceived as a loss or a gain.
Given people’s tendency toward loss aversion—caring more about preventing losses than achieving equivalent gains—we expect that purposes framed as preventing negative outcomes will seem more compelling and legitimate than those framed as achieving positive outcomes. We therefore hypothesize:
Decision-makers tend to see an act as more proportional if it is framed such that its purpose is to prevent a loss rather than to achieve a gain, ceteris paribus.
Individual Susceptibility to Framing: the Effect of Legal Training
In addition to the general influences of behavioral effects, our study also focuses on the role of experience and expertise. From an economic standpoint, specialization entails the exploitation of comparative advantages: those who become legal experts can use their time more effectively and incur lower effort costs while working on legal questions (see, e.g., Rachlinski et al., 2007). The traditional perspective in common law views legal expertise more as a superior ability to conduct analogical reasoning by “recognizing a similarity between the facts of some previous case and the facts of the instant case” (Schauer & Spellman, 2017, p. 249). Thus, the idea is that lawyers and judges are able to engage in analogical reasoning that differs from those of laymen. Schauer and Spellman (2017, p. 261) argue, however, that lawyers are not experts in analogical reasoning as such, but rather that they possess an ability to “see analogies that others do not and […] see structural and relational similarities (and differences) when others see only surface similarities and differences.” In other words, the expertise of lawyers lies in their ability to retrieve relevant sources of comparison within the legal domain and identify similarities (Schauer & Spellman, 2017, p. 263). This ability is generated through “immersion in legal categories – through study or practice or both” (Schauer & Spellman, 2017, p. 264).
But how does such legal expertise affect the susceptibility to cognitive biases? Does the ability to identify similarities mean that legal experts would be able to distinguish the facts of the case from how they are framed? Given the aforementioned findings in the literature that legal experts might be less susceptible to framing effects, we hypothesize that:
Legal expertise mitigates the framing effects.
Specifically, we expect that framing effects will be strongest among non-law students, weaker among law students, and even weaker (or non-existent) among judges. This reflects the notion that expertise, knowledge, and experience should help avoid the effects of biases due to framing. 46 To be clear, we do not claim that this reflects a cardinal scale—the difference in expertise between law students and judges may be drastically larger than the difference between law students and non-law students. We only assume that there is an ordinal ranking of expertise, such that judges are more experts than law students, who are more experts than non-law students.
Experiment
We designed an online experiment to test our four hypotheses. Sub-part 5.1. describes the experimental design. Sub-part 5.2. outlines our procedures. Sub-part 5.3. presents descriptive statistics and our findings.
Experimental Design
Our experiment presents subjects with three vignettes, each containing a brief legal case, followed by a series of questions that capture the various prongs of PA. We developed two versions for each vignette that differ only in framing, each tailored to test (at least) one of the possible biases and heuristics reviewed above. As subjects only see one version of each vignette, we can attribute any differences in the decisions to a framing effect. Furthermore, all three vignettes are purposefully built around novel problems (mostly dealing with new technologies) to avoid a situation where a subject’s familiarity with some existing cases would potentially affect their decision.
While the choice to use novel problems may somewhat detract from the vignette’s realism (as subjects have not faced such scenarios before), we submit that this is irrelevant for our study, for multiple reasons. First, the exact degree of realism is generally irrelevant because it is completely orthogonal to our framing treatments (for a discussion of how orthogonality in vignettes ensures validity, see Su & Steiner, 2020). In other words, as the degree of realism remains fixed across our treatments—which differ only in framing—it does not matter how realistic the underlying scenario is. Second, many of the existing experiments on framing include scenarios that lack realism, 47 and our vignettes do not raise any unusual difficulties in that regard. 48 Third, what may seem implausible in other countries, may not be implausible in Germany or Europe. 49 The vignettes are presented to subjects in a randomized order, so that we can rule out any interference of ‘order effects’, but for ease of presentation, we will provide numbers for the vignettes.
The vignettes are all written in German, but a full-text English translation is provided in our supplemental materials.
Vignette 1: Cryptocurrencies. Our first vignette tests our H1 (recall: this hypothesis deals with diminishing sensitivity and risk attitudes the framing of consequences of the disputed law as a either certain loss or a certain gain). Subjects are presented with a scenario where a fictitious state requires individuals who trade in cryptocurrencies to acquire a “laymen’s certificate” as a pre-condition for selling. The claim of the state in this scenario is that this certificate helps sellers to avoid scams that otherwise would cause some people to lose their money. Our treatment is then in the framing of the state of the world with this act versus without the act (that is, if it is nullified by the court). The treatment closely follows the approach of Tversky and Kahneman (1981) in their seminal article on the framing of decisions (see also Rachlinski & Wistrich, 2018). In their original article, subjects had to choose between two options that yield the same expected outcome, which is framed either as a gain (lives saved) or as a loss (deaths), where one option is stochastic whereas the other entails certainty. Following the same logic, our vignettes say that there are 600,000 market participants, and differ only in the description of the two options faced by the subjects. For instance, the gain frame says that: “(a) Without the act that requires a laymen certificate, it is estimated that with 1/3 probability no market participant will keep their money, and with 2/3 probability all of the market participants will keep their money. (b) With the act that requires a laymen certificate, 400,000 market participants will keep their money.”
Whereas the loss frame says that: “(a) Without the act that requires a laymen certificate, it is estimated that with 1/3 probability all of the market participants will lose their money, and with 2/3 probability none of the market participants will lose their money. (b) With the act that requires a laymen certificate, 200,000 market participants will lose their money.”
Note that the only difference is in the use of “keep” vs. “lose”, but numerically both versions are identical. Following H1, we expect that in the loss-frame treatment, subjects will tend to avoid option (b) because it refers to a certain loss, and prefer the stochastic option (a). Respectively, the opposite should occur in the gain-frame treatment.
Vignette 2: Bees. Our second vignette tests our H2 (recall: this hypothesis deals with the framing of the purpose of the act as preventing a loss or achieving a gain). Subjects are presented with a scenario where a fictitious state mandates the use of a special spray in gardens of private residences (house or apartment). In the gain frame treatment, the purpose of the act is framed as aiming to “protect the life of bees and thus lead to a healthy and sustainable environment”,
whereas the loss frame treatment says instead that the purpose of the act is to “prevent the death of bees and thus avoid an unhealthy and unsustainable environment”.
Thus, the only difference is in whether the purpose of the law is framed as acquiring a gain or preventing a loss. Following our H2, we expect that subjects will be more likely to classify the act as proportional in the loss frame treatment.
To avoid confusion, note the following key difference between Vignette 1 and Vignette 2: in the first vignette, the loss-frame treatment refers to a legal act that may induce a certain loss (instead of a stochastic one) whereas in the second vignette, the loss-frame treatment refers to a legal act that may avert a loss. That is why we anticipate judges in a loss-frame treatment to be more supportive of the legal act in Vignette 2 but less supportive in Vignette 1.
PA prongs. To facilitate a full-blown PA, we elicit subjects’ agreement (on a 7-point likert scale) with the following statements:
50
1. The act meets the scope of the right.
51
2. The act meets the core of the right. 3. The act promotes a proper purpose. 4. The act is suitable to achieve its purpose. 5. The act is necessary to achieve its purpose.
52
6. The act is proportional in the strict sense.
53
7. The act is overall proportional.
Our main variable of interest is the seventh item—an overall judgment of proportionality. However, we also measure the other prongs to get a better understanding of the mechanisms leading subjects to their judgment. Additional measurements are described below.
Procedures
Our experiment was conducted separately on two samples: students (law/non-law) and judges. For the student sample, we imposed a minimum sample size requirement to ensure sufficient statistical power. For the judges, we simply ran the experiment on all those who participated. Our supplemental materials entail a brief power analysis, showing that our sample is of sufficient size to detect even fairly small effects.
We ran the experiment first with the student sample, which included law students and non-law students. The experiment was conducted online during the months of September-October 2021 using the standard software Qualtrics. 54 Subjects were invited to participate directly through the social science lab at the University of Hamburg, restricting the invitation to students only, and imposing a comparable group size of law students and non-law students. We asked subjects to complete the survey in one go and most answers (over 90%) were completed within 1 h from the beginning of the survey. 55 Subjects were paid a fixed fee of 9 EUR for their participation. After completing a consent form, subjects proceeded to answer the three vignettes (our vignettes 1 and 2, and the additional vignette described in Appendix B, in a randomized order, as mentioned). Thereafter, subjects were asked to complete the aforementioned CRT 56 (in which the items were again presented in random order) and then filled out a brief questionnaire. The questionnaire measured variables such as basic demographics (for example, gender, age-cohort, mother tongue, education; see our supplemental materials for full details), whether the subject had previous acquaintance with behavioral economics, and a question about the general tendency to take risks in order to measure risk aversion on a 10-point scale (Dohmen et al., 2010). In total, 110 law and 121 non law students participated.
Thereafter, we ran the experiment with German administrative judges. Procedures were similar to those used for the student sample, with a few exceptions: First, judges were invited to participate through the President of the highest administrative court of the state of Lower Saxony (Niedersachsen), in conjunction with an invitation to a judicial workshop, which took place in June 2022. All judges were from the same federal state within Germany: Lower Saxony. Second, for obvious reasons, judges did not receive any payment for their answers. Third, judges were asked a few additional questions related to their experience on the bench and several questions that tested whether they fall prey to biases outside of their domain of expertise. We describe these additional questions in more detail in the sub-part on “Are Judges Susceptible to Behavioral Effects Outside Their Domain of Expertise?” below. We also introduced minor changes for practical reasons. 57 Importantly, all of the additional questions were asked only after the vignette study was completed, so they could not confound our results. Answers from judges were gathered in the months leading up to the judicial workshop (February – April 2022), yielding 86 valid responses. 58
Descriptive Statistics and Findings
Comparison of Descriptive Statistics
Note. This table compares the descriptive statistics between the three subject groups. We used the following statistical tests: Pearson’s
Proportionality Assessment
Figure 1 compares the means of subjects’ choices for Overall Proportionality in the two main vignettes across the subject groups (for our additional vignette, see Appendix B). Table 2 complements the relevant information for this comparison, presenting a statistical comparison of all seven PA items.
60
This comparison yields several insights. Comparison of Mean Agreement Rates with Overall Proportionality. Note. This figure compares the mean rate of agreement with the statement that the act is overall proportional. The dark gray bars correspond to the loss frames, whereas the light gray bars correspond to the gain frames. The lines on top of the bars represent 95% confidence intervals Descriptive Statistics of the Different Prongs of PA Note. This table compares the agreements rates with the seven PA prongs. For each prong, the median (IQR) is presented for each of the treatments, separated by group. Within each group, a between-treatment comparison is conducted using a Wilcoxon rank-sum test, yielding the p-value listed in the table. Note that a higher IQR corresponds to the group whose values are (significantly or insignificantly, depending on the p-value) higher.
First, in the crypto vignette, both law and non-law students express a higher agreement rate with the statement that the act is overall proportional in the gain-frame treatment (light gray bars) compared to loss-frame treatment (dark gray bars). This difference is statistically significant at the 1% level in the non-law group and 5% level in the law group. 61 Judges demonstrate the opposite trend, but the difference becomes insignificant once control variables are accounted for (see below). Thus, for the students, but not the judges, the behavior is in line with our H1: the loss frame induces subjects to see the act as less proportional when it is framed as converting a stochastic loss to a certain loss.
Second, in the bees vignette, we find stark differences: we find a framing effect for non-law students, but no effect for law students or judges. Hence, our H2 seems to hold for non-law students, but not for law students and judges. The direction of the effect on the non-law students is as expected: these subjects tend to view the act as more proportional when its purpose is framed as preventing a loss. Turning to Table 2 reveals that, interestingly, the framing affected not only the proper purpose prong but almost all prongs: in the loss-frame treatment, non-law subjects were also more likely to view the act as meeting the scope of the right, and as unsuitable, unnecessary, and unbalanced. Furthermore, some of the judges’ prongs seem to be affected—those referring to the scope of the right and suitability (in the same direction as non-law students) and the prong concerning the core of the right (for which non-law students were unaffected).
As a robustness check, we run also linear (OLS) regressions, which enable us to compare the groups while also controlling for underlying differences between the subjects (for example, their previous knowledge of behavioral economics). Our regression model is then
Effect of Framing (Average Marginal Effects)
Note. This table presents (average) marginal effects of the framing treatment on each of the three subject groups: non-law students, law students, and judges. Robust standard errors are in parentheses. OLS coefficients can be found in Table A2 in Appendix A. Control variables are: Female dummy, age, German mother tongue, Knowledge of Behav. Econ., CRT: Correct answers, and order effects. *p < .1; **p < .05; ***p < .01.
Table 3 suggests that our findings are robust to the inclusion of controls (that is, even after we control for differences between subjects, the results persist). Starting from the crypto vignette (column 1), the effect (of being assigned to the gain frame) is positively significant for both non-law (p < .01) and law (p < .05) students, but insignificant for judges. Panel B complements the information by showing that judges indeed differ from the students (as they are unaffected whereas students are affected) but also that law and non-law students are equally affected (the difference of 0.077 is insignificant). Next, in the bees vignette (column 2), the effect on non-law students is negatively significant (p < .001) but the effect on judges and law students is insignificant. Summing up, our OLS regressions reinforce all of the aforementioned findings.
Are Judges Susceptible to Behavioral Effects Outside Their Domain of Expertise?
As mentioned, we also set out to measure whether the German administrative judges—who are experts on PA—fall prey to biases outside their domain of expertise. This measurement included three tasks, all of which were presented after all vignettes have been completed. Thus, these measurements could not confound our results.
The first task was the CRT, which we also presented to the other subjects in our sample. Recall from Table 1 that judges answered approximately 2 (out of 3) questions of the CRT correctly, which means they are at least somewhat susceptible to a behavioral bias (namely, they choose, on average, one answer that is incorrect). The second question relates to overconfidence in judicial decision-making outside the experiment and, therefore, could be presented to judges but not to the other subjects. The behavioral economics literature reveals that people are generally prone to overoptimism and overconfidence (Guthrie et al., 2001). In particular, people tend to believe that they are relatively better than others (a “better than average” effect; Alicke, 2005). The potential for overconfidence can have important implications for judges, as it could “prevent judges from maintaining an awareness of their limitations . . . [and] may make it hard for judges to recognize that they can and do make mistakes” (Guthrie et al., 2001, p. 811).
To measure overconfidence, we asked the judges to rate their relative performance across four judicial parameters: the rate of being overturned (OCapprate), their ability to assess witnesses (OCwitness), their procedural efficiency (OCefficiency), and the degree of justice that their decisions yield (OCjustice). Specifically, we asked them to place themselves in one of four quartiles: (1) the top 25%, (2) the second quartile (25%–50%), (3) the third quartile (50%–75%), or (4) the bottom 25%. The answers given by the judges are depicted in Figure 2. The figure shows that most judges placed their ability in the second quartile and over 13% placed themselves in the highest quartile (exact numbers are provided in Table A3 in Appendix A). Of course, by definition, only 50% of judges can actually belong in the two upper quartiles. Hence, this points at overconfidence. Overconfidence Among Judges
To provide more context as to the degree of overconfidence, Tables A4 and A5 in Appendix A compare the judges in our experiment with other comparable subjects in the existing literature, such as international arbitrators and U.S. administrative law judges, in terms of performance in the CRT and overconfidence measures. These tables reveal that the overconfidence trend among the German judges seems overall similar to previous experiments, where the only conspicuous difference is that German judges tend to classify themselves as belonging to the second quartile (and less so to the first quartile) more often. Thus, although they are overconfident overall, they are more modest than international arbitrators and U.S. administrative law judges.
Lastly, we presented judges with a classical problem in behavioral economics—the “Linda problem” (Tversky & Kahneman, 1983). In a nutshell, in the Linda problem, subjects are presented with a brief description of a female student, who was active in social goals during her study and was also outspoken and very bright. 64 The subject then needs to evaluate what is more likely that this person will be in the future. Among the options is one case that “sounds” correct because it is representative (for example, the woman will be a bank teller that is also active in the feminist movement) but actually captures a subset of another statement (for example, “Linda works at the bank and does activity X” is a subset of “Linda works at the bank”). In our experiment, subjects received eight statements about the woman and were asked to rank them from most likely to least likely. 65 Subjects who mark the special case as more likely than the general case in this task fall prey to a so-called “conjunction fallacy” (mistakenly thinking that the co-occurrence of two events is more likely than the occurrence of only one of them). Such a fallacy occurs because of a mental shortcut known as the “representativeness heuristic”—a tendency to make decisions based on prototype or stereotype that seem representative, while ignoring statistics.
The judges in our experiment exhibited the conjunction fallacy to a great extent, with 79.2% falling prey to it. Again, to put this in context, one study found that 92% of arbitrators fall prey to the conjunction fallacy (Helm et al., 2016). However, another (earlier) study found that less than 42% of German students demonstrated the fallacy (Fiedler, 1988). In comparison, judges in our experiment demonstrate a moderate susceptibility to the conjunction fallacy. Nonetheless, the key insight is that we do find evidence of susceptibility to overconfidence and the conjunction fallacy outside of the judges’ domain of expertise.
Discussion
General
In this study, we set out to check whether PA, which is usually considered a rationality-based process by its proponents (Alexy, 2017; Brems & Lavrysen, 2015; Möller, 2012; Popelier & Van De Heyning, 2013), is influenced by the biases and heuristics established in the behavioral economics literature. Our experiment focused on framing effects, where we let subjects conduct a PA in three vignettes (the two main vignettes and the additional vignette described in Appendix B) in which we manipulated the framing of the case. We then compared the answers of our three groups of subjects: non-law students, law students, and judges.
The findings overall support the conjecture that framing effects can play a role in PA. In particular, we find strong evidence of a framing effect when the act was framed as inducing a certain loss rather than as a certain gain. This effect was prevalent both for those with some degree of legal expertise (law students) and laymen (non-law students), but not for judges. Yet, in our bees vignette, neither judges nor law students were influenced by the framing, whereas the non-law students were. This seems to suggest that legal expertise can mitigate (or even eliminate) the framing effects in some contexts. The diminished framing effects among those with legal expertise is consistent with the findings of previous experiments that measured framing in other contexts (that is, not on PA; see, e.g., Broude & Levy, 2019; Mizrahi, 2018; Shereshevsky & Noah, 2017). Thus, we show that this insight extends to PA as well, which is especially interesting since one of the criticisms of PA has been that it opens a gate to subjective evaluations of the decision-maker (see, e.g., Gunn, 2005; Kaplow, 2019). We can thus show that certain behavioral biases can affect PA as a discursive tool for balancing decisions but also that legal expertise can mitigate the biases. The fact that judges are found to be less biased is “good news” for the proponents of PA, as it suggests that framing is less likely to influence the judgments in actual court cases. For further insights arising from the additional vignette, see our Appendix B.
At the same time, the findings also suggest that legal expertise (of judges) does not mitigate biases outside the domain of expertise: First, judges fared only slightly better than the students on the CRT, meaning they tended to answer intuitively – and wrongly. This is in line with other findings on judges and international arbitrators (see Table A4 in Appendix A). Second, the classical conjunction fallacy was found with judges, tested in a neutral, non-judicial setting. Third, judges were overconfident in their abilities as judges on several variables. The latter finding is in line with other studies having been administered to U.S. judges and international arbitrators on overconfidence in their ability to assess witness credibility, make quality decisions, provide parties with procedural efficiency, and avoid being challenged on appeal (see Table A5 in Appendix A). The judges in our experiment demonstrated overconfidence, especially in whether they delivered justice, but to a lesser degree: they put themselves predominantly in the second highest quartile and not in the highest, as international arbitrators and U.S. judges demonstrated in previous studies.
Application to American Balancing
Recall that several existing studies indicated that U.S. judges do exhibit susceptibility to framing effects, but in experiments conducted only in non-administrative and non-constitutional legal contexts. 66 Thus, we we do not know whether U.S. judges are susceptibile to framing in a setting similar to our study. The uncertainty here is twofold: First, U.S. judges may be inherently different from German judges (e.g., due to differences in legal tradition or training) and therefore differ on susceptibility. Second, the different structure of U.S. constitutional review, focusing on categories and tiered-scrutiny, may make any judge either more or less susceptible to biases compared to PA. Yet, our findings are still informative, with necessary modifications, for American judicial review despite these differences. Recall that scholars have noted at least three reasons for why the U.S. should care: a growing call to adopt PA in the U.S., implicit application of PA-like reasoning (balancing) in existing court cases, and the application in international law cases. Furthermore, given the claims that PA and the US-style review do not yield substantially different results anyway, the difference may be of degree rather than kind.
In particular, we submit that our main two vignettes are of direct relevance also to the United States: First, the uncertainty element (as in the cryptocurrency vignette) and idea of diminishing sensitivity is equally present in US-style balancing. It is possible, for instance, that if upholding the act would be framed as preventing a certain loss, judges would be more prone to applying a more lenient review standard (for example, rational basis instead of intermediate review) or that judges would apply the standard itself differently (for example, concluding that even under strict scrutiny, the law should be upheld). 67 Second, purpose framing (bee vignette) has particular significance, as scrutiny levels in the US depend on legislative purpose.
Limitations
While our study arguably provides an important contribution, it is also subject to some limitations. First, we cannot formally distinguish between legal training and (self-)selection into the law profession in general and into the judiciary in particular. Albeit law students and judges may, in theory, hold some “special traits” that make them less susceptible to framing compared to non-law students, those special traits would only be relevant under very specific conditions. In particular, such traits would have to explain why judges differ from the general pool of law students (some of which would likely become judges) in a way that happens to be correlated with susceptibility to framing. A much simpler explanation for our finding, following “Occam’s Razor” principle, would be that legal expertise and experience do mitigate the effect of framing. Namely, because German administrative judges must engage in proportionality analysis on a regular basis, they seem highly likely to face different frames of similar facts in their line of work. Moreover, lawyers presumably try to present a version most consistent with their client’s benefit using, inter alia, framing. Thus, a good judge needs to “filter out” the description and focus on the case in order to avoid mistakes. This experience may help judges develop expertise that enables them to circumvent the effects of framing. The same might hold, to a lesser extent, for law students, who face case studies in their legal training.
Second, as our sample only consists of German subjects, we cannot rule out that the effects may differ in other jurisdictions, particularly in common-law countries. A recent cross-country experimental study found no effects between common law and civil law judges in sentencing decisions (Spamann et al., 2021) and similar study on PA might need to be conducted to confirm whether our results are generalizable to other jurisdictions.
Third, our sample consists of three representative groups along a continuum of legal expertise: non-law students, law students, and (administrative) judges. There are, of course, other groups along the spectrum, including non-student laymen, lawyers, legal interns, paralegals, law professors, and other judges. However, these groups may differ from one another in many other features apart from expertise (for example, in the level of education, income, or analytical skills). While future studies may benefit from looking at such differences, we believe that our study design enables to look at the most relevant control groups.
Finally, a general difficulty with framing is that it is not binary: one may frame the same set of facts in many different ways. Our ceteris paribus setting allows us to contrast two versions for each vignette, but of course one could introduce others as well (for example, by making some points more or less salient). We leave such endeavors for future research.
Conclusion
The rationality assumption in adjudication, whereby interpreters decide objectively and without cognitive errors, has long been challenged on numerous accounts, both theoretically and empirically. Legal theorists have often articulated their suspicion that interpretation is the result of the result—interpreters tend to confirm their initial intuition by interpretation. Cognitive studies can help illuminate that suspicion. Experiments on the psychology of judging demonstrate that cognitive biases and heuristics affect the decision-making of national court judges and international arbitrators, such that their decision making process may deviate from perfect rationality.
We contribute to this discussion in several ways. First, we provide reassuring evidence that professional judges conducting PA are largely protected from framing effects that significantly influence other decision-makers. This finding challenges concerns about judicial susceptibility to cognitive biases in constitutional review and supports the practical rationality of PA when applied by trained legal experts. Our findings demonstrate that legal expertise serves as an effective debiasing mechanism within judges’ professional domain. While non-law students showed substantial framing effects and law students showed intermediate effects, judges remained largely unaffected by the same framings in their PA judgments. However, we also find that judges remained susceptible to biases outside their field of expertise. Thus, judges seem to be able to override their intuitions, but only when deciding in their professional context, Second, we enrich the discussion on the rationality of proportionality analysis. Although we show that proportionality analysis can be prone to framing among non-experts, these framing effects follow predictable patterns consistent with established behavioral theories. Thirdly, we add to the more general discussion on whether experiments with students allow inferences for the decision-making of professionals. Although many studies have found cognitive biases in expert adjudicators, there have also been studies showing that biases may be generally diminished in experts compared to other populations (see, e.g., Broude & Levy, 2019; Mizrahi, 2018; Shereshevsky & Noah, 2017). Our results are in line with this line of findings, as we find that legal expertise mitigates framing effects. This seems important for our understanding of constitutional review as it is today—being aware of the role of framing effects on the one hand, but reducing the concern of its influence when judges are experts, on the other hand. Furthermore, our findings suggest that constitutional cases should be allocated to specialized judges (as is often the case anyway, whenever cases are submitted to a specialized constitutional court) and that training and experience could potentially debias judges who are insufficiently specialized. Figuring out the costs of benefits of such interventions requires further study, but our findings are nonetheless important in identifying a potential debiasing mechanism. Overall, our findings provide empirical support for the rationality of PA as practiced by professional judges, while highlighting the importance of legal expertise in constitutional adjudication.
Supplemental Material
Supplemental Material - Framing Effects in Proportionality Analysis: Experimental Evidence
Supplemental Material for Framing Effects in Proportionality Analysis: Experimental Evidence by Anne van Aaken and Roee Sarel in Journal of Law & Empirical Analysis
Footnotes
Author Note
A theoretical precursor of this article with comparative constitutional law and more biases and heuristics covered can be found in van Aaken (2019). An older working paper version is available at
. The experiment is registered on osf.io.
Acknowledgements
We are grateful for the invitation to an expert workshop in Paris in March 2016 organized by the Israel Democracy Institute with the support of the European Research Council and the helpful comments by the participants. On the theoretical part, we would also like thank Mattias Kumm, Samual Ischararoff, and Aharon Barak for helpful comments on the article as well as the participants of the Global and Comparative Public Law Colloquium at NYU (2016), a workshop at Humboldt University (2017), ILE Jour Fixe in Hamburg 2021, panels at ICON-S (2021 and 2022), European Association of Law & Economics Conference (2022), French Law & Economics Association Conference (2022), Conference on Empirical Legal Studies (2022), and FUELS seminar (2023). For the empirical part and other helpful comments, we would like to thank Christoph Engel, Eberhard Feess, Sven Hoeppner, Niels Petersen, Holger Spamann, Roseanna Sommers, Doron Teichman, and Justus Vasel. We would also like to thank the editor, Dan Klerman, and several anonymous referees for their helpful comments. We are grateful to the Administrative Judges Association in Lower Saxony (Niedersachsen) Germany for permitting us to conduct the experiment. We gratefully acknowledge funding from the Alexander von Humboldt Foundation.
Ethical Considerations
The experiment received an ethics approval from the ethics committee of the economics faculty (WISO research lab) of the University of Hamburg.
Consent to Participate
Informed consent to participate was written (attained both generally by the Wiso Lab and specifically for the experiment before participation begins).
Author Contributions
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The experiment was funded by the Alexander von Humboldt Foundation.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data will be posted on a public repository (e.g., OpenICPSR) upon publication.
Supplemental Material
Supplemental material for this article is available online.
Notes
Appendix
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
