Sage Journals: Discover world-class research

Abstract

Condorcet’s Jury Theorem states that the correct outcome is reached in direct majority voting systems with sufficiently large electorates as long as each voter’s independent probability of voting for that outcome is greater than 1/2. Previous research has found that switching to a hierarchical system always leads to an inferior result. Yet, in many situations direct voting is infeasible (e.g., due to high implementation or infrastructure costs), and hierarchical voting may provide a reasonable alternative. This paper examines differences in accuracy rates of hierarchical and direct voting systems for varying group sizes, abstention rates, and voter competences. We derive three main results. First, we prove that indirect two-tier systems differ most from their direct counterparts when group size and number are equal (i.e., when each equals $\sqrt{N_{d}}$ , where N_d is the total number of voters in the direct system). In multitier systems, we prove that this difference is maximized when group size equals $\sqrt[n]{N_{d}}$ , where n is the number of hierarchical levels. Second, we show that while direct majority rule always outperforms indirect voting for homogeneous electorates, hierarchical voting gains in accuracy when either the number of groups or the number of individuals within each group increases. Third, we prove that when voter abstention and competency are correlated within groups, hierarchical systems can outperform direct voting. The results have implications beyond voting, including information processing in the brain, collective cognition in animal groups, and information aggregation in machine learning.

Keywords

Condorcet's Jury Theorem information aggregation decision making voting abstention

Introduction

Throughout the social, physical, and biological sciences, researchers are interested in better understanding collective decisions. Whether we are studying signal processing in the brain, artificial neural networks, animal communication, organizational design, or voting and elections, the structure of information transmission can help or hinder group performance. In this paper, we examine decision making with two outcomes using majority rule. Specifically, we extend a foundational theorem from social choice theory, the Condorcet Jury Theorem, to investigate how hierarchical majority elections shape collective accuracy. In hierarchical (or indirect) voting systems, preferences are first aggregated within subgroups and then aggregated across groups.

Condorcet’s Jury Theorem (de Condorcet, 2014) characterizes the probability that an electorate reaches a mutually beneficial outcome in a majority-rule election where voters are uncertain about which of two options leads to the best outcome. The theorem is a direct consequence of the weak law of large numbers (Kalai and Safra, 2006) and states that if each voter’s independent probability of supporting the best outcome is greater than 1/2 (that is, they are more likely to vote correctly than incorrectly), the group will increase its chance of arriving at the correct decision as more voters are added to the electorate. If each voter’s probability of voting correctly is less than 1/2, the likelihood of selecting the correct outcome decreases with more voters, and the probability of obtaining the correct outcome is maximized with a single decision maker. (When voters are equally likely to vote for the correct and incorrect outcome, the probability of a correct collective decision is 1/2 for any size electorate.)

Several remarks are in order to clarify the simplifying assumptions made by Condorcet. First, the theorem assumes a homogeneous pool of voters—both with respect to preferences and accuracy. Everyone votes for the correct outcome with equal probability. Second, it neglects social influence; voters are independent and cannot sway each other’s decisions (Böttcher et al., 2017). Third, in Condorcet’s setting, majority rule is applied to all decision makers simultaneously using direct elections. Fourth, the model assumes no abstention. Everyone votes with certainty.¹

The influence of certain aspects of heterogeneity, voter dependence, and hierarchical voting are discussed in Boland (1989). Heterogeneity in information can be captured by substituting the mean voter’s opinion for each individual’s probability of voting for the correct outcome. In addition, positive correlations among voters’ decisions have a negative impact on the effectiveness of direct majority voting systems, a finding confirmed more recently in Kaniovski and Zaigraev (2011). Previous work has shown that the probability of accepting the correct decision under Condorcet’s assumptions is larger in direct voting systems than in certain indirect systems (Boland, 1989; Boland et al., 1989). As discussed in Berg and Paroush (1998), indirect voting may outweigh direct systems when there is a tradeoff between effectiveness (or accuracy) and implementation costs, which may increase with both group number and size.

According to May’s theorem (May 1952), majority voting is the only voting rule that satisfies certain fairness properties. Alternative rules, such as unanimity voting, are often inferior to majority voting because they confer larger type I and type II errors (Feddersen and Pesendorfer, 1999). For example, Feddersent and Pesendorfer (1998) find that unanimity rule leads juries to be more likely to both convict an innocent person or acquit a guilty person.² For social choice problems with at least three voting alternatives, Arrow’s impossibility theorem (Arrow, 1950; Miller, 2019; Sen, 2020) states that there exists no social welfare function (or rank-order electoral system) that maps individual preferences to a societal preference order satisfying desired fairness conditions. As a further generalization of Condorcet’s direct binary choice voting scheme, majority runoff elections with three candidates and a continuous range of voter preferences are modeled in Bouton and Gratton (2015). In such multi-stage elections, the Condorcet winner (i.e., the candidate who wins a majority of head-to-head contests against any other candidate) may not even participate in the final runoff round (Bouton and Gratton, 2015).

In this paper, we examine differences in direct and indirect voting systems, focusing on variations in the size and number of groups, abstention rates, and voter competences. We start by introducing a simple model of hierarchical voting where preferences are first aggregated within subgroups and then aggregated across groups. We confirm that for any size electorate and number of groups, direct systems outperform indirect ones, so long as each voter’s probability of voting for the correct outcome, p, is greater than 1/2. Simply put, indirect voting introduces room for error. Although under certain conditions indirect systems approach the same outcome as in the direct voting model, adding layers of aggregation is always strictly dominated in expectation.

A natural inquiry for hierarchical voting systems concerns how accuracy depends on both the number and size of groups. In accordance with Condorcet’s Jury Theorem, we show that the accuracy of a hierarchical voting system increases with the number of voters if both the number of layers and groups are kept constant. Previous research examining two-level hierarchical voting suggests that collective outcomes are greater for a large number of small groups than for a small number of large groups (Boland, 1989). A more recent study on binary decision making in animal groups provides numerical evidence that the difference between a two-tier indirect system (in our case, a voting system in which opinions are aggregated in two stages) and a direct system is greatest when the number of groups is similar to the number of members per group (Kao and Couzin, 2019). Using an asymptotic expansion of the derivative of the reliability function (or Banzhaf number [Berg, 1997]), we prove that the outcome for any indirect system with n tiers and N_d voters differs most from a direct voting system when group size and number equal $\sqrt[n]{N_{d}}$ .

Finally, we examine how voter abstention impacts (and indeed changes) the superiority of direct versus indirect voting. To describe heterogeneous multitier voting systems analytically, we develop a corresponding generating function approach that does not rely on approximations to describe heterogeneity among voters. We find that when abstention and competency are independent, direct voting remains superior. However, when a person’s likelihood of turning out is correlated with their vote—as is likely the case in many electoral and managerial contexts—indirect voting may gain in its ability to represent the entire electorate. Indeed, in some cases, hierarchical voting dramatically outperforms direct voting. Another way to explain this finding is that when a voting system is meant to represent the preferences of eligible, rather than actual, voters, indirect elections provide a potential correction to direct voting.

Although the main focus of our work concerns elections, hierarchical systems of information aggregation play an important role in various other contexts. Majority rules and hierarchical decision making are relevant throughout studies of social choice (Brandt et al., 2016) and organizational design (Christensen et al., 2021; Christensen and Knudsen, 2010; Csaszar and Eggers, 2013; Gersbach et al., 2022; Sah and Stiglitz, 1984, 1988), but also appear in research about collective cognition in animal groups (Couzin, 2009). Previous studies have identified hierarchical or modular interaction structures in ant (Mersch et al., 2013; Pinter-Wollman et al., 2011) and honeybee (Naug, 2008) colonies, as well as with bird flocks (Nagy et al., 2010) and elephant herds (Wittemyer et al., 2005). At the individual level, information processing in the brain (Friston, 2008) is known to be affected by the hierarchical organization of cortical areas (Zeki and Shipp, 1988). Information aggregation that is based on majority voting is also relevant to describe synchronous interactions between grid cells in cellular automata (Gärtner and Zehmakan, 2017) and to perform renormalization operations in statistical mechanics (Böttcher and Herrmann, 2021). Artificial neural networks (Richards et al., 2006), and in particular ensemble machine learning, combine outcomes of individual classifiers using (weighted) majority voting to enhance model performance (Dietterich, 2000). In addition, von Neumann (1956) and Moore and Shannon (1956a, 1956b) examine how to optimize hierarchical computing circuits by combining unreliable electrical components. Thus, identifying factors that impact accuracy in hierarchical voting systems is crucial for our broad understanding of preference and information aggregation across fields.

A simple model of hierarchical voting

We consider two voting systems: one with indirect, or hierarchical, voting and another with direct voting. The hierarchical voting system we study is represented by a regular tree network G(n, k) with n layers. Each node (except leaf nodes which represent the “electorate”) has k child nodes, where k is an odd number. The total number of nodes in such an indirect majority voting model is $N = \sum_{i = 0}^{n} k^{i} = (k^{n + 1} - 1) / (k - 1)$ (including the top node). In comparison, the number of voters in a direct voting system is N_d = kⁿ. Nodes can be in two different states: “0” and “1,” representing two different policy positions. Initially, we set each node in the bottom layer to 1 with probability p. Next, we apply a recursive majority rule (Häggström et al., 2006) to update the states of all root nodes. That is, the state of a node is set to 0 if the majority of its children are in state 0. Otherwise, it will be set to 1. Note that a tie cannot occur because k is an odd number. Figure 1(a) shows an example of a hierarchical voting system with n = 4 layers.

Figure 1.

Hierarchical and direct majority voting. (a) Example of a hierarchical voting system with four layers. Blue and red nodes are in states 0 and 1, respectively. (b) Proportion of correct voting outcomes in three direct voting systems (denoted by P_d(p) in the text) and three indirect voting systems (denoted by P_n(p) in the text) as a function of the probability that an individual in the electorate votes for the correct outcome (denoted by p in the text). Different curves correspond to different numbers of layers n. In the shown examples, we set n = 3, 4, 5 and the number of voters in each majority voting group is k = 3. (c) Difference between P_d(p), the proportion of correct voting outcomes in direct voting systems, and P_n(p) as a function of p. The number of voters in the direct voting system is N_d = kⁿ. All parameters are as in (b). In the vicinity of the Condorcet threshold, p = 0.5, the slopes of the functions describing the voting outcome of direct majority voting are steeper than those observed in hierarchical voting systems with the same number of voters in the bottom layer, illustrating that direct majority systems lead to the correct outcome more often if p > 0.5.

To characterize a hierarchical voting system mathematically, we denote by P_n(p) and 1 − P_n(p) the probabilities that the top node is in state 1 and 0, respectively. According to Condorcet’s theorem (de Condorcet, 2014), we have

\lim_{n \to \infty} P_{n} (p) = {\begin{array}{c} 0 & p < 0.5 \\ 0.5 & p = 0.5 \\ 1 & p > 0.5 . \end{array}

(1)

For systems of finite size, the probability P_n(p) does not jump from 0 to 1 at p = 0.5, but instead smoothly approaches 1 for large enough values of p Figure 1(b). The probability, P_n(p), that the top node of such a finite system is in state 1 can be defined recursively in terms of the model parameters. Let X_i be a binomial random variable representing the number of nodes in state 1 in stage i (1 ≤ i ≤ n). That is, X_i ∼ B(k, p_i) where p_i is the probability that a node in stage i is in state 1. For fixed values of k, n, and P₀(p) = p, then, the recursion relation is given by

P_{i} (p) = \Pr (X_{i - 1} > \frac{k}{2}) = \sum_{j = ⌈ \frac{k}{2} ⌉}^{k} (\begin{array}{c} k \\ j \end{array}) P_{i - 1}^{j} {(1 - P_{i - 1})}^{k - j},

(2)

where ⌈k/2⌉ is the ceiling above k/2 (i.e., the smallest integer greater than or equal to k/2).

For a comparison with a direct voting system, we denote by P_d(p) and 1 − P_d(p) the corresponding probabilities of reaching a correct and incorrect voting outcome. Similar to equation (2), we define the reliability function of the direct voting system as

P_{d} (p) = \Pr (X > \frac{N_{d}}{2}) = \sum_{j = ⌈ \frac{N_{d}}{2} ⌉}^{N_{d}} (\begin{array}{c} N_{d} \\ j \end{array}) p^{j} {(1 - p)}^{N_{d} - j},

(3)

where X denotes the number of voters in state 1. In the limit N_d → ∞ and N → ∞, it holds that P_d(p) = P_n(p). This equality is approximately fulfilled if the number of voters is sufficiently large. Figure 1(c) shows the difference P_d(p) − P_n(p) for different values of n. We observe that the probability of reaching a correct voting outcome in the direct voting system is larger than the corresponding probability in the indirect voting system if the majority of voters in the electorate is able to identify the correct outcome (i.e., P_d(p) > P_n(p) for p > 0.5). In other words, direct voting systems lead to the correct voting outcome more often than hierarchical systems if the probability of voting for the correct outcome is larger than 0.5. In the remainder of this work, we say that a direct system outperforms an indirect one if and only if P_d(p) > P_n(p) for p > 0.5, thereby assuming that the electorate is more likely to vote for the right outcome. However, it is important to note that the studied reliability functions are symmetric around p = 0.5, and thus in cases where p < 0.5 the opposite system will always perform best. We also observe in Figure 1(c) that differences between direct and hierarchical voting systems vanish for sufficiently small or large values of p. The greater the number of layers n, the more pronounced the differences in the vicinity of p ≈ 0.5.

In the following theorem, we formalize and prove these observations for hierarchical majority voting systems with n > 1 layers.

Theorem 1

Let P_d(p) and P_n(p) be the probabilities of reaching the correct voting outcome in direct and hierarchical majority voting systems with the same number of voters in the electorate who vote for the correct voting outcome with probability p. For a hierarchical voting system with at least n = 2 layers, it holds that

(i) P_d(p) = P_n(p) for p ∈ {0, 0.5, 1},

(ii) P_d(p) < P_n(p) for p ∈ (0, 0.5),

(iii) P_d(p) > P_n(p) for p ∈ (0.5, 1).

(Points (i-iii) imply that the probability of accepting the correct voting outcome is larger in direct majority systems compared to hierarchical ones as long as p ∈ (0.5, 1).)

The proof of Theorem 1 is in Appendix 1. For indirect voting systems with one layer, similar results are presented in Boland (1989) and Boland et al. (1989), where the authors consider an indirect voting system with n₁ voter groups, each of size n₂ (n₁ and n₂ being odd integers). Our proof applies to hierarchical voting systems with n layers, making it more general than the proof presented in Boland et al. (1989). Moreover, we also quantify the slope of the reliability functions P_d(p) and P_n(p) at p = 0.5.

An overview of the main quantities and parameters used in our hierarchical voting model is provided in Table 1.

Table 1.

Model quantities and parameters. An overview of the main quantities and parameters used in the hierarchical voting model.

Symbol	Definition
G(n, k)	A regular tree network describing a hierarchical voting system with n layers and branching number k
N	Total number of nodes (or voters)
N _d	Number of nodes (or voters) in direct voting system
p	Nodes in the bottom layer are in state “0” and “1” with probability p and 1 − p, respectively
P_i(p)	The probability that the majority of nodes in layer i ≥ 1 are in state “1” given that nodes in the bottom layer are in state “1” with probability p
P_d(p)	The probability that the majority of nodes in the bottom layer are in state “1” given that nodes in the bottom layer are in state “1” with probability p
α	The abstention probability

Optimal group size and number

In a hierarchical voting system, the probability of reaching the correct outcome depends not only on the number of layers but also on the voting group size and the number of groups. In a two-tier system (i.e., a voting system in which opinions are aggregated over n = 2 layers) with l groups and k voters per group (Figure 2), we have

P_{1} (p) = \Pr (X_{1} > \frac{k}{2}) = \sum_{j = ⌈ \frac{k}{2} ⌉}^{k} (\begin{array}{c} k \\ j \end{array}) p^{j} {(1 - p)}^{k - j}

(4)

and

P_{2} (p) = \Pr (X_{2} > \frac{l}{2}) = \sum_{j = ⌈ \frac{l}{2} ⌉}^{l} (\begin{array}{c} l \\ j \end{array}) P_{1}^{j} {(1 - P_{1})}^{l - j}

(5)

where P₁(p) and P₂(p) refer to the probabilities of reaching the correct outcomes in layers 1 and 2, respectively. In Figures 2(a) and (b), we show two two-tier voting systems with (k, l) = (5, 3) and (k, l) = (3, 5), respectively. Although the distribution of voters in both electorates is identical, the voting outcome differs because of the different voting group structure.

Figure 2.

Effect of different group sizes on performance of two-tier voting systems. (a) A two-tier electoral system with l = 3 groups and k = 5 voters within each group. Arrows pointing upwards (red) and downwards (blue) represent individuals who vote for the correct and incorrect outcome, respectively. (b) For the same distribution of voter types in the electorate, the voting outcome is different than in (a) because of the different voting group structure. (c) The probability P₂(p) of a correct outcome for different sized electorates, N_d = kl, and p = 0.5001 as a function of the number of groups l. For two of the selected electorate sizes, N_d = 1, 125 and 3, 375, the corresponding square roots are not integer values and approximately 34 and 58, respectively. Hence, the minimum of P₂(p) is attained for group sizes l_min = 25 and 45 that are close but not equal to $\sqrt{N_{d}}$ . For N_d = 2, 025, the minimum of P₂(p) is attained at $l_{\min} = \sqrt{N_{d}} = 45$ . (d) Black dots indicate group sizes with the lowest performance, l_min, for all electorates with integer square roots. The solid black line describes the number of groups with the lowest performance according to $l_{\min} = \sqrt{N_{d}}$ .

The probability P₂(p) of reaching a correct voting outcome increases with the number of voters in the electorate N_d = kl for p > 0.5. If the number of voters N_d is constant, what is the most effective and least effective composition of voters per group k and numbers of groups l? Mathematically, for constant k, l, l > k, and p > 0.5, the probability P₂(p) is maximized if one uses l groups with k members each (Berg, 1997). The opposite holds for l < k. Hence, the reliability function P₂(p) is not symmetric in k, l. Earlier related observations (Boland, 1989) led to the conjecture that the collective performance of a group of independent voters is larger for a large number of small groups than for a small number of large groups. Indeed, next to direct voting, the most effective two-tier voting system is that with the smallest possible number of voters per group k since it resembles direct voting most closely. However, this observation does not imply that the least effective voting system is that with the maximum possible number of voters per group for constant N_d. Other work has suggested that the difference between indirect and direct voting models is greatest when group size and number are similar (Kao and Couzin, 2019). Kao and Couzin (2019) state that “the modular structure that leads to the lowest collective accuracy occurs very close to when there are $\sqrt{N_{d}}$ subgroups with $\sqrt{N_{d}}$ individuals per subgroup”.

To examine this question, we use an asymptotic expansion of the derivative of the reliability function about one of its fixed points and formulate the following Square Theorem (Theorem 2), which shows that P_n(p) is minimized when k = l.

Theorem 2

For p ∈ (0.5, 1), the probability of reaching a correct voting outcome in a two-tier voting system is minimized asymptotically if the number of groups approaches the number of voters per group, that is, if

(k_{\min}, l_{\min}) = (\sqrt{N_{d}}, \sqrt{N_{d}}) .

(6)

In Appendix 2, we present a proof of the Square Theorem and provide numerical evidence that the above square root relation also holds for small electorates. According to the Square Theorem, when group size and number are equal, the indirect model deviates most from the direct model. Thus, for p > 0.5, arriving at the “correct” decision is least likely when k = l. However, when p < 0.5, an indirect model with k = l is most likely to produce the correct result.³

We also prove in Appendix 2 that the Square Theorem can be extended to hierarchical voting systems with n layers and group sizes k⁽¹⁾, k⁽²⁾, …, k⁽ⁿ⁾. The accuracy of such a multitier voting system is minimized for $k_{\min}^{(1)} = k_{\min}^{(2)} = \dots = k_{\min}^{(n)} = \sqrt[n]{N_{d}}$ . For example, in a three tier system with N_d = 729 voters, indirect and direct systems deviate most when there are 9 groups of 9 groups of 9 voters, where $\sqrt[3]{729} = 9$ . In a four-tier system with 6561 voters, of the 35 possible voting compositions, the difference between direct and indirect systems is maximized for group sizes $\sqrt[4]{6,561} = 9$ ; in an electorate with 2,313,441 voters, out of 471 possible four-tier compositions, the maximum difference occurs at $\sqrt[4]{2,313,441} = 39$ .

Figure 2(c) shows the reliability function P₂(p) for different numbers of voters in the electorate, N_d, as a function of group size l. In accordance with Condorcet’s Jury Theorem, we observe that the accuracy of a two-tier voting system increases with the number of voters N_d if the number of groups is kept constant. This finding also extends to hierarchical voting systems with more than two layers (see Appendix 1). We also observe in Figure 2(c) that the minimum of P₂(p) is obtained for group sizes l that are close to $\sqrt{N_{d}}$ . For two of the selected electorate sizes (N_d = 1, 125 and 3, 375), the values of $\sqrt{N_{d}}$ do not correspond to integer values. Hence, the group sizes associated with the lowest collective accuracies, l_min, are close to but not equal to $\sqrt{N_{d}}$ . As shown in Figure 2(d), the relation $l_{\min} = \sqrt{N_{d}}$ holds exactly if the square root of the electorate size N_d is a possible group size (i.e., an odd integer).

The intuition behind the Square Theorem follows from the law of large numbers. First, in a direct voting system, as N_d increases, the share of voters needed to pass any particular outcome decreases. (For example, if there are 11 total voters, 6 (54.5%) are needed to vote for an outcome for it to pass. If there are 21 total voters, 11 (52.3%) are needed to pass an outcome.) Second, a direct system will always require more votes to win than an indirect system with the same number of voters. Consider an election where N_d = 81. In a direct system, at least 41 voters must support any given outcome for it to pass. However, in a two-tiered hierarchical system with 3 groups of 27 voters, the minimum number of voters is equal to 28 (14 in each of 2 groups). (The same minimum number of voters is obtained when there are 27 groups of 3 voters.) And even fewer voters (25) are needed if there are 9 voters in each of 9 groups (where exactly 5 voters favor a measure in each of 5 groups). As the minimum number of voters required to sway an outcome decreases, an indirect voting system deviates even more from a direct system.

In fact, because the probability a correct outcome is achieved increases with the number of required votes to win an election, the fewest voters necessary to achieve any outcome in a two-tier hierarchical system always occurs when group size equals group number. We prove this, as well as the extension to an n-tier hierarchical system as stated in Theorem 3, in Appendix 3.

Theorem 3

The fewest voters necessary to sway an election occurs when $k_{\min}^{(i)} = \sqrt[n]{N_{d}}$ (1 ≤ i ≤ n), where $k_{\min}^{(i)}$ is the number of voters (groups) at level i.

Thus, the probability that a minority of voters sways the overall outcome toward the incorrect choice when p > 0.5, or the correct choice when p < 0.5, is maximized when the number of groups in each hierarchy equals $\sqrt[n]{N_{d}}$ . In the standard Condorcet set-up, where all members in society benefit most from the same “correct” outcome, hierarchical models with few groups of many voters or many groups of fewer voters outperform those where group size and number are equal. In contrast, in a competitive environment, where voters hold differing preferences over two outcomes (rather than simply differing levels of accuracy), the outcome preferred by a majority of voters may fail if voters supporting the minority opinion are concentrated in a minimum-winning number of groups. Intentional redistricting to favor one party over another is well documented throughout U.S. history. Theorem 3 suggests this partisan gerrymandering may be easier in cases where the number of groups approaches group size. Moreover, as we show in the next section, when voter participation is not guaranteed, and abstention is positively correlated with accuracy, an indirect model may be more likely to elicit the correct outcome, even in the standard Condorcet set-up.

Influence of abstention

Thus far, we have assumed that all voters participate fully in the election. But in any system, people abstain from voting. In some cases, such as primary or midterm elections in the U.S., less than half of all eligible voters cast their ballot. More broadly, abstention-like behavior is also relevant to describe passive components in other information aggregation systems, such as non-functioning neurons in artificial neural networks (Douglas and Yu, 2018) and impaired prefrontal cortical areas in lesioned brains (Dehaene and Changeux, 1997). Studying the effect of abstention on hierarchical information processing is thus relevant in different scientific fields. In this section, we examine how abstention shapes the tradeoffs between direct and indirect voting.

To model abstention in a multi-stage voting system, let $X_{i}^{(0)}$ , $X_{i}^{(1)}$ , and $X_{i}^{(2)}$ be the number of nodes in layer i that are in state “0,” “1,” and “2,” respectively. Nodes in state “2” represent abstaining voters. In a two-tier voting system, uniform abstention (i.e., an equal probability of abstaining for all voters) can be incorporated in equations (4) and (5) via a multinomial distribution. That is,

P_{1} (p) = \Pr (X_{1}^{(1)} > \frac{k}{2}) = \sum_{(x_{1}, x_{2}, x_{3}) \in S (k)} (\begin{array}{c} k \\ x_{1}, x_{2}, x_{3} \end{array}) p^{x_{1}} α^{x_{2}} {(1 - p - α)}^{x_{3}}

(7)

and

P_{2} (p) = \Pr (X_{2}^{(1)} > \frac{l}{2}) = \sum_{(x_{1}, x_{2}, x_{3}) \in S (l)} (\begin{array}{c} l \\ x_{1}, x_{2}, x_{3} \end{array}) P_{1}^{x_{1}} α^{x_{2}} {(1 - P_{1} - α)}^{x_{3}}

(8)

where

S (k) : = {(x_{1}, x_{2}, x_{3}) | x_{1} \in {⌈ \frac{k}{2} ⌉, \dots, k} \land x_{2} \in {0, \dots, k - x_{1}} \land x_{3} = k - x_{1} - x_{2}}

and α (0 ≤ α ≤ 1 − p) is the abstention probability. The condition 0 ≤ α ≤ 1 − p guarantees that the term 1 − p − α in equation (7) is positive.

In the above description of (homogeneous) abstention, direct voting is still superior to indirect voting systems since α only reduces the domain of p from [0, 1] to [0, 1 − α] without altering other previous results.

In real-world scenarios, abstention is often correlated within voting groups. Residents of rural communities may have longer transportation times to reach their polling location, and thus vote at lower rates in the absence of mail-in ballots. Alternatively, voters may be discouraged from voting in urban areas when there are long lines to vote. Turnout is associated with political interest, information, and education (Sondheimer and Green, 2009), which are not homogeneously distributed across districts. People in economically disadvantaged districts may have fewer financial, temporal, and informational resources available that make voting accessible. Abstention is also related to an election’s competitiveness (Cancela and Geys, 2016; Simonovits, 2012) and the number of races on the ballot (Garmann, 2016; Kogan et al., 2018)—both of which can vary across districts.⁴ Even random shocks—like a power outage, hail storm, or freeway accident—can cause geographical correlations in abstention. In a managerial setting, people in certain divisions of a firm may be less informed or interested in company decisions on which they have a vote. And, while all union members can participate and vote in labor decisions, full-time employees, as well as those receiving higher pay and requiring greater skill, are historically more likely to participate in union votes (Kolchin and Hyclak, 1984). Thus, for any number of reasons, it is important to examine cases where abstention is not homogeneously distributed across groups.

To model heterogeneous abstention and voter competences within districts, we incorporate the following modifications into our mathematical framework. We again consider a two-tier voting system and account for different abstention probabilities, α_j ∈ [0, 1] (1 ≤ j ≤ l), and voter competences, p_j, in each of the l voting groups (or jurisdictions). The number of voters in group j that do not abstain is ${\tilde{k}}_{j} = k (1 - α_{j})$ , where α_j is chosen such that ${\tilde{k}}_{j} \geq 3$ is an odd number. The probability that the correct voting outcome is achieved in voter group j is

P_{1, j} (p_{j}) = \Pr (X_{1, j}^{(1)} > \frac{{\tilde{k}}_{j}}{2}) = \sum_{m = ⌈ \frac{{\tilde{k}}_{j}}{2} ⌉}^{{\tilde{k}}_{j}} (_{m}^{{\tilde{k}}_{j}}) p_{j}^{m} {(1 - p_{j})}^{{\tilde{k}}_{j} - m} .

(9)

According to a theorem by Hoeffding (Boland, 1989; Hoeffding, 1956; Percus and Percus, 1985), the probability of a correct voting outcome after information aggregation across all voter groups satisfies

P_{2} (p) = \Pr (X_{2}^{(1)} > \frac{l}{2}) \geq \sum_{j = ⌈ \frac{l}{2} ⌉}^{l} (\begin{array}{c} l \\ j \end{array}) {\bar{p}}_{1}^{j} {(1 - {\bar{p}}_{1})}^{l - j}

(10)

if the mean voter competence

{\bar{p}}_{1} = {\bar{p}}_{1} (p) = (P_{1,1} + \dots + P_{1, l}) / l \geq 1 / 2 + 1 / (2 l)

(Boland, 1989), where p = (p₁, …, p_l). A similar bound can be given for the probability of a correct outcome in direct voting systems:

P_{d} (p) = \Pr (X > \frac{N_{d}}{2}) \geq \sum_{j = ⌈ \frac{N_{d}}{2} ⌉}^{N_{d}} (\begin{array}{c} N_{d} \\ j \end{array}) {\bar{p}}^{j} {(1 - \bar{p})}^{N_{d} - j},

(11)

which holds if

\bar{p} = (p_{1} + \dots + p_{N_{d}}) / N_{d} \geq 1 / 2 + 1 / (2 N_{d})

. For the above examples with two levels of hierarchy and three subgroups, the number of voters in the direct system is

N_{d} = {\tilde{k}}_{1} + {\tilde{k}}_{2} + {\tilde{k}}_{3}

. Hoeffding’s bound implies that direct heterogeneous voting systems outperform direct homogeneous voting systems with mean voter competency

\bar{p}

if the value of

\bar{p}

is larger than or equal to 1/2 + 1/(2N_d).

Previous work has relied on different bounds (Boland, 1989; Hoeffding, 1956; Hodges and Le Cam, 1960) to characterize heterogeneous systems. Instead of using Hoeffding bounds in equations (10) and (11), we derive a generating function in Appendix 4 to obtain an analytical expression for the reliability function of a two-tier voting system with abstention. To the best of our knowledge, this is the first paper to directly capture underlying heterogeneity in voting models without relying on further approximations. We find that the reliability function of a heterogeneous two-tier system is

P_{2} (p) = \frac{1}{Q} (\frac{P_{1,1} P_{1,2}}{Q_{1,1} Q_{1,2}} + \frac{P_{1,1} P_{1,3}}{Q_{1,1} Q_{1,3}} + \frac{P_{1,2} P_{1,3}}{Q_{1,2} Q_{1,3}} + \frac{P_{1,1} P_{1,2} P_{1,3}}{Q_{1,1} Q_{1,2} Q_{1,3}}),

(12)

where Q_1,j = 1 − P_1,j, and

Q = \prod_{j = 1}^{3} Q_{1, j} = Q_{1,1} Q_{1,2} Q_{1,3}

. For the corresponding direct system, we find that

P_{d} (p) = \frac{1}{Q} \sum_{j = ⌈ \frac{N_{d}}{2} ⌉}^{N_{d}} C_{j} (p_{1} / (1 - p_{1}), \dots, p_{N_{d}} / (1 - p_{N_{d}})),

(13)

where

Q = \prod_{j = 1}^{N_{d}} (1 - p_{j})

, and C_j is the jth elementary symmetric function. As detailed in Appendix 4, we use Newton’s identities to recursively calculate C_j. Such generating function approaches are useful to analytically capture the properties of general heterogeneous multitier voting systems and complement earlier approaches that mainly relied on using different bounds (Boland, 1989; Hoeffding, 1956; Hodges and Le Cam, 1960).

Contrary to the differences between homogeneous and heterogeneous direct voting systems, as described by the Hoeffding bound (11), we find that hierarchical homogeneous systems outperform heterogeneous hierarchical systems. That is, when $\bar{p} > 0.5$ , systems where all voters have equal values of p_j are associated with higher levels of P₂ than those where individuals have heterogeneous values of p_j, even with a constant mean.

For heterogeneous abstention and homogeneous voter competences (i.e., p_j = p), direct voting systems are still preferable over indirect ones. However, if abstention rates are associated with greater within-group voter accuracy, hierarchical voting can outweigh the potential underrepresentation of voter groups with high abstention rates in direct voting systems. Thus, indirect voting can provide an opportunity not only for more efficient preference aggregation, but also for greater representation. All else equal, this would suggest that countries with significant geographic correlation between vote choice and abstention should achieve greater representation with large legislatures.

Figure 3(a) shows an example of a two-tier voting system in which heterogeneous abstention and voter competences impact the final voting outcome. There are l = 3 voter groups with k₁ = k₂ = 3 (α₁ = α₂ = 0.4) and k₃ = 5 (α₃ = 0), and the voter-group competences are p₁ = p₂ = p and p₃ = 1 − p, respectively. Figure 3(b) shows the corresponding direct voting system. For p ≥ 0.5, hierarchical voting is associated with a larger probability of a correct voting outcome [Figure 3(c)]. For p ≈ 0.5, the Hoeffding bounds [dashed lines in Figure 3(c)] provide good characterizations of P₂( p ) and P_d( p ). In the limit p → 1, the Hoeffding bound of P₂( p ) reaches a value of 0.74 and that of P_d( p ) reaches a value of about 0.62, significantly underestimating the true outcome probability. In the limit p → 1, both voting systems lead to the correct outcome with probability 1—there are two voters against one voter in the two-tier voting system and 6 against 5 voters in the direct voting system. To verify the analytical results in equations (12) and (13), we simulated 10,000 independent voting realizations [red and black disks in Figure 3(c)] and find excellent agreement between our numerical and analytical results. For values of p between 0.5 and 0.8, the accuracy of the direct voting system is between 0.5 and about 0.6. In contrast, indirect systems’ accuracy levels reach 0.8. To put this result in more practical terms, even if 80% of the voters in the first and second groups vote for the correct outcome, the direct system produces the correct outcome on average in only slightly more than 60% of the elections. For the same value of p (i.e., p = 0.8), the indirect system leads to the correct outcome in more than 80% of the elections. The maximum difference between the reliability functions of the two described voting systems is about 0.26 and is reached for p ≈ 0.9 [Figure 3(d)].

Figure 3.

Heterogeneous abstention in indirect and direct voting systems. (a) An indirect voting system with l = 3 voter groups and voter-group competences p₁ = p₂ = p and p₃ = 1 − p. The number of voters per group are ${\tilde{k}}_{1} = {\tilde{k}}_{2} = 3$ (α₁ = α₂ = 0.4) and ${\tilde{k}}_{3} = 5$ (α₃ = 0), respectively. (b) The corresponding direct voting system. (c) The probability of a correct voting outcome P(p) for the direct and indirect voting systems as shown in (a,b). Solid red and black lines are the analytical solutions (12) and (13), respectively. Dashed lines are the corresponding Hoeffding bounds (10) and (11). Red and black disks are numerically obtained voting outcomes that are based on 10,000 i.i.d. samples. (d) The difference between the probabilities of a correct voting outcome in a direct and an indirect voting system, P_d(p) and P₂(p), as a function of p.

Discussion and conclusion

Most democracies use some form of indirect voting, be it through representative democracy (e.g., members of parliament selecting a Prime Minister) or an electoral college (e.g., the U.S. presidential race). In the U.S. electoral college, whichever candidate wins a plurality of a state’s votes receives all of that state’s electoral college votes. (The exception is in Maine and Nebraska, where electoral votes can be split across candidates.) As a result, the electoral college winner may not win the popular vote—a result that has occurred in five presidential elections, including George W. Bush’s (2000) Donald Trump’s (2016) wins. Had the U.S. tallied votes using a direct election instead, Al Gore and Hillary Clinton would have won in each respective election. Of course, if the U.S. switched to a popular vote to select the president, voting turnout would likely change as well. Candidates would spend less time than they currently do on populous battleground states, such as Florida, Ohio, and Pennsylvania, and more time mobilizing votes in large safe states that candidates currently all but ignore (except to fundraise), such as such as California or Texas. Electoral colleges are rarely used outside of the U.S. to select the head of government, though many countries have experimented with electoral colleges over time. For example, Charles de Gaulle was chosen under an electoral college in the first election of the Fifth Republic. After that, the country moved to a direct election for president. Indirect systems are frequently employed when selecting leaders within parties, so as to set aside a share of votes going to members of parliament, rank and file party members, and unions, for example.

In this paper, we have investigated differences in accuracy between hierarchical voting systems and their direct counterparts for varying voter competences, group sizes, and abstention rates. In our analysis of two-tier voting systems, we build on previous work (Boland, 1989; Kao and Couzin, 2019) and prove that the lowest collective accuracy is reached if the number of voters per group is equal to the number of groups. Moreover, we generalize this finding to multitier hierarchical voting systems, proving that collective accuracy is minimized asymptotically when group sizes equal $\sqrt[n]{N_{d}}$ , where N_d is the total number of voters in an n-level system.

For homogeneous competences and abstention rates across voter groups, direct voting always outperforms hierarchical voting. For heterogeneous voter competences and abstention rates, we develop a generating function approach to analytically describe voting systems without relying on approximations used in earlier studies (Boland et al., 1989; Hoeffding, 1956; Hodges and Le Cam, 1960). We provide an example illustrating how indirect voting systems can correct for the underrepresentation of voters in groups or jurisdictions with high abstention rates. It is worth noting that most governments allocate seats in proportion to population, rather than turnout, for precisely these reasons. If abstention is correlated with accuracy or preferences, indirect systems can level the playing field by ensuring that districts with lower turnout are equally represented. Moreover, these methods may prove useful for research on hierarchical systems in areas outside politics—especially where information processing units fail or become inactive, such as for artificial neural networks with idle activation functions (Douglas and Yu, 2018) or lesioned brains (Dehaene and Changeux, 1997).

We see a number of directions for future research. First, it would be interesting to investigate a dual outcome model in which voters receive utility from both the overall and local outcome (that is, the within-group winner). Such a model could be used in both the standard Condorcet framework where there exists one “correct” outcome, or it could include an extension to allow for heterogeneous preferences over two competing outcomes. Second, future research may allow individuals to base their decision on multiple cues (perhaps from different sources) about the accuracy of each outcome, rather than a single signal (Kao and Couzin, 2014). Third, individuals (and animals) do not act independently; rather, they exchange information with diverse actors in forming group decisions (Adler and Gordon, 1992; Kao and Couzin, 2019). Thus, future research may investigate different effects of homophily (i.e., the tendency of individuals to form groups with similar individuals) on the tradeoffs between hierarchical and direct voting (Kossinets and Watts, 2009; Massen and Koski, 2014). Last, it would be fruitful to test the empirical implications of our work by examining multi-candidate elections (Boehmer and Schaar, 2022), electoral volatility, the quality of representation, and voter satisfaction as a function of different indirect electoral designs.

Footnotes

Acknowledgements

The authors thank Wendelin Werner for his lecture on “Randomness and Stability” at GYSS 2020 that inspired parts of this work. The authors also thank Malte Henkel, PJ Lamberson, Josh LeClair, and two anonymous reviewers for valuable comments, as well as Xiaofeng Lin for research assistance. We are especially grateful to Scott E. Page, who suggested that the Square Theorem may apply beyond two layers.

Declaration of conflicting interests

The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

LB acknowledges financial support from the SNF (P2EZP2_191888).

ORCID iDs

Lucas Böttcher

Georgia Kernell

Notes

Direct versus hierarchical voting systems

To formulate our proof of Theorem 1, let k = 2k′ + 1 and $N_{d} = 2 N_{d}^{'} + 1$ , such that (14)

\begin{matrix} P_{d} (p) : = P_{d} (p; N_{d}^{'}) = \sum_{j = N_{d}^{'} + 1}^{2 N_{d}^{'} + 1} (\begin{matrix} 2 N_{d}^{'} + 1 \\ j \end{matrix}) p^{j} {(1 - p)}^{2 N_{d}^{'} + 1 - j} \\ = {(1 - p)}^{N_{d}^{'}} p^{N_{d}^{'} + 1} (\begin{matrix} 2 N_{d}^{'} + 1 \\ N_{d}^{'} + 1 \end{matrix})_{2} F_{1} (1, - N_{d}^{'}; N_{d}^{'} + 2; \frac{p}{p - 1}) \end{matrix}

and (15)

\begin{matrix} P_{i + 1} (p) : = P_{i + 1} (p; k^{'}) = g (P_{i} (p)) = \sum_{j = k^{'} + 1}^{2 k^{'} + 1} (\begin{matrix} 2 k^{'} + 1 \\ j \end{matrix}) P_{i}^{j} {(1 - P_{i})}^{2 k^{'} + 1 - j}, \\ = {(1 - P_{i})}^{k^{'}} P_{i}^{k^{'} + 1} (\begin{matrix} 2 k^{'} + 1 \\ k^{'} + 1 \end{matrix})_{2} F_{1} (1, - k^{'}; k^{'} + 2; \frac{P_{i}}{P_{i} - 1}), \end{matrix}

where we identified the truncated binomial sum with the function g, and ₂F₁ is the ordinary hypergeometric function.

Point (i) of Theorem 1 (P_d(p) = P_n(p) for p ∈ {0, 0.5, 1}) can be readily obtained from the definitions of P_d(p) and P_i+1(p).

• For p = 0, we find P_d(p) = P_n(p) = 0 (note that $_{2} F_{1} (a, b; c; z = 0) = 0$ ).

• For p = 0.5, we obtain P₁(p) = (1–0.5)^k′0.5^k′⁺¹4^k′ = 0.5 and thus P_i(p) = 0.5 for all i ∈ {1, …, n}. Similarly, we find P_d(p) = 0.5 for p = 0.5 and conclude that P_d(p) = P_n(p) = 0.5 for p = 0.5.

• For p = 1, the last term in the sums (14) and (15) dominates; i.e., P_d(p) = P_n(p) = 1. Alternatively, we can use the identity

(16)

\begin{array}{c} \lim_{p \to 1} {(1 - p)}^{k^{'}}_{2} F_{1} (1, - k^{'}; k^{'} + 2; \frac{p}{p - 1}) = \lim_{p \to 1} {(1 - p)}^{k^{'}} {(1 - p)}^{- k^{'}} [\frac{k^{'}! (k^{'} + 1)!}{(2 k^{'} + 1)!} + O {(p - 1)}^{2}] \\ = \frac{k^{'}! (k^{'} + 1)!}{(2 k^{'} + 1)!} = {(\begin{array}{c} 2 k^{'} + 1 \\ k^{'} + 1 \end{array})}^{- 1} \end{array}

to show that P_n(p) = 1 (and similarly P_d(p) = 1).

A graphical interpretation of P_i+1(p) = P_i(p) for p ∈ {0, 0.5, 1} is that 0, 0.5, and 1 are fixed points of the iteration P_i+1(P) = g(P_i(p)) [black disks in Figure 4(a)].

To prove points (ii-iii), we will use that P_d(p) and P_n(p) are monotonically increasing with p and convex for p ∈ (0, 0.5) (Boland, 1989). For P_d(p) this follows directly from the definition (14) and for P_n(p) this follows from the way it is constructed via an iteration from P₀(p) as we illustrate in Figure 4(a). Everything that is left to show is that the derivative $P_{n}^{'} (p)$ is smaller than the derivative $P_{d}^{'} (p)$ at p = 0.5. As pointed out by Berg (1997), “the derivatives at this point are crucial when two such functions are compared.”, motivating our following comparison of $P_{n}^{'} (p)$ and $P_{d}^{'} (p)$ . The derivative of P_d(p) with respect to p at p = 0.5 is (17)

\begin{array}{c} \partial_{p} P_{d} (p = 0.5) & = {0.25}^{N_{d}^{'} - 1} \frac{(2 N_{d}^{'} + 1)!}{N_{d}^{'}!} [0.25 \frac{_{2} F_{1} (1, - N_{d}^{'}; N_{d}^{'} + 2; - 1)}{(N_{d}^{'} + 1)!} + 0.5 N_{d}^{'} \frac{_{2} F_{1} (2,1 - N_{d}^{'}; N_{d}^{'} + 3; - 1)}{(N_{d}^{'} + 2)!}] \\ = {0.25}^{0.5 (k^{n} - 1)} \frac{(k^{n})!}{(0.5 (k^{n} - 1))!} [\frac{_{2} F_{1} (1,0.5 (1 - k^{n}); 0.5 (k^{n} + 3); - 1)}{(0.5 (k^{n} + 1))!} \\ + (k^{n} - 1) \frac{_{2} F_{1} (2,0.5 (3 - k^{n}); 0.5 (k^{n} + 5); - 1)}{(0.5 (k^{n} + 3))!}] . \end{array}

Figure 4.

Fixed-point iteration and hierarchical voting. (a) The probability P_n(p) of reaching the correct voting outcome in a hierarchical voting system as a function of p, the probability that voters in the electorate vote for the correct outcome. One can obtain P_n+1(p) from P_n(p) via an iteration (black arrows) from n = 1 → n + 1 = 2. Black disks indicate fixed points of the iteration. (b) The slope of P_n(p) (black disks) and P_d(p) (blue triangles) at p = 0.5. For n ≥ 2, $P_{d}^{'} (p = 0.5)$ is larger than $P_{n}^{'} (p = 0.5)$ . In both panels, we set the number of child nodes to k = 3.

For P₀(p) = g(p), we find (18)

\begin{array}{c} \partial_{p} g (p = 0.5) & = {0.25}^{k^{'} - 1} \frac{(2 k^{'} + 1)!}{k^{'}!} [0.25 \frac{_{2} F_{1} (1, - k^{'}; k^{'} + 2; - 1)}{(k^{'} + 1)!} + 0.5 k^{'} \frac{_{2} F_{1} (2,1 - k^{'}; k^{'} + 3; - 1)}{(k^{'} + 2)!}] \\ = {0.25}^{0.5 (k - 1)} \frac{k!}{(0.5 (k - 1))!} [\frac{_{2} F_{1} (1,0.5 (1 - k); 0.5 (k + 3); - 1)}{(0.5 (k + 1))!} \\ + (k - 1) \frac{_{2} F_{1} (2,0.5 (3 - k); 0.5 (k + 5); - 1)}{(0.5 (k + 3))!}] . \end{array}

The derivative of P_n(p) is (19)

\partial_{p} P_{n} (p) = \partial_{p} (\underset{n times}{\underset{⏟}{g ° g . . . g ° g}}) ° p = \underset{n times}{\underset{⏟}{(\partial_{p} g ° p) (\partial_{p} g ° g ° p) . . . (\partial_{p} g ° \underset{n - 1 times}{\underset{⏟}{g ° . . . ° g}} ° p)}} .

Using g(p = 0.5) = 0.5 yields (20)

\begin{array}{c} \partial_{p} P_{n} (p = 0.5) & = {[\partial_{p} g (p = 0.5)]}^{n} \\ = {0.25}^{0.5 n (k - 1)} \frac{{(k!)}^{n}}{{[(0.5 (k - 1))!]}^{n}} [\frac{_{2} F_{1} (1,0.5 (1 - k); 0.5 (k + 3); - 1)}{(0.5 (k + 1))!} \\ + {(k - 1) \frac{_{2} F_{1} (2,0.5 (3 - k); 0.5 (k + 5); - 1)}{(0.5 (k + 3))!}]}^{n} . \end{array}

Note that ∂_pP_d(p = 0.5) = ∂_pP_n(p = 0.5) for n = 1 and arbitrary k. According to equation (20), $\log [\partial_{p} P_{n} (p = 0.5)]$ grows linearly with n (with slope $\log [\partial_{p} g (p = 0.5)]$ ). Because of the (kⁿ)! term in equation (14), the slope of ∂_pP_d(p = 0.5) for constant k is larger than that of ∂_pP_n(p = 0.5). Instead of the linear growth of $\log [\partial_{p} P_{n} (p = 0.5)]$ with n, we find that $\log [\partial_{p} P_{d} (p = 0.5)]$ involves a $\log [(k^{n})!] = n k^{n} \log (k) - k^{n} + O [\log (k^{n})]$ term. In Figure 4(b), we show ∂_pP_d(p = 0.5) and ∂_pP_n(p = 0.5) for k = 3 as a function of n, confirming that ∂_pp_d(p = 0.5) is larger than ∂_pP_n(p = 0.5) for n ≥ 2. Given the discussed properties of P_d(p) and P_n(p), we have thus shown that for n ≥ 2, P_d(p) < P_n(p) for p ∈ (0, 0.5) (point ii) and P_d(p) > P_n(p) for p ∈ (0.5, 1.0) (point iii).

Group size associated with lowest performance

Figure 5.

Number of groups with the lowest performance as a function of the electorate size. Black dots were obtained by numerically determining the minimum of the product (24). Note that equation (24) admits real-valued solutions of l_min. The solid black line describes the number of groups with the lowest performance according to $l_{\min} = \sqrt{N_{d}}$ .

We aim at determining the values of k (number of voters per group), l (number of groups) that are associated with the lowest collective accuracy in a two-hierarchy system [i.e., with the smallest value of P₂(p; k) for p > 0.5 in equation (5)]. As in Appendix 1, we use k′ = (k − 1)/2 and l′ = (l − 1)/2. Note that l′ is a function of k′ for fixed N = kl.

To determine the values of k′ and l′, and hence of k and l, that are associated with the smallest values of P₂(p; k) for p > 0.5, we study the slope of P₂(p; l′) as a function of l′ in the vicinity of p = 0.5. Since the considered reliability functions are concave increasing in the interval [1/2, 1] (Boland, 1989; Berg, 1997), it is sufficient to evaluate P₂(p; l′) for a fixed value of p and varying l′. An expansion of P₂(p; l′) about (p, l′) = (0.5 + Δp, l′) with Δp > 0 yields (21)

\begin{matrix} P_{2} (p = 0.5 + Δ p; l^{'}) = 0.5 + Δ p {\frac{\partial P_{2}}{\partial p} |}_{(p, l^{'}) = (0.5, l^{'})} + O (Δ p^{2}) . \end{matrix}

The derivative ∂_pP₂ (or Banzhaf number [Berg, 1997]) is (22)

\frac{\partial P_{2}}{\partial p} (P_{1} (p = 0.5; k^{'}); l^{'}) = \frac{\partial P_{1}}{\partial p} (p = 0.5; k^{'}) \frac{\partial P_{2}}{\partial p} (P_{1} (p = 0.5; k^{'}); l^{'}) = \frac{\partial P_{1}}{\partial p} (p = 0.5; k^{'}) \frac{\partial P_{2}}{\partial p} (p = 0.5; l^{'}) .

As a side note, observe that it is symmetric in (k, l) = (k, N_d/k). This symmetry in (k, l) is a consequence of the fixed-point behavior of P₂ at p = 0.5 [black disks in Figure 4(a)].

For evaluating ∂_pP₂, we use equation (18), which we rewrite to obtain (23)

\begin{array}{c} \partial_{p} g (p = 0.5; k^{'}) = 1 + \frac{4 k^{'} Γ (k^{'} + 3 / 2)_{2} F_{1} (2,1 - k^{'}; k^{'} + 3; - 1)}{\sqrt{π} Γ (k^{'} + 3)}, \end{array}

and find (24)

\frac{\partial P_{2}}{\partial p} (P_{1} (p = 0.5; k^{'}); l^{'}) = \partial_{p} g (p = 0.5; k^{'}) \partial_{p} g (p = 0.5; l^{'}) .

Invoking Stirling’s approximation in the limit of large k′, we approximate the ratio of the Gamma functions in equation (23) according to (25)

\frac{Γ (k^{'} + 3 / 2)}{Γ (k^{'} + 3)} \sim {k^{'}}^{- \frac{3}{2}} .

To derive the corresponding asymptotic relation for the hypergeometric function $_{2} F_{1} (2,1 - k^{'}; k^{'} + 3; - 1)$ in equation (23), we use relation 7.3.6.3 from Prudnikov et al. (1990) and obtain (26)

\lim_{k^{'} \to \infty}_{2} F_{1} (2, 1 - k^{'}; k^{'} + 3; - 1) = \frac{k^{'}}{2} - \frac{\sqrt{π}}{4} \sqrt{k^{'}} + \frac{3}{2} + O (k^{' - \frac{1}{2}}) .

We thus find (27)

\partial_{p} g (p = 0.5; k^{'}) \sim \frac{2 (k^{'} + 3)}{\sqrt{π k^{'}}} .

For large k and l, we set k′ ∼ k/2, l′ ∼ N_d/(2k) and obtain (28)

\frac{\partial P_{2}}{\partial p} (P_{1} (p = 0.5; k / 2); N_{d} / (2 k)) \sim \frac{2 (k + 6) (6 k + N_{d})}{π k \sqrt{N_{d}}} .

The minimum of equation (28) is attained for (29)

(k_{\min}, l_{\min}) = (\sqrt{N_{d}}, \sqrt{N_{d}}) .

We thus conclude that the minimum of the first derivative of P₂(p; l) in the vicinity of p = 0.5 is attained asymptotically for $l_{\min} = k_{\min} = \sqrt{N_{d}}$ . Hence, $P_{2} (p; l = \sqrt{N_{d}})$ corresponds to the reliability function with the smallest collective accuracy for p > 0.5 and the largest collective accuracy for p < 0.5.

We derived equation (29) using different asymptotic relations which are valid for large k′ and l′. Based on numerical calculations, we observe that the minimum of the product (24) follows the same square root law for small values of k′ and l′ (Figure 5).

The above proof can be extended to general hierarchical systems with groups of sizes k⁽¹⁾, k⁽²⁾, …, k⁽ⁿ⁾ in layers 1, 2, …, n, respectively. The total number of voters in the electorate is N_d = k⁽¹⁾k⁽²⁾⋯k⁽ⁿ⁾. In accordance with equations (24) and (27), observe that the derivative of the reliability function P_n(p) of a hierarchical voting system with n layers factorizes at the fixed point p = 0.5 and is given by (30)

\begin{array}{c} \partial_{p} P_{n} (p = 0.5) & = \prod_{i = 1}^{n} \partial_{p} g (p = 0.5; k^{' (i)}) \end{array}

(31)

\sim \prod_{i = 1}^{n} \frac{\sqrt{2} (k^{(i)} + 6)}{\sqrt{π k^{(i)}}},

where k′⁽ⁱ⁾ ∼ k⁽ⁱ⁾/2 and k⁽ⁿ⁾ = N_d/(k⁽¹⁾⋯k⁽ⁿ⁻¹⁾). Determining the minimum of equation (31) yields the following n − 1 equations with n − 1 unknowns: (32)

\begin{array}{c} k_{\min}^{(1)} k_{\min}^{(2)} \dots k_{\min}^{(n - 2)} {k_{\min}^{(n - 1)}}^{2} & = N_{d} \\ k_{\min}^{(1)} k_{\min}^{(2)} \dots {k_{\min}^{(n - 2)}}^{2} k_{\min}^{(n - 1)} & = N_{d} \\ ⋮ \\ k_{\min}^{(1)} {k_{\min}^{(2)}}^{2} \dots k_{\min}^{(n - 2)} k_{\min}^{(n - 1)} & = N_{d} \\ {k_{\min}^{(1)}}^{2} k_{\min}^{(2)} \dots k_{\min}^{(n - 2)} k_{\min}^{(n - 1)} & = N_{d} . \end{array}

The solution of the above set of equations is $k_{\min}^{(1)} = k_{\min}^{(2)} = \dots = k_{\min}^{(n)} = \sqrt[n]{N_{d}}$ .

The fewest voters needed to sway an outcome occurs at N d

In a two-tier hierarchical system, the lowest number of voters needed to support a winning measure is equal to (33)

(\frac{k + 1}{2}) (\frac{\frac{N_{d}}{k} + 1}{2}) = \frac{N_{d} + k + \frac{N_{d}}{k} + 1}{4},

where k denotes the odd number of voters in each of the N_d/k groups.

Taking the derivative of equation (33) with respect to k gives (34)

\frac{1}{4} - \frac{N_{d}}{4 k^{2}} .

Setting this to zero and solving for k reveals that $k_{\min} = \sqrt{N_{d}}$ . Given the fact that k_min must be a positive number, this implies a unique extremum. Taking the second derivative of equation (33) yields (35)

\frac{N_{d}}{2 k^{3}} .

This number is positive for $k_{\min} = \sqrt{N_{d}}$ ; therefore, it is a unique minimum.

More generally, in an n level hierarchy with k⁽ⁱ⁾ groups (voters) at each level i = 1, …, n, where k⁽ⁱ⁾ is odd for all i, the fewest voters needed to support a winning measure can be found by searching for the minimum of

f (k^{(1)}, \dots, k^{(n - 1)}) = (\frac{k^{(1)} + 1}{2}) (\frac{k^{(2)} + 1}{2}) \dots (\frac{\frac{N_{d}}{k^{(1)} k^{(2)} \dots k^{(n - 1)}} + 1}{2}) .

For each i ∈ {1, …, n − 1} the partial derivative of this quantity with respect to k⁽ⁱ⁾ is

\frac{\partial}{\partial k^{(i)}} f (k^{(1)}, \dots, k^{(n - 1)}) = \frac{(\prod_{j \neq i} (k^{(j)} + 1)) (k^{(i)} \prod_{j = 1}^{n - 1} k^{(j)} - N_{d})}{2^{n} k^{(i)} \prod_{j = 1}^{n - 1} k^{(j)}} .

The unique positive solution to $\nabla f (k_{\min}^{(1)}, \dots, k_{\min}^{(n - 1)}) = (0, \dots, 0)$ is $k_{\min}^{(i)} = \sqrt[n]{N_{d}}$ and can be found by setting $k_{\min}^{(i)} (\prod_{j = 1}^{n - 1} k_{\min}^{(j)}) = N_{d}$ . Observe that the resulting set of equations is equivalent to equation (32). The eigenvalues of the Hessian at this solution are

\{\frac{n {(1 + \sqrt[n]{N_{d}})}^{n - 2}}{2^{n} \sqrt[n]{N_{d}}}, \frac{{(1 + \sqrt[n]{N_{d}})}^{n - 2}}{2^{n} \sqrt[n]{N_{d}}}\} .

Since these are both positive, the Hessian is positive definite, so $k_{\min}^{(i)} = \sqrt[n]{N_{d}}$ for i = 1, …, n is a minimum of f.

Generating function approach

Let X_i be a Bernoulli random variable for which Pr(X_i = 1) = p_i and Pr(X_i = 0) = q_i = 1 − p_i. In what follows, we assume that p_i ≠ 1. To derive the probability mass function of the number of positive outcomes (36)

Σ_{n} \equiv \sum_{i = 1}^{n} X_{i}

we start from the generating function (Percus and Percus, 1985) (37)

G_{n} (t) = \sum_{l = 0}^{\infty} t^{l} \Pr (Σ_{n} = l)

which converges for |t| ≤ 1. Using that

G_{n} (t) = E (t^{Σ_{n}}) = E (t^{\underset{i}{Σ} X_{i}}) = \prod_{i} E (t^{X_{i}}) = \prod_{i} (q_{i} + p_{i} t)

, the generating function can be written as (Percus and Percus, 1985) (38)

G_{n} (t) = \prod_{i = 1}^{n} q_{i} \prod_{i = 1}^{n} (1 + \frac{p_{i}}{q_{i}} t) .

The probability that Σ_n is equal to s ≤ n is (39)

\begin{matrix} \Pr (Σ_{n} = s) = \frac{1}{s!} {\frac{\partial^{s} G_{n} (t)}{\partial t^{s}} |}_{t = 0} \\ = \prod_{i = 1}^{n} q_{i} \sum_{1 \leq j_{1} < j_{2} < \dots < j_{s} \leq n} \frac{p_{j_{1}}}{q_{j_{1}}} \dots \frac{p_{j_{s}}}{q_{j_{s}}} \\ = \frac{1}{Q} C_{s} (p_{1} / q_{1}, \dots, p_{n} / q_{n}), \end{matrix}

where

Q = 1 / \prod_{i} q_{i}

and C_s is the sth elementary symmetric polynomial.

To calculate the sth elementary symmetric polynomial, we utilize the recursion relation (“Newton’s identities”) (Mead, 1992) (40)

\begin{array}{c} C_{s} & = \frac{1}{s} \sum_{k = 0}^{s - 1} {(- 1)}^{k} C_{s - 1 - k} σ_{k + 1}, s > 0, \end{array}

(41)

\begin{matrix} C_{0} = 1, \end{matrix}

where (42)

σ_{s} (p_{1} / q_{1}, \dots, p_{n} / q_{n}) = \sum_{i = 1}^{n} {(\frac{p_{i}}{q_{i}})}^{s} .

To apply equation (39) to the voting system with abstention and heterogeneous opinions (see main text), positive outcomes (X_i = 1) have to be in the majority. For n = 2n′ + 1, positive outcomes are in the majority if Σ_n ≥ n′ + 1. The corresponding probability is (43)

\Pr (Σ_{n} \geq n^{'} + 1) = \sum_{s = n^{'} + 1}^{n} \Pr (Σ_{n} = s) .

As an example, we consider the case with n = 3 independent Bernoulli random variables. Positive outcomes are in the majority if Σ₃ ≥ 2. Using the elementary symmetric polynomials for n = 3 and s = 2, 3 (Table 2), the probability of Σ₃ ≥ 2 is (44)

\Pr (Σ_{3} \geq 2) = \frac{1}{Q} (\frac{p_{1} p_{2}}{q_{1} q_{2}} + \frac{p_{1} p_{3}}{q_{1} q_{3}} + \frac{p_{2} p_{3}}{q_{2} q_{3}} + \frac{p_{1} p_{2} p_{3}}{q_{1} q_{2} q_{3}}),

where

Q = 1 / (q_{1} q_{2} q_{3})

References

Adler

Gordon

(1992) Information collection and spread by networks of patrolling ants. The American Naturalist 140: 373–400.

Arrow

KJA

(1950) Difficulty in the concept of social welfare. Journal of Political Economy 58: 328–346.

Article 7, Treaty on European Union . https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A12012M007.

Auriol

Gary-Bobo

(2012) On the optimal number of representatives. Public Choice 153: 419–445.

Berg

Paroush

(1998) Collective decision making in hierarchies. Mathematical Social Sciences 35: 233–244.

Berg

(1997) Indirect voting systems: Banzhaf numbers, majority functions and collective competence. European Journal of Political Economy 13: 557–573.

Boehmer

Schaar

(2022) Collecting, classifying, analyzing, and using real-world elections. arXiv preprint arXiv:2204.03589.

Boland

Proschan

Tong

(1989) Modelling dependence in simple and indirect majority systems. Journal of Applied Probability 26: 81–88.

Boland

(1989) Majority systems and the condorcet jury theorem. Journal of the Royal Statistical Society: Series D (The Statistician) 38: 181–189.

10.

Böttcher

Herrmann

(2021) Computational Statistical Physics. Cambridge, UK: Cambridge University Press.

11.

Böttcher

Nagler

Herrmann

(2017) Critical behaviors in contagion dynamics. Physical Review Letters 118: 088301.

12.

Bouton

Gratton

(2015) Majority runoff elections: strategic voting and Duverger’s hypothesis. Theoretical Economics 10: 283–314.

13.

Brandt

Conitzer

Endriss

, et al. (2016) Handbook of Computational Social Choice. Cambridge, UK: Cambridge University Press.

14.

Cancela

Geys

(2016) Explaining voter turnout: a meta-analysis of national and subnational elections. Electoral Studies 42: 264–275.

15.

Christensen

Knudsen

(2010) Design of decision-making organizations. Management Science 56: 71–89.

16.

Christensen

Dahl

Knudsen

et al. (2021) Context and aggregation: an experimental study of bias and discrimination in organizational decisions. Organization Science.

17.

Couzin

(2009) Collective cognition in animal groups. Trends in Cognitive Sciences 13: 36–43.

18.

Csaszar

Eggers

(2013) Organizational decision making: an information aggregation view. Management Science 59: 2257–2277.

19.

de Condorcet

(2014) Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Cambridge, UK: Cambridge University Press.

20.

Dehaene

Changeux

J-P

(1997) A hierarchical neuronal network for planning behavior. Proceedings of the National Academy of Sciences 94: 13293–13298.

21.

Dietterich

(2000) Ensemble methods in machine learning. International Workshop on Multiple Classifier Systems. Berlin, Heidelberg: Springer, 1–15.

22.

Douglas

S C

(2018) Why ReLU units sometimes die: analysis of single-unit error backpropagation in neural networks. In 2018 52nd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, October 28-31, 2018, 864–868. IEEE.

23.

Feddersen

Pesendorfer

(1998) Convicting the innocent: the inferiority of unanimous jury verdicts under strategic voting. American Political Science Review 92: 23–35.

24.

Feddersen

Pesendorfer

Elections, information aggregation, and strategic voting. Proceedings of the National Academy of Sciences 96, 10572–10574 (1999).

25.

Friston

(2008) Hierarchical models in the brain. PLOS Computational Biology 4: e1000211.

26.

Garmann

(2016) Concurrent elections and turnout: causal estimates from a German quasi-experiment. Journal of Economic Behavior & Organization 126: 167–178.

27.

Gärtner

Zehmakan

(2017) Color war: Cellular automata with majority-rule. International Conference on Language and Automata Theory and Applications. Cham: Springer, 393–404.

28.

Gersbach

Mamageishvili

Tejada

(2022) Appointed learning for the common good: optimal committee size and monetary transfers. Games and Economic Behavior 136: 153–176.

29.

Häggström

Kalai

Mossel

(2006) A law of large numbers for weighted majority. Advances in Applied Mathematics 37: 112–123.

30.

Hodges

Le Cam

The Poisson approximation to the poisson binomial distribution. The Annals of Mathematical Statistics 31: 737–740 (1960).

31.

Hoeffding

(1956) On the distribution of the number of successes in independent trials. Annals of Mathematical Statistics 27: 713–721.

32.

Kalai

Safra

(2006) Threshold Phenomena and Influence: Perspectives from Mathematics, Computer Science, and Economics. Oxford, UK: Oxford University Press.

33.

Kaniovski

Zaigraev

(2011) Optimal jury design for homogeneous juries with correlated votes. Theory and Decision 71: 439–459.

34.

Kao

Couzin

(2014) Decision accuracy in complex environments is often maximized by small group sizes. Proceedings of the Royal Society B: Biological Sciences 281: 20133305.

35.

Kao

Couzin

(2019) Modular structure within groups causes information loss but can improve decision accuracy. Philosophical Transactions of the Royal Society B 374: 20180378.

36.

Kogan

Lavertu

Peskowitz

(2018) Election timing, electorate composition, and policy outcomes: evidence from school districts. American Journal of Political Science 62: 637–651.

37.

Kolchin

Hyclak

(1984) Participation in union activities: a multivariate analysis. Journal of Labor Research 5: 255–263.

38.

Kossinets

Watts

(2009) Origins of homophily in an evolving social network. American Journal of Sociology 115: 405–450.

39.

Massen

Koski

(2014) Chimps of a feather sit together: chimpanzee friendships are based on homophily in personality. Evolution and Human Behavior 35: 1–8.

40.

May

(1952) A set of independent necessary and sufficient conditions for simple majority decision. Econometrica: Journal of the Econometric Society 20: 680–684.

41.

Mead

(1992) Newton’s identities. The American Mathematical Monthly 99: 749–751.

42.

Mersch

Crespi

Keller

(2013) Tracking individuals shows spatial fidelity is a key regulator of ant social organization. Science 340: 1090–1093.

43.

Miller

(2019) Reflections on Arrow’s theorem and voting rules. Public Choice 179: 113–124.

44.

Moore

Shannon

(1956a) Reliable circuits using less reliable relays, part I. Journal of the Franklin Institute 262: 191–208.

45.

Moore

Shannon

(1956b) Reliable circuits using less reliable relays, part II. Journal of the Franklin Institute 262: 281–297.

46.

Moskowitz

Schneer

(2019) Reevaluating competition and turnout in U.S. house elections. Quarterly Journal of Political Science 14: 191–223.

47.

Nagy

Ãkos

Biro

, et al. (2010) Hierarchical group dynamics in pigeon flocks. Nature 464: 890–893.

48.

Naug

(2008) Structure of the social network and its influence on transmission dynamics in a honeybee colony. Behavioral Ecology and Sociobiology 62: 1719–1725.

49.

Percus

(1985) Probability bounds on the sum of independent nonidentically distributed binomial random variables. SIAM Journal on Applied Mathematics 45: 621–640.

50.

Pinter-Wollman

Wollman

Guetz

, et al. (2011) The effect of individual variation on the structure and function of interaction networks in harvester ants. Journal of the Royal Society Interface 8: 1562–1573.

51.

Prudnikov

Bračkov

Marichev

, et al. (1990) Integrals and Series. Volume 3, More Special Functions. Amsterdam, Paris, New York: Gordon & Breach Science Publishers, Inc.

52.

Richards

Seung

Pickard

(2006) Neural voting machines. Neural Networks 19: 1161–1167.

53.

Sah

Stiglitz

(1984) The architecture of economic systems: Hierarchies and polyarchies. Tech. Rep., National Bureau of Economic Research.

54.

Sah

Stiglitz

. (1988) Committees, hierarchies and polyarchies. The Economic Journal 98: 451–470.

55.

Sen

(2020) Majority decision and condorcet winners. Social Choice and Welfare 54: 211–217.

56.

Simonovits

(2012) Competition and turnout revisited: the importance of measuring expected closeness accurately. Electoral Studies 31: 364–371.

57.

Sondheimer

Green

(2009) Using experiments to estimate the effects of education on voter turnout. American Journal of Political Science 54: 174–189.

58.

von Neumann

(1956) Probabilistic logics and the synthesis of reliable organisms from unreliable components. Automata Studies 34: 43–98.

59.

Wittemyer

Douglas-Hamilton

Getz

(2005) The socioecology of elephants: analysis of the processes creating multitiered social structures. Animal Behaviour 69: 1357–1371.

60.

Zeki

Shipp

(1988) The functional logic of cortical connections. Nature 335: 311–317.

Examining the limits of the Condorcet Jury Theorem: Tradeoffs in hierarchical information aggregation systems

Abstract

Keywords

Introduction

A simple model of hierarchical voting

Optimal group size and number

Influence of abstention

Discussion and conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iDs

Notes

Direct versus hierarchical voting systems

Group size associated with lowest performance

The fewest voters needed to sway an outcome occurs at N d

Generating function approach

References