Sage Journals: Discover world-class research

Abstract

Survey researchers are sometimes tasked with sifting through possible ways of phrasing a question in order to discover the wording that maximizes the proportion of respondents who answer in a particular way. One example occurs when a political interest group seeks to phrase a ballot measure on a given topic in a way that will garner as much voter support as possible. The present study illustrates how adaptive sample allocation can be used quickly and efficiently for this purpose. Using the example of a ballot measure on ranked choice voting, we show that adaptive design was better suited to identify the most promising ballot wording than a more conventional static design. We provide an open-source software R package and instructions for survey researchers seeking to implement adaptive experiments.

Keywords

Ballot wording survey method adaptive experiment Thompson sampling

Introduction

Scholars have long suspected that public support for ballot measures may vary substantially according to how these proposals are described in the synopsis presented to voters (Burnett and Kogan, 2015; Gafke and Leuthold, 1979; Schuman and Presser, 1996). Support for this hypothesis draws on decades of surveys showing that public responses to policies can vary markedly depending on whether transfer payments are described as “welfare” or “aid to the poor” (Jacoby, 2000) or whether legal prohibitions are said to “forbid” an activity or merely “not allow” it (Hippler and Schwarz, 1986).

The hypothesis that alternative descriptions of a given policy may have a substantial effect on public support is of enormous interest to advocacy groups seeking to put a ballot measure before voters. If an appealing wording confers an extra few percentage points of voter support, the advocacy group may be less reliant on advertising and other costly outreach efforts that attempt to shift voter support during the campaign. Of course, advocacy groups do not have free rein to choose any title and description they like; in many jurisdictions, they may only propose certain wording and hope that election administrators adopt it. But which wording should the advocacy group propose?

In order to inform their proposed ballot measure wording, advocacy groups sometimes commission polls that randomly vary the ways in which a given ballot measure is described (Barber et al., 2017; Downey and Oklobdzija, 2018). The aim is to discover which wording maximizes voter support. If resources were unlimited, this exercise would be straightforward: to find which of k possible descriptions works best for a sample of size N, randomly assign N/k to each description and assess which wording maximizes the share of yes votes. In practice, however, advocacy groups face a variety of practical constraints that restrict the range of feasible research designs. First, they typically have a polling budget that allows for at most a few thousand respondents. Second, they often face a deadline for submitting their proposed language, which in turn limits the number of days that the poll can be in the field. Third, ballot measures may be presented to state or local electorates, for which the number of available respondents per day may be small. Given these constraints, an attractive research approach is an adaptive survey experiment in which the allocation of respondents to each unique ballot measure wording is updated as results come in.

Building on prior work that describes the statistical theory underlying adaptive survey design and illustrates the idea of a competition among alternative ballot wordings (Offer-Westort et al., 2021), the present paper offers a worked example of this design in practice, based on a recent study undertaken on behalf of an interest group seeking to win support for a statewide ballot measure to institute ranked choice voting (RCV). In Supplemental Material Section A, we provide practical guidelines with example R code. The adaptive design introduced here can also be implemented via an open-source software R package, AdaptiveR.¹

Response-adaptive design

To appreciate the intuition underlying adaptive experimental design, suppose that an interest group were interested in testing how to best phrase a ballot measure on a minimum wage increase. To do so, the group commissions a poll of 1,000 eligible voters to assess which of the 20 potential ballot measure wordings maximizes voter support. When applying a conventional static design, a pollster would ordinarily randomly assign these 20 wordings (i.e., treatment arms) evenly across the total sample and compare the share of yes votes across all arms. This one-shot experimental design may have difficulty determining the optimal ballot wording, since each arm is allotted only 50 observations.

Adaptive design helps overcome this problem by assigning more subjects to better performing arms. Instead of analyzing the data at the end of the trial, an adaptive survey periodically evaluates interim results to the properties of each arm (Bechhofer, 1954; Bubeck et al., 2011). In our running example, a pollster could split the total sample into equally sized batches and conduct experiments for each batch sequentially—for example, 100 subjects per batch over 10 batches. Not knowing at the outset which arms will perform especially well, the pollster typically begins by assigning subjects with equal probability. However, once early results are in hand, the pollster can use them to determine how to allocate subjects in the next round. That is, a researcher can assign more respondents to the arms that seem to perform well (e.g., generate higher “yes” votes among respondents). Adaptive design proceeds in this manner until a stopping rule is invoked, which might refer to a budget constraint, a total N, or a target level of statistical uncertainty (Berman et al., 2018; Rao et al., 2008).

Adaptive allocation of subjects

The goal of an adaptive experiment, as it pertains to optimal ballot wording discovery, is to maximize the share of yes votes. Intuitively, the pollster would like to invest more subjects in ballot wordings that look especially promising based on data collected up to that point. One of the most common algorithms used to allocate subjects adaptively is Thompson sampling (Thompson, 1933). Thompson sampling is rooted in a Bayesian probabilistic model of which arm is best and assigns treatments accordingly. The intuition behind this algorithm is simple: the more confident we are that a given arm is best, the more likely we are to assign that arm in the next batch.

In the first period, there are no data to generate a probability; therefore, treatment arms are sampled with equal probability. In the next period, the Thompson sampling algorithm uses the data collected from the previous period to generate the probability that each arm is best. For example, if the model suggests that there is a 60% chance that ballot wording A is best among all wordings tested, 60% of the subjects in the next period will be randomly selected to receive wording A. Similarly, if ballot wording B is accorded only a 5% chance of being best according to the algorithm, in expectation, 5% of the subjects will receive wording B in the next round. This probabilistic calculation is updated over time as more data are accumulated. Supplemental Material Section C describes the statistical theory underlying this algorithm.

Correcting for bias

A key component of adaptive designs is that the probability of assignment to each arm depends on observed outcomes, which fluctuate over time. For example, suppose that ballot wording A performed poorly (and therefore received a very low probability of being best) in the first period; it could still perform better in the next round and, consequently, be accorded a much higher probability of being best. This observed variability is more pronounced in the early stages of the study when data are limited. Therefore, even if we are only interested in estimating the share of yes votes in each arm at the end of the study, those final sample means are prone to bias (Nie et al., 2018). To account for bias, the inverse probability weighting (IPW) estimator is used, under which each observation is weighted by the inverse of the probability of assignment to the arm that it is in (Dimakopoulou et al., 2017).²

However, IPW estimators may exhibit large variance. For example, if ballot wording A were found to have a 10% probability of being best in a second period, the outcomes from subjects who received a ballot wording A during this period will be weighted by the IPW value of 10. But if ballot wording A received 70% probability of being best in the next period, then these outcomes will be weighted by the IPW value of 1.43. Across all periods, then, the average share of yes votes will up-weight the sample mean from period 2 while down-weighting the sample mean from period 3. This could result in a higher standard error of the estimate as compared to that of the estimate under the standard static design with the same number of observations. Alternative IPW estimators exist that account for these statistical challenges (Hadad et al., 2021). However, in the context of ballot wording testing, the primary objective is to identify the best performing arm based on the estimated mean rather than obtaining precise evaluations for all arms. The standard IPW estimator suffices to achieve this goal because the best performing arm tends to receive the largest number of subjects.

Advantages

Adaptive designs have several advantages over static designs. Here, we focus our discussion on benefits as they relate to discovering optimal ballot wording. First, researchers (and advocacy groups) can evaluate a larger initial set of alternatives. Rather than pre-selecting only a few treatment arms for investigation, researchers can test a larger set of treatment arms simultaneously at a lower cost as the experiment allocates more resources to the most promising treatments during the experiment. This feature is particularly appealing for ballot wording testing: there are often many different candidate wordings that a researcher would like to test, and the adaptive design lets the researchers systematically evaluate all of them rather than having to narrow them down to a smaller set a priori, as would ordinarily be done under a static design. Relatedly, adaptive experiments can also be useful as a pilot study to quickly identify and eliminate less promising treatment options, leaving the more favorable ones to be tested in a traditional non-adaptive setting.³

Second, adaptive design allows researchers to get an answer with a limited number of observations. Under a standard static design, a sample of size N is evenly divided across k arms. When the sample size within each arm is relatively low, the static design is unable to obtain an accurate estimate for any of the arms. Adaptive design invests more observations in the promising arms while economizing on the poorly performing ones. This dynamic allocation elevates the likelihood of discovering the best arm. As we show in the results section, we observe this uneven allocation in our application, where the best performing arm received more than five times as many observations as the worst arm.

We note that an adaptive design is particularly useful under the prior that some arms are truly better than others. If every arm were equally good, the benefit of using an adaptive design is marginal. However, when advocacy groups commission such polls, they do so because they believe that there is going to be variation in the performance across different ballot measure wordings. Given these prior beliefs, adaptive allocation of subjects is a sensible design choice. A further assumption underlying this type of adaptive design is that the true underlying performance of the treatment arms is stable during the period of the study. The Thompson sampling algorithm proposed here is designed to estimate fixed success probabilities (e.g., the probability of voting yes to a given ballot measure). This choice makes sense for the topic of ranked choice voting, which receives relatively little attention from the media. However, if we were studying a topic in a fast-moving environment where the true preferences over ballot measures are non-stationary, over-time changes in success probabilities may gradually degrade the performance of the algorithm (Granmo and Berg, 2010).

Ballot measure on ranked choice voting

Design

We present an example of adaptive design to discover optimal ballot wording. In partnership with a nonprofit organization that promotes ranked choice voting as an electoral system, we tested 11 alternative ballot title wordings about the adoption of RCV. This study was conducted in 5 Western U.S. states—California, Idaho, Oregon, Utah, and Washington—using an online survey of 2,168 adult subjects.⁴ In order to obtain ample numbers of completed surveys each day, we recruited 50% of the respondents from Oregon, the state that the organization was primarily interested in, and the remaining 50% from residents of four neighboring states with the following breakdown: Washington (37.5%), northern California (37.5%), Utah (12.5%), and Idaho (12.5%).⁵ Table 1 shows the question text as well as the full text of all ballot wordings tested.⁶

Table 1.

Question and treatment texts.

Question text
We would like to ask you about a measure that could appear on a future ballot. It would read as follows
[Treatment text]
- A “yes” vote gives voters the option to rank candidates by preference for statewide and federal offices.
- A “no” vote restricts voters to one choice for each position on ballot, maintaining current voting system. Candidate with most votes on single tally wins.

If this measure were on the ballot, would you vote yes or no?

Arm	Treatment text
1	Officials elected using ranked choice voting; removes the need for runoff elections.
2	Gives voters the right to rank candidates in order of preference.
3	Gives voters the option to rank candidates in order of preference.
4	Gives voters the option to rank candidates when more than two are running.
5	Gives voters the option to rank candidates by preference, provides resources to county election officials.
6	Allows voters to rank candidates.
7	Allows voters to rank candidates, assists local communities with election resources.
8	Allows voters to rank candidates, and the candidate with a majority wins.
9	Voters may choose to rank candidates in order of preference.
10	Voters may choose to rank candidates by preference, and the candidate with a majority wins.
11	Voters may choose to rank candidates by preference when more than two are running.

Our adaptive experiment was conducted over 10 periods, each time assigning treatment arms to roughly 200 respondents according to the adaptive algorithm. As our objective is to maximize voter support, we use the term “best arm” to refer to the ballot wording that yields the highest share of yes votes. In the first period, all arms were sampled with equal probability. In subsequent periods, we used standard Thompson sampling that adaptively allocates more subjects to the best performing arm as it emerges.⁷ That is, the Thompson sampling algorithm generates probabilities that a given arm is best across all arms, and we use this probability to assign respondents to this arm in the next round.

In the first period, with a treatment assignment probability of 1/11 for all arms, the number of respondents assigned to arm 1 through 11 was as follows: [24, 25, 13, 22, 23, 25, 17, 22, 15, 24, 17]. From this sample, we observed the following average support rate for each arm: [0.50, 0.68, 0.62, 0.77, 0.83, 0.76, 0.76, 0.77, 0.67, 0.71, 0.59]. Based on these observed mean outcomes, the Thompson sampling algorithm then generated probabilities that each arm is best as follows: [0.01, 0.02, 0.02, 0.14, 0.34, 0.11, 0.14, 0.14, 0.04, 0.04, 0.01]. As expected, the algorithm allots higher probability to arm 5, which had the highest share of yes votes in the first period. We then randomly assigned respondents according to this probability distribution for the second period. The number of respondents assigned to each arm in period 2 was as follows: [0, 2, 2, 32, 68, 27, 23, 31, 4, 6]. We continued this procedure until the end of period 10.

Results

Figure 1 shows the inverse probability weighted average proportion of yes votes for each wording. Overall, the average support rate is generally high across all arms: all arms have an estimated average approval of 50% or more. We note that the wordings with the highest vote support have the smallest standard errors, as these arms tend to receive more respondents than arms with lower support rates. For example, compared to the poorest performing wording (arm 9), which received fewer than 60 respondents in total, generating a standard error as high as 8 percentage points, the most promising wording (arm 10) received over 300 respondents at the end of the study, resulting in a much more precise estimate (a standard error of 3 percentage points).⁸ Under a standard static design, each arm would have received about 180 subjects. But by letting the treatment assignment adapt to past performance and allocate more observations into the most promising arms, the adaptive design allots nearly twice as many observations to the top performing arms and, consequently, is more likely to discover the best option.

Figure 1.

Mean Support Outcomes Points represent inverse probability weighted average approval rate and bars represent 95% confidence interval. Estimates, standard errors, and cumulative sample size within each arm are shown in text. Confidence intervals built from heteroskedasticity-robust (HC2) standard errors.

Our adaptive experiment isolates a set of top finishers. Arm 10 garnered the highest vote support in the end, with an estimated proportion of 75% yes votes. But the winning arm is closely followed by arm 4 (73%), arm 6 (72%), and arm 5 (71%). Indeed, the probabilities generated by the Thompson sampling algorithm reflect the close finish: at the end of the last period, probabilities of being best were 0.19, 0.17, 0.07, and 0.18 for arms 10, 4, 6, and 5, respectively. With similarly high probabilities of being best across several arms, the best arm was not easily discernible: adaptive design invests across a set of promising arms, rather than a single arm, when there is no one clear winner.⁹

Discovering a set of good arms may be an advantage in practice when a decision-maker prefers to have backup options from which to choose. In this case, the sponsoring organization may not be able to convince election officials to adopt its preferred wording. Nevertheless, our best guess is that arm 10 is the most promising option despite the presence of close runner-ups. Ballot measures are often decided by narrow vote margins (Barber et al., 2017). When such small differences can flip the ballot result, the best performing option is the safest bet. But it remains a bet, not a sure thing. At the end of ten rounds, the posterior probability that the winning arm is indeed best remains just 0.19. We conducted our experiment over 10 periods with an intention to get the result quickly, but in practice, a researcher may run additional rounds of adaptive experiments with larger samples in order to get a more precise estimate of the best arm.¹⁰

Performance under static design: simulation

How different would the results have been had we run a standard static design instead? To assess this, we treat the observed share of yes votes in each arm as the ground truth and simulate 10,000 experiments according to an adaptive as well as a standard static design. In each simulation, the static design randomly samples 200 respondents in expectation for each of the 11 arms while the adaptive design samples 200 respondents for each of 10 periods, assigning treatment to 11 arms according to the Thompson sampling algorithm. Table 2 shows the results.

Table 2.

Simulation statistics.

Arm	Observed	Simulated adaptive			Simulated static
Arm	Est (S.E.)	% Best	RMSE	Coverage	% Best	RMSE	Coverage
Arm 10	0.749 (0.031)	0.543	0.052	0.940	0.425	0.032	0.945
Arm 4	0.730 (0.028)	0.194	0.070	0.923	0.207	0.033	0.945
Arm 6	0.718 (0.032)	0.096	0.081	0.914	0.119	0.033	0.947
Arm 5	0.713 (0.032)	0.066	0.089	0.909	0.092	0.034	0.945
Arm 7	0.708 (0.032)	0.049	0.091	0.903	0.070	0.034	0.949
Arm 2	0.696 (0.037)	0.023	0.103	0.891	0.038	0.034	0.947
Arm 8	0.689 (0.038)	0.013	0.113	0.886	0.027	0.035	0.947
Arm 1	0.677 (0.075)	0.008	0.127	0.871	0.011	0.034	0.951
Arm 3	0.676 (0.040)	0.007	0.126	0.870	0.011	0.035	0.946
Arm 11	0.646 (0.058)	0.001	0.150	0.847	0.001	0.035	0.952
Arm 9	0.537 (0.082)	0.000	0.221	0.756	0.000	0.037	0.948

Notes: Based on 10,000 simulations. “Observed Est (S.E.)” is the observed mean outcome and standard error from the adaptive experiment used as the truth. “% Best” presents the portion of simulations under which the corresponding arm yielded the highest mean outcome. “RMSE” is root mean squared error of the simulated estimate (i.e., the square root of the average squared discrepancy between the estimated success probability of a given arm and its true success probability). “Coverage” represents the portion of simulation under which the true estimate falls within 95% confidence intervals. Rows are sorted in a descending order by the estimated mean outcomes from the adaptive experiment.

Column “Observed Est (S.E.)” shows the average approval rate and standard error estimated from the adaptive experiment, as shown in Figure 1. Treating these as the truth, we first evaluate how often a static design is able to detect the best arm as compared to an adaptive design. We find that arm 10—the best performing arm according to the experiment—turns up as the best arm in 43% of the static experiments, whereas it is selected as best in 55% of adaptive experiments.

On the other hand, the root mean squared error (RMSE) across simulations suggests that the static design is less susceptible to random sampling variability. The additional uncertainty surrounding the adaptive design reflects the use of inverse probability weights, particularly in those simulated experiments where the best arm does not quickly achieve a high sampling probability. For example, the standard errors around the estimate for the best performing arm would have been, on average, about twice as large as those under a comparable static design.

Taken together, these simulations show that, even under a scenario where the true best arm is closely followed by the next runner-ups, adaptive designs tend to outperform static designs in settings where the goal is to discover the best performing arm. The adaptive design incurs some loss of precision when it comes to precisely estimating the performance of each arm. The former goal was of greater importance to our interest group collaborator.

In Supplemental Material Section I, we present additional simulations that vary the length of the experiment, the total sample size, and the total number of arms to assess the relative performance of static and adaptive designs. Importantly, our R package includes a simulation function that enables users to evaluate the performance of an adaptive design by manually varying a diverse set of parameters, including the number of periods, number of arms, true mean outcomes under each arm, and the total sample size. In this way, users may find an adaptive design best suited to their research setting.

Conclusion

By empirically testing the performance of alternative ballot wordings, advocacy groups can potentially gain a few extra percentage points of voter support, reducing their reliance on costly outreach efforts during campaigns. The paper suggests an adaptive survey experiment as an attractive approach that helps researchers discover optimal ballot wording efficiently, addressing practical considerations such as budget constraints, time pressures, and limited availability of respondents. Using the example of a ballot measure on ranked choice voting, we demonstrate various features of adaptive design vis-a-vis a more conventional static design. Overall, adaptive survey design may provide a foundation for advocacy groups to make informed decisions about how they present their ballot measures to the public. To facilitate the use of adaptive design, we provide an R package that aids practitioners at both design and analysis stages.

We note that an adaptive design encompasses a much broader class of designs that differ in their objectives (Agrawal and Goyal, 2013; Athey et al., 2022), sampling algorithms (Carpentier et al., 2011), and estimators (Dimmery et al., 2019; Hadad et al., 2021). These alternatives lie beyond the scope of this paper, which introduces an adaptive design that is easy to implement and most relevant to political interest groups: identifying the optimal wording for a ballot measure. The adaptive design, however, is applicable to broader contexts and has demonstrated its effectiveness in various policy-making efforts such as helping refugees find work (Caria et al., 2020) or reducing the sharing of COVID-19 misinformation (Offer-Westort et al., 2024).

That said, we are quick to acknowledge the limitations of adaptive designs in real world applications. The advantages of adaptive allocation disappear when the treatment arms have identical success probabilities. Although practitioners might be tempted to test a wide array of interventions (e.g., different ballot wordings, advertisements, and influencers), they should also reflect on whether the interventions are different enough from one another to plausibly produce different outcomes. Another consideration is whether the success probabilities are plausibly stable over time. In our application, there was no reason to believe that public opinion was in flux while our survey was in the field, but the stability assumption might be dubious when interventions are deployed amid rapidly-changing political campaigns, where interventions that work well oneday might be ineffective a few days later. In such environments, what is learned from any experiment, whether adaptive or static, may have a short shelf life.

Supplemental Material

Supplemental Material - Discovering optimal ballot wording using adaptive survey design

Supplemental Material for Discovering optimal ballot wording using adaptive survey design by Diana Da In Lee and Donald P Green in Research & Politics.

Footnotes

Acknowledgments

These authors contributed equally to the research design and writing. The first author created the R package. This study was reviewed and approved by IRB at Columbia University (IRB-AAAR8749). Our pre-registration is archived at OSF, available at . We would like to thank Alexander Coppock, Alec R. Ewig, Anja Kilibarda, and Molly Offer-Westort for helpful comments.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The survey included in this article was funded by Oregon Ranked Choice Voting Advocates, which is a 501(c)(4) nonprofit organization. Neither author received compensation for the work described here, nor do they have financial or family ties to the survey firm or interest group that funded this study.

ORCID iDs

Diana Da In Lee

Donald P. Green

Data availability statement

The data and code required to reproduce results reported in this article can be found at https://doi.org/10.7910/DVN/O273OW. Open-source software R package is available at

Supplemental Material

Supplemental material for this article is available online.

Notes

References

Agrawal

Goyal

(2013) Thompson sampling for contextual bandits with linear payoffs. In: International conference on machine learning, Atlanta, USA, June 16–21 2013, PMLR, 127–135

Athey

Byambadalai

Hadad

, et al. (2022) Contextual bandits in a survey experiment on charitable giving: within-experiment outcomes versus policy learning. ArXiv Preprint arXiv:2211.12004.

Barber

Gordon

Hill

, et al. (2017) Status quo bias in ballot wording. Journal of Experimental Political Science 4(2): 151–160.

Bechhofer

(1954) A single-sample multiple decision procedure for ranking means of normal populations with known variances. The Annals of Mathematical Statistics 25: 16–39.

Berman

Pekelis

Scott

, et al. (2018) p-hacking and false discovery in a/b testing. SSRN Electronic Journal. doi: 10.2139/ssrn.3204791

Bubeck

Munos

Stoltz

(2011) Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Computer Science 412(19): 1832–1852.

Burnett

Kogan

(2015) When does ballot language influence voter choices? evidence from a survey experiment. Political Communication 32(1): 109–126.

Caria

Kasy

Quinn

, et al. (2020) An Adaptive Targeted Field Experiment: Job Search Assistance for Refugees in Jordan. Cambridge, MA: Journal of the European Economic Association.

Carpentier

Lazaric

Ghavamzadeh

, et al. (2011) Upper-confidence-bound algorithms for active learning in multi-armed bandits. In: International conference on algorithmic learning theory, Espoo Finland, 5 - 7 October 2011. Springer, 189–203.

10.

Dimakopoulou

Zhou

Athey

, et al. (2017) Estimation considerations in contextual bandits. ArXiv Preprint arXiv:1711.07077.

11.

Dimmery

Bakshy

Sekhon

(2019) Shrinkage estimators in online experiments. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, Anchorage, AK, USA, 4-8 August 2019, 2914–2922.

12.

Downey

Oklobdzija

(2018) Buying the Ballot: Political Actors and Official Ballot Initiative Language. SSRN Electronic Journal Available at: http://doi.org/10.2139/ssrn.3235539

13.

Gafke

Leuthold

(1979) The effect on voters of misleading, confusing, and difficult ballot titles. The Public Opinion Quarterly 43(3): 394–401.

14.

Granmo

Berg

(2010) Solving non-stationary bandit problems by random sampling from sibling kalman filters. Trends in applied intelligent systems: 23rd international conference on industrial engineering and other applications of applied intelligent systems, IEA/AIE 2010, Cordoba, Spain, June 1-4, 2010, Proceedings, Part III 23. Springer, 199–208.

15.

Hadad

Hirshberg

Zhan

, et al. (2021) Confidence intervals for policy evaluation in adaptive experiments. Proceedings of the national academy of sciences 118(15): e2014602118.

16.

Hippler

Schwarz

(1986) Not forbidding isn’t allowing: the cognitive basis of the forbid-allow asymmetry. Public Opinion Quarterly 50(1): 87–96.

17.

Jacoby

(2000) Issue framing and public opinion on government spending. American Journal of Political Science 44: 750–767.

18.

Nie

Tian

Taylor

, et al. (2018) Why adaptively collected data have negative bias and how to correct for it. In: International conference on artificial intelligence and statistics, Blanca, Lanzarote, 9 April 2018. PMLR, 1261–1269.

19.

Offer-Westort

Coppock

Green

(2021) Adaptive experimental design: prospects and applications in political science. American Journal of Political Science 65(4): 826–844.

20.

Offer-Westort

Rosenzweig

Athey

(2024) Battling the coronavirus ‘infodemic’ among social media users in Kenya and Nigeria. Nature Human Behaviour 1–12.

21.

Rao

Glickman

Glynn

(2008) Stopping rules for surveys with multiple waves of nonrespondent follow-up. Statistics in Medicine 27(12): 2196–2213.

22.

Schuman

Presser

(1996) Questions and Answers in Attitude Surveys: Experiments on Question Form, Wording, and Context. New York, NY: Sage.

23.

Thompson

(1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3-4): 285–294.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.91 MB