Abstract
Lottery-based identification strategies offer potential for generating the next generation of evidence on U.S. early education programs. The authors’ collaborative network of five research teams applying this design in early education settings and methods experts has identified six challenges that need to be carefully considered in this next context: (a) available baseline covariates that may not be very rich; (b) limited data on the counterfactual; (c) limited and inconsistent outcome data; (d) weakened internal validity due to attrition; (e) constrained external validity due to who competes for oversubscribed programs; and (f) difficulties answering site-level questions with child-level randomization. The authors offer potential solutions to these six challenges and concrete recommendations for the design of future lottery-based early education studies.
Decades of research show that preschool 1 helps prepare children for kindergarten, and in some contexts, improves participants’ outcomes into adulthood (Phillips et al., 2017; Yoshikawa et al., 2013). But much of this evidence comes from older, small programs, which differ substantially from modern preschools in their curriculum, funding, diversity of children served, and alternative options. More evidence on today’s large-scale public programs is needed for guiding policy and practice (Phillips et al., 2017). This is especially the case in the wake of a pandemic that has been particularly devastating for the early care and education sector (Weiland et al. 2021) and in light of policy proposals to expand public preschool program to all U.S. 3- and 4-year-olds (White House, 2021).
Lottery-based school assignment systems used in many cities across the United States have potential for helping generate this needed evidence. In these systems, when programs are oversubscribed, a random process is used to choose among the applicants. Sometimes, these lotteries are generated via separate applications to individual schools and other times, by centralized school choice systems across an entire district. In both cases, this creates a natural experiment in which some children are granted access to particular schools or programs and others are not.
In the elementary and secondary school contexts, researchers have leveraged this random assignment to estimate the causal impacts of charter schools (Abdulkadiroğlu et al., 2011; Dynarski et al., 2019; Unterman, 2017; Unterman et al., 2016) and small schools of choice (Bloom & Unterman, 2014). This design has been leveraged in preschool in only two peer-reviewed studies to date (Gray-Lobe, Pathak, & Walters, 2023; Weiland et al., 2020) to examine the longitudinal impacts of enrolling in public preschool (vs. not), though at least five teams total (represented on our authorship team) are now using this methodological approach to investigate policy and practice questions in large-scale systems.
In this article, we bring together lessons and examples drawn from a collaborative network comprising these five teams and a group of methods experts. Our network began organically, with researchers considering a lottery-based design connecting with those who were already in the process of doing so. Eventually, a conference grant from the Spencer Foundation provided us with resources to more formally engage with one another. Together, we illustrate how, when moving the lottery design into a new context, there are shared challenges that need to be carefully considered. The six challenges we cover are: (a) available baseline covariates that may not be very rich; (b) limited data on the counterfactual; (c) limited and inconsistent outcome data; (d) weakened internal validity due to attrition; (e) constrained external validity due to who competes for oversubscribed programs; and (f) difficulties answering site-level questions with child-level randomization.
As we illustrate, these challenges are not necessarily unique to early education studies but in many cases, are exacerbated compared with lottery-based studies with older children or to other causal designs with preschool programs. 2 For example, lottery-based studies typically rely on administrative records for characteristics of children to assess whether random assignment was successful, whether balance was maintained despite any attrition, and to boost power by explaining residual variance in the outcome. However, the most useful and convincing characteristic, prior test scores, is not available for lottery-based early education studies, while lottery studies after third grade typically can access multiple years of such data.
Following the example of pedagogical guides that have helped improve applied randomized trial and regression discontinuity studies in education (Calonico et al., 2017; Duflo, Glennerster, & Kremer, 2007; Imbens & Lemieux, 2008; Lipsey et al., 2015; Murnane & Willett, 2010), our primary goal is to improve the application of the lottery-based design in current and future early education studies. We also hope our network serves as one model of how to do so: via researchers working on different studies in the same content area and/or using the same methodological approach collaborating to address shared challenges. Our secondary goal is also to serve as a case study more broadly of how context can affect study design in applied education work. In our view, this study design has the potential to provide much-needed evidence on many critical early education questions. But without careful attention to its particularities, we fear it instead could be a source of randomization in search of a question. In other words, the tail could wag the dog and the opportunity to systematically tackle the most pressing questions in the field will not be realized.
In the sections that follow, we first explain the design and describe potential opportunities for building new evidence on early education programs using lottery-based designs. Then, drawing from the experiences of our five teams, we detail and provide examples of six challenges and possible solutions in early education lottery studies that are critical to consider a priori. We conclude with recommendations for the design of future lottery-based early education studies.
Basic Features of Lottery-Based Studies
Lottery-based studies of education programs are possible because of the school choice systems now in place in many U.S. localities. The design of these systems, including how choice programs are advertised and explained to parents, varies from place to place. For example, in some school settings, families submit individual applications to individual schools that then conduct their own lotteries, when there are more applicants than seats. In these studies, students who won the lottery formed the treatment group and those who lost, the control group (e.g., Abdulkadiroğlu et al., 2011; Dynarski et al., 2019; Unterman, 2017; Unterman et al., 2016). Standard methods in randomized trials (e.g., Angrist & Pischke, 2008; Bloom, 2005; Murnane & Willett, 2010) were then used to estimate both the impacts of treatment assignment and, under assumptions, of enrollment.
Other studies have leveraged school choice systems that are based on the deferred acceptance (DA) algorithm (Abdulkadiroğlu, 2011; Roth, 2008). Although the specific assignment rules vary from setting to setting, this approach allows applicants to centralized systems, such as a large school district, to reveal their true preference order and reduce gaming behaviors, such as ranking a less desired and less popular school first to improve chances of a match. In these systems in place in many large U.S. school districts, parents rank schools within a given set of choices, and slots are assigned on the basis of their preferences as much as possible. Schools can rank applicants according to particular criteria as well. For example, they can give higher preference, and thus a greater likelihood of a match, to students with siblings in the school already and/or students who live in a particular geographic area. Each family is assigned a random number (unknown to them) at the beginning of the process. As with individual lotteries, when programs are oversubscribed, the random number is used as a tie breaker (or coin flip) between children with the same priority and preference for the school. 3
There are multiple analytic approaches to estimating impacts of a given education program leveraging the lotteries created by the DA algorithm. One is to leverage only students’ first-choice lottery (Lincove, Valant, & Cowen, 2018; Weiland et al., 2020). Another is using the first lottery in which a student competes regardless of choice order (e.g., if the student was shut out of their first choice entirely but then competed in a lottery for their second choice; Bloom & Unterman, 2014). More recently, scholars have developed DA propensity score or assignment score approaches with the goal of including more students in the sample, increasing the statistical power and enhancing the generalizability of the impact estimates (Abdulkadiroğlu, Angrist, Narita, & Pathak, 2017). Empirical work comparing the first choice and first lottery approach in New York City’s Small School of Choice (Bloom & Unterman, 2014) and the Boston Prekindergarten program (Weiland et al. 2020) found no meaningful differences in treatment impacts between the first lottery and first choice analytic approaches. A similar rigorous analysis comparing the estimates from these two approaches with the newer assignment score approach across a diverse set of sites would greatly add to the field’s understanding of the trade-offs of these approaches. We return to this in our recommendations section.
Regardless of their analytic approach, empirical studies show that the lotteries generated by these school choice systems have strong internal validity; that is, they result in treatment and control groups that were essentially randomized in a coin-flip-like procedure and that are equal in expectation before a given intervention began (Murnane & Willet, 2010). However, importantly, not all applicants in these systems are randomized, no matter the analytic approach the research team chooses. Only students who compete for oversubscribed schools are randomized and sometimes, only a minority of students are randomized to a relatively small number of schools. This has implications for external validity, or the generalizability of impacts estimated using this approach, an important issue we return to in our “Challenges and Possible Solutions” section.
Because these studies are just beginning in early education, we have so far only limited answers to important questions like which types of preschools are oversubscribed, who is ultimately randomized within these systems, how children who are randomized differ from those who are not, and how this may vary over time. These have been questions within their own right to date in some research studies (e.g., Balu, Condliffe, & Hennessy, 2021; Braga et al., 2023; Greenberg et al., 2020; Weiland et al., 2020). We return to them in our key recommendations around how to lay the groundwork for a strong lottery-based study in early education and as part of the field’s broader research agenda.
DA Lottery Assignment Example
To build intuition, in Figure 1, we provide a concrete idea of what the matching process looks like for a hypothetical 4-year-old preschool applicant in a DA choice system, following the DC Public Schools explainer for parents (My School DC, 2019). In our example, not all preschool applicants are assigned a seat (i.e., the treatment-control contrast is between the preschool program and all local alternatives to it) and the researcher wants to identify the effect of winning a seat in the program versus being lotteried out. The numbers on children’s shirts are their random lottery numbers. As shown in Figure 1, a child’s family has ranked three schools they would like her to attend: North, West, and East elementary schools. She is the only child with glasses in Figure 1, with random number 16. For simplicity, we will refer to her as Student 16. Her first choice, North, gives priority to students with siblings. Her second and third choices give priority to both students with siblings and children with a geographic area preference that we refer to as in-boundary status. These priorities are hierarchical (e.g., the system assigns those with sibling and in-boundary status first, then siblings, and then children with in-boundary status). Student 16 has no priority at North, sibling preference at West, and in-boundary preference at East.

School choice process for a hypothetical preschool applicant in a deferred acceptance choice system.
Every student in the system is assigned a random number (unknown to them) as the first step in the assignment process. Student 16’s position in line reflects the combination of her priority at each school and her random number within her priority group. Her ranked schools can each admit 20 students total, and by the time her application is considered, they have different numbers of seats still unfilled (assume, as in D.C., that the seats were filled by students who attended the school’s 3-year-old program in the prior year and are “moving up” to the 4-year-old program). Student 16 is unmatched to her first choice (North) because she is 9th in line and only 2 seats are open. Her second choice (West) has 5 seats open and she is 6th in line, so she is again unmatched. Her third choice (East) has 5 seats open; she is 3rd in line and matches here.
In a first-choice lottery analytic approach, Student 16 would not be part of the lottery sample; her first choice (North) was filled before her priority group was considered. In a first-lottery analytic approach, Student 16 is in the control group for the intent to treat estimates of the effects of being randomly assigned to the preschool program at West (i.e., because of competing in a siblings lottery at West and not matching there) and a crossover or always-taker in a local average treatment effect analysis of the effects of enrolling in the preschool program (i.e., because of her match at East, assuming she enrolls there). In an assignment score analytic approach, Student 16 is considered a member of the treatment group, with a probability of treatment assignment that falls between 0 and 1, as she faced risk for not being assigned to the program. In this regression-based analysis, she will be analyzed within a small set of students that had a similar probability of assignment (referred to her as her random assignment block). 4
Data in Lottery-Based Studies
Another important feature of lottery-based studies is the data used, beyond students’ choice data. To our knowledge, all lottery-based studies conducted to date have relied solely on administrative data, or data collected as part of the typical operation of a given district or school system. Student characteristics such as race/ethnicity, gender, and dual-language learner status, for example, are commonly tracked in educational administrative data. Other commonly available fields in administrative data include students’ past and future test scores, attendance, disciplinary records, special education status, and grade retention (i.e., potential outcomes in a lottery-based study). To date, researchers in lottery-based evaluations have not engaged in primary data collection such as surveys or classroom observations. However, because of more limited administrative data available for early education studies, several of our five teams are now attempting to collect such data, as we detail later in this article.
Advantages of Lottery-Based Studies for Answering Pressing Questions in Early Education
Thousands of families now apply to public preschool programs that use lottery-based assignment systems, presenting opportunities to address new research needs with no or limited disruption to a locality’s standard operations. Lottery-based studies too have potential strengths over alternatives. First, lottery studies offer the opportunity to study policy initiatives in real time and in their natural form. It can take years otherwise to rally support for and design an experimental test that can identify the causal effects of an intervention or policy. When random assignment changes natural operations as well, there is a possibility too that any detected effects are lottery induced (i.e., John Henry and Hawthorne effects; Murnane & Willett, 2010), a threat ruled out (or at least reduced) in naturally occurring lotteries.
In addition, in randomized trials, many families are reluctant to consent in studies or simply forget to return consent forms. This threatens external validity, as consenting families may not be representative of the population of interest. Notably, working with the constraints of their context and design, the consent rate in one of the directly assessed cohorts of the randomized trial of the Tennessee Voluntary Pre-K study was only 24% (Lipsey, Farran, & Durkin, 2018). And even in randomized trials with relatively high rates of parental consent, for example, 80% or higher, researchers may still find some differences in the characteristics of students who consent and those in the broader population. There can also be biasing attrition among those who consent, a threat to internal validity.
The potentially large numbers of students randomly assigned in naturally occurring lotteries each year also may permit more precise estimation of effects for important subgroups, particularly if leveraged across multiple cohort years. For example, there is evidence that dual-language learners benefit more from public preschool programs than their monolingual peers (Phillips et al., 2017), but randomized trials and birthday-cutoff-based regression discontinuity studies (Gormley, Phillips, & Gayer, 2008; Wong et al., 2008) often lack the statistical power to examine effects separately by specific home language. If enough members of subgroups compete in naturally occurring lotteries, lottery-based studies may permit more specificity in estimating effects for different language subgroups which could be very informative, given stark differences in language structures, immigration histories, and home cultures of multi-lingual learner groups in the U.S.
Another feature of the lottery-based design that may be beneficial in building the next generation of evidence is that random assignment occurs within lottery blocks (i.e., within smaller sets of applicants to particular schools). For example, returning to Figure 1, Student 16 did not compete against every student for West Elementary (her second choice and her first lottery); she competed only against those with the same preference to school (i.e., the students shown with orange shirts). Her block in a first-choice and first-lottery analytic approach then is the West Elementary and sibling preference combination. Essentially, each of these blocks represent “mini-experiments” within the full applicant sample. Recent advances in evaluation methods have highlighted how blocked random assignment can be used to move beyond average impacts to examine how effects vary across schools and the factors that predict this variation (Bloom et al., 2017). For example, in a Boston Prekindergarten lottery-based study, effects on all third grade outcomes varied substantially across blocks and the best school-level predictor of this variation was school standardized test scores (Unterman & Weiland, in press). Preschool lottery contexts are very promising for additional such evidence. Blocked random assignment otherwise can be quite difficult to implement and is often underpowered for impact variation analyses in the early education context because of factors such as small numbers of classrooms in centers compared with K–12 settings (Sabol et al., 2022).
There may too be a parallel need for new research designs in the changing policy context. For example, multiple states and cities have moved in recent years to fund their own universal programs. These changes mean that researchers will no longer be able to rely on the kinds of scarcity and oversubscription that have permitted past studies of the causal effects of a given public preschool program versus alternatives, as all children in those systems now will be offered a seat (e.g., Lipsey et al., 2018; Puma et al., 2012). The changing policy context also raises new policy questions and thus introduces a need for a new generation of early education evidence. For example, some localities have introduced public preschool programs in part to attract and retain students in a given system—a new outcome to the literature—and two lottery-based early education studies indeed have demonstrated large positive outcomes on this outcome (Monarrez et al., 2020; Weiland et al., 2020). With the vast majority of 3- to 5-year-olds already in out-of-home care of some kind, some scholars have argued too for a pivot away from preschool versus none questions to a focus on how to build high-quality programs at scale, such as through comparing types of early childhood education (ECE) or features of ECE (Bassok & Engel, 2019; Weiland, 2018). Lottery-based methods may provide opportunities to meet the new moment and new needs in the field.
Five Current Lottery-Based Early Education Studies
Before turning to the challenges that lottery-based design presents in the early education context, we briefly describe the aforementioned five ongoing lottery-based early education studies represented among our authorship team, which provides the basis for our understanding of and sensitivity to these challenges. We also summarize key information about these five studies in Table 1. Together, the site-based teams and methods experts form a collaborative network aiming to identify best practices for design and analysis, common challenges, and potential solutions for this future preschool research. As we describe below, each of the five teams too are addressing pressing questions in the field and breaking new ground as part of the next generation of evidence on the impacts of public preschool programs.
Summary and Key Features of the Five Lottery-Based Preschool Studies
Note: BPS = Boston Public Schools; CLASS = Classroom Assessment Scoring System; DCPS = DC Public Schools; ITT = intent to treat; LATE = local average treatment effect; PK = prekindergarten; PL = professional learning; TBD = to be determined.
Estimated (to be determined).
Boston Instructional Alignment Study
Curriculum alignment has emerged as a leading hypothesis about how best to build on children’s preschool gains, so that preschool attenders do not merely repeat again the same content in kindergarten that they have already learned and therefore lose the opportunity to build on their preschool skills (Harding, McCoy, & McCormick, 2020; Stein & Coburn, 2023). However, limited rigorous empirical work has examined the effects of alignment. Only two studies have done so using study designs that could identify causality, both focused on math curriculum alignment and both finding positive effects (Clements et al., 2013; Mattera et al., 2021).
Using naturally occurring lotteries from Boston’s application of the DA algorithm and in partnership with the Boston Public Schools Department of Early Childhood, the study team, comprising researchers at MDRC and the University of Michigan, is examining the impact of Boston’s rollout of an aligned prekindergarten and kindergarten curriculum and professional development approach on children’s language, literacy, and math skills in third grade (McCormick et al., 2022). The study breaks new ground in the field as the first-ever test of a district-created aligned curriculum across multiple learning domains and of a district rollout approach in the early years. In addition, the study will examine a set of exploratory research questions that estimate impacts on school persistence, attendance, receipt of special education services, and grade retention, as well as whether effects vary by student subgroup characteristics. The study team is leveraging administrative data on three cohorts of students who applied to the program in 2012–2013, 2013–2014, and 2014–2015 to estimate impacts, for a lottery sample total of 2,656 students (out of 10,318 applicants [26%]). A complier average causal effect analysis will estimate the effect of a student winning their first lottery and enrolling in the aligned school, compared with students that lost their first lottery and did not enroll in an aligned school.
DC Public Prekindergarten: Impacts on 3-Year-Olds
Policy proposals under both the Obama and Biden administrations aimed to expand public preschool to all three and 4-year-olds in the country (White House, 2013, 2021). Although there is ample evidence that such programs improve the school readiness of 4-year-olds (Phillips et al., 2017), there is very little such evidence for 3-year-olds, particularly using experimental methods in large samples (Head Start and Early Head Start are the exception; Love et al., 2005; Puma et al., 2012). This is due to the very practical reason that only one U.S. locality, Washington, D.C., offers public preschool to all 3-year-olds in the district.
Since 2019, a team of researchers from the Urban Institute has been studying D.C.’s program with support from the D.C. Office of the State Superintendent of Education. Their work spans both retrospective impact analysis of recent cohorts of both 3- and 4-year-olds (Braga et al., 2024), as well as a prospective study of the impacts on 3-year-olds (in collaboration with researchers at the University of Michigan and School Readiness Consulting). As summarized in Table 1, the randomized subsample size for the retrospective study is approximately 5,600 students (about 22% of applicants to the 3-year-old program), while for the prospective study of the 2025 and 2026 cohorts, the target sample size is 2,500. Outcomes drawn from administrative data are similar to those in the Boston study (with the additions of school and residential mobility outcomes). Prospective study outcomes include directly assessed measures of children’s language, literacy, math, executive function, social-emotional skills, and racial attitudes at the end of their 3-year-old and 4-year-old years, with plans to follow children beyond these years in future work. The team is using the assignment score analysis strategy described earlier (Abdulkadiroğlu et al., 2017; Monarrez et al., 2020) to estimate the impacts of enrolling in the program versus being randomized out and experiencing a different care setting in the 3-year-old year.
Montessori
There are currently more than 3,000 Montessori schools in the United States, 560 of which are public schools and more than 150 of which serve public preschool and kindergarten students (National Center for Montessori in the Public Sector, n.d.). Despite the model’s popularity and growing prevalence in public schools, no large-scale evaluation of the efficacy of the Montessori model on children’s academic, social, and emotional skills has been conducted until now. This evidence is critical for addressing the question about what kind of public preschool can produce which learning gains and for which students.
A team of researchers at the American Institutes for Research are collaborating to conduct the study. Drawing on a sample of 22 public Montessori schools around the United States that use lotteries to admit 3-year-old students, the team aims to estimate the impacts of the Montessori model through the end of kindergarten. They also plan to explore heterogeneity by student subgroup, incorporation of Montessori principles (i.e., fidelity), and the counterfactual. Outcome measures will be directly assessed by trained study staff members; these will include widely used measures of language, literacy, math, and executive function, as well as more novel measures that tap constructs that align directly with the Montessori theory of change around building persistence and problem solving skills. Unique among the five teams, their lotteries are drawn from both applications to individual oversubscribed schools and to schools participating in centralized choice systems using the DA algorithm.
New Orleans Study of Prekindergarten Quality
High-quality prekindergarten programs can lead to substantial short-term academic and cognitive gains for children (e.g., Gormley et al., 2005; Wong et al., 2008). However, how best to define and measure quality in ECE remains an open question. Prior research on “high-quality” programs has used a variety of definitions, including both structural and process features of care (Yoshikawa et al., 2013). Notably, however, government and research-based definitions may not match parents’ definitions of quality. Differences could arise if parents’ quality criteria differ (e.g., if parents incorporate elementary school considerations when choosing school-based prekindergarten programs) or if parents judge programs differently using similar criteria (e.g., parents have different ways of assessing teacher quality).
With this proposed study, a team of researchers across Tulane University, the Brookings Institute, and the University of Maryland will compare government-defined quality and parent-defined quality. The former draws on scores obtained through systematic classroom observations using the Classroom Assessment Scoring System measure (Pianta & Hamre, 2009); the latter draws on parents’ ranked requests. Using New Orleans’s centralized school assignment lottery, the team will examine how children and families’ short-term academic, cognitive, and socioemotional outcomes are affected by winning a seat in (a) their top-choice prekindergarten programs or (b) prekindergarten programs rated highly by state government. The team will use data from seven cohorts of applicants (2017–2018 through 2023–2024), with an estimated lotteried sample size of 4,500 (of roughly 15,000 total applicants [30%]) to calculate treatment effects. Exploratory analyses will examine the role of elementary school and teacher quality in sustaining gains, teachers’ and administrators’ beliefs about the effects of prekindergarten, and effects of offering prekindergarten on school composition and outcomes.
New York City Pre-K for All Professional Learning Study
Over the last 15 years, rigorous studies have shown that some kinds of preschool programs produce larger child learning gains than others (Phillips et al., 2017; Yoshikawa et al., 2013). These studies have pushed beyond the question of whether preschool works (or not) to how to deliver high-quality preschool experiences. For example, models that use play-based curricula with a scope and sequence and that focus on a particular learning domain outperform those that use more general curricula that do not share these features (Clements & Sarama, 2008; Clements et al., 2013; Morris et al., 2014). Such findings have helped fuel policy and practice attention to the specific malleable, active ingredients in large, at-scale programs.
Researchers at New York University partnered with the New York City Department of Education (DOE) to answer a pressing how question in their context: the effects of several distinct teacher PL “series” for prekindergarten teachers on children’s learning. The professional learning (PL) series offered teachers in a given site training on (a) an evidence-based math curriculum (Clements & Sarama, 2008) and research-based interdisciplinary units developed by the New York City DOE; (b) integrating the arts (visual arts, music, dance, theater) into instruction; (c) integrating strategies drawn from an evidence-based program known as ParentCorps (Brotman et al., 2011) for supporting family engagement, child social-emotional development, and trauma-informed care; or (d) topics aligned to the district’s quality standards (i.e., business as usual in this system). The team originally planned to leverage child-level lotteries that occur within New York City’s DA algorithm for preschool seats. They intended to identify lottery “winners” and “losers” for the three contrasts of interest (i.e., each PL series vs. business as usual) and to collect data via direct assessments in preschool and kindergarten on approximately 800 lottery children per contrast (2,400 children total across the three contrasts). However, they found a number of methodological challenges with the child-level lottery design. Most important for the purposes of this article, they found that site characteristics (such as quality and site type) were correlated with PL series (in part because site assignment to PL is not entirely random; e.g., sites’ PL preferences are taken into account when the New York City DOE assigns them to a PL series, in ways that would not allow them to isolate the effect of PL series from other sites characteristics). Subsequently, the study team learned that some of the DOE’s PL tracks had limited capacity, and it was highly likely that the number of sites interested in participating in those PL tracks would exceed capacity. They then worked with the DOE to develop a process in which a subset of sites were randomized to their first choice PL track or to a more general business-as-usual PL. They were able to study the impact of PL track using a cluster randomized design, with sites randomized into different PL series, which was a substantially stronger design for testing their research question about PL specifically.
Challenges and Possible Solutions
As these five studies exemplify, there is considerable opportunity to leverage naturally occurring lotteries to answer pressing questions facing at-scale early education programs. However, there are a set of challenges that must be handled carefully in the design and analysis of these studies for their full potential to be realized. We discuss each challenge and possible solutions below, drawing on examples from the ongoing early education lottery studies described in the previous section.
Challenge #1: Limited Child-Level Covariates
Problem
Information on study participants’ baseline characteristics is an essential part of studies that aim to identify causal impacts. In randomized designs, baseline covariates are used to assess internal validity, meaning whether the treatment and control groups are equivalent at baseline and for confirmatory outcomes at follow-up (i.e., whether random assignment worked and whether differential attrition may have biased treatment effect estimates; Murnane & Willett, 2010). Covariates also can increase statistical power by explaining some of the residual variance in the relevant outcomes. This can result in cost and time savings by reducing required sample sizes, as well as in more precise treatment effect estimates. Covariates also may be used to examine the heterogeneity of treatment effects (Bloom & Michalopoulos, 2013). For example, children from families with low incomes, dual-language learners, and Latino children in particular tend to benefit more from public preschool than their peers (Phillips et al., 2017). Baseline measures of such key dimensions allow researchers to examine whether the effects of a given early childhood intervention similarly vary. Finally, covariates are critical for examining external validity, or to whom impact estimates apply, a topic we cover in more detail under “Challenge #5: External Validity.”
Covariate information for lottery-based early education studies tends to be sparse. To illustrate this point, we display available covariate information for the five lottery-based studies in Table 2. As shown, arguably the richest and most useful covariates, students’ prior test scores, are not collected at the time of preschool application in these (or to our knowledge, any) early childhood system that uses a lottery-based choice process. In some contexts, data are especially sparse because of efforts to reduce administrative burdens of application and to improve the equity of take up. For example, in D.C., only students’ ages, addresses, and languages of application are available for all applicants. In New York City, detailed demographic and screening data covariate information is available only on preschool applicants who subsequently enroll in preschool.
Data Sources, Covariates, and Counterfactual Data Across the Five Lottery-Based Preschool Studies
Note: ECE = early childhood education; EF = executive function; IEP = individualized education program; TBD = to be determined; SNAP = Supplemental Nutrition Assistance Program.
In contrast, lottery-based K–12 education studies tend to have much richer data available. For example, studies of New York City’s Small Schools of Choice program had 9 years of administrative data on applicants, covering basic student demographic characteristics, such as age, race, ethnicity, free- or reduced-price lunch status, English language learner status, and special education status, and scores from students’ prior New York State standardized tests, such as seventh- and eighth grade English language arts and mathematics (Bloom & Unterman, 2014). These data permitted that study team to illustrate empirically that random assignment “worked,” providing two equivalent treatment and control group samples at baseline, as well as to examine whether balance was maintained throughout the follow-up period. In addition, that study team used these baseline data to compare students in the lottery sample with other students attending New York City Small Schools of Choice, as well as other high school students across the New York City School District. Furthermore, these data have enabled policy-relevant student subgroup analyses of variation in impacts, exploring for example, whether Small School of Choice impacts differed for students that entered high school performing below grade level in mathematics and English language arts than for students who had previously performed at higher levels. Finally, these rich covariate data, especially highly predictive prior test scores, enabled the study team to conduct a rigorous propensity score matching analysis and estimate the effects of Small School enrollment for all students attending Small Schools of Choice, not just those who were in a Small Schools of Choice lottery, thereby helping broaden the population of children who were studied.
Possible Solutions
Avenues for addressing this issue include building collection of richer data into the preschool application process, adding baseline parent surveys, and adding pretests. Taking each in turn, research teams could work with a locality to add additional questions to application intake forms. For example, a locality could ask parents to report on maternal education or family income when applying to its prekindergarten program (as in New Orleans). Of course, any additions must be balanced against administrative burden for participants and equity issues. Research has already shown that some of the groups most likely to benefit from public preschool programs are the least likely to apply (Shapiro et al., 2019) and that administrative burden is a barrier for some families interested in public preschool programs (Weixler et al., 2020). If data collection additions hurt application rates, from a study design perspective, the additional data gained may not offset potential statistical power loses (e.g., fewer students randomized and fewer lottery blocks), nor loses to generalizability. New Orleans has tried to strike this balance by e-mailing parents an optional survey after they submit their school choices as an additional data collection mechanism.
Collectively, we have found that public systems generally have not been able or willing to make changes to their application processes because of costs, logistics, privacy, and potential equity issues. Accordingly, some research teams have turned to baseline surveys for a subset of applicants to gather such data (see Table 2). Baseline surveys add cost to studies and are difficult to administer to all applicants. Parent surveys too, when not required for school entrance, typically have lower response rates and can be biased toward groups more likely to complete them. In addition, teams may have to wait until postrandomization to collect such data which is not ideal as randomization can influence families’ responses and willingness to participate (Murnane & Willett, 2010). For example, the D.C. team is planning to administer a family survey to gather richer data. These surveys will be collected postrandomization as the team will need to know which families were randomized because of the intricacies of the city’s assignment process writ large. They will administer the survey too to a subsample because of costs; thus they also need to know who was actually randomized to draw their subsample. One possible solution is that if localities help parents complete applications in centralized locations as Boston, D.C., and New Orleans do, this process might be feasibly leveraged in future studies to consent parents and facilitate survey completion among all families or among a sample representative of the full range of program applicants.
The lack of child-level pretest data in these systems is important because child-level pretest data tends to explain more of the variation in child-level outcomes than other covariates, providing more of a statistical power boost. 5 Pretest data also can provide more convincing evidence of baseline balance by treatment status and be used to create subgroups to test whether, as in prior literature, young children with lower pretest scores show larger gains then their peers in public preschool studies (Bitler, Hoynes, & Domina, 2014; Bloom & Weiland, 2015). Currently, three research teams are planning to collect these data prospectively in their lottery-based studies, using external trained data collectors (see Table 3). One team that is not (Boston) attempted prospective data collection in a lottery-based study before the pandemic. However, they faced power limitations due to large numbers of control crossovers that were compounded by the fact that because of small lottery blocks (i.e., the smaller set of students within which a given student was randomized), nonconsenting students resulted in incomplete blocks that could not contribute to estimates of treatment impacts. However, a large-scale study of Tulsa’s pre-K program was able to enlist teachers to assess all incoming students just before school began, at teacher meet-and-greet sessions with each individual child (Gormley et al., 2008). In lottery-based studies, it may be possible to similarly enlist school personnel for pretest assessments or to use the state-mandated direct assessments of children’s school readiness in place in some states for this purpose. Because of logistical limitations, pretest assessments in such cases may have to occur after random assignment but before the intervention begins. This is not ideal timing since treatment assignment in theory may influence scores even before the intervention begins (Murnane & Willett, 2010). But such data would still be very valuable for the reasons we have outlined (enhancing internal validity, statistical power, and external validity plus making it possible to study the heterogeneity of impacts).
Outcome Data Across the Five Lottery-Based Preschool Studies
Note: EF = executive function; IEP = individualized education program; PL = professional learning.
Where additional data collection is not possible because of resources or other constraints, researchers may be able to leverage low-cost, publicly available data in some cases. For example, students’ addresses are commonly available in these systems and can be used to link to neighborhood characteristics such as poverty levels and education levels. As we detail later, this is a strategy some of our teams have found helpful when covariates are sparse.
Challenge #2: Limited Data on the Counterfactual
Problem
Multiple evaluation frameworks emphasize the importance of identifying not just whether an intervention “works” but whether it works compared with a well-identified counterfactual condition (Murnane & Willett, 2010; Weiss, Bloom, & Brock, 2014). Past empirical studies of early childhood programs provide rich illustrations of why this is important. For example, using a principal stratification framework, Feller et al. (2016) found that the effects of Head Start depended on what child care was like under the counterfactual condition, with effects concentrated in the subgroup of children who would have stayed home if they were not offered Head Start. In addition, Duncan and Magnuson (2013) demonstrated descriptively that since the early days of public preschool evaluation in the 1960s, immediate posttreatment impacts have declined and that the much greater availability of alternative programs is a prime explanation for why.
Identifying the treatment-control contrast is critical to all education studies. In ECE research, studies generally can identify what treatment group members experienced through available program and study-collected data. But identifying what control group children experienced is often more difficult in ECE than K–12 research. This is due in part to the U.S. policy context. For example, once children turn 5 in the United States, they are eligible for free public education and the vast majority of these children enroll. Consequently, their educational settings are tracked by public data systems. In contrast, ECE is voluntary and supports for ECE data systems are fragmented and uneven across the country (Chaudry et al., 2021). The counterfactual accordingly tends to consist of a wider range of settings than in K–12 studies, with less administrative data to describe the mix of alternatives in a given context.
In some systems, families do in fact provide information on children’s care settings at age 4 when they register for kindergarten (see Table 2). In Boston, these data were useful for understanding the alternative care settings for those children who competed in a lottery, lost the lottery, and ultimately did not enroll in the Boston program (i.e., the control compilers; Weiland et al., 2020). These data too allowed the team to identify the alternative care settings for all applicants who did not enroll, regardless of whether they participated in a lottery for an oversubscribed school. As shown in Figure 2, nearly all of the lottery control compilers in the Boston study enrolled in some out-of-home care, with nearly half in private settings and 88% in another preschool program. Among all applicants, the mix of settings was different, with fewer kids in other preschool programs and in different types of programs. These data were essential for interpreting the causal impacts of the Boston program and assessing their generalizability.

Non-Boston Prekindergarten care settings in the year before kindergarten for lottery sample control compliers versus all applicants.
Ideally, to interpret study results, we would have information not just on alternative care setting type but about important features of the child-care setting like its quality, curriculum, and teacher qualifications. But here too, the United States’ decentralized, fragmented early education system means such data are rarely available. This issue is not unique to lottery-based early education studies; other study designs often face this challenge too. But this is another area where lottery-based early education studies are at a disadvantage versus studies of older children, in which many features of K–12 public schools are already centralized and publicly available.
Possible Solutions
Data on counterfactual child care for studies of preschool programs can be gathered similarly to covariates, by building in questions in the registration process for kindergarten (as Boston and New Orleans do) and/or through surveys of families. The Boston example suggests that gathering both the name of the program and its type is beneficial for cleaning and verification purposes, as is the use of prepopulated lists with validated names and types. Where this is not possible, a model is Gray-Lobe et al.’s (2023) approach of triangulating publicly available Head Start data, private school enrollment data from the National Center for Education Statistics Private School Survey, and U.S. Census Bureau and American Community Survey data with school district data to approximate what the counterfactual might have been during their study’s focal years.
Data on the features of alternative care settings could be gathered via surveying these settings once they are reported by parents. If data are gathered early in the year before kindergarten, when children are enrolled in the alternative setting, observational quality assessments might also be possible. In the Head Start Impact Study, for example, the study team collected such data on the settings of control group children not enrolled in Head Start (Puma et al., 2012). These data were used in subsequent analyses to understand the contribution of treatment-control contrast in program quality to impacts on children (Friedman-Krauss, Connors, & Morris, 2016). Such data require substantial additional funding to gather but should be prioritized by funders and researchers in future lottery-based studies of early childhood when possible.
Challenge #3: Limited Outcome Data
Problem
To our knowledge, all published lottery-based studies have leveraged administrative data to obtain outcome measures for their samples of interest. For example, a study of the impacts of Michigan’s largest charter school network used state records of students’ math and reading test scores, grade retention, special education placement, and disciplinary incidents in Grades 3 to 8 (Dynarski et al., 2018). Another charter school study leveraged participants’ voting records to explore the effect of education on civic participation (Cohodes & Feigenbaum, 2021). Other such prominent examples include New York City Small Schools of Choice (Unterman & Haider, 2019), which leverages district administrative records for Grades 9 to 12, National Student Clearinghouse data for postsecondary enrollment records and degree attainment, and New York State unemployment insurance data for employment and earnings outcomes.
However, sometimes, there are gaps in what is available for outcome measures and when it is available for lottery-based studies. For example, the Michigan charter study did not include measures of children’s moral character, a central focus of the charter network (Dynarski et al., 2018). In addition, there were lotteries in that study that began in kindergarten but some outcome measures, such as math and reading test scores, were not available until third grade. Consequently, that research team could not identify whether there were different or cumulative effects across grades for children in the early grades.
These issues of when and what are particularly pronounced in all early education studies that rely on administrative data, not just lottery-based studies. For example, in propensity score–based studies of Tulsa’s pre-K program that rely on state and district records and difference-in-difference studies of state pre-K programs that use the National Assessment of Educational Progress, academic outcomes are not available until third or fourth grade (Fitzpatrick, 2008; Hill, Gormley, & Adelstein, 2015). This timing is problematic given considerable evidence that the largest benefits of a given preschool program occur at the end of the program and may no longer be detectable by the end of kindergarten on widely used measures in the field (Lipsey et al., 2018; Puma et al., 2012). Evidence also shows that whether the preschool boost is sustained can depend on children’s educational experiences in the early elementary years (Johnson & Jackson, 2019; Mattera et al., 2022; Unterman & Weiland, in press). But without data on children’s outcomes before third grade, we cannot discern between programs with no impact at all from programs with a strong initial impact that faded because of subsequent experiences. The practice and policy implications in the two scenarios are very different, making this limitation a major one for evidence-based improvement efforts.
On the what (or substance) side, the best evaluations are theory based (Murnane & Willett, 2010). In early education, they engage deeply with theoretical frameworks on how early education programs support children and families, in which domains, and through which contextual mediators and moderators. Educational administrative data generally lack measures of possible mediators and moderators, as well as some of the key outcomes of early education programs such as child social-emotional development, behavior, family engagement and maternal employment. Accordingly, studies that rely on administrative data only available in public education systems may miss or underestimate the potential effects of these programs.
Possible Solutions
Recognizing the limitations of the timing and content (i.e., when and what) of outcomes available in administrative data, some of our five teams have begun or are planning prospective data collection with direct assessments of young children. As shown in Table 3, for example, the Montessori team is collecting outcome data on children at the end of children’s 3- and 4-year-old preschool years and at the end of kindergarten. Their work includes widely used measures in the field of children’s math, language, and early literacy that will permit cross-study comparability. They are also collecting more novel data on children’s skills that match the unique theory of the Montessori model (i.e., persistence and a mastery orientation). The D.C. team too plans to collect widely used measures of children’s language, literacy, math, executive function, and social-emotional skills to compare results to other early childhood impact studies. They also plan to add measures of children’s racial attitudes new to preschool evaluation, following one of the hypothesized benefits of D.C. programs. That is, because child care and early education programs are more segregated than K–12 settings (Greenberg & Monarrez, 2019), the study team hypothesizes that school-based preschool, universally available and administered by lottery, may be more racially mixed than available alternatives and have the institutional support necessary to address early explicit bias. To our knowledge, these dynamics have not yet been studied. However, research shows that children can distinguish between racial groups by 3 months, show favorable attitudes toward their own racial group by 9 months, and use racial stereotypes by 6 years, making public preschool a potentially important time to support the development of inclusive social skills and intergroup attitudes (Kelly et al., 2005; Lee, Quinn, & Pascalis, 2017; Pauker, Ambady, & Apfelbaum, 2010).
Notably, however, prospective outcome data collection can be very difficult in lottery-based early education studies. The Boston Alignment team, for example, ultimately decided against attempting prospective data collection via direct child assessments. Preschool blocks can be quite small compared with those in K–12; losing just a few families across blocks can result in incomplete blocks and then worsen both statistical power and external validity issues. Differential attrition in particular was too large of a risk, given that families who lost the lottery were not particularly motivated to participate in assessments. Consent rates too might have varied substantially across blocks, presenting design decisions around who to sample and include. In addition, if compliance is relatively low, very large numbers of participants are needed to generate sufficient statistical power to detect intervention effects.
Enriched administrative data may be another possibility. Many school districts are now adopting benchmark assessments to monitor student progress in the early grades. Some state laws even require such assessments, such as third grade reading laws. For example, the Michigan Education Data Center is gathering and cleaning such data from benchmark assessments required by the state’s third grade reading law. Unlike third grade and up state standardized tests, districts tend to have leeway in which benchmark or progress monitoring assessments they choose, which can lead to inconsistent outcomes available for preschool studies. For example, Boston used an early reading assessment called DIBELS (Dynamic Indicators of Basic Early Literacy Skills) for many years while surrounding districts did not, and furthermore, such data were not compiled at the state level. A study of Boston Prekindergarten that leveraged these data could do so accordingly only for children who remained in Boston Public Schools (Weiland, Unterman, & Shapiro, 2021). But when available and when equivalent across districts, these data offer promise for providing more timely, policy-relevant evidence on the effects of preschool programs.
Challenge #4: Attrition
Problem
As mentioned previously, empirical studies have shown that the lotteries generated by these school choice systems have strong internal validity; that is, they result in treatment and control groups at baseline that were essentially randomized in a coin-flip-like procedure and who are equal in expectation before a given intervention began (e.g., Bloom & Unterman, 2014; Gray-Lobe et al., 2023). However, a more vexing problem—as it tends to be for most studies in education that, in principle, can identify causal effects—is attrition (i.e., when students disappear from the follow-up dataset). That is, to be fully credible, researchers must show (a) that there has not been differential attrition by treatment status and (b) that there is still balance in baseline characteristics for the nonattritors (Krueger & Zhu, 2004; Murnane & Willett, 2010). Both analyses are easy to conduct analytically and are standard in empirical research. But when evidence of biasing attrition is found, there are no simple fixes that can fully restore confidence in the internal validity of a study’s impact estimates.
Issues of attrition can be exacerbated in lottery-based early education studies for several reasons. First, features of systems play an important role. In some preschool systems, like in New York City, students are only given a unique identifier that follows them through 12th grade if they enroll in public preschool. Students who apply but do not enroll can receive a unique identifier if they enroll later, in kindergarten or beyond. But matching them to their preschool enrollment records requires additional matching processes that are resource intensive. In New York City, about 11% to 18% of prekindergarten applicants who participated in a lottery for an oversubscribed site did not enroll in any prekindergarten slot. There was evidence this occurred differentially, with 11% to 16% of lottery winners not enrolling versus 16% to 18% of lottery losers. Unfortunately, in this instance, there is a differential attrition issue, but no demographic data available on the children that are missing outcome data, making it very difficult to assess the extent of the attrition-induced bias. In contrast, in an instance such as the study of New York City’s Small (High) Schools of Choice, when students choose to leave the district after participating in a lottery, their demographic and prior academic achievement data are available and extensive sensitivity tests are possible (Bloom & Unterman, 2014).
Second, the early childhood years are when families are often more mobile than when their children are older. Accordingly, in many contexts, families of young children may be more likely to move out of a given locality, particularly if they do not receive a school they would like their child to attend through a lottery system. If statewide data are available, children can be followed into other localities (via either a unique identifier or an additional matching process otherwise). But if not, differential attrition can be a difficult problem. For example, preliminary evidence shows that about 69% of children who applied to D.C.’s preschool program for 3-year-olds and participated in a lottery were enrolled in DC Public Schools in kindergarten 2 years later. The 31% who were not are lost to the study team using in D.C. administrative data. As we show in Supplemental Materials Table S1, there was evidence of differential attrition by treatment status, though this difference is relatively small (about 4 percentage points) when controlling for the likelihood of being matched to a 3-year-old program. In Supplemental Materials Table S2, early evidence shows that balance was fairly well maintained on the limited baseline characteristics available.
Possible Solutions
Common advice in the education research field is try to avoid attrition and when you cannot, do your best to understand it (i.e., who attritted and why; Murnane & Willett, 2010). On the prevention side, research teams facing differential attrition problems can work to create robust longitudinal datasets that span multiple school districts and states. In addition, researchers can also encourage states and localities to assign a unique identifier at preschool application (or even birth) to allow more seamless tracking of children for research purposes. 6 Finally, on the understanding attrition side, collecting richer baseline data on student demographics and pretests as we discussed under “Challenge #1: Limited Child-Level Covariates” can allow deeper insight into which students are attriting and thus better assessment of the potential effects of attrition on internal validity.
Challenge #5: External Validity
Problem
All empirical education studies have to contend with external validity, or to whom impact estimates apply. If effects are heterogeneous, results of a given study generalize only to the population they represent (Murnane & Willett, 2010). For example, if a research team randomly sampled students from only elementary schools in the northern end of a district, the subsequent study’s results apply technically only to elementary school students in elementary schools in that part of the district. They do not apply to middle school students in that same district, to elementary school students in another district, not to elementary school students in other schools in the same district. The reason is that students in the study may differ from other students in ways that make an intervention, program, or policy affect students in that district differently than students elsewhere (i.e., effects may be heterogeneous). In empirical research, determining to whom the researcher would like to generalize is a critical step in making sampling decisions.
Methods for assessing external validity are generally quite simple. 7 Researchers compare the characteristics of participants and settings on average in their study to those of the population. Similar characteristics indicate that study results are more applicable to the population; differences indicate that results are less generalizable.
In lottery-based early education studies, the core external validity issue is that the lotteries are naturally occurring, within oversubscribed programs. Researchers have no control over who is ultimately randomized; external validity is not a study design feature that can be manipulated by the research team to answer the question of interest. Rather, after randomization occurs, the research team then learns who was randomized and thus to whom studies that leverage this randomization would apply. Entire schools (and the students who applied to those schools) can be left out of a given sample as well, if they were not oversubscribed.
So far, external validity findings from preschool lottery studies show that this issue can have major implications for study design and interpretation. For example, in Washington, D.C., from 2014 to 2018, about 25,197 families applied for a 3-year-old seat, and 5,997 ultimately competed in a lottery (24%). As shown in Table 4, there were large differences in neighborhood income, racial composition, and educational attainment when comparing all applicants, the randomized sample, and those who complied with their lottery assignment. For example, median neighborhood income for applicants was about $81,000 versus $107,000 for the randomized sample and $141,000 for compilers. Nearly half of lottery compilers are drawn from just one ward or neighborhood (Ward 6), even though only 16% of applicants live in this ward.
Characteristics of D.C. 3-Year-Old Preschool Applicant Population, Applicants Who Participated in a Lottery, and Lottery Compliers Among Applicants, 2014 to 2018
Source: Authors’ calculations using My School DC administrative lottery data, Office of the State Superintendent of Education enrollment data, and data from census-type sources.
Note: Median income and educational attainment are obtained from the 2015–2019 American Community Survey estimates at the census block-group level. Population and racial and ethnic shares are derived from the 2020 decennial census population tables at the census block level.
External validity findings from the study of Boston’s rollout of an aligned prekindergarten and kindergarten curriculum offer interesting evidence that suggest that the design may address some questions better than others (McCormick et al., 2022). An earlier lottery-based study of the effects of Boston Prekindergarten versus alternatives (Weiland et al., 2020) found substantial differences in background characteristics between those randomized in the lottery process versus the full set of applicants. For example, among randomized applicants, 51% qualified for free or reduced-priced lunch and 28% were White, versus 65% and 17% of all applicants, respectively. Lotteries were also highly concentrated in a subset of schools. Accordingly, the authors took care to caveat that their study results applied to more advantaged students who wanted to attend a subset of oversubscribed district schools and not effects for the full set of students who wanted to attend. In contrast, as shown in Supplemental Materials Table S3, applicants to schools implementing aligned curriculum and applicants who participated in a lottery were much more similar to the full set of applicants (e.g., 67% of applicants were eligible for free or reduced-priced lunch, vs. 68% for applicants to aligned schools and 61% of the lottery sample).
Possible Solutions
Given researchers’ lack of control of the randomization process, lottery-based studies may answer a different question than the research team intended at the outset, a problem of which limited external validity is a symptom. A way forward in improving external validity with lottery-based early education studies is to obtain prior lottery data and covariates information to first understand in past years who was randomized in a given system and what settings are represented in the randomized subset of applicants (assuming similar processes from one year to the next). The design of future such studies should be informed heavily by these analyses, so that researchers can be more certain of what research questions they can address with data from these systems and determine whether these are the policy questions of interest. This will likely require more funder support for less definitive, exploratory analyses.
These early-stage analyses can also help build the case for alternative designs. Lottery-based designs are attractive because they do not disrupt localities’ normal operations. However, demonstrating that data from these systems may not answer the question of interest may help persuade decision makers to allow other designs, such as randomizing classrooms or schools, that can better answer the research questions of interest.
If previous data show that lotteries from these systems can answer questions of interest for a locality and the broader field, external validity can be assessed following models from K–12 and from preschool specifically. For example, Abdulkadiroğlu et al. (2011) provide an excellent road map in their lottery-based study of charter schools for assessing to whom study results are likely to generalize, including a lottery-based propensity score validation approach for examining whether students not in the subsample randomized would likely experience benefits if enrolled in charter schools instead of alternatives. The first lottery-based preschool study (Weiland et al., 2020) followed and extended this example, ultimately examining the concentration of lotteries in certain schools, the characteristics of the lottery subsample versus all applicants, differences in the counterfactual between the lottery subsample control group versus all applicant nonenrollees, and whether lottery impact findings likely generalized to all applicants (they did not).
External validity work does depend on having good covariate, counterfactual, and education setting data at hand. Our possible solutions to those challenges also apply for addressing and solving external validity challenges.
Challenge #6: Answering Site-Level Questions With Child-Level Randomization
Problem
As public preschool programs have become more common, there has been increasing interest in not just whether to fund preschool but how to make it more effective (Weiland, 2018). Localities that administer these programs tend to be particularly interested in such questions. Should they hire teachers with bachelor’s degrees? Should they continue using their current curriculum or switch to an alternative? What is the best assessment system for providing actionable, feasible, and valid information on student learning?
Teams are just beginning to explore when and how to leverage the preschool lotteries created in school choice systems to address such site-level questions. As described, one of our teams is using student-level lotteries to examine the impact of Boston’s rollout of an aligned prekindergarten and kindergarten curriculum on students’ learning in third grade. And in New York, the city was interested in estimating the impacts of different PL for preschool teachers on student learning in its universal preschool system. New York University researchers initially proposed using child-level lotteries to do so.
Ultimately, the New York City team found that they could not answer the city’s questions using the child-level lotteries from the DA system. Sites had selected into different PL series and one of several methodological challenges was that series were associated with other characteristics of sites. 8 This is not surprising in a system that relies on program leaders’ rank-ordering preferences among the PL series, as well as site need and series capacity, to make PL series assignments.
But estimates leveraging the child-level lotteries accordingly would represent the joint impact of all the characteristics of sites, not just their different PL series. Said differently, it would be impossible to disentangle the effect of each different PL series track from site type, the children that attend these sites, and the assessed Classroom Assessment Scoring System quality scores of the site prior to participating in the series. The study team learned that in fact, site-level randomization had occurred because of constraints on capacity for each series (i.e., PL tracks were oversubscribed and subsequently, sites were randomly assigned to their first or second choices). They pivoted to leverage this source of randomization and to conduct a cluster randomized trial instead.
The Boston team grappled with similar site selection issues as schools selected into implementing the aligned curriculum (or not) per the district’s autonomous schools model. Site characteristics were similarly correlated with alignment status. In designing their study, the team accordingly was careful to be clear they were testing not the effects of alignment on its own but the district’s rollout of an aligned curriculum. Given the paucity of causal evidence on this topic, the district and research team felt the study would still answer a vital question, even if it could not isolate the effects of alignment alone. This issue is akin to other circumstances in which a set of schools are targeted for additional resources because of low student achievement levels. For example, in studies of the effects of School Improvement Grant funds for schools with chronically low academic achievement, researchers used various analytic approaches to estimate the effects of School Improvement Grant funds, while acknowledging that any effects may also be the result of a package of supports that schools attract when in need of intervention (Dee, 2012; Dragoset et al., 2017, LiCalsi et al., 2015).
Possible Solution
This issue is critical to address in the design phase. Just as in our external validity solutions, a concrete way forward is to obtain prior system data to first understand in past years who was randomized in a given system and what settings are represented in the randomized subset of applicants. These data, along with close communications and interviews with staff in a given locality, can help pinpoint where a setting-level intervention is implemented, the selection process into implementation, and site characteristics that may be correlated with a given intervention. These data and analytics can help the study team and locality understand what question the design can versus cannot answer. From there, a pivot may be in order (as in New York City) to a different research design.
Summary: Recommendations for Designing Preschool Lottery Studies
Our joint work on leveraging naturally occurring early education lotteries illuminates both the promise and challenges of this design in this new context. As we highlighted in our introduction, many of the challenges of this design are the same as in any empirical education study, particularly those aiming to identify causal relations. But some of these challenges are exacerbated in lottery-based early education studies and require careful handling in study design, analysis, and interpretation.
For future such studies, we offer the following recomm-endations.
When designing lottery-based studies, start with the program’s theory of change, a locality’s research questions, and gaps in the broader research evidence base. The highest quality and most useful educational empirical studies for guiding policy and practice tend to combine these three essential elements when identifying research questions. Furthermore, some of the solutions to the design challenges we identified are more likely to be successful if the locality views them as addressing their own central questions (i.e., support for additional data collection and administrative systems changes).
In the study design phase, explore which types of preschools are oversubscribed, who is ultimately randomized within these systems, how children who are randomized differ from those who are not, and how this may vary over time and across localities. Because these studies are nascent, we currently have limited answers to these important questions. Exploratory work on these questions has been helpful so far in understanding what lottery studies can and cannot do (e.g., Balu et al., 2021; Braga et al., 2023; Greenberg et al., 2020; Weiland et al., 2020) and deepening localities’ understanding of how their lottery-based systems are working.
Identify the covariates, outcomes, and counterfactual data that are available from administrative data. Use field-based efforts and supplements to the preschool application process, to address any important gaps in these data, such as the lack of rich covariates and lack of a measure of the key outcome that the program was supposed to move. As a lower cost strategy, leverage publicly available sources of data to enhance available districts and state school data.
To limit attrition problems, leverage existing administrative datasets, and in the longer run, consider opportunities to improve state and local administrative datasets. The latter is admittedly ambitious but may be possible especially within the context of long-standing research-practice partnerships and via funding opportunities such as the Institute of Education Sciences Statewide Longitudinal Data Systems Grant Program. This work with partners also could include setting up systems for tracking students across localities using a common identifier from the time of preschool application.
Where data enhancements are not possible because of resource and other practical constraints, be clear to the locality about the potential limitations of the analysis. This a priori clarity can also aid the researcher in the write-up and interpretation of results.
Anticipate the external validity of a lottery-based study from past years’ data and use it to determine a priori what research questions a lottery-based study can answer well and which ones require a different design. Because the pandemic has changed enrollment patterns for young children especially (Bassok & Shapiro, 2021; Greenberg, 2021; Weiland et al., 2021), studies with cohorts after the pandemic began might be better informed by data from cohorts from 2021 onward than by data from prepandemic cohorts.
Carefully weigh trade-offs in choosing an analytic strategy for estimating impacts from a lottery-based design. In particular, one must choose samples drawn from first-choice lotteries for children, first lotteries for children, and assignment score approaches. For example, first-choice lottery samples may be the smallest in size, but the clearest in the treatment effect estimated: the effect of winning a seat in a student’s first choice school. Although the first lottery approach may include more students than a first-choice approach and have greater statistical power, the treatment becomes a bit more muddled when the lotteries include students competing for their less-preferred schools as well. Sensitivity tests, such as those conducted by Bloom and Unterman (2014), may be useful when choosing between these two approaches. The assignment score approach in theory may include more students and (potentially) improve external validity than these other two options but also includes a mixed sample of students across the choice spectrum and it may not permit predicting cross-site variation, as two students with the assignment score (block) may have applied to different schools with different characteristics (Bloom et al., 2017). More research comparing these approaches directly in the preschool space is needed. Teams should weigh the trade-offs between them carefully, determine which best answers their particular research questions, and, as a robustness check, ideally conduct the analysis multiple ways.
For site-level questions, pinpoint where a setting-level intervention is implemented, the selection process into implementation, and site characteristics that may be correlated with a given intervention. This work is critical as site characteristics can be confounded with the main characteristic of interest. Pivot to a different research design, if child-level randomization cannot satisfactorily answer a site-level question.
Find opportunities to connect with colleagues engaged in similar work. As we described, collaboration between our five teams began organically, with researchers considering a lottery-based design connecting with those who were already in the process of doing so. A conference grant from the Spencer Foundation provided us with resources to more formally engage with one another. As teams leverage lotteries in other contexts and to address other questions, similar collaborative networks have a role to play in improving applied studies and accordingly shaping future evidence-based policy and practice.
Finally, we also hope that funders will begin to recognize the potential contributions of the lottery-based design for building the next generation of evidence on early education programs. Funding for the early stage work to identify what questions these lottery-based early education studies can answer in a given context and the relevance of those questions to practice partners is essential. Prospective field work too in these studies can be very challenging and may require additional resources, beyond those required in other kinds of studies that can identify causal impacts. We hope that illuminating the particularities and nuances of the design across our five studies can also inform funder priorities and decisions.
Rigorous design has long characterized early education studies, dating back to the landmark Perry and Abecedarian studies in the 1960s and 1970s. And since about 2000, there has been a dramatic rise in the use of methods that can identify causal effects of education programs, practices, and policies more broadly. In addition to improving early education studies directly, we hope that our joint work also serves as a case study of how educational context can affect study design when moving a study design into a new educational topic area.
Supplemental Material
sj-docx-1-ero-10.1177_23328584241231933 – Supplemental material for Lottery-Based Evaluations of Early Education Programs: Opportunities and Challenges for Building the Next Generation of Evidence
Supplemental material, sj-docx-1-ero-10.1177_23328584241231933 for Lottery-Based Evaluations of Early Education Programs: Opportunities and Challenges for Building the Next Generation of Evidence by Christina Weiland, Rebecca Unterman, Susan Dynarski, Rachel Abenavoli, Howard Bloom, Breno Braga, Anne-Marie Faria, Erica Greenberg, Brian A. Jacob, Jane Arnold Lincove, Karen Manship, Meghan McCormick, Luke Miratrix, Tomás E. Monarrez, Pamela Morris-Perez, Anna Shapiro, Jon Valant and Lindsay Weixler in AERA Open
Footnotes
Authors’ Note
The authors are listed alphabetically after the third author.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The Spencer Foundation provided funding that supported this work, through their conference grants program. We would like to thank the funders of the project described: Institute of Education Sciences (Boston, D.C., Montessori, and New York teams), Arnold Ventures (Boston), and the Heising-Simons Foundation (D.C.).
Notes
Authors
CHRISTINA WEILAND is an associate professor at the University of Michigan. Her research focuses on the effects of early childhood interventions and public policies on children’s development, especially on children from families with low incomes.
REBECCA UNTERMAN is a senior associate at MDRC. She is interested in the effects of P–12 policies on students’ outcomes and in research methods.
SUSAN DYNARSKI is the Patricia Albjerg Graham Professor of Education at the Harvard Graduate School of Education. Here research focuses on understanding and reducing inequality in education.
RACHEL ABENAVOLI is a research scientist at Child Trends. The work represented here was completed while she was at New York University and does not necessarily reflect the views of Child Trends. Her research focuses on understanding and strengthening the contexts, programs, and policies that foster children’s development, promote equity, and support educators’ well-being.
HOWARD BLOOM served as MDRC’s chief social scientist from 1999 to 2017 and, before that, taught quantitative methods for program evaluation for two decades to graduate students in public policy and management at Harvard University and New York University.
BRENO BRAGA is a principal research associate at the Urban Institute. His research has covered topics such as the effects of childhood exposure to the earned income tax credit on health outcomes and the impact of access to individual development accounts on asset building.
ANNE-MARIE FARIA is the CEO and founder of Harmony Research, LLC, a woman-owned small business focused on conducting rigorous research in early childhood education and child welfare. Prior to launching Harmony Research, LLC, Dr. Faria served as a principal researcher at AIR for 14 years. Her primary research foci are family- and child well-being and documenting and supporting high-quality early childhood education.
ERICA GREENBERG is a senior fellow and PK–12 team lead in the Center on Education Data and Policy at the Urban Institute. Her research interests span early childhood and K–12 education, focusing on public prekindergarten, child care, and issues of equity.
BRIAN A. JACOB is the Walter H. Annenberg Professor of Education Policy and a professor of economics at the Ford School. His current research focuses on the intersection between the education and child welfare systems and the connection between education and the labor market.
JANE ARNOLD LINCOVE is a professor of public policy at the University of Maryland, Baltimore County. Her research focuses on the implementation and effects of market-based policies in public education in the United States and developing countries.
KAREN MANSHIP is a principal researcher with the American Institutes for Research. Her research focuses on early childhood and K–12 education policy and finance.
MEGHAN MCCORMICK was a senior associate at MDRC at the time this work was completed. Her work uses experimental and quasi-experimental approaches to estimate the impacts of school- and home-based programs and policies on children’s academic, behavioral, and social-emotional outcomes, with a focus on identifying programs and policies that promote equitable opportunities and outcomes for children and families living in poverty.
LUKE MIRATRIX is an associate professor at the Harvard Graduate School of Education. His primary research focus is on causality with a focus on developing methodology to assess and characterize treatment effect heterogeneity in randomized clinical trials and observational studies.
TOMÁS E. MONARREZ is a labor economist and senior research fellow at the Federal Reserve Bank of Philadelphia. His research focuses on the economics of education.
PAMELA MORRIS-PEREZ is a professor of applied psychology at the NYU Steinhardt School of Culture, Education, and Human Development. She conducts research at the intersection of developmental psychology, suicidology, education, and policy.
ANNA SHAPIRO is an associate policy researcher at the RAND Corporation. Her research focuses on the effects of early childhood programs and policies on young children’s cognitive and socioemotional outcomes.
JON VALANT is a senior fellow in governance studies at the Brookings Institution and the director of the Brown Center on Education Policy at Brookings. He specializes in PK–12 education policy and politics. Much of his research examines inequities in U.S. schools and the policies that mitigate or exacerbate those inequities.
LINDSAY WEIXLER is an assistant professor at Tulane University. Her research interests include early childhood education and child and adolescent development in educational settings.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
