Abstract
Many states and communities have invested in public early childhood education programs to improve children’s readiness to enter school and narrow achievement gaps in later grades. This study asked whether and how Wisconsin’s universal state-funded prekindergarten program, Wisconsin 4K, has improved student achievement and helped to reduce the achievement gap. Using publicly available data from 2002–2003 to 2013–2014, the study examined the effects of the program, which features high participation rates and part-day delivery modes. The results showed that Wisconsin 4K enhanced third-grade reading achievement but not math achievement in the participating districts. Its effects were larger for non-White and economically disadvantaged students than for their White and affluent counterparts. The policy implications for large-scale universal pre-K are discussed.
A large body of research suggests that early-age interventions can yield substantial educational and social benefits by enhancing children’s school readiness (Almond & Currie, 2010), particularly for disadvantaged children (Fryer et al., 2020; Magnuson & Duncan, 2016). Educational policies that embrace high-quality early childhood education (ECE) systems are powerful tools for strengthening national human capital and reducing social inequality (Hahn & Barnett, 2023). In addition, the rapid growth in female labor force participation has led policymakers worldwide to consider expanding public entitlements to childcare for working parents (Raikes et al., 2023). Therefore, many states and countries have invested in public ECE programs (Yang et al., 2024). Among the various forms of public support for ECE, there is growing interest in large-scale universal prekindergarten (UPK) programs.
State-funded prekindergarten (pre-K) is defined as formal, state-funded, and regulated programs offering group instruction through public schools or private childcare centers for 3- and 4-year-old children (Cascio, 2021). Pre-K is distinct from the Head Start program, which is funded by the federal government for low-income families and provides holistic assistance covering health care, nutrition, home visits, and academic learning. Prior to the outbreak of COVID-19 in the 2019–2020 academic year, approximately half of 4-year-olds who were enrolled in any formal nonparental care attended state-funded pre-K, which accounted for 34% of all 4-year-olds (Friedman-Krauss et al., 2021). Depending on the state in which they live, children could attend one of two types of pre-K programs. The first type is UPK, which accepts all age-eligible children for enrollment; and the second type is targeted pre-K programs, which serve only families that meet certain eligibility requirements, such as those based on family income or child disability status (Blau, 2021). As of 2019–2020, 10 states and the District of Columbia offered UPK, and 33 states provided targeted pre-K. Seven states, including California, Colorado, and Michigan, are currently working toward scaling up UPK for 4-year-olds (Friedman-Krauss et al., 2023). Wisconsin’s commitment to ECE is deeply rooted, with a universal framework dating back to 1848, when the state constitution included for the first time universal education for 4- and 5-year-olds. With many variations since then, the state has earnestly supported large-scale UPK in its districts for nearly four decades (Wisconsin Department of Public Instruction [DPI], 2016).
Despite the growing interest in large-scale UPK, evidence is lacking as to whether UPK improves cognitive and noncognitive outcomes or narrows the achievement gap in the short and long terms and how these effects vary by program characteristics, such as specific levels of quality and intensity. Sources supporting UPK often draw on evidence from small-scale ECE interventions such as Perry Preschool or targeted pre-K initiatives in other states (Blau, 2021). However, these findings from targeted pre-K are not necessarily applicable to UPK due to the unique aspects of the universal nature of the latter program, such as its broad accessibility and potential peer effects within diverse classrooms (Dotterer et al., 2013; van Huizen & Plantenga, 2018). In addition, while many studies have found positive and sustained effects of high-quality targeted pre-K in the medium and long term, others have shown null (fading out) or detrimental effects (e.g., Lipsey et al., 2018). This variability makes it difficult to draw conclusions about the success of UPK from the evidence regarding targeted pre-K (Duncan et al., 2022). Notably, the evidence bolstering UPK’s positive effects on educational outcomes in the United States is concentrated in a few states, such as Oklahoma and Georgia. Recent reviews have underscored the urgent need for comprehensive research on ECE programs, including UPK, emphasizing the need to delve deeper to determine “what works, for whom, and why” in terms of the effectiveness of different strategies for various groups (Cascio, 2021; Duncan et al., 2022).
Wisconsin’s UPK program, Wisconsin 4K, is funded and administered by state and local education departments and offered in public elementary schools and community-based childcare centers. A feature that distinguishes the Wisconsin 4K program from other states’ UPK programs is its “thin and wide” approach, providing pre-K opportunities to a large 4-year-old population by offering cost-effective options such as part-day schedules. Despite its long history, the Wisconsin 4K program has received less academic attention in the form of evaluation studies than other state policies. The present study investigates the association between Wisconsin’s UPK and district-level third-grade reading and math scores and examines how the impact varies for economically disadvantaged students and students of color compared with their nondisadvantaged peers. Using state administrative data, it observes changes in test scores based on different 4K initiation year in districts from 2001 to 2014. The aim is to provide a more detailed understanding of 4K, including its funding, curricula, and teacher workforce, as well as the broader state context. The study’s findings contribute to the existing body of evidence of the sustained effects of large-scale state-funded UPK programs, thus informing the development and implementation of more effective and inclusive ECE policies.
Evidence of the Impact of UPK
Phillips et al. (2017) summarized nearly 40 studies that documented the short-term and sustained impacts of state-funded targeted and universal pre-K programs on academic achievement and showed that the results of these pre-K impact studies varied significantly with factors such as research design, sample size, program quality, and alternatives to pre-K. Despite this variability, other review studies have suggested that the benefits of UPK tend to be greater than or equal to those of targeted programs (Cascio, 2021; Duncan et al., 2022).
Georgia and Oklahoma, having been recognized as “model states” due to their substantial enrollment and early expansions in the 1990s, offer noteworthy cases for studying the impact of UPK (Cascio & Schanzenbach, 2013; Wong et al., 2008). The UPK in Tulsa, Oklahoma, positively impacts school attendance, academic performance, Advanced Placement participation, and college enrollment (Amadon et al., 2022; Gormley et al., 2023). Similarly, research on Georgia’s UPK program revealed significant and lasting positive effects on fourth-grade reading and math scores (Fitzpatrick, 2008). Gray-Lobe et al. (2023) revealed that Boston’s UPK increased the rate of high school graduation and college attendance, although there was no evidence that the program increased academic achievement from Grades 3 through 10. Only one study examined Wisconsin 4K’s medium-term effect. Artz and Welsch (2016) found that from 1997 to 2005, when statewide enrollment reached 30%, Wisconsin 4K was positively associated with fourth-grade math scores. Outside the United States, Berlinski et al. (2008) reported that universal preschool in Uruguay reduced grade retention and school dropout rates in primary education. Blau (2021) reviewed European studies, concluding that universal preschool programs had substantial short- and long-term positive effects, including on IQ, academic achievement, and earnings. However, the significant benefits were concentrated among disadvantaged children, with relatively modest benefits for more advantaged children.
Given that universal programs typically incorporate more diverse students in a single classroom than targeted programs, most research on UPK has reported heterogeneous treatment effects based on family background. There is consistent empirical evidence that universal programs confer greater benefits for children from low-income families, with the impacts of such programs on high-income families tending to be moderate or negative (Cascio, 2021; Morris et al., 2018; van Huizen & Plantenga, 2018). The differential impacts of UPK may be attributable to differences in care settings across family income levels (Feller et al., 2016). Similarly, Cascio (2021) provided a framework suggesting that the benefits of universal ECE programming are related to a quality gap between the counterfactual environment and the program, suggesting that the impact of UPK is inherently greater for disadvantaged groups than for their more advantaged peers. Without UPK, children from lower-income households might rely on informal arrangements, such as care from relatives, friends, or neighbors; or potentially lower-quality alternative childcare (Duncan et al., 2022). In contrast, children from more advantaged families are more likely to attend formal, higher-quality childcare programs or receive home care from more educated caregivers because the parents can afford to purchase these services without public programs (Flood et al., 2022; Kline & Walters, 2016).
(Dis)Advantages of Universal ECE Programs Over Targeted Pre-K
The literature has outlined mechanisms by which large-scale universal ECE programs can yield significant benefits that are as great as, or even greater than, those observed in targeted programs (Barnett, 2010). First, compared with selective programs, UPK can reach a significantly larger proportion of families in need. Despite the expansion of public investment in ECE programs, disparities in access to high-quality programs based on socioeconomic status persist (Cloney et al., 2016; Stahl et al., 2018). For instance, state-funded targeted pre-K programs and the federal Head Start program only accommodate approximately 25% of 3- and 4-year-olds from low-income families (Friedman-Krauss et al., 2021). Given limited funding that cannot cover every child, selective ECE programs are more likely to open in communities with high concentrations of eligible children, leaving many eligible children in areas with lower concentrations unserved. Even when the targeted program is available, families may have concerns about the potential stigma associated with receiving welfare or negative peer effects, and they may then opt out (Barnett, 2010).
In addition, while family income is a major contributor to the school readiness gap, other factors, such as parental education, race and ethnicity, family structure, and immigration status, also contribute to differences in school readiness. Children from families above the low-income threshold also face challenges in accessing high-quality ECE programs, largely due to their limited availability and high cost (Marshall et al., 2013). By lowering barriers across the board, UPK promotes the inclusion of children from diverse backgrounds, potentially making UPK more effective than targeted programs in reaching a broader range of children who are in need of public support for reasons beyond family income (Barnett & Frede, 2010).
Second, the effect of UPK could be boosted in that children in UPK may benefit more from positive peer effects in mixed-background classrooms, as children from disadvantaged families can benefit from playing and learning through interactions with advantaged peers (Ribeiro et al., 2017). Although there are concerns about the potential negative effects on children with more preparation, who may be influenced by peers with less preparation, evidence suggests that less-prepared children tend to benefit from being around more-prepared peers (Chen et al., 2020; Weiland & Yoshikawa, 2014). Furthermore, indirect benefits may accrue to children who do not participate in the programs. If a significant proportion of students enter K–12 schools well prepared owing to their UPK experience, teachers can elevate their instructional standards (Magnuson et al., 2007). The literature has underscored the challenges that teachers face in addressing individual needs in classrooms with widely varying levels of readiness, and teachers often focus more on less-ready students (Duflo et al., 2011). Similarly, having UPK programs equip a majority of children with noncognitive skills, such as socioemotional competencies and discipline, allows K–12 teachers to focus more on instruction. This may improve educational outcomes not only for UPK participants but also for those who do not participate. Considering these aspects of high accessibility, peer influence, and spillover effects on nonparticipants, UPK may be more effective and equitable than targeted ECE models.
Although the theoretical effectiveness of UPK is recognized, the practicality of alternative policy options and the consequences of UPK for other services should be considered. One concern about the UPK policy is that a widespread and free ECE program could displace children from private childcare providers and Head Start programs (Bassok et al., 2014). Small and home-based childcare settings, which are often more accessible to disadvantaged families, could be negatively affected by UPK, with negative consequences for children outside UPK (Bassok et al., 2016). Similarly, UPK would be considered inefficient if it undermines private programs that may provide more effective care services (Kline & Walters, 2016). Although many studies have focused on individual participants in UPK, few have examined the broader net effects, including negative impacts on nonparticipants and those in other programs. The current study uses district-average academic achievement as a measure to assess the overall impact of UPK policies on the entire student population.
Because of their broad coverage, UPK programs generally have higher costs than targeted programs. Allocating additional funding to UPK could result in reduced budgets for other educational initiatives that could provide greater benefits for K–12 education systems (Graue et al., 2017; Hustedt & Barnett, 2011). Despite evidence driving policy support and expansion efforts aimed at closing achievement gaps (Frede & Barnett, 2011), debates over UPK’s cost-effectiveness and practical feasibility persist. These debates often revolve around the concern that the average benefits of UPK may not justify the expenditure unless the program benefits all participating children, including those from high-income families, or substantially improves the outcomes of children from low-income families (Duncan et al., 2022). Given the strong evidence for the long-term effects of targeted ECE initiatives, particularly Head Start (e.g., M. J.Bailey et al., 2021), it can be argued that targeted pre-K may be a better use of resources than a universal policy, as targeted programs can provide high-quality education to fewer children in greatest need at the similar cost (Blau, 2021).
Given the advantages and disadvantages of UPK relative to the alternative, calculating the magnitude of benefits by subgroup could be useful in justifying the choice of a pre-K policy from a cost-effectiveness perspective. However, compared with the extensive evidence of large effects for targeted ECE programs, the evidence of UPK’s effects is sparse, making it difficult to assess policy effectiveness and equity between the two options. In this regard, the experience of Wisconsin 4K, which reaches many families while being budget-conscious, could contribute to the ECE policy literature. The current study examines the relationships between the Wisconsin 4K program and third-grade test scores across income and race/ethnicity groups. It aims to provide evidence on whether UPK benefits all children or only disadvantaged ones and assesses UPK’s role in reducing the achievement gap.
Sustained Effect of Pre-K Programs
ECE experiments conducted before the 1980s, such as the Abecedarian and Perry Preschool programs, focused on disadvantaged children using intensive models. These studies reported substantial and lasting effects into adulthood, including higher earnings and lower crime rates, which are important justifications for expanding public efforts for ECE (García et al., 2017). However, recent research on large-scale pre-K policies, mostly targeted, has suggested that they may not produce lasting effects through K–12 grades, despite the presence of strong initial impacts (D. H.Bailey et al., 2020; Burchinal et al., 2022; Whitaker et al., 2023). Several studies have indicated that enrollment in UPK can enhance postsecondary outcomes, such as college enrollment rates, although the medium-term impacts of UPK on test scores are mixed (Gormley et al., 2023; Gray-Lobe et al., 2023). The mixed results for medium-term test scores might mean that test scores do not accurately reflect program effectiveness or that the benefits of skills obtained during ECE may reemerge later ( D. H.Bailey et al., 2020). Given the scarce evidence of the persistent effects of large-scale pre-K, especially UPK, it is unclear whether persistent benefits can be expected and, if so, for which outcomes and through what mechanisms. Therefore, more evidence of sustained effects is needed.
D. H. Bailey et al. (2020) suggested that for pre-K programs to achieve long-term results, the skills and abilities they target must meet three essential criteria, referred to as the “trifecta”: malleable, foundational, and unlikely to develop in other settings. Skills that are both malleable and foundational, such as early math and literacy, attention and emotional self-regulation, and social skills, are often emphasized in ECE programs ( D. H.Bailey et al., 2020). Meeting the third criterion is more challenging, however, because many early competencies fostered in pre-K can be developed elsewhere (Duncan et al., 2022). For example, basic math skills can be acquired not only in pre-K but also at home or during elementary school. As a result, children who do not attend pre-K may catch up in math by the end of kindergarten or later in first grade, leading to a perceived diminution of the impact of pre-K (Clements et al., 2013).
Fade-out can manifest as a combination of two scenarios: skill depreciation in the intervention group and skill catch-up in the control group (Abenavoli, 2019). In contrast with skill depreciation, the catch-up scenario suggests that both groups may benefit from UPK in the long run, as long as children in the treatment group retain the knowledge and skills they acquired in pre-K. Accordingly, it could be a misinterpretation to consider pre-K policies unsuccessful in terms of sustained effects, as the actual outcomes for both groups could be improved relative to their counterfactual outcomes in the absence of the pre-K program. Methodologically, studies of pre-K effects have referred to the narrowing of the difference in outcomes between treatment and control group students as the fade-out effect ( D. H.Bailey et al., 2020; Phillips et al., 2017). However, the absence of differences due to catch-up does not necessarily mean that ECE policies have no lasting impact. The district-level comparisons of precohort and postcohort data used in the current study can shed light on whether UPK policies have medium-term effects, such as on third-grade test scores.
Features of Wisconsin 4K
The State of Wisconsin has a longstanding commitment to UPK that can be traced back to 1848 when the State Constitution advocated for school districts to be as uniform as possible and provided free education to all children between the ages of 4 and 20 (Wisconsin DPI, n.d.-a). Nevertheless, the enrollment of 4-year-olds in Wisconsin 4K remained as low as 30% until 2000. From 2001 to 2020, almost 70% of Wisconsin school districts implemented the 4K program. In 2018, 98% of the state’s districts offered UPK, with approximately 75% of all 4-year-olds enrolled (Friedman-Krauss et al., 2021).
Participation in Wisconsin 4K is optional for school districts and families. Local districts decide whether to offer Wisconsin 4K and choose delivery modes, including days, hours, and locations. Enrollment follows the same process as K–12 schools: Families apply in spring, receive confirmation in summer, and classes begin in fall. State funding is based on full-time equivalent (FTE) student headcounts reported to the DPI in the third week of September. Each district receives funding based on the number of 4K enrollees, calculated as 0.5 or 0.6 FTE depending on annual instructional time. This formula-based funding aligns with K–12 levels, facilitating district-wide program opportunities. In the 2013–2014 academic year, about 30% of districts subcontracted with private childcare centers, Head Start agencies, family childcare providers, and faith-based centers; while 70% offered Wisconsin 4K exclusively through public schools. However, specific enrollment figures for public schools and other institutions are unavailable.
According to the standards for ECE policies in the United States suggested by Duncan et al. (2022), the quality of Wisconsin’s 4K program is moderately high. The state has established high standards for teacher qualifications and learning benchmarks. However, there are no specified regulations regarding the qualifications for teacher aides, class size, or child-to-teacher ratios. Moreover, most Wisconsin 4K programs are not full-day programs and do not operate 5 days a week (the funding scheme sets a base of 2.5 hours per day). In terms of cost, the program’s financial investment is moderate compared with that of UPK programs in other states (Wisconsin DPI, 2016).
The state requires teachers to hold a bachelor’s degree and an elementary or regular education license. The state offers developmentally appropriate learning standards—the Wisconsin Model Early Learning Standards (WMELS)—for young children from birth to first grade, which must be incorporated into the curriculum. The WMELS mandates 10 required subjects for the Wisconsin 4K curriculum: reading and language arts, mathematics, social studies, science, health, physical education, art, music, environmental education, and computer literacy. Multiple subjects can be combined into a single class. The state recommends that reading and language arts should constitute roughly 30% of the curriculum, with the other nine subjects collectively making up 70%, and that up to a third of each day should be allocated to student self-directed activities (Wisconsin DPI, n.d.-b).
As of 2013–2014, the per-child expenditure for Wisconsin 4K was $5,618. This figure was lower than Oklahoma’s $7,678 per child expenditure but higher than Georgia’s $3,746 per child expenditure in the same year (Barnett et al., 2015). Only 2% of districts offered full-day, 5-days-a-week programs in 2013–2014, including Milwaukee, the state’s largest district, which offered full-time, 5-days-a-week programs by amalgamating state funds with other sources such as local tax revenues and donations. However, approximately 90% of districts provided part-time programs (see Supplemental Appendix A in the online version of the journal). Thus, Wisconsin 4K adopted a thin and wide coverage strategy, achieving high enrollment through part-day schedules.
The Wisconsin Context
During the early 2000s, when Wisconsin 4K rapidly expanded, approximately 30% of 4-year-olds received in-home care, while 70% used various out-of-home childcare options, such as pre-K, private childcare, parochial childcare, and Head Start (The Wisconsin Council on Children and Families, 2010). However, even in 2020, by which time most school districts had initiated Wisconsin 4K, the overall supply of licensed childcare for children under the age of 5 across Wisconsin remained insufficient. On average, each licensed childcare slot was available for three children under the age of 5 years, which can be termed a “childcare desert” (Malik et al., 2018). This inadequacy was more pronounced in rural areas, with the slot-to-child ratio of 3.4 nearly twice as high as that in urban areas, at 1.7. Therefore, the counterfactual settings of Wisconsin 4K were a blend of informal and formal childcare but predominantly informal care in rural locales. Geographically, 97% of Wisconsin’s land area is rural, housing 30% of the population.
Wisconsin’s population is predominantly White (87%), followed by 6.7% Black, 3.0% Asian (mainly Hmong), and 7.6% Hispanic as of 2022 (U.S. Census Bureau, n.d.). The state exhibits significant racial disparities in terms of academic achievement. According to a national achievement gap database (Reardon et al., 2021), in 2013–2014, the gaps between Black and White students were approximately 0.91 SD for reading and 1.12 SD for math, and the gaps between Hispanic and White students were approximately 0.73 SD for reading and 0.70 SD for math in Wisconsin. These gaps exceeded the national average.
Current Study
This study examines the relationship between the district-level adoption of the Wisconsin 4K program and the academic performance of students, considering the program’s characteristics and context. As an alternative to the full-day instruction provided by UPK programs in states like Oklahoma, Wisconsin 4K offers a part-day model. This approach necessitates examination of the effectiveness of a program with shorter instructional hours and whether it provides robust educational benefits. Building on the work of Artz and Welsch (2016), this study uses multiple administrative data sources spanning extended periods, including more recent years. Additionally, it scrutinizes the heterogeneous effects of UPK by race/ethnicity and family poverty status to determine the degree to which Wisconsin 4K helps to narrow achievement gaps. The study is guided by two research questions: (a) To what extent is the introduction of Wisconsin 4K associated with reading and math achievement among third-grade students at the district level? (b) Does this effect vary among racial and ethnic groups and children from different income backgrounds?
Method
Data and Measures
District-Level 4K Implementation
This study utilized publicly accessible statewide administrative data from Wisconsin. Initially, among the 412 school districts, 300 were identified as having adopted UPK from the 2001–2002 to 2013–2014 school years, based on the History of 4K and 5K in Wisconsin dataset provided by the Wisconsin DPI. However, owing to missing district reading and math scores, eight school districts were omitted, resulting in a final analytical sample of 292 districts spanning 12 years.
Some districts may have initiated pilot programs in a limited number of schools or classrooms 1 or 2 years prior to a full district-wide UPK launch, as indicated by a DPI staff consultant. Unfortunately, these pilot studies were not captured in the historical data. To address this issue, the study used school enrollment data and designated the implementation of district-wide UPK programs as valid if the enrollment rate exceeded 30%. This 30% threshold was chosen based on visual observation of the enrollment distribution across the entire analytical sample. The enrollment rate graph in Supplemental Appendix B, in the online version of the journal, displays a notable increase at the 60% mark, with 30% serving as the midpoint within the range of 0% to 60%, during which the enrollment rate remains consistently low. A series of sensitivity checks was conducted to ensure that the results remained consistent irrespective of any variations in the chosen cutoff point of less than 60%. The data for each district per year were coded as one if they offered a UPK program in that year and zero otherwise.
4K Enrollment Rates
The 4K enrollment rate for each district and year was calculated by dividing the number of 4K enrollments by the kindergarten (5K) enrollments in the subsequent year. The 4K enrollment rates could be inflated over the actual 4K enrollment because they are measured based on the following year’s kindergarten enrollment rate, which was about 90% of 5-year-olds. For example, an actual 4K enrollment of 80% would appear as 88% when divided by 90%. The gap between kindergarten enrollment and 4K enrollment was significantly small in districts exceeding the 100% 4K enrollment rates (Supplemental Appendix C in the online version of the journal). Therefore, the 4K enrollment rates should be interpreted with caution. Data from the DPI also provide enrollment details segregated by race, ethnicity, and socioeconomic status. One potential issue with the data relates to how school districts identify economically disadvantaged students, either through eligibility for free or reduced-price meals or by reference to household income. As the part-day pre-K program does not include lunch provision and the reporting of household income is optional (particularly for children in non-public-school 4K settings), a significant number of children not initially identified as low-income at 4K were identified as such in full-day kindergarten in the following year. This discrepancy leads to a significant underestimation of the enrollment rate of economically disadvantaged students. Because of this measurement issue regarding the enrollment rate of low-income children, this study did not utilize the enrollment rate of free or reduced-price lunch (FRL) students for statistical analyses. No similar issues were detected in the enrollment data of the different racial/ethnic groups.
District Third-Grade Reading and Math Scores
This study used each district’s proportion of third-grade student proficiency in reading and math based on the state standardized test, the Wisconsin Knowledge and Concepts Examination (WKCE). The WKCE was a proficiency examination designed to meet the requirements of the federal No Child Left Behind legislation and was administered from the 2002–2003 to the 2013–2014 school years. This standardized test was composed of some items specifically designed for Wisconsin alongside commercially developed questions used nationwide. It was administered in Wisconsin public schools each fall. The DPI website provides district percentages of all students at four levels: below basic, basic, proficient, and advanced. In 2013–2014, across the state, 6% of third grades were identified as below basic in reading tests, 21% as basic, 52% as proficient, and 18% as advanced. For math scores, 14% of third grades were identified as below basic, 35% as basic, 41% as proficient, and 8% as advanced. While the WKCE was in use, the state department articulated a long-term goal for all students to attain the advanced or proficient level, with an exception for students with disabilities. The current study aggregated the percentages of the proficient and advanced levels, following the lead of previous district-level research (e.g., Artz & Welsch, 2016; Fahle et al., 2021). Over the 12 years analyzed, the WKCE changed the test sets and criteria for the four proficiency levels, resulting in different distributions across the years. Thus, this study standardized the proficiency percentages by combining the percentages of students at proficient and advanced levels within a year and subjects with zero means and one standard deviation.
District-level test proficiency scores were also available and segregated according to economically disadvantaged status at third grade and race/ethnicity. However, to safeguard privacy, data for subgroups comprising fewer than six individuals from each district were not provided. This resulted in a considerable amount of missing data for district test results for students of color. For instance, district test scores for Black and Hispanic students were only available for 46 and 119 of the 292 districts, respectively, despite enrollment data indicating the presence of at least one Black or Hispanic student enrolled in 4K in 257 and 280 districts. Conversely, district score data for White students were available for 285 districts. As such, the outcome variables for students of color predominantly represented proficiency scores in districts with more than seven Black and Hispanic students. Only five districts had missing outcome measures for economically disadvantaged students.
Demographic and Economic Covariates
This study examined multiple control variables that encompass the demographic and economic characteristics of the districts. First, the district’s per-pupil expenditure was included because a district’s financial resources can be associated with test scores, 4K adoption, and the overall economic condition of the district. The expenditure data were sourced from a survey conducted by the Local Education Agency Finance for Common Core Data. Second, the covariates included the number and proportion of children living in poverty, as per the data from the Small Area Income and Poverty Estimate provided by the U.S. Census. Third, total school enrollment, which represents district size, and subsequent-year kindergarten enrollment, which may closely approximate the population of 4-year-olds, were also included in our analysis. Fourth, the proportions of Black, Hispanic, and Asian kindergarten students were considered. Fifth, given that declining test scores may prompt a district to adopt a new policy, we included proficiency scores from the year preceding the program’s initiation. Finally, taking into account Curran’s (2015) suggestion regarding potential neighboring effects, we controlled for the likelihood of a district starting 4K increasing if more neighboring districts were offering the program and based on the percentage of districts within a 40-mile radius (equivalent to 10% of all distances between pairs of district offices) that had a 4K program.
Table 1 summarizes the data used in this study, featuring the district test scores, 4K enrollment, and control variables utilized in the analyses. These data encompass 292 districts with 3,495 year-by-district observations over 12 years. Fifty-nine percent of these observations fell into the posttreatment period, whereas 41% were captured prior to 4K implementation. Within the data, the overall enrollment rate for 4K programs, when available, was 88.4%, which might have been inflated over the actual 4K enrollment rates. The data revealed disparities in the 4K enrollment rates among the different racial groups. Whereas White students had an average 4K enrollment rate of 89.7%, students of color had lower rates: Hispanic students, 72.5%; Black students, 65.5%; and Asian students, 70.1%. The average test scores were slightly lower during the posttreatment period than before the treatment.
Unweighted Descriptive Statistics for 292 Districts in Wisconsin from 2002–2003 Through 2013–2014 (N = 3,495)
Note. Analytic districts included 292 districts that changed their 4K status between 2003 and 2014. 4K participation rates considered only those district-year observations that implemented 4K and had at least one 4K enrollment from each race or ethnicity. The 4K participation rate was calculated by dividing 4K enrollment by the kindergarten enrollment of the following year and could exceed 100%. Rates exceeding 200% were excluded. FRL refers to the free or reduced lunch program recipients.
Analytic Strategies
The current study used cohort comparisons within the same districts by exploiting variations in the 4K program implementation status and third-grade academic achievements. District-level studies have tended to focus on policy, specifically the impact of policy implementation at scale. Owing to the universal nature of Wisconsin 4K programs, the district-level effect can account for the potential spillover of 4K participants affecting nonparticipants and the catch-up effects after treatment, which are difficult to capture in a student-level study. Moreover, district-level effects can calculate the net impact by accounting for unobserved yet plausible indirect effects, such as the 4K program’s impact on private childcare markets or changes in parental educational motivations and investments in response to policy changes in schools. In addition, district-level studies have fewer measurement errors than student-level analyses (Artz & Welsch, 2016). Nonetheless, the interpretation of policy effects in the current study should approximate average district scores, and these might differ from the expected effects on individual students.
With the purpose of policy evaluation at the district level, this study estimated the intention-to-treat (ITT) effects of 4K implementation on academic achievement and the treatment-on-the-treated (TOT) effects of 4K enrollment rate on the same outcomes. The ITT effects are a reasonable estimate of the overall policy effects on achievement that can be expected in the field when districts implement 4K programs in anticipation of less than full participation owing to the voluntary nature of the program. Given the use of district-level analysis in this study, the TOT estimate should be viewed as a dosage effect, indicating the expected change in district test scores with the 4K enrollment rate, rather than the effect of 4K on its participants.
The primary analysis adopted a staggered difference-in-differences (DID) specification to estimate the ITT and TOT effects. A DID specification assumes parallel trends, meaning that the slopes of outcome changes before and after treatment are identical for the treatment and control groups. In this study, the parallel trend assumption can be met if the 4K implementation is exogenous; thus, the difference between the control and treatment groups after the 4K implementation can be interpreted as a quasi-causal effect.
This study relied on assumptions regarding the exogeneity of 4K. First, parents may not have the power to change the 4K starting year of their children. Second, parents do not cross other districts because of 4K. Third, district 4K implementation was associated with district third-grade test scores only through 4K enrollment. With limited data, it was not possible to validate all these assumptions. Instead, the results of an event study analysis, which is described below, confirmed that there were no significant changes in the selected covariates before and after 4K implementation, supporting the exogeneity of 4K (Supplemental Appendix D in the online version of the journal).
Furthermore, the staggered DID method utilizes multiple treatment timings and is thus considered more rigorous than single-time DID. Instead of calculating the difference between treatment and control groups, staggered DID methods assess the differences between the treated and the not-yet-treated groups that will be treated in the next period. The likelihood that violations of the exogeneity assumptions occurred systematically at 12 different treatment times is low, making this a conservative estimate. Conventionally, a two-way fixed-effect (TWFE) specification is widely used to assess a treatment effect in cases of multiple treatment times (Callaway & Sant’Anna, 2021; Imai & Kim, 2021). The following TWFE equation was used for the ITT effect for the first research question:
where Outcomesdy denotes the district-level third-grade reading and math scores in district d in year y + 4. Xdy−1 is a set of time-varying district-level variables for year y−1. A
d
and B
y
are the vectors of the district and year dummy variables that account for the district and year fixed effects, respectively. ε
dy
is the error term. The variable of interest is Ddy, which is a binary variable that equals one in years if a district implemented 4K, and zero otherwise. Therefore, the coefficient β represents the impact of 4K implementation on third-grade reading and math scores. The year-fixed effect controls for statewide trends, such as the Great Recession, which shaped academic achievement over time. The district fixed effects control for time-invariant unobserved district characteristics that affect the outcomes. As suggested by Baker et al. (2022), the analyses were conducted with and without control variables. If the data satisfy the parallel trend assumption,
To test the heterogeneous effects of the 4K policy on academic achievement by race and ethnicity for the second research question, the following ITT model was used:
where Outcomesgdy denotes the district averages of the third-grade reading and math proficiency scores for subgroup g in district d in year y + 4. Sg is an indicator representing the outcomes of FRL, non-FRL, or each racial/ethnic group. FRL status was a binary variable set to one for the FRL group and zero for the non-FRL group. The race/ethnicity subgroups included dummies for the Hispanic, Black, Asian, and White groups. Similar to Equation (1),
In addition, this study used a two-stage least squares (2SLS) regression analysis model to estimate the TOT effects. The 2SLS approach helped to examine potential dosage effects representing how varying degrees of 4K program participation may be associated with district test scores. In the 2SLS model, a binary variable for 4K implementation was used as a conditional exogenous instrument variable with the controls. The following equations were used for the 2SLS model:
where 4Kdy represents the 4K participation rate in district d in year y. In the first stage, 4K was regressed on the instrumental variable Ddy, representing the 4K implementation in district d in year y. The predicted 4K enrollment rate was used to predict the outcome variable Outcomesdy + 4 in the second stage. The coefficient θ is the TOT effect, interpreted as the expected outcome from a 1% change in the 4K participation rate. It is possible that there were nonlinear relationships between district enrollment rates and outcomes, as there is variation in enrollment rates across districts. However, quadratic and cubic functions supported the linearity of the association. As a result, we used Equation (4) in the TOT model specification. All of the time-varying control variables were included in the second stage. To examine heterogeneous TOT effects by subgroup, the same interaction terms in the ITT model specification were added in the second-stage model.
In Equation (4), the TOT effect indicates how an increase of 1% in the 4K enrollment of a racial/ethnic subgroup was associated with district reading and math scores for the subgroup. As the endogenous variable 4
Robustness Checks
Although researchers have widely used the TWFE specification for staggered DID models, recent papers have raised the concern that the treatment effect from a TWFE regression model could be biased (Callaway & Sant’Anna, 2021; Goodman-Bacon, 2021). Methodologically, one source of bias inherent to the regression gives more weight to units treated towards the middle of the panel with greater variance. For example, the treatment effect of a district starting 4K in 2008 would contribute more to the average effect than a district starting 4K in 2013. Second, when the treatment effects vary as time elapses since the treatment, known as the dynamic effect, a TWFE specification yields biases because it uses both already-treated units and later-treated units as the treatment. When the elapsing dynamic effects are larger for the earlier treated units, the TWFE could yield a biased estimation that has an opposite sign of the true effect; this is called the negative weight problem (Goodman-Bacon, 2021).
An event study specification is a useful tool to check to what extent the treatment effects vary over time (Baker et al., 2022). As a robustness check, the event study specification replaced the 4K implementation dummy with relative year dummies from the first treatment year. The following equation was used:
where
Although event study analysis can capture dynamic effects, the specification cannot show different effects over different treatment times. Callaway and Sant’Anna (2021) suggested a robust staggered DID alternative to address these issues. The proposed method first estimates the group-time average treatment effect ATT(g, t) for group g at time t, where “group” is defined by when units are first treated. This cutting-edge staggered DID method was applied to check the robustness of the main treatment effect using Equation (1). For this purpose, the Stata command csdid written by Callaway and Sant’Anna was used.
The robustness analysis also included falsification tests with 3- and 5-year gaps between the treatment year and the outcome instead of a 4-year interval, thus making a fuzzy sample. For example, using a 3-year gap led to the inclusion of outcomes from cohorts that did not receive 4K in the treatment group. If the
Furthermore, if full-day programs in a handful of districts had significantly larger effects than part-day programs in the majority of districts, the ITT effects may have been overestimated and may not have represented the pre-K effect in Wisconsin. To test this hypothesis, subgroup analyses were performed by excluding the seven districts that reported full-day programs. Finally, replicated models were used to test whether the results were sensitive to the different universal 4K definitions set at 30%.
Results
Table 2 shows the ITT effects of 4K on third-grade academic achievement. The 4K effects on reading scores were 0.091 SD in the model without controls and 0.104 SD in the model with controls. Considering the distribution of WKCE, 0.091 SD was equivalent to a 15% increase in the number of students who achieved proficient and advanced levels of reading, and 0.104 SD was equivalent to a 20% increase in the number of students who achieved proficient and advanced levels of math. In addition, Table 2 shows that the positive effect of 4K on reading was concentrated in FRL students at 0.165 and 0.177 SD, equivalent to a 40% increase in students achieving proficiency and advanced levels. However, there was no statistically significant impact on reading among non-FRL students. The positive effect of 4K was much greater, at a statistically significant level, for the reading scores of Hispanic students (0.323 and 0.333 SD, equivalent to a 50% increase). Given that the gap between White and Hispanic students in reading scores was approximately 0.8 SD, nearly 40% of the White–Hispanic reading score gap could be reduced by 4K implementation. The associations were positive for the scores of Black students and negative for the scores of White students, but they did not reach statistically significant levels after including the control variables.
Intention-to-Treatment Effects of Wisconsin 4K on District 3rd-Grade Reading and Math Achievement
Note. M.E. stands for marginal effect. The dependent variables have been standardized to have a zero mean and a standard deviation of one. Marginal effects for subgroup outcomes, such as FRL and students of color, were determined by summing the coefficient of 4K and an interaction effect between 4K and the respective subgroup. Standard errors are clustered at the district level. Analyses were weighted based on the number of students in each district. Time-varying controls for districts encompass total enrollment; child poverty rates; per-pupil expenditure; median household income; the percentage of individuals with a college degree; the percentage of people of color; the count of 4-year-olds; the count of kindergarten students; the percentage of Black, Hispanic, and Asian students in kindergarten; and the percentage of neighboring districts that have implemented the 4K program. Full results can be found in Supplemental Appendices E and F in the online version of the journal.
p < .05, **p < .01, ***p < .001.
Unlike the effects of 4K on reading, Table 2 shows that 4K was not significantly associated with third-grade math scores. The associations were not statistically significant for either FRL or non-FRL student scores. The effects of 4K on math scores were statistically significant for Black students at 0.254 and 0.253 SD, equivalent to a 38% increase in proficiency and advanced math scores. Given the 1.1 SD math score gap between White and Black students, 4K reduced the math gap between Black and White students by approximately 20%. The associations between 4K and math scores for Hispanic, Asian, and White students were not statistically different from zero.
Table 3 presents the TOT effects of 4K on the outcomes. The results show that full participation in 4K in the district might be associated with third-grade reading scores of 0.113 SD without controls and 0.129 SD in the model with controls. These figures are equivalent to an increase of approximately 15% in the district’s reading test scores. The presumed maximum effect was slightly higher than the ITT effects in Table 2 because the ITT results incorporated nonparticipating children. The TOT effect of 4K with the full participation of Hispanic children on Hispanic students’ reading was 0.488 SD, representing a nearly 50% increase in proficiency and advanced achievement in reading. The TOT effects were greater than the ITT effects because the observed participation rates for students of color were lower than average. In addition, Table 3 shows that 4K was significantly associated with math achievement only for Black students, at 0.385 SD, representing a 40% increase in advanced and proficient math achievement. Supplemental Appendices E, F, G, and H, in the online version of the journal, present the results shown in Tables 2 and 3 with a complete set of variables.
Treatment-on-Treated Effects of Wisconsin 4K Program on District 3rd-Grade Academic Achievement
Note. M.E. stands for marginal effect. The dependent variables have been standardized to have a zero mean and a standard deviation of one. Marginal effects for subgroup outcomes, such as FRL and students of color, were determined by summing the coefficient of 4K and an interaction effect between 4K and the respective subgroup. Standard errors are clustered at the district level. Analyses were weighted based on the number of students in each district. Time-varying controls for districts encompass total enrollment; child poverty rates; per-pupil expenditure; median household income; the percentage of individuals with a college degree; the percentage of people of color; the count of 4-year-olds; the count of kindergarten students; the percentage of Black, Hispanic, and Asian students in kindergarten; and the percentage of neighboring districts that have implemented the 4K program. Full results can be found in Supplemental Appendices G and H in the online version of the journal.
p < .05, ***p < .001.
Figure 1 presents the results of the event study analysis. The district reading scores after implementation tended to be significantly higher than the reading scores a year prior to the implementation of the 4K. The left panel of Figure 1 does not capture dynamic effects (incremental or decremental over time) on reading achievement, which reduces the chance of biased estimations of the ITT effects. The effects of 4K on reading tended to be consistent from the 2nd year to the 8th year of 4K. The average coefficients for the year after 4K implementation were 0.114 SD for reading and 0.075 SD for math. In contrast, there were no significant effects of 4K on math scores over time, consistent with the ITT results. Figure 1 shows that the parallel trends assumption holds because the coefficients of years prior to the 4K start were not significantly different from zero for the reading and math models.

Wisconsin 4K effects on district reading and math scores over time.
Table 4 presents the results of the robust staggered DID method proposed by Callaway and Sant’Anna (2021). The single parameters for the effects of 4K on reading scores ranged from .067 to .161 in the unconditional models and from .079 to .115 in the conditional models, all of which were similar to the ITT effects on reading scores. The effects of 4K on math ranged from −.015 to .077 in the unconditional models and −.054 to .037 in the conditional models, confirming the nonsignificant ITT effect on overall math scores. Figure 1 and Table 4 suggest that the ITT effects were robust to the problems inherent in the TWFE specification. Supplemental Appendix I, in the online version of the journal, shows that the ITT results were similar to the results from other models using different 4K implementation cut-off points. The falsification models with 3- or 5-year gaps instead of 4-year gaps between 4K implementation and outcomes did not detect significant relationships between 4K and the outcome variables (Supplemental Appendix J in the online version of the journal). The subgroup analyses, which excluded seven districts with full-day pre-K, did not produce any significant changes to the main findings, despite a slight reduction in the overall impact of 4K on reading and math scores.
Aggregated 4K Treatment Effect Estimates From the Staggered Difference-in-Differences Model Suggested by Callaway and Sant’Anna (2021)
Discussion
The current study found that Wisconsin’s UPK was positively associated with district-average third-grade reading achievement but not math achievement. The positive association with reading scores was concentrated among students from economically disadvantaged families and Hispanic students. The 4K program was also positively related to the math scores of Black students. By utilizing district 4K enrollment rates, the results suggest that full participation in 4K could increase academic achievement more than the observed effects, particularly for Hispanic and Black students, who tend to enroll at a lower rate than their White peers. These findings add to the pre-K literature evidence that Wisconsin’s UPK, featuring high accessibility and a mostly part-day format, could have sustained positive effects on academic achievement. Furthermore, the findings by subject and subgroup provide a more nuanced and contextual understanding of UPK. The following sections discuss several key findings.
Average UPK Effects on Third-Grade Academic Achievement
The association between Wisconsin’s UPK program and district reading achievement, evidenced by a 0.1 SD in both ITT and TOT analyses, was notably less pronounced than other reported findings, which typically ranged from 0.2 SD to 0.4 SD in medium-term reading scores (Amadon et al., 2022; Cascio & Schanzenbach, 2013). Although making direct comparisons between UPK programs in different states is challenging owing to variations in metrics and study periods, the modest effect sizes in Wisconsin could be attributed to two possible reasons. First, certain characteristics of Wisconsin’s UPK program, such as a moderate state commitment to quality assurance and funding constraints leading to predominantly part-time schedules, offer insights into these results. For instance, the UPK program in Oklahoma, which has been shown to have a more significant impact on student reading outcomes, with an increase of around 0.4 SD, primarily features full-day schedules and adheres to stricter state regulatory benchmarks (Friedman-Krauss et al., 2021). The duration of instruction, particularly in full-day pre-K programs, plays a crucial role in program efficacy, and full-day programs often have a more significant impact on school readiness than do part-day programs (Atteberry et al., 2019). The state’s funding structure invariably influences local districts to favor part-day programs. These programs, with their restricted instructional hours, may not yield results comparable to those in states such as Oklahoma. Moreover, the part-time care arrangements of these pre-K programs may limit their appeal to some parents, thus influencing which children are enrolled.
Additionally, the lack of statewide regulations concerning classroom features such as teacher-to-student ratios or class sizes may prompt local districts to overpopulate classrooms and occasionally to integrate 3- and 5-year-olds into the same classrooms. The literature has suggested that high structural quality benchmarks are indispensable for fostering process quality, enhancing teacher responsiveness, fostering bonds between educators and students, and facilitating tailored instruction (NICHD Early Child Care Research Network, 2002). Oklahoma set a commendable standard with rigorous regulations, such as maintaining a 1:10 adult-to-student ratio and capping class sizes at 20 students (Friedman-Krauss et al., 2021). Nevertheless, data limitations make it difficult to determine the degree to which variations in the regulatory benchmarks of state UPK policies correlate with the observed effect sizes.
Second, the nature of the counterfactual settings may shed light on the observed effect of the Wisconsin UPK program. There is an absence of specific data detailing the proportion of 4-year-old children transitioning from informal childcare or parental care to structured 4K classes, but the available statistics suggest that nearly 30% of these children previously attended center-based childcare. This implies that not every child transitioned from informal childcare settings to 4K classes upon the initiation of the UPK program. For those who previously had access to high-quality childcare in a counterfactual setting, the impact of the policy may be marginal. Consequently, there is a compelling need for future research to determine the extent to which structural disparities and counterfactual settings contribute to the modest effects observed in Wisconsin’s UPK program.
An interesting finding of this study is the stronger association between Wisconsin’s 4K program and advancements in reading achievement than in mathematical achievement. This trend aligns with empirical research and meta-analyses that have consistently found that the benefits of pre-K programs tend to be more pronounced for literacy than for math skills (Barnett et al., 2018; Cascio, 2021). Concurrently, national studies have indicated a relative scarcity of mathematical instruction within pre-K curricula (Cross et al., 2009). Wisconsin also places a heightened emphasis on reading and literacy, with the recommendation that literacy subjects constitute 30% of the curriculum but an absence of specific guidelines regarding the allocation of time for math subjects. The predominantly part-day schedules of these programs, which allocate considerable time to noninstructional activities such as lineups, snack breaks, and cleaning, further curtail opportunities for math instruction. Children are frequently exposed to environments that promote English communication skills with peers and adults during noninstructional periods, in contrast with their limited exposure to math. In contrast, other UPK programs, such as Georgia’s UPK, which has been shown to have significant effects on medium-term mathematical achievement, deliberately prioritize math instruction times (Early et al., 2019).
Artz and Welsch (2016), examining an earlier phase of Wisconsin’s UPK from 1997–1998 to 2005–2006, when 4K enrollment reached approximately 30%, detected a marginally significant positive effect on fourth-grade math outcomes but no discernible impact on reading scores. To investigate this inconsistency, the present study used an ITT model with a subsample of year-district observations spanning the period 2002–2003 to 2005–2006. The subsample analysis revealed that the effect of 4K was greater for third-grade math scores (β = 0.077, p = .219, N = 1,162) than for reading scores (β = 0.054, p = .419, N = 1,162). However, neither effect was statistically distinguishable from zero. These findings support the notion that 4K may have been more conducive to math achievement than reading during the early implementation years, specifically from 1997–1998 to 2001–2002. Artz and Welsch (2016) reported that the effects of 4K on math were principally localized within districts characterized by lower achievement levels. Synthesizing these observations, the differential impact of UPK on math achievement may be attributable to the unique contextual characteristics of the districts that implemented 4K prior to 2000. However, it is noteworthy that the current study, which used a more expansive dataset, found that the overall UPK program was not associated with district-average math achievement.
Heterogeneous Effects by Subgroups
Despite a modest association with overall reading scores and a null effect on math scores, the results of this study demonstrated more pronounced effects of the Wisconsin 4K program for students from disadvantaged backgrounds. Specifically, the effect sizes were approximately 0.25 SD for Hispanic students’ reading scores, 0.17 SD for children from low-income families in reading, and nearly 0.3 SD for Black students in math.
These heterogeneous effects may indicate substantial disparities in counterfactual learning environments, which in turn foster different degrees of alignment with school readiness and diverse opportunities for skill development elsewhere (D. H. Bailey et al., 2020; Cascio & Schanzenbach, 2013). For example, 4K serves as an advantageous setting for English language acquisition before entering elementary school, particularly for children who primarily speak a different language at home and have not previously attended center-based childcare (Watts et al., 2023). Similarly, 4K programs have been found to have a more significant impact on Black children in contexts where the home culture may prioritize domains other than mathematical skills (Tudge & Doucet, 2004) or where their alternative childcare options would be of lower quality (Chaudry et al., 2011).
In light of evidence regarding the reading achievement of Black children in the United States, it is essential to consider broader explanations for the varying sustained effects on academic achievements. The role of the Wisconsin 4K model in supporting reading development among Black children, given the alignment of their primary language with that of the school, invites a nuanced analysis of its impact within the context of systemic challenges, including economic disparities and educational inequities. It is not clear why Wisconsin 4K was not found to be related to Black students’ third-grade reading scores at a statistically significant level. Given the context of social and economic marginalization faced by Black children, the lower quality of school environments may hinder the persistent effects of the program for these children (Iruka et al., 2022). Nonetheless, these findings suggest that the Wisconsin 4K model could be instrumental in significantly reducing academic achievement gaps by compensating for the lack of stimulating environments.
Sustained Effects in Third Grade
This study provides evidence that the impacts of Wisconsin’s UPK extend beyond the initial year, implying that its benefits persist through at least the third grade. Two distinct hypotheses emerged when interpreting these outcomes, each offering an explanation for the observed medium-term effects. On the one hand, the program’s influence on reading may remain consistent across grades and continue to be sustained in subsequent grades. On the other hand, the notable policy impact observed in the third grade might not yet capture the true fadeout effect but, rather, result from a substantial increase at the end of the program. For instance, it might be reasonable to anticipate a systematic decline in the effects of 4K by the third grade. From this perspective, the influence of the 4K initiative on third-grade reading skills might gradually taper off, potentially becoming negligible over more extended durations, such as into the fourth and fifth grades and beyond. Therefore, future research probing the effects of Wisconsin’s UPK on subsequent grades could provide valuable insights into the most suitable explanation for whether the impact is sustained or experiences a gradual fadeout.
Policy Implications
The findings of this study have several important policy implications for the implementation of large-scale UPK programs. First, the results underscore a significant policy consideration for communities grappling with the initiation of expansive UPK programs. Many communities face challenges in implementing large-scale, high-quality, full-day 4K programs due to political barriers and taxpayer reservations (Zigler et al., 2011). In situations characterized by constrained resources and public reluctance to fund UPK, communities could initially consider deploying a more economically feasible program, mirroring the strategy adopted by numerous districts in Wisconsin.
This study provides compelling evidence that a thin and wide approach can improve a district’s overall academic achievement in the medium term, with larger effects for children from disadvantaged than nondisadvantaged backgrounds. For communities with a pressing need to cater to students with less preparation, such as those with higher levels of poverty or in rural areas, a cost-effective large-scale UPK could be an appropriate starting point. Subsequently, as resources become available and community buy-in strengthens, the communities can pivot towards a more comprehensive full-day program. The model adopted by Wisconsin offers valuable insights for states in comparable situations, such as Indiana, where pre-K infrastructure is nascent; and both Minnesota and Ohio, where public pre-K programs currently serve only 10% of 4-year-olds (Friedman-Krauss et al., 2021). An incremental approach driven by pragmatic considerations can enable communities to provide meaningful early educational experiences while respecting budgetary constraints and evolving public sentiment.
Second, enhancing the quality of UPK programs appears to be a pivotal factor in magnifying their benefits for all students irrespective of their socioeconomic backgrounds. Intriguingly, the observed null effects on children not from low-income families and White students underscore the fact that UPK does not significantly augment learning environments for these specific demographic groups. This aligns with an extensive body of literature that has similarly found the benefits of UPK to be predominantly concentrated among children from disadvantaged families (Blau, 2021; Duncan et al., 2022). A full-day instruction format could be a promising strategy for enhancing the overall impact on all participants and offering more equitable childcare benefits across social strata (Atteberry et al., 2019). Ultimately, increasing the overall quality of the program to the extent that it surpasses the stimulation provided by home or alternative environments could lead to benefits for all children (Cascio, 2021).
Finally, the data show that 4K enrollment rates tend to be lower among students of color, which has been consistently reported in other states. Research has suggested that traditionally underrepresented groups and lower-income families encounter obstacles that impede their ability to enroll in and attend public pre-K programs (Barnett, 2010). The demonstrable increase in the effect size from ITT to TOT for low-income students and students of color in this study indicates the opportunity to derive further benefits from 4K by enhancing access to serve more disadvantaged student populations. The part-day structure of Wisconsin’s 4K may contribute to this enrollment gap, as it may conflict with the schedules of working parents. These families could face the additional burden of arranging supplementary childcare during non-4K hours, thus incurring extra costs and possibly necessitating complex transportation logistics (Pilarz et al., 2019). Consequently, some families might find it more convenient and economical to bypass 4K entirely and opt for informal childcare arrangements or parental supervision. Moreover, the presence of language barriers, compounded by a lack of translation services, can further exacerbate enrollment disparities (Hill et al., 2019). Policy adjustments such as introducing full-day programs or providing ancillary services, such as transportation and translation, could address the specific barriers associated with part-day programming.
Limitations
This study has several limitations that warrant acknowledgment. First, although the district-level analysis provides unique insights from a policy perspective, potential measurement errors may undermine its comparability to previous research using student-level data. Moreover, using district-wide percentages to evaluate proficiency levels may not accurately reflect the subtleties of student performance given the diverse range of high and low scores. This mode of representation can give rise to an ecological fallacy, indicating that analyses at different levels may lead to divergent conclusions. Thus, it is prudent to interpret the findings strictly in terms of district-level changes and to exercise caution in extending such interpretations to individual student benefits. Further studies could investigate the congruence between district- and student-level findings.
Second, the statistical analysis in this study focused on districts that implemented 4K between 2002 and 2014. The study excluded districts that adopted 4K either before 2002 or after 2014. Consequently, the generalizability of the findings is confined to the analyzed districts, restricting extrapolation to other Wisconsin districts. For instance, the study omitted Milwaukee Public Schools, a pioneer in offering full-time 4K and Wisconsin’s largest district, characterized by rich diversity and pronounced achievement gaps. The impact of 4K in Milwaukee could differ considerably within the counterfactual framework. By extension, the effects of 4K could be relatively muted in districts that embraced 4K later and in affluent and suburban districts (Yang 2024). Subsequent studies should explore the heterogeneity of the 4K effects, considering factors such as the timing of 4K adoption and district-specific attributes, such as location, socioeconomic backdrop, and the local childcare market.
Third, although the data do not reveal significant disparities in student demographics before and after the policy’s implementation, potential threats to internal validity remain and pose challenges in verifying causal treatment effects. A prominent concern is the possible migration of students across district boundaries during the posttreatment phase. Parents may strategically enroll their children in neighboring districts offering 4K programs, only to return to their original districts for subsequent elementary education. This was evident in the 4K enrollment rates, which exceeded 100% in the data. Student migration may have introduced a downward bias to the observed associations by attenuating the distinctions between treatment districts that had adopted the 4K programs and those that had not yet implemented them. Unfortunately, this study was constrained by the unavailability of individual-level student data to track transitions across districts. Future research should aim to obtain more detailed tracking information, allowing for a more nuanced estimation of the 4K effects by considering potential migration scenarios.
Finally, the interpretation of TOT effects was focused on presumed dosage effects rather than direct causal impacts on the treated—in this case, 4K participants. The percentage of 4K enrollment potentially reflects both impacts on actual participants and the peer effect on nonparticipants. Given the constraints of our data, isolating the peer effect is beyond the scope of this study. Future research using individual administrative data could separate the impact of 4K from the peer effect on individuals and suggest how the peer effect on nonparticipants varies depending on enrollment rates.
Conclusion
The study aims to demonstrate the effectiveness of the Wisconsin UPK program, which serves as an example of how moderately funded and less-intensive policies can still yield significant educational gains for disadvantaged groups and reduce academic achievement gaps. The findings contribute to the literature on ECE policies and emphasize the importance of considering program features and contexts when designing and implementing these policies. Policymakers and researchers should focus on the nuanced design and implementation of ECE programs that not only increase their accessibility but also enhance the expected benefits across diverse participant groups by scrutinizing counterfactual settings and program features. Such an approach, grounded in evidence and tailored to the multifaceted realities of children’s early educational experiences, has the potential to maximize the effectiveness of ECE policies in increasing academic achievement. The study’s exploration of the Wisconsin UPK model serves as both a contribution to and a call for a more comprehensive and context-aware understanding of the dynamics shaping the effectiveness of early education policies.
Footnotes
Notes
H
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
