Abstract
Utilizing Utah state data, the aim of this study is to examine the association between language program types programs (dual language programs [DLI], sheltered instruction [SEI], and English as a second language [ESL]) since first grade and third-grade basic literacy skills of Spanish-speaking English learners (ELs) in the United States. We employ propensity score matching (PSM) to generate matched samples using child and family factors known to be associated with children’s early literacy: child sex, immigrant status, unhoused status, special education status, child met Dynamic Indicators of Basic Early Literacy Skills (DIBELS) benchmark at start of first grade, as well as family income and parental language (DLI & ESL, n = 380; DLI & SEI, n = 380; SEI & ESL, n = 550). Regression models comparing early literacy outcomes for each matched group indicate small trends, based on effect sizes, in favor of DLI programs when compared to other program types, although differences were not statistically significant. Implications for policy and practice related bilingual education are addressed.
The aim of this study is to identify the type of language programs associated with the early literacy of Spanish-speaking English learners (ELs) in the United States. There are a variety of language program types available to ELs that range from bilingual models to fully English-only models—a range that includes, but is not limited to, dual language immersion (DLI), transitional bilingual education (TBE), heritage language support, sheltered English instruction (SEI), full immersion, partial immersion, and English as a Second Language (ESL) (U.S. Department of Education, 2012). Evidence that these programs support the academic achievement of ELs, specifically of Spanish-speaking ELs, is mixed at best (Garcia, 2018; Marian et al., 2013; National Academies of Science, Engineering, and Medicine, 2017; Slavin et al., 2011; Steele et al., 2017; Umansky et al., 2016; Valentino & Reardon, 2015; Watzinger-Tharp et al., 2018). Unfortunately, given the variability of samples, methods, and contexts of the research examining these effects, it is challenging to determine if the mixed evidence is due to methodological variability or substantive differences in program effects.
To examine the relation between the types of language programs ELs receive and their literacy outcomes, we focus on Spanish-speaking ELs, given the significant size of this population of students in the United States (U.S. Department of Education, n.d.). In particular, we center our investigation on Utah and its three most common language program types for Spanish-speaking ELs: DLI, SEI, and ESL. Utah is also one of the first states to systematically offer DLI instruction, which is thought to benefit ELs by leveraging students’ home language (e.g., Spanish, Chinese) and the language of instruction (i.e., English in the United States) to support long-term learning, alongside other language programs for ELs (i.e., SEI and ESL).
Hence, state-level data from the state of Utah allows us to explore the following research question: Among students identified as Spanish-speaking ELs, are there differences in third-grade basic early literacy skills by language program (DLI vs. SEI; DLI vs. ESL; SEI vs. ESL)? In the following sections, we (a) describe the educational context in Utah focused on the three most common instructional models that serve ELs in the state, (b) provide a brief history of language program options and implications for ELs in the United States, and (c) briefly summarize prior research conducted with ELs receiving DLI instruction in Utah.
Utah as a Case Study
The demographic characteristics of ELs in Utah mirror national trends. As of the 2014–2015 academic year (i.e., the first observed year in this study), nearly 25% of Utah’s students were from a racial or ethnic minority group, with 16% of the total population identified as Hispanic (Utah State Board of Education Department of Data and Statistics, personal communication, 2023), slightly lower than the national rate of 25.9% of U.S. school-age population identify as Hispanic (National Center for Education Statistics, 2023). The share of school-age children with one or more foreign-born parents in Utah versus United States showed a similar pattern, with 18% in Utah and 26% nationwide (Sugarman & Geary, 2018).
To better prepare its graduates to compete in a global market economy, Utah passed an initiative in 2008 to establish 100 DLI programs across the state by 2015. With clearly articulated program guidelines and statewide funding, this effort dramatically increased the number of DLI programs across the state. With high public demand, there were already 116 DLI programs across five languages with more than 25,000 children enrolled by fall 2014 (Leite, 2013). Consequently, the three dominant programmatic offerings for EL students in Utah are DLI, SEI, and traditional ESL instruction, which are the primary focus of the present study.
Using state-level educational data made available by the Utah State Board of Education, we examine the potential differential effect of three language instruction programs most commonly used to serve ELs within the state of Utah. Using these state-level data as a case study, we are able to examine different language instructional programs across the entire state, such that findings are not driven by local differences in policy and/or population. We believe this approach is a significant advantage over prior research that may be limited by sample selection.
Brief History of Language Education in the United States
The Bilingual Education Act of 1968 was the first legislation to recognize the rights of linguistic minorities in the United States and provided federal funds to encourage states to experiment with new models of language instruction (Nieto, 2009). While this legislation did not require a specific instructional model, an amendment in 1974 required programs to articulate student goals and progress reports. Simultaneously, judicial decisions (Castañeda v. Pickard, 1981; Lau v. Nichols, 1974) recognized ELs’ right to equitable and accessible grade-level content and the right to English language support for English language acquisition (Gandara & Escamilla, 2016; Nieto, 2009), placing significant responsibility at the district level. In 1994, the Bilingual Education Act was reauthorized, reiterating the importance of English mastery among students with limited English proficiency with a stated preference for bilingual programs that develop bilingual proficiency in English and the student’s heritage language (Nieto, 2009). This resulted in the emergence of two-way bilingual programs (i.e., programs that include students who are native English speakers and native speakers of another language who are also identified as ELs, with formal instruction in both languages for all students). However, growing backlash throughout the 1980s and 1990s eliminated bilingual education in some states and resulted in the resurgence of English-only programs. In turn, some states adopted instructional models such as the SEI model, which focused primarily on rapid English language acquisition. By the start of the 21st century, the implementation of No Child Left Behind in 2001 resulted in the elimination of the Bilingual Education Act and an exclusive focus on testing in English (Escamilla et al., 2022; Goldenberg & Wagner, 2015). The changing policy priorities over time have resulted in the current array of programs available within and across states.
Three Types of Instructional Programs for ELs
The types of instructional programs employed to support English language acquisition vary, with different goals and target populations (for a review of different language models, see D. L. Baker et al., 2016; Serafini et al., 2022). In this study, we focus on three common types of instructional programs for ELs: ESL, SEI, and DLI. ESL and SEI programs are primarily considered “English-only” models,” while DLI programs are considered bilingual programs (U.S. Department of Education, 2012). ESL programs focus on promoting the transition to English for children who primarily speak a different language at home or those who are labeled as having limited English proficiency (Francis et al., 2019; Gandara & Escamilla, 2016). Employing either a push-in or pull-out model, students receive targeted English-only instruction to expeditiously promote English-language proficiency. Students enrolled in ESL programs may do so in the context of a TBE program in which the home language is used to support the acquisition of academic content with the goal of moving toward English-only instruction (U.S. Department of Education, 2012). Similarly, SEI is an instructional model used in subject area classes that are typically taught in English but within which some or all students are ELs. The goal of this approach is to make subject area content more accessible while simultaneously promoting English language acquisition (Short et al., 2011). However, the focus on promoting written and oral language skills in English within ESL and SEI programs has been criticized as being subtractive bilingual models, in that English language acquisition often occurs at the expense of the home language (C. Baker & Jones, 1998; Chang-Bacon, 2022; Wong Fillmore, 1991).
In contrast, the DLI model is designed to promote the development of bilingualism for all children in the classroom, including children who primarily speak a different language at home as well as monolingual English speakers. Bilingualism in DLI (and TBE) contexts involves the development of both English and the native language, as well as cultural competence across cultures (Gandara & Escamilla, 2016), even as they vary in their orientation toward bilingualism as a goal. The theoretical foundation for DLI models rests on an interdependence hypothesis and transfer theories, which suppose that the skills and knowledge present in a student’s home language form the foundation for and support development in a second language (Cummins, 1979; MacSwan & Rolstad, 2005). These models provide ELs with opportunities for academic development that leverage the resources of the home language while also supporting the development of English language skills (Boyle et al., 2016). For some DLI programs, the stated long-term goal is the development of biliteracy (Arteagoitia & Yen, 2020). The language of instruction, which varies substantially across these programs, is a “nonignorable factor” when considering the early literacy of ELs (Francis et al., 2019, p. 35).
Mixed Evidence for Bilingual and English-Only Instructional Models
The research evidence comparing outcomes for students in bilingual or English-only models varies significantly in samples, assessment types, inclusion criteria by age or grade, methodological approaches (e.g., experimental, quasiexperimental, observational), and type of outcome (e.g., oral language in English, reading in English, reading in home language). This variability challenges our ability to compare across studies or to make significant claims about the benefits of each program type. As noted by D. L. Baker et al. (2016), the variability in terms and definitions referring to programs serving EL populations “makes it difficult for parents and others to understand the differences among programs and their potential advantages or disadvantages” (p. 824). Nonetheless, in this section, we broadly review the evidence base for bilingual and English-only instructional models, with a primary focus on studies that focus on Spanish-speaking participants.
Substantial research evidence offers support for bilingual education models (e.g., August & Shanahan, 2006; Collier & Thomas, 2004, 2017; Greene, 1997; Willig, 1985; Wong-Fillmore & Valdez, 1986). Long-term evidence on participating in different types of EL instructional programs is mixed, with some suggesting that DLI instruction is positive for long-term ELA/reading and math (Marian et al., 2013; Valentino & Reardon, 2015), while others present a more complex picture describing different patterns of results by outcome—with some studies documenting nonsignificant but positive trends, others demonstrating no trends, and yet others documenting small but statistically significant results—for students in bilingual programs compared to those in English-only models (e.g., Garcia, 2018; Slavin et al., 2011; Steele et al., 2017; Umansky et al., 2016; Watzinger-Tharp et al., 2018).
As early as preschool, Spanish-speaking students participating in bilingual programs that support their home language appear to demonstrate language and early literacy gains in Spanish and no or limited differences in English that reduce over time compared to those in English-only models (Durán et al., 2015; Rojas et al., 2019). With similar program comparisons, others have found similar benefits in Spanish alongside benefits for students’ English language outcomes (Lara-Alecio et al., 2022). However, studies with different populations and assessments have found different patterns when comparing students in bilingual programs to those in English-only programs. For example, Nakamoto et al. (2012) found no meaningful differences in English and Spanish reading comprehension across first through third grade among Spanish-speaking students in Texas participating in different types of bilingual programs but found a slight advantage in English scores in students attending English programs compared to those receiving some form of bilingual instruction. Yet a different pattern was found by Lara-Alecio et al. (2022), who conducted a randomized control trial with Spanish-speaking students in Texas comparing K–3 early literacy outcomes in English for students in bilingual and English-only programs. Their findings suggest that the early literacy skills of Spanish-speaking students in the bilingual program lagged behind that of students in the SEI programs only at kindergarten entry. However, students in the bilingual program demonstrated greater growth than those in the English-only program through third grade, such that the difference in early literacy outcomes in English had reduced significantly by this time. Leveraging random assignment by lottery at pre-K or kindergarten entry into DLI programs, with a sample that included 17% Hispanic/Latinx students, Steele et al. (2017) found that students assigned to the dual language bilingual program improved their fifth- and eighth-grade reading and were less likely to remain EL classified in later grades, with greater benefits for students whose home language matched the language of the bilingual program. With a sample of dual language learners in Miami that included 86% Latinx students, Serafini et al. (2022) found that students in dual language bilingual programs performed better on standardized math assessments and obtained higher GPAs in fifth grade compared to students in models that lacked home-language supports (e.g., SEI, ESL). Diemer et al. (2021) show similar findings in a study comparing the early reading outcomes of EL students in Spanish dual language bilingual programs in the Midwest with those of EL students in more traditional English-only settings. Taken together, these studies suggest that dual language bilingual programs may be especially beneficial for Spanish-speaking ELs who enter these programs early in their formal school trajectory.
Several studies examining one type of English-only model, specifically SEI programs, have primarily focused on case studies of professional development and fidelity of implementation (e.g., Echevarria, Richards-Tutor, Chinn, & Ratleff, 2011; Echevarria et al., 2006; Piazza et al., 2020; Ringler et al., 2013; Short et al., 2012). Of the few studies to compare these English-only models to other models of instruction, one investigation of middle school science classrooms with populations of more than 5% EL students, found no statistically significant evidence, although trends suggested that students in SEI may have small benefits in some science outcomes compared to ELs receiving regular science instruction (Echevarria, Richards-Tutor, Canges, & Francis, 2011). As with this study in science classrooms, a significant portion of the research evidence on SEI English-only instructional models has taken place with students in middle and high school (e.g., Echevarria et al., 2006; Short et al., 2011, 2012). More broadly, research comparing EL students in monolingual English-only programs to those participating in bilingual programs in kindergarten and the early elementary grades finds that students participating in monolingual English-only programs have lower academic outcomes than those in bilingual programs (e.g., Serafini et al., 2022; Umansky, 2016).
Prior Studies of ELs in Utah
Although much commentary exists on the nature of the DLI initiative in Utah and the extent to which this program may benefit non-ELs over ELs (e.g., Freire et al., 2022; Valdez et al., 2016), few studies have examined students’ academic outcomes. To our knowledge, two studies to date have examined the academic performance of ELs as a function of program participation as early as third grade (i.e., Steele et al., 2019; Watzinger-Tharp et al., 2018). Steele et al. (2019) compared the relative average English Language Arts (ELA) performance of ELs enrolled in both DLI programs to non-DLI programs in the same schools. Findings indicate that DLI students outperformed non-DLI students on ELA in third grade, suggesting benefits of DLI program participation for EL students. However, improved ELA performance was limited to comparisons between DLI and non-DLI program participants when enrolled in the same school and may not generalize to students across schools. In a study focused on math achievement, Watzinger-Tharp et al. (2018) used propensity score matching to compare non-DLI and DLI students in third and fourth grade. Study results suggested that DLI students being taught in the target language performed equally as well as non-DLI students on math achievement tests delivered in English in third grade, controlling for ELA achievement. The findings highlight the importance of using quasi-experimental methods, such as propensity score matching, to isolate language program effects across different outcomes.
While both studies suggest that participation in DLI programs may result in academic performance equal to or better than non-DLI programs, neither study included controls for the type of instructional program in which non-DLI EL participants were enrolled. In seeking to understand how to best serve ELs, it is important to know if these findings hold true when comparing children’s performance across the three most common programs offered to ELs (ESL, SEI, and DLI). Using propensity score matching to more effectively isolate the effects of each program type, the current study leverages Utah state data to examine differences in third-grade English literacy skills for Spanish-speaking ELs enrolled in different EL programs.
The Present Study
As the number of students in schools who are identified as English learners continues to rise, a major focus of education policy at the federal, state, and local levels has become how best to serve those students. Central to education policy decisions must be the recognition that ELs come to school with a diversity of linguistic, intellectual, and cultural assets (Callahan & Gándara, 2014) that are to be built upon rather than erased. Umansky and Porter (2020) have proposed a framework for EL education policy organized around three core principles: understanding student needs and assets, accessibility to high-quality instruction, and effective system conditions. In light of this current policy context, the present study focuses on the second core principle by comparing outcomes for Spanish-speaking ELs across the three dominant programmatic offerings for EL students found in Utah schools (DLI, SEI, and ESL). Grounded in divergent theories of language acquisition each with their own unique goals, as previously outlined, we focus on these three programs given their historical and practical importance in schools. Our focus in conducting the present study is not to investigate the underlying mechanisms at play, instead, it is motivated by a need to identify the programs that best support the early literacy of Spanish-speaking EL students, using rigorous statistical methods.
Third grade marks a critical turning point in schooling as the elementary language arts curriculum shifts from an emphasis on learning to how read to using reading more independently to learn across content areas (Workman, 2014). Children who are still struggling readers in third grade tend to do less well academically moving forward (Hernandez, 2011; Karasinski & Anderson, 2017). This shift in performance expectations and curricular focus in third grade reinforces the importance of accessibility to effective instructional programs in the early grades for ELs. By examining the association between EL program participation in first and second grade (DLI, SEI, ESL) and children’s third-grade English literacy skills, we hope to discern the extent to which each of these programs is likely to produce skilled readers by third grade.
In this study, we employ quasi-experimental methods (i.e., propensity score matching [PSM]), to examine the following research question: among Spanish-speaking EL children in the state of Utah, are there differences in third-grade basic literacy skills when comparing different English language programs (DLI vs. SEI; DLI vs. ESL; SEI vs. ESL)? Because students are not randomly assigned to EL program participation, research that compares EL students in different types of programs may be limited by selection bias that can occur when confounding factors are not adequately accounted for in the analyses. PSM is a widely accepted method to reduce the potential bias resulting from group comparisons with observational data where random assignment to a condition is not possible. By applying PSM to large state data, where ELs were able to enroll in a variety of English language programs, our study focuses on exploring differences in third-grade literacy among EL students participating in different English language programs (DLI, SEI, ESL) in first and second grade.
Findings from this study have the potential to inform education policy specific to Spanish-speaking ELs, an important contribution given that ELs are a rapidly growing population in the United States, in some states making up 20% of the school-aged population (Sugarman & Geary, 2018). For ELs in particular, identifying instructional programs that strengthen English and their home language skills is critical, as achievement in both languages provides access to broader educational opportunities while maintaining connections to family and culture. Finally, although our current study focuses on Utah, our investigation can offer a useful case study for other states interested in offering an array of instructional options for ELs.
Method
Data Source and Participants
This study relies on educational data collected and maintained by the Utah State Board of Education (USBE). Our aim is to explore differences in Spanish-speaking ELs’ early literacy skills related to the language program experienced in the early elementary period. The present study is part of larger project that follows the 2014–2015 first-grade cohort through fifth grade. We specifically included students if they met the following criteria: started first grade in 2014–2015; were identified as “Hispanic” and “Spanish-speaking”; and were enrolled in one of the three program types for English learners: DLI, SEI, and ESL. These inclusion criteria yielded a sample of 4,574 Spanish-speaking ELs in the State of Utah.
It is important to note that DLI refers to programs that include both Spanish and English instruction, including two-way bilingual and dual-immersion programs, with the goal of maximizing bilingualism; sheltered instruction refers to programs in which EL students receive modified content instruction in English in their general classroom with the goal of maximizing English proficiency; and ESL refers to push-in or pull-out programs in which children are removed from their general classroom instruction for targeted English-only instruction. Over 87% of the students experienced two full years of instruction or dosage in the program that they started in first grade. In order to see true program differences that were not biased by changing program type across years, the sample was limited to include only students who received the same EL program type in first and second grade (i.e., those with 2 full years of program exposure; N = 3,993).
Overall, 4.8% of the sample were participating in DLI programs in first grade starting in the 2014 academic year, 7.0% of the sample were participating in SEI programs, and 88.3% were enrolled in ESL programs at the start of first grade. Demographic information for the sample is available in Table 1. Approximately 48% of the sample were classified as female. In 2015, 90% of the sample was identified as low-income, nearly 2% of the sample were identified as first-generation immigrant children (i.e., children were foreign-born), and approximately 95% of households indicated that parents spoke Spanish in the home.
Demographic Information for Spanish-Speaking EL Students Receiving 2 Years of Dosage (N = 3,993)
Note. DIBELS = Dynamic Indicators of Basic Early Literacy Skills.
Variable used in the propensity score matching process.
Measures
Measures include third-grade early literacy, EL program type, and demographic indicators such as child sex, immigrant status, unhoused status, and special education status; and two indicators of first-grade early language and meeting reading benchmarks. We report published reliability information for measures, as we do not have student item-level data to compute reliabilities for this specific sample.
Early Literacy Outcome
Students’ reading proficiency and early literacy skills were assessed using the Dynamic Indicators of Basic Early Literacy Skills (DIBELS) assessment (Good & Kaminski, 2002; Kaminski & Good, 1996), which provides criterion-referenced target scores or benchmark goals that represent adequate reading skills for a particular grade and time of year. DIBELS measures student reading proficiency in the areas of letter naming fluency (LNF), phoneme segmentation fluency (PSF), nonsense word fluency (NWF), oral reading fluency (ORF), and retell fluency (RTF) as a measure of comprehension. Alternate-form reliability has been reported at .88 for LNF, .88 for PSF, .87 for NWF, and ranging from .89 to .94 for ORF. Test-retest reliability on the ORF ranged from .92 to .97 (Good & Kaminski, 2002).
This study used students’ DIBELS Reading Composite Score at the start of first and third grade. These composite scores can be interpreted relative to benchmark goals to determine if a student’s score is at or above the benchmark, below the benchmark, or well below the benchmark (below the cut point for risk). By the start of first grade, a composite score of 113 and above is considered at or above benchmark, a score between 97 and 112 is considered below benchmark, and scores below 96 are considered well below benchmark; by the start of third grade, a composite score of 220 and above is considered at or above benchmark, a score between 190 and 219 is considered below benchmark, and scores below 179 are considered well below benchmark (Dynamic Measurement Group, 2010).
EL Program Type
Districts across the state of Utah offer a variety of instructional programs designed for ELs. EL program classifications include “heritage language instructional program,” “native language support,” “transitional bilingual program,” “two-way bilingual program,” “dual language program,” “English-as-a-second language program,” “partial English immersion,” “structured English immersion,” “total English Immersion,” and “sheltered English instruction.” In the fall of 2014, most first-grade students receiving EL services were enrolled in one of the following program types: “two-way bilingual program,” “dual language program,” “English-as-a-second language program,” and “sheltered English instruction.” Given that less than 1% of the sample was in the various other program types, we focused our analysis exclusively on the programs represented in the following composite variable. DLI refers to programs that include both Spanish and English instruction, including two-way bilingual and dual-immersion programs, representing programs with the stated goal of maximizing bilingualism in both languages; SEI refers to programs in which EL students receive modified content instruction in English in their general classroom with the goal of maximizing English proficiency; and ESL refers to pull-out programs in which children are removed from their general classroom instruction for targeted English-only instruction.
Child-Level Factors
The following child-level sociodemographic factors were included in our analyses based on indicators provided by state: child sex, immigrant status, unhoused status, and special education status. Child sex was coded as female (1 = female; 0 = male), given prior research suggesting differences in academic achievement by sex (e.g., Reilly et al., 2019). Additionally, we included child immigrant status, which indicates child foreign-born status (1 = child is foreign-born; 0 = child is U.S.-born). Child immigrant status was included based on research pointing to the association between students’ immigrant status and academic achievement (Palacios & Bohlmann, 2020). Based on parent report of their child’s housing status, state data identifies children who experienced houselessness in their first-grade year (1 = experienced being unhoused; 0 = did not experience being unhoused). We included students’ unhoused status given that it is a student stressor that has strong implications for students’ educational experiences, where students experiencing housing insecurity tend to be in schools and neighborhoods with limited economic, institutional, and relational resources (e.g., Dhaliwal et al., 2021; Herbers et al., 2012). Using the first- and third-grade databases, the two special education indicator variables specified whether the child was receiving special education services in first and third grade, respectively (1 = received special education services; 0 = did not receive special education services). We accounted for students’ special education status, based on national reports finding disability status to be linked with academic underachievement (U.S. Department of Education, 2019).
In addition to child-level sociodemographic factors, we included two indicators of early language and reading in 2015. The early reading indicator was conceptualized as whether or not the child met the benchmark for DIBELS at the start of first grade (1 = at or above benchmark; 0 = not above benchmark). We used the WIDA ACCESS composite scale score in 2015 as a continuous indicator to account for EL student’s early listening, speaking, reading, and writing skills in English (Kenyon, 2006). The reported reliability for the overall composite score in first grade is .95 (Bauman et al., 2007).
Family-Level Factors
Family income level and parental language background were included as family-level factors in our analyses. Two family income indicators were created using the first and third-grade enrollment data, respectively, which indicated whether the family was identified as being low-income (1 = low-income; 0 = not low-income) based on free/reduced lunch eligibility. Family income level was included in our analysis given that household income status is often linked to compromised academic achievement (Duncan & Hoynes, 2021), especially English language proficiency (Collins & Toppelberg, 2021). In addition, we included a home language use indicator in first grade to identify parents who spoke Spanish (1 = parent speaks Spanish; 0 = other). By doing so, we aimed to account for the potential link between bilingual home language input that has been found to contribute to children’s oral language skills (Cha & Goldenberg, 2015; Collins & Toppelberg, 2021).
Data Analysis
Propensity Score Matching
PSM is a commonly used method to improve researchers’ ability to assess causal relationships from observational data when random group assignment is not possible. The benefit of random assignment is the assumption that confounding variables are evenly distributed across groups. When random group assignment is not possible, group comparisons are often biased given the unequal distribution of confounders. PSM is a statistical method that uses observational data to mimic the nature of random assignment—it uses statistical procedures to generate groups that have similar distributions on confounding variables (Austin, 2011).
For this study, we generated matched samples using seven child and family factors known to be associated with the outcome of interest: children’s early literacy. Specifically, matching variables, measured at the beginning of first grade, included child sex, immigrant status, unhoused status, special education status, whether the child met the DIBELS benchmark at the beginning of first grade, family income, and parental language. PSM was conducted using PSMATCH in SAS/STAT 14.2, with a 0.25 calibration requirement and optimal matching, yielding a 1:1 match for children in each of our “treatment” and “control” groups. Before matching, there were 190 students in DLI, 278 in SEI, and 3,525 in ESL. To examine differences between all three EL program types, PSM was conducted three separate times. First, we used PSM to generate matched samples of students in DLI with those in ESL (n = 380). Next, we generated a matched sample of students in DLI with those in SEI (n = 380). Third, after removing the ESL students that were included in the DLI/ESL matched sample, we generated a matched sample of students in SEI and ESL (n = 550). All students in DLI were successfully matched with ESL students as well as with students in SEI. Three of the 278 students enrolled in SEI did not match with ESL students. All matched samples had strong covariate balance, well below the 0.25 calibration requirement. For the DLI/ESL sample, all covariates had a standardized mean difference of .00, standardized mean differences for the DLI/SEI instruction sample ranged from –.02 to .05, and standardized mean differences for the SEI/ESL sample ranged from –.02 to .01.
Regression Models
For each matched sample, we examined group differences in DIBELS, at the beginning of third grade, by estimating an ordinary least squares (OLS) regression model with the primary variable of interest, program indicator (DLI vs ESL, DLI vs. SEI, and SEI vs. ESL), and three covariates, first-grade WIDA composite scale score, third-grade income, and third-grade special education status. We did not include WIDA composite scale scores in the matching process because scores were collected in the spring of first grade, and we created the matched samples using data from fall of first grade. Hence, we opted to control for students’ WIDA composite scale scores to account for initial language differences that emerge as early as first grade. Family income and special education status at the start of first grade were included in the matching process. However, we also opted to control for both income and special education indicators in third grade, as these factors are likely to change over time and are known to be associated with student achievement. Given the longitudinal nature of our study, the analytic samples differed in size compared to each of the three matched data sets described above. In the DLI/ESL dataset, 43 students were removed for not having WIDA scores (which was a variable measured at the end of first grade), 40 students were removed from the DLI/SEI sample for not having DIBELS scores, and 72 students were removed from the SEI/ESL sample for not having WIDA scores. We examined the nature of the missing data patterns; we detected no statistically significant moderate or strong correlations between missing and observed variables or between missingness on multiple variables. Moreover, the amount of missing data across all students and variables, in the three different matched samples, ranged from 2.1% to 3.2%. When the sample size is large and missing data are less than 10%, listwise deletion does not cause any more bias than imputation (Cheema, 2014). Finally, to address the inherent clustering of students in schools, all OLS models were estimated with robust standard errors via SURVEYREG procedure (i.e., we used third grade school id with the CLUSTER option) in SAS/STAT 14.2. OLS assumptions were examined; no violations were noted.
Results
Results from each of the three OLS regression models are provided in Table 2. Although there are no statistically significant differences for any of the three group comparisons, that does not mean that there are no differences between the groups. In addition to reporting statistical significance (i.e., p values), we also report effect sizes, which capture the magnitude of the association (Sullivan & Feinn, 2012).
Differences in DIBELS Scores at the Start of Third Grade by Program Type Comparing DLI and ESL, DLI and SEI, and SEI and ESL Based on Propensity Score Matched Samples, Adjusting for Nesting of Students in Schools
Note. Values reported in the table are unstandardized regression coefficients, standard error in parentheses, p values, and standardized regression coefficients. Due to missing data in the outcome, WIDA scores, and/or third-grade special education status, the analytic sample size differs from the matched sample size.
For the DLI vs. ESL program comparison the program indicator was coded as DLI = 1 and ESL = 0.
For the DLI vs. SEI program comparison the program indicator was coded as DLI = 1 and SEI = 0.
For the SEI vs. ESL program comparison the program indicator was coded as SEI = 1 and ESL = 0.
The first comparison was third-grade DIBELS scores for students who received at least 2 years of instruction in DLI and ESL programs. Specifically, DIBELS scores at the beginning of third grade are predicted to be 0.12 standard deviations higher for students in DLI programs compared to those in ESL programs (p = .0637). We then compared third-grade DIBELS scores for students who received at least 2 years of instruction in DLI and SEI programs. A similar pattern was observed, such that DLI students were predicted to score 0.09 standard deviations higher than students in SEI programs (p = .1724). Although these differences are small, they are still worth noting, especially given the complex analytic procedures used in the comparisons, which rely on both PSM and robust standard errors. Finally, the comparison between the two monolingual programs, SEI and ESL, was both small in magnitude and did not meet the threshold to reject the null hypothesis.
Our rationale for including first-grade early language skills (i.e., WIDA overall scaled score), as well as third-grade income and special education, was to account for any potential differences between groups that may be related to these factors. In fact, first-grade WIDA scores are strong predictors of third-grade DIBELS, such that students with higher first-grade WIDA scores are likely to have higher third-grade DIBELS scores (DLI & ESL: β = .47, p < .0001; DLI & SEI: β = .46, p < .0001; SEI & ESL: β = .41, p < .0001). Being identified as receiving special education services in third grade, on the other hand, is associated with lower DIBELS scores at the start of that year (DLI & ESL: β = –.19, p < .0001; DLI & SEI: β = –.19, p < .0001; SEI & ESL: β = –.27, p < .0001).
Discussion
Our primary aim was to utilize propensity score matching to examine differences in third-grade basic literacy among Spanish-speaking EL students in the State of Utah enrolled in bilingual (i.e., DLI) and English-only (i.e., ESL and SEI) models. By relying on this rigorous quasi-experimental approach, we were able to find matching students in each program comparison that did not differ on observable characteristics. Further, to ensure the robustness of our models, we also included controls for first-grade WIDA scores, and indicators for third-grade student special education status and third-grade family income. Our findings indicated that there may be small differences in third-grade basic reading among EL students, such that those attending DLI programs in first and second grades may have a small advantage over students attending ESL programs. However, we are cautious in interpreting this small effect size, given that statistical significance was just below the threshold for rejecting the null hypothesis. The comparisons of third-grade basic literacy between EL students in DLI and SEI programs, as well as between SEI and ESL, yielded effect sizes that were both small in magnitude and did not meet the statistical threshold to reject the null hypothesis.
This mixed pattern, with some indication of the benefits of bilingual education compared to English-only models when comparing DLI and ESL, but not when comparing DLI and SEI, mirrors the mixed pattern of findings evident in the extant literature regarding bilingual and English-only models (e.g., Garcia, 2018; Marian et al., 2013; Umansky et al., 2016; Valentino & Reardon, 2015; Watzinger-Tharp et al., 2018). We find this notable in a few ways. First, it is important to highlight that children attending bilingual programs are not falling behind in their early reading skills relative to their peers in monolingual programs overall. Moreover, at least in the case of comparisons between students in DLI and ESL, there is some suggestion that Spanish-speaking children in bilingual programs may be able to leverage skills in Spanish to support their reading development in English, as suggested by the linguistic interdependence hypothesis and transfer theories (Cummins, 1979; MacSwan & Rolstad, 2005). Second, although it is not possible to examine this hypothesis with the data available by third grade, it is reasonable to imagine that bilingual students are also benefiting from gains made in Spanish. Home language maintenance is important for various long-term student outcomes beyond the potential implications for supporting English language development, including preserving the ability to communicate with family and the protective mechanism of a strong cultural identity (e.g., National Academies of Science, Engineering, and Medicine, 2017). Third, it is interesting that we saw no evidence of meaningful differences between students in the bilingual program and students attending SEI programs. A key feature of SEI programs is the focus on providing instruction in a manner that is comprehensible to the student (Short et al., 2011), which may involve connecting concepts to students’ backgrounds and prior content area knowledge, as well as implementing an array of strategies to increase comprehensibility (e.g., using images or pictures, posters, diagrams, word wall, providing dictionaries, flow maps, or graphic organizers). It may be that these strategies provide underlying cognitive support for students similar to the way that the use of a students’ home language provides cognitive support for students in bilingual programs. Future research should examine the quality of instruction provided in DLI, SEI, and ESL programs using observational methods to elucidate whether students’ experiences vary significantly across these three settings.
Finally, the early indication of differences between students in bilingual programs and those experiencing English-only instruction through ESL programs is notable precisely because it occurs as early as third grade. All of the students in our study were identified as Spanish-speaking ELs as early as first grade. Given that oral English proficiency may take between 3 and 5 years and academic English proficiency may take between 4 and 7 years (Hakuta et al., 2000), we would expect academic English proficiency to emerge between third and sixth grades. Even finding small differences in basic English language reading skills by third grade may be a promising indicator that bilingual programs provide academic benefits for ELs over the long term.
Limitations
Despite the strengths associated with PSM in our study—primarily that it gives us some degree of confidence that underlying differences between students enrolled in each group are not driving any potential differences between programs in our third-grade reading outcome—there are also challenges associated with this approach. Limited power due to sample size constraints is one key challenge. For example, although each of our matched groups contained over 300 students, which is adequate in terms of statistical power and standard OLS regression models, models estimated with robust standard errors require larger samples to maintain statistical power. Moreover, conducting this study with data from one state allows for an interesting examination of EL programs in a state that has significantly invested in EL education in the last decade. In fact, we argue that using state data strengthens the ecological validity of our study, as these are the programs and tools being used by educators within Utah to make decisions about their EL population. However, it is not clear the degree to which our findings generalize to other states with different profiles of English learners. Similarly, our study focused exclusively on Spanish-speaking ELs. Although this makes for an important contribution to the literature given the significant size of the Spanish-speaking EL population throughout the United States, the findings may not generalize to other EL populations. It is especially important to replicate this type of quasi-experimental study using ecologically valid data with other language populations for whom bilingual programs exist but where the underlying components of the primary language differ significantly from English (e.g., differences in phonology, morphology, orthography, presence of cognates). For example, recent meta-analyses of transfer among students learning Chinese and English suggest that even with limited shared features, students are able to transfer learning between languages (Yang et al., 2017).
The use of DIBELS reading composite score as our measure of reading proficiency also poses a limitation. DIBELS has been criticized by reading experts for reducing the complex processes required in reading to simple subskill components that may not accurately represent a child’s full reading ability and for placing greater significance on the speed of decoding over thoughtful reading (Goodman, 2006). Although DIBELS has undergone revisions to address some early critiques, criticisms remain, especially when used with ELs (Butvilofsky et al., 2021). Despite the criticism, DIBELS continues to be widely used by teachers and schools in making educational decisions for children and therefore has good ecological validity in research.
The present study focused on bilingual and English-only programs, specifically, DLI, ESL, and SEI programs. Though these are common programs utilized throughout the United States, they represent only three of many types of programs used to support EL students. The state of Utah relies on a wide variety of programs including partial English immersion; total English immersion; transitional bilingual programs; native language supports; in addition to DLI, ESL, and SEI. Unfortunately, due to small sample sizes, we were not able to include the full array of offerings in our analysis examining early reading outcomes in English by third grade for all program types. Future studies relying on data from other states with significant EL populations but with a different array of EL programs should consider replicating this type of analysis. Finally, we recognize the importance of assessing Spanish-speaking ELs early reading skills in both English and Spanish. Unfortunately, the state of Utah only collected Spanish language assessments in fourth grade and only for students in DLI programs. Despite the importance of assessing students across the full scope of their linguistic repertoires, we are not able to compare Spanish language outcomes for children across different EL programs. School districts investing in dual language programs should consider adding parallel assessments in the heritage language. For example, districts and schools using the DIBELS assessments in third grade should consider incorporating the Spanish language version of this assessment (i.e., Indicadores Dinámicos del Éxito en la Lectura [IDEL]; D. L. Baker et al., 2007) for all participating DLI students to allow examination of students’ full literacy repertoires in English and Spanish. This is especially important in contexts with the intended goal of developing students biliteracy skills.
Conclusion
Our aim with this study was to use propensity score matching to compare the third-grade basic literacy skills of Spanish-speaking EL students in Utah who were enrolled in bilingual (DLI) and English-only (ESL and SEI) programs to determine whether there were significant differences in early literacy outcomes among these students. We found some promising evidence to suggest that students in DLI programs have better third-grade reading skills compared to those in ESL programs. Additionally, the comparison between students in the DLI and SEI program did not yield evidence of significant difference in third-grade reading skills. Together these findings suggesting that students in bilingual programs are not underperforming relative to their fellow students in English-only programs, and in some cases may obtain higher reading skills by third grade. It may be that ELs’ early reading skills are stronger when they participate in instructional programs that focus on strengthening English as well as their home language skills. Instruction in both languages ultimately gives children access to broader educational opportunities while maintaining connections to family and culture.
Footnotes
Acknowledgements
We are grateful to the Utah State Board of Education, especially the team responsible for data and data privacy, who have been diligent and responsive partners in helping us access and understand the state data systems.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305B210008 to the University of Virginia. The opinions expressed are those of the authors and do not represent the views of the Institute or the U.S. Department of Education.
Open Practices
The data sharing agreement with the Utah State Board of Education (#USBE210073DA) restricts the authors from sharing the data with outside parties. Data requests should be directed to the USBE Department of Data and Statistics. Analysis files for this article can be found at
. SPSS files provide syntax for data preparation and variable creation; SAS files provide syntax for PSM and OLS analyses.
Authors
NATALIA PALACIOS is associate professor of education at the University of Virginia School of Education and Human Development, 417 Emmet Street South, Charlottesville, VA 22903;
NATALIE L. BOHLMANN is founding director at Interactions Matter, LLC., 4185 Cascina Way Sarasota, FL 34238;
BETHANY A. BELL is associate professor of education at the University of Virginia School of Education and Human Development, 417 Emmet Street South, Charlottesville, VA 22903;
MIN HYUN OH is a postdoctoral fellow at the University of Virginia School of Education and Human Development, 417 Emmet Street South, Charlottesville, VA 22903;
YITONG YUE is a doctoral student at the University of Virginia School of Education and Human Development, 417 Emmet Street South, Charlottesville, VA 22903;
