Abstract
This U.S. study evaluated the effects of a reading intervention for emergent bilingual students with significant reading difficulties in Grades 6 and 7 within a multisite randomized controlled trial. Emergent bilinguals were randomized to a researcher-provided intervention (n = 171) or business-as-usual comparison condition (n = 169). Results on a measure of word reading indicated significant differences favoring treatment after Year 1; however, there were no significant differences between groups on standardized measures of reading comprehension. Initial English vocabulary knowledge moderated reading comprehension scores at the beginning of the second year of intervention, indicating that students’ response to instruction varied as a function of their initial English language proficiency. The discussion focuses on interpreting these findings with an emphasis on improving the effectiveness of interventions for secondary grade emergent bilinguals with significant reading difficulties.
Despite influential findings showing high-quality reading interventions can prevent reading difficulties among at-risk children in the primary grades (e.g., Mathes et al., 2005), many students demonstrate inadequate reading proficiency in the middle grades (Grades 4–9). In 2019, 27% of eighth graders performed below a basic level in reading (National Center for Education Statistics, 2020). Over the past two decades studies investigating the effects of reading interventions for students with reading difficulties in the middle grades consistently demonstrate low or no impact on reading comprehension outcomes (e.g., Cheung & Slavin, 2016; Donegan & Wanzek, 2021; James-Burdumy et al., 2012). Effects are consistently low when reading interventions are tested in rigorous randomized controlled trials (RCTs) using standardized measures of reading (Donegan & Wanzek, 2021). In a recent meta-analysis summarizing the effects of reading interventions for middle-grade struggling readers, Donegan and Wanzek (2021) found mean effect sizes ranging from g = 0.08 to 0.13 across reading outcomes. This was consistent with a previous review conducted by Scammacca and colleagues (2015), which found an average effect size of 0.13 on standardized reading outcomes for struggling readers in Grades 4–12.
Few of these intervention studies focused on emergent bilingual (EB) students with reading difficulties in the middle grades, despite the expectation that EBs represent a growing subpopulation that may reach 25% of the total student population in the U.S. in the next decade (National Educational Association, 2008). Richards-Tutor et al. (2016) conducted a meta-analysis examining the effects of reading interventions for EBs with reading difficulties (RD) in Grades K–12. The authors found word reading interventions were beneficial, particularly when implemented in the early grades (K–1). They identified only three studies focused on improving reading comprehension in Grade 2 and beyond, determining that the effects were “minimal” across these studies with only a few exceptions (p. 164). Since the publication of the Richards-Tutor’s review, additional studies have demonstrated the impact of interventions for improving word reading outcomes for EBs in the early grades (Dussling, 2020; Klingbeil et al., 2020). Relevant for older EBs, Vaughn, Martinez, et al. (2019; Williams & Vaughn, 2020) recently examined the effects of interventions for high school EBs with RD. Findings revealed small, not statistically significant effects on standardized measures of word reading, fluency, and reading comprehension, underscoring the critical need for research on interventions with older EBs. Yet, few past studies focused on improving reading outcomes, particularly reading comprehension, among EBs with RD in the middle grades (Capin et al., 2020).
Hall and colleagues (2017) examined the effects of class wide reading approaches in Grades 4 through 8 on the reading performance of EBs, reporting an overall mean effect size of (g = 0.35) across reading measures. However, the mean effect size was near zero (g = 0.01) for standardized reading comprehension measures. Most studies examined the effectiveness of teaching reading comprehension strategies (e.g., self-questioning, summarizing) and addressed the underdeveloped language skills of EBs through intensive vocabulary instruction. Programs that targeted both reading comprehension and vocabulary (g = 0.39) yielded larger gains in comprehension than studies that focused on only vocabulary (g = 0.08) across all measure types. We interpret these findings as suggesting instruction for EBs targeting academic language and reading comprehension directly are likely to be associated with improved reading outcomes. However, none of the interventions tested for older EBs focused on word reading—perhaps because they were not designed specifically for struggling readers.
The lack of rigorous research examining intensive interventions for EBs with RD, coupled with the limited success of class wide approaches to improve reading comprehension, underscores the need to develop and test reading interventions for EBs in the middle school grades with significant RD. Our research team sought to address this need by developing and testing the efficacy of an intensive, 2-year reading intervention, Reading Intervention for Students who are Emergent bilinguals with Reading Difficulties (RISE). The RISE intervention was designed to be both intensive and extensive, based on previous research with native English speakers that shows these interventions lead to positive changes for adolescents with significant reading problems. Vaughn et al. (2015) reported that a 2-year reading intervention implemented daily with ninth-grade students with significant reading problems yielded a strong, positive effect (g = 0.44) on a standardized measure of reading comprehension. Another study conducted by Vaughn and colleagues (2011) also reported significant effects on standardized measure of word reading (g = 0.28–0.44), text reading (g = 0.26–0.27), and reading comprehension (g = 0.52–0.56) after a second year of daily reading intervention. Both interventions targeted multiple components of reading (e.g., code- and meaning-based instruction) including multisyllabic word reading, vocabulary, reading fluency, and comprehension. However, these approaches have not been tested with EBs with RD.
Study Purpose
The purpose of this study was to conduct a multisite RCT to evaluate the effects of a comprehensive and extensive reading intervention for Spanish-speaking EB students with significant RD in Grades 6 and 7. One reason past interventions for native English speakers and EB students beyond the primary grades may have yielded small effects is because they do not address the deficits that readers with significant reading problems present related to word reading, vocabulary, and linguistic comprehension (Capin et al., 2023; Cho et al., 2019). Another explanation may be that interventions did not include adequate instructional intensity (e.g., class sizes were too large, instruction was too brief) to fully address their difficulties. This may be particularly true for studies conducted with EBs, for whom few intensive interventions have been studied (Richards-Tutor et al., 2016; Roberts et al., 2022). To address the need for rigorous research for EB students with RD in the middle school grades, this study evaluated the effects of a 2-year, multicomponent intervention in a rigorously conducted, multisite RCT.
In our initial grant submission and preregistration, we hypothesized that students receiving the researcher-provided treatment would outperform students in the business-as-usual (BaU) condition on word reading and fluency at the end of Year 1. Informed by research that indicates more intensive and extensive interventions are needed with older students with significant reading comprehension difficulties (Scammacca et al., 2015; Vaughn et al., 2010, 2011); we hypothesized there would be smaller differences in reading comprehension measures after 1 year of intervention, and that significant between-group differences on comprehension would be evident after 2 years of intervention. We also hypothesized that initial vocabulary performance would not moderate students’ performance on code-based reading skills (i.e., word reading, reading fluency), but that students with higher levels of vocabulary knowledge at pretest would demonstrate greater improvement in reading comprehension than students with lower levels of vocabulary. This was based on research that show differences among subpopulations of bilingual students are associated with differences in academic growth trajectories primarily for comprehension-related outcomes (Genesee et al., 2006; Hwang et al., 2015; Kieffer, 2008, 2011; Lawrence, 2012; Mancilla-Martinez & Lesaux, 2011). However, this finding has not been tested for EBs with significant RD, nor in studies where intensive interventions could affect trajectories. In summary, we addressed two research questions (RQs):
What are the effects of intensive reading intervention on reading outcomes for EB students in the middle school grades with significant RD after 1 and 2 years of reading intervention (RQ1)?
To what extent does students’ initial vocabulary moderate the effect of treatment on reading comprehension (RQ2)?
Method
Participants
Participants for this study came from six middle schools from two research sites in the southwestern United States. Of the six participating middle schools, four of the schools were urban and two were suburban. The enrollment at the schools ranged from 665 to 988 students, with a mean of 812. With regards to student demographics, all of the schools received Title I funding and had high percentages of students eligible for free or reduced-price lunch (school M = 94.1%). All schools had a higher proportion (school M = 32%) of students identified as limited English proficient based on their performance to the state’s English language proficiency exam (Texas English Language Proficiency Assessment System) than the state average. The four urban schools were similar demographically, with student enrollment by ethnicity ranging from 91% to 98% Hispanic, 1% to 4% Black, and 1% to 3% White. The remaining two suburban schools were more balanced in enrollment by ethnicity, ranging from 46% to 47% Hispanic, 33% to 35% White, and 13% Black.
Sixth and seventh grade students were selected for study participation based on four inclusion criteria: (a) failure on the state reading test in the prior school year, (b) currently identified as limited English proficient or reclassified as English proficient in the past two school years according to performance on state tests, (c) a home language of Spanish, and (d) were of Mexican or Central American descent. We adopted the latter criteria (home language of Spanish and Mexican or Central American descent) as part of our overall study which had multiple goals including to support the collection and analysis of epigenetic data, which required a relatively homogeneous sample with similar genetic admixture. We included students who were reclassified as nonlimited English proficient in the prior two school years because these students frequently require academic language support. Students identified as having disabilities and receiving special education services who met all other criteria were allowed to participate in the study. Our research team provided students who met these criteria and their parents an opportunity to consent to study participation. After obtaining consent, a total sample of 340 students (157 Grade 6 students and 183 Grade 7 students) were identified for participation.
Participating students who met inclusion criteria were blocked by school and randomly assigned to the RISE intervention condition (n = 171) or to continue with BaU condition (n = 169). Students randomly assigned to the treatment condition were treated in Years 1 and 2 (Grades 6 and 7) and students in both conditions were tested in the fall and spring of each year. We present student demographic information by condition in Table 1. Our study sample experienced attrition over the course of 2 years, particularly in Year 2 due to the COVID-19 pandemic. We discuss the impact of COVID-19 below in the Method section (see Impact of COVID-19 Pandemic) and present information related to attrition and how we addressed attrition in the data analysis and results section.
Demographic Information for Participants in the Extensive Reading Intervention Study.
Note. BaU = business-as-usual condition; INT = intervention condition; LEP = limited English proficient, all students identified as Hispanic.
RISE Intervention Procedures and Description
Students randomized to the RISE intervention condition received one period of instruction (approximately 45 to 50 mins per day) from the research team for two consecutive school years. The RISE reading intervention was implemented by trained teachers to medium-sized groups of students (M = 6.3). All students regardless of study condition continued to receive their core instructional classes. Treatment students participated in the RISE intervention in place of an elective class. School staff reported treatment students were removed from another reading intervention class they were scheduled to take with school personnel because they had failed the state reading test during the prior year. Students in the BaU condition remained in their school-provided reading intervention, which also met daily and included similar class sizes as the RISE treatment classes.
The RISE intervention is a multicomponent reading intervention intended to occur daily over two academic years, with an opportunity for students to extend their learning in the summer between years one and two. Grounded in the simple view of reading (Gough & Tunmer, 1986) and based on previous research that shows EBs with RD in the middle grades experience code-based and meaning-based difficulties (e.g., Capin et al., 2023; Cho et al., 2019), the intervention targets word reading, text reading fluency, academic vocabulary, and reading comprehension. This multicomponent approach found to be effective with high school students in a prior study with native English speakers (Vaughn et al., 2015). We allocated more instructional time to code-based word reading and reading fluency instruction in the first semester of the first year of intervention. In subsequent semesters, instructional time shifted to include more instruction focused on developing academic vocabulary and reading comprehension to help students read and understand complex grade-level texts. The rationale was that developing students’ word reading would contribute to their access to text reading, thus promoting fluency and vocabulary development (see Supplemental File F1 for an intervention timeline). During the summer between Years 1 and 2, treatment students received novels and accompanying reading guides that included comprehension and vocabulary support.
Word reading
The intervention lessons included multisyllabic word reading, and text fluency instruction taught using explicit instructional techniques. Advanced phonics instruction focused on teaching common vowel combinations (e.g., “R” controlled vowels), affixes, and multisyllabic word reading. This instruction was coupled with “rule-breaker” instruction that focused on learning irregular words commonly found in texts. The word reading instruction followed an explicit instructional sequence of teacher modeling, guided practice with support, and independent practices (Archer & Hughes, 2013). Moreover, it was systematic as students practiced reading vowel patterns and affixes in isolation and then read words and texts with those elements. Students also had opportunities to develop their word reading fluency by reading word lists of regular and irregular words in partners using timers to record their speed. In all word reading instruction, there was an emphasis on developing students’ English vocabulary knowledge, drawing on students’ Spanish language knowledge when possible.
Text reading fluency
Text Reading fluency instruction included the introduction of a repeated reading routine, which provided students with opportunities to read the same text multiple times. Teachers often modeled elements of fluent reading (i.e., accuracy, rate, and prosody) during the first read of the text. From there, students worked in pairs to read the text two additional times. Teachers were asked to provide corrective feedback when needed. Although these activities had a primary focus on developing efficient connected text reading skills, teachers checked for word and concept understanding after reading to ensure students maintained a focus on reading for understanding.
Reading Comprehension Instruction With Embedded Vocabulary and Self-Regulation Instruction
Reading Intervention for Students who are Emergent bilinguals with RD teachers provided students with opportunities to practice reading informational and narrative texts. Although text complexity varied, most informational texts were at a middle-school level and organized within unit topics focused on history or science. We selected middle-school level texts (which we called “stretch texts”) to challenge students to read texts beyond their current independent reading level and learn grade-level content knowledge. Explicit routines were used to enhance children’s vocabulary and background knowledge before text reading to support EBs in understanding. Vocabulary instruction included the use of vocabulary graphic organizers with student-friendly explanations, visuals, and opportunities for peers to discuss key vocabulary in multiple contexts. This prereading instruction was brief and primarily focused on Tier 2-type vocabulary (Beck et al., 2013) that were likely to be found across grade-level texts. When key vocabulary appeared in texts, teachers and students would discuss each word’s meaning in context to further support academic language development. Teachers also frequently developed background knowledge on the day’s topic by showing images and short video clips and describing key information, particularly information that students needed to know to access the day’s text.
Reading Intervention for Students who are Emergent bilinguals with RD teachers also used explicit instructional practices to teach reading comprehension strategies, such as summarizing, identifying the main idea, and making inferences. To support the gradual release of responsibility and students’ self-regulation, teachers taught students to use a self-monitoring document (see Supplemental File F2) that enabled students to set goals before reading, monitor their progress toward these goals while reading, and evaluate their progress after reading. Initially, the self-monitoring form identified goals for students related to monitoring text understanding and identifying unknown words and figuring out their meaning, as well as strategies for meeting these goals (e.g., identifying main idea). Over time, students were able to select their own goals and strategies to apply during reading and asked to reflect on their strategy use with a partner. To support students’ oral language development and reading comprehension, students worked on the structured text-based practices in cooperative learning groups. This provided students with an opportunity to integrate language-focused instruction (e.g., vocabulary, syntax, and morphology) as well as higher-order cognitive processes (e.g., inference-making) while reading and acquiring knowledge from informational texts. As shown in the self-monitoring form, a high-priority was placed on teaching students to develop word consciousness (i.e., recognize when they come across an unknown word and try to learn the meaning of that word) because EB students often do not know the meaning of words in grade-level texts. Teachers also explicitly taught students to highlight when they came across unknown vocabulary in texts and use vocabulary learning strategies (look within and around the word for clues) to try to determine the meaning of the unknown word.
Students were also provided opportunities to read age-appropriate novels. For example, students read Iqbal (D’Adamo, 2005), a fictionalized account of a Pakistani child who was sold into slavery and worked alongside other children at a carpet factory. Reading comprehension instruction was similar across text types with one exception. For narrative texts, instruction included instruction related to key story grammar elements that are critical to narrative text structures (Bogaerds-Hazenberg et al., 2021), such as internal feelings, problems, and solutions.
Summer Book Club Program
To minimize the potential for summer learning loss and the benefits of intervention experienced in Year 1, teachers provided students in the treatment condition with materials to engage in a summer book club program during the summer between Year 1 and 2. Similar to home-based, summer reading interventions we provided students with texts and self-paced reading guides to support vocabulary acquisition and reading comprehension These guides were aligned to the instructional practices students engaged with during the school year. Children selected three books from eight choices that the research team selected based on results from surveyed students to better understand topics that were of interest to them. The research team privileged texts that were identified as culturally relevant to Spanish-speaking children of Mexican and Central American family ancestries because students demonstrated higher engagement with these texts during the school year. Finally, we considered student reading ability and selected novels that ranged in Lexile from 600L to 840L, which approximately corresponded to students’ reading performance on curriculum-based measures. To support parent knowledge of the summer reading program, a parent literacy night was held at each campus.
Reading Interventionists
Reading interventionists were currently or former credentialed public school teachers hired and trained by the research team to provide instruction. All interventionists were female, held at least a bachelor’s degree, and had at least 3 years of teaching experience. All reading interventionists participated in an 8-hour training and then participated in practice sessions for 1 week. In addition to the initial training, members of the research team provided ongoing coaching and support as instruction was being implemented.
Fidelity
Treatment fidelity was measured as a multidimensional construct, with multiple indicators evaluating the extent to which the treatment was implemented as planned, including: (1) treatment adherence; (2) quality of instruction; (3) treatment dosage; and (4) treatment differentiation, an indicator of the differences between the treatment condition and the comparison condition. Adherence and quality were evaluated with a tailored observation instrument our research team has utilized in over 20 randomized control trials. This observation instrument included adherence ratings on a 4-point Likert-type scale (1 = low adherence through 4 = good adherence) across nine instructional activities implemented within the intervention class. In addition, intervention quality was rated on eight global indicators of quality (e.g., quality of feedback, quality of behavior management) on a 7-point Likert-type scale (1 = poor quality through 7 = highest quality).
Intervention teachers audio-recorded all intervention lessons. We then randomly selected 10% of the total intervention lessons for coding, blocking by teacher. A trained team of coders established at least 90% interrater reliability before independently coding each lesson randomly selected for fidelity coding. Across intervention teachers and instructional components, adherence was generally good (median adherence by component = 3.2, SD = 1.0, range: 3.0–3.7). Across global quality indicators, tutors generally scored high (mean quality across indicators = 5.8, SD = 1.2, mean range = 5.3–6.1).
BaU Instruction
We interviewed school staff to understand the counterfactual instruction that students randomized to the BaU condition received. School staff reported that all participating students were enrolled in a reading intervention class because all students had failed the state reading test the prior year. There was a lot of variation across the six school sites in the reading instruction provided. Some schools reported that teachers used miscellaneous resources to teach reading with an emphasis on study skills and test taking in preparation for the next state test. Other schools reported using the SRA Corrective Reading (a multicomponent reading intervention) and the Read to Achieve (a comprehension-focused reading program).
Impact of COVID-19 Pandemic on Study Implementation
During the second semester of Year 2 (late March 2020), all participating districts discontinued in-person instruction in response to the COVID-19 pandemic. As a result, in-person instruction was prematurely terminated for all students in the treatment condition. At the time of the unforeseen school closures, treatment students had completed approximately 70% of the intended lessons for the second year and about 85% of the intended intervention lessons across both years. We were unable to continue with intervention lessons for the same reason that schools had difficulty engaging students in remote instruction. Participating schools reported that they attempted to provide instruction via videoconference in April and May of 2020, but most students did not participate in this instruction because many families did not have the resources to engage in remote instruction. For some families, this occurred because families lacked resources such as a computer to access videoconferencing. In some cases, they had devices, but no one was available during the school day to help their child access the remote instruction.
Our research team was able to remotely administer a small number of assessments in May through June of 2020 to assess student learning postintervention. Our research team selected assessments for posttesting that could be easily implemented remotely. As shown in Table 2, these assessments included the GMRT-4 reading comprehension, the TOWSRF-2, and the KTEA-3 SRF tests. We identified these assessments because they could be administered without deviations from standardized protocols over the telephone with the support of a parent. The assessment stimuli were packaged in envelopes within envelopes that allowed student prompts and response packets to remain inaccessible to students until the time of testing. Bilingual members of the assessment team, blind to students’ study condition, administered tests with students over the phone while on speakerphone with an adult family member (e.g., parent) and the participating child. Although our preference would have been to provide the full battery of assessment at each time point, emerging research suggests that these types of measures can be reliably applied via a telephone administration (e.g., Larner, 2021; Magimairaj et al., 2022).
Test Administration Schedule in the Extensive Reading Intervention Study.
Note. X = data collected; U = unable to collect due to COVID-19; WJ-III = Woodcock Johnson-III; PV = Picture Vocabulary; GMRT-RC = Gates-MacGinitie Reading Test Reading Comprehension; KTEA-3 = Kaufman Test of Educational Achievement–Third Edition; LWR = Letter & Word Recognition; WRF = Word Recognition Fluency; WRMG = Word Reading in the Middle Grades; SRF = Silent Reading Fluency; TOSWRF-2 = Test of Silent Word Reading Fluency-2.
Measures
Members of the research team who were not involved in instruction and were blind to study conditions administered all assessments. The assessment team members received extensive training from a senior member of the research team and established 100% reliability in a mock testing session prior to administering assessments in the field. Further, all assessments were double-scored and -entered to ensure the reliable collection of data.
Kaufman Test of Educational Achievement (Third Edition) Letter Word Recognition (KTEA-3 LWR)
The KTEA-3 LWR subtest is an individually administered assessment of the student’s ability to accurately recognize letters and read words. The measure is comprised of letters and then words of increasing difficulty. Split-half reliabilities for ages 13–15 are .96 to .97.
Word Reading in the Middle Grades (WRMG)
The researcher-developed reading measure consisted of 45 items that were administered individually. The first 15 items consisted of word parts (seven vowel digraphs and eight affixes). The next 15 items included multisyllable words (ranging from 2 to 4 syllables) that followed common grapheme-phoneme correspondences, such as reproach and spaciously. The final 15 items included both single and multisyllable words that contained irregular grapheme-phoneme correspondences. Examples of these words include although, instead, and rhythm. Test administrators asked to identify the word parts and read the words.
KTEA-3 Word Recognition Fluency (KTEA-3 WRF)
The KTEA-3 WRF subtest is a test of word reading fluency. In this timed subtest, test administrators asked each student to read a list of single words aloud as quickly and accurately as possible during two 15-second trials. The test manual reports high alternate form reliability (.89) and concurrent validity (ranging from .71 to .90) for this age range.
Test of Silent Word Reading Fluency (2nd edition; TOSWRF-2)
The TOSWRF is a timed measure of word reading fluency that can be administered individually or in groups. The test presents rows of words with no spaces, and the directions prompt students to draw a line between as many unrelated words as possible in three mins. Practice items are presented before administration to ensure students understand directions. The authors of TOSWRF-2 report test–retest reliability is high (range from .84 to .91).
KTEA-3 Silent Reading Fluency (KTEA-3 SRF)
The KTEA-3 SRF is a timed, individually administered test in which a student silently reads simple interrogative statements (e.g., is water dry?) and marks yes or no to each. The items are intended to assess the child’s text-reading ability rather than their knowledge base, so the language and knowledge presented in the statements are simple. The KTEA-3 manual reports adequate split-half reliability (.82) and alternative form reliability (.78).
Woodcock Johnson-III Picture Vocabulary (WJ-III PV)
We administered the WJ-III PV (Woodcock et al., 2001) subtest in Fall of Year 1 to obtain an estimate of students’ general vocabulary knowledge in English. This information was used to characterize the sample and assess the degree to which treatment effects varied based on students’ English language vocabulary knowledge (Research question 2). The WJ-III PV is an individually administered measure of word knowledge and expressive vocabulary. Students are asked to identify the appropriate picture when providing multiple choices. The WJ-III PV demonstrates high internal reliability (α = .81).
Gates-MacGinitie Reading Test Reading Comprehension Subtest
The Gates-MacGinitie Reading Test (GMRT-4) reading comprehension subtest is a timed, group-administered test of reading comprehension. Students are asked to read expository and narrative text passages ranging from 3 to 15 sentences in length and answer three to six multiple-choice questions per passage for 35 mins. Items increase in difficulty as the student progresses through the test. Internal consistency reliability ranges from .91 to .93 and alternate form reliability is reported as .80 to .87.
Analytic Methods
We used multilevel regression (MLM; Raudenbush & Bryk, 2002) to evaluate the efficacy of the intensive reading intervention on reading outcomes and to estimate the moderating effect of initial vocabulary skills. We nested students in schools. We ignored the effect of the classroom because middle school students have multiple teachers throughout the school day. Interventionists delivered intervention to students in the treatment condition, which represents partial clustering. However, only two of six schools had more than one tutor during each year of the intervention, thus clustering at the tutor level was ignored. Using R (R Core Team, 2020) and the lme4 (Bates et al., 2015) and lmerTest (Kuznetsova et al., 2017) packages, we estimated sample average treatment effects with restricted maximum likelihood estimation (REML). We fit the following general model to estimate treatment’s effect on all outcomes:
where Tij is the dummy coded treatment variable and PreC ij is the student-level pretest mean centered on the school means, (Preij—Pre.j). Restricted maximum likelihood estimation outperforms other estimators when variance components are comprised of a small number of clusters (McNeish & Stapleton, 2016). However, if random effects models were not estimable due to a failure to converge, we estimated fixed effects models instead. We included a grand-mean centered pretest as a covariate for models with minimal clustering (i.e., when models reduced to fixed effects). We modeled the effects of 1 year of treatment (Time Point 2), at the start of the second year of treatment (Time Point 3), and after 2 years of treatment (Time Point 4) across the outcomes.
We fit the following model to evaluate the moderating effect of initial vocabulary scores:
where WJ3C
ij
is the group-mean centered WJ-III scores, (WJ3ij—WJ3
.j
). Again, in cases where ICC’s were equal to or less than .01, we fit fixed effects models with grand-mean centering of covariates. To correct the type I error rate, we implemented the Benjamini-Hochberg procedure for controlling the false discovery rate (Benjamini & Hochberg, 1995). This procedure resulted in a critical value of
Results
Sample Attrition and Baseline Equivalence
There were no statistically significant differences in mean scores between study conditions at pretest (Fall of Year 1). Moreover, there were no effect size differences greater than 0.25 at pretest and, thus, baseline equivalence was established for the full sample according to What Works Clearinghouse (WWC, 2020) standards. However, attrition can undermine the assumption that the treatment and BaU groups do not differ on measured and unmeasured variables at baseline, thus threatening internal validity. We report attrition as the proportion of a randomized sample with missing outcome data. Differential attrition describes differences between rates of attrition for the treatment group and BaU. Even in the most well-designed studies, patterns of overall and differential attrition can introduce bias by creating imbalance in a previously balanced design.
To evaluate the threat of attrition, we adhered to the WWC (2020) recommendations by first calculating sample attrition and then assessing baseline equivalence when sample attrition was considered high. We present the rates of overall and differential attrition for each measure in Supplemental File S1. Although the WWC does not require reporting sample attrition by measure, we do so because we seek to be transparent that there were small differences in the number of students who took each test (e.g., 286 students completed the GMRT whereas 292 students completed the KTEA-3 LWR in Fall of Year 2) due to student absences. We treat these data as missing at random because the students that were absent during the administration of the posttest battery had either transferred out of the participating school or were missing due to being unreachable when contacted multiple times after schools closed due to COVID-19. Because we only had missing data on the dependent variable (and less than 5% on the covariate), listwise deletion was used to address missing outcome data (Jakobsen et al., 2017).
Applying the WWC (2020) standards, all measures collected at spring of Year 1 had low levels of sample attrition based on the combination of overall and differential attrition. This suggests that baseline equivalence was not a threat to the internal validity of the study after 1 year of instruction and contrasts examining differences after 1 year of instruction met “WWC design standards without reservations” (WWC, 2020, p. 9). The combined rates of overall and differential attrition also met WWC standards for “tolerable level of potential bias” at the beginning of Year 2, which suggests the effects of attrition experienced by children moving over the summer was also not a considerable threat to the study (p. 14).
However, COVID-19 led to substantial attrition during Year 2, which impacted the size of the analytic sample in Spring of Year 2. The combination of overall and differential attrition did not meet WWC standards for acceptable threat under cautious or optimistic assumptions for the three outcome measures administered in Spring of Year 2. For RCTs with high attrition, WWC recommends that baseline equivalence between the treatment and comparison conditions be assessed in the analytic sample. We used Hedges’ g to index baseline differences on the three outcome measures administered at Time 4. The TOSWRF-2 and KTEA-3 SRF measures were not administered at pretest (i.e., Fall of Year 1), so we evaluated baseline equivalence for these analytic samples using the pretest KTEA-3 WRF measure. We selected KTEA-3 WRF because it was the most similar to the TOSWRF-2 and to the KTEA-3 SRF in design and in the construct assessed of all the measures collected at pretest. Based on their effect sizes, KTEA-3 SRF (g = 0.52) and TOSWRF-2 (g = 0.38) did not satisfy equivalence standards (baseline effect size < 0.25). This suggests treatment and comparison students at spring of Year 2 may not be equivalent at baseline, although we encourage readers to consider this conclusion with caution. We evaluated baseline equivalence for the GMRT-4 RC using performance on the same measure at pretest (Fall Year 1) and found that baseline equivalence (g = 0.25) was met. Although we adjusted for pretest differences in the analytic sample when evaluating main effects, we note that attrition represents a substantial concern when examining effects at the end of Year 2.
Main Effects of Intervention (RQ1)
Our first research question addressed the effect of RISE on reading outcomes. Supplemental File S2 presents standard score means and standard deviations for the standardized outcome measures at pretest and posttest, as well as for the moderator variable (WJ-III PV). We used extended scale scores and growth scale scores for data analysis with these measures. We present extended and growth scale score means and standard deviations in Supplemental File S3 and unconditional model estimates in Supplemental File S4.
As shown in Table 3A and 3B, results in the spring of Year 1 revealed a significant, positive effect favoring the treatment condition over BaU on the WRMG, β = 5.70, SE = 0.79, t(295.81) = 7.25, p ≤ .001; g = 0.72, 95% CI [0.52, 0.92]. No significant treatment effects were found on standardized word reading measures: KTEA LWR (
Estimating the Main Effect of Intervention Across Two Years: Fixed Effects.
Estimating the Main Effect of Intervention Across Two Years: Random Effects.
Note. Bolded values indicate statistically significant main effects (p < .05); ICC = intraclass correlation coefficient; g = Hedges’ g effect size; GMRT-RC = Gates-MacGinitie Reading Test–Reading Comprehension; KTEA-3 = Kaufman Test of Educational Achievement–Third Edition; LWR = Letter & Word Recognition; WRF = Word Recognition Fluency; WRMG = Word Reading in the Middle Grades; SRF = Silent Reading Fluency; TOSWRF-2 = Test of Silent Word Reading Fluency-2.
The Moderating Effect of English Vocabulary (RQ2)
Our second research question concerned the extent to which the effects of RISE on reading outcomes were moderated by students’ initial English vocabulary performance. Model estimates for the moderating effect of initial vocabulary scores on the relation between treatment and GMRT-RC are reported in Table 4A and 4B. No moderation effect of initial vocabulary scores was found during spring of Year 1 or spring of Year 2. We found that initial English vocabulary did moderate the relation between treatment and GMRT-RC scores at the beginning of year 2, β = 0.56, SE = 0.19, t(268) = 2.983, p ≤ .001. Using the Johnson-Neyman regions of significant technique (Johnson & Neyman, 1936), we found the regions of significance were <−9.96 and >11.60 of the grand-mean centered initial scores on the WJ-III PV. Figure 1 displays the interaction between treatment effect and initial vocabulary scores. For students who scored below average on WJ-III PV, those in the treatment condition had lower scores on GMRT-RC than the BaU. For students with above average performance on WJ-III PV measure in Fall of Year 1, those randomized to the treatment condition showed greater GMRT-RC scores than the BaU group in Fall of Year 2.
Estimating the Moderating Effect of Vocabulary on Reading Comprehension Across Two Years: Fixed Effects.
Estimating the Moderating Effect of Vocabulary on Reading Comprehension Across Two Years: Random Effects.
Note. Bolded values indicate statistically significant main effects (p < .05); ICC = intraclass correlation coefficient; g = Hedges’ g effect size; GMRT-RC = Gates-MacGinitie Reading Test–Reading Comprehension; KTEA-3 = Kaufman Test of Educational Achievement–Third Edition; LWR = Letter & Word Recognition; WRF = Word Recognition Fluency; WRMG = Word Reading in the Middle Grades; SRF = Silent Reading Fluency; TOSWRF-2 = test of silent word reading fluency-2.

Interaction plot of RISE treatment and WJ-III picture cocabulary on GMRT reading comprehension.
Discussion
The goal of this randomized trial was to evaluate the efficacy of the RISE reading intervention on reading outcomes for Spanish-speaking EBs in middle school with significant, persistent RD. Following 1 year of intervention, we observed a significant between group difference (g = 0.72) on the WRMG word reading measure. There were no statistically significant differences on standardized measures of word reading, word reading fluency, or reading comprehension, though effect sizes typically favored treatment. At the end of Year 2, we used a novel remote testing protocol administered over the phone and facilitated by parents due to school closures related to the COVID-19 pandemic. We observed no statistically significant differences between intervention and BaU students on a measure of word reading fluency, a sentence verification task, or a reading comprehension measure. Given the uncertainty and difficulty associated with interpreting Year 2 effects amid unexpected and disruptive school closures, we divide our discussion by study year, beginning with Year 1 findings.
Effects of 1 Year of Intervention
We hypothesized that we would observe significant intervention effects on word reading and reading fluency following 1 year of intervention but would not observe significant effects on reading comprehension at that time. We observed statistically significant effects on the WRMG (g = 0.72) but did not observe statistically significant effects on standardized measures of word reading (g = 0.14) or word reading fluency (g = 0.01). There were no statistically significant effects for reading comprehension (g = 0.10). We interpret these observed effects as generally consistent with our Year 1 hypotheses, which were based on previous studies (Miciak et al., 2018; Vaughn, Roberts, Miciak, et al., 2019), suggesting that word reading and reading fluency may be more immediately malleable to intensive interventions than reading comprehension, which relies on impacting broad domains such as linguistic comprehension and general knowledge.
The relatively large effect (g = 0.72) observed on the researcher-developed word reading measure is encouraging, particularly in light of its focus on advanced word reading skills and its greater sensitivity to detect small differences in performance at participants’ skill level. In the first semester of Year 1, the RISE intervention included a relatively intensive focus on building the code-based skills of word reading and connected text reading fluency. Tutors split instructional time evenly between code-based instruction and meaning-based instruction. This code-based instruction included advanced phonics instruction focused on vowel combinations, affixes, and multisyllabic word reading. In addition, instruction focused on reading “rule-breaker” words—irregular words of low frequency. This word reading measure directly assessed students’ ability to apply these skills, with items consisting of word parts (i.e., vowel combinations, affixes), multisyllable phonetically regular words, and irregular words. Although the effect on our standardized measure of word reading was not statistically significant, it should be noted that the observed effect size for word reading accuracy (g = 0.14) is consistent with previous meta-analytic effect sizes for word reading for intervention studies with struggling readers in Grades 4–12 (g = 0.14; Wanzek et al., 2013).
To put this effect size in the broader educational context, a 0.15 SD increase represents improvements on a standardized test that can be expected from about three-quarters of an academic school year for middle school children (Bloom et al., 2008). Put differently, this effect is similar to the effect one would see from having a very good teacher versus an average teacher for about 1 year (Hanushek, 2011). When interpreted in combination with the large effect on the WRMG, we find the small, directionally appropriate (but not statistically significant) effect on word reading promising.
The lack of any discernible effect on word reading fluency is surprising and somewhat puzzling, as word and text reading fluency were a significant focus of the RISE intervention, particularly in Year 1. In previous studies with similar populations and interventions, we have observed significant effects on reading fluency (Miciak et al., 2018; Vaughn, Roberts, Capin, et al., 2019), although not always (Vaughn, Martinez, et al., 2019). In addition, an inspection of standard scores for both word reading and reading fluency following 1 year of intervention indicate that participants continued to demonstrate significant normative deficits in word reading, reading fluency, and reading comprehension. This is consistent with previous research that finds most older students with RD experience difficulties across code- and meaning-based skills (Cirino et al., 2013), including samples restricted to EBs (Capin et al., 2023; Miciak et al., 2022). Although those deficits were most pronounced in reading comprehension, deficits in word reading and reading fluency should not be ignored, because accurate and effortless text reading is a gateway to comprehension (Stevens et al., 2017).
That we did not find statistically significant between groups differences in reading comprehension as measured by a standardized measure was consistent with previous research documenting the difficulty of remediating comprehension deficits in a single academic year (Clemens & Fuchs, 2022). Our study findings suggest that the difficulty improving reading comprehension among students with significant reading comprehension difficulties extends to EBs with significant RD. There are two potential (and partial) explanations for this difficulty. First, as noted above, most secondary students with RD experience deficits in foundational reading skills that make understanding grade-level texts difficult. The EBs with significant RD in our study were not immune from these difficulties. In addition, most secondary students with comprehension difficulties experience challenges in linguistic comprehension and general knowledge (e.g., Cirino et al., 2013), particularly EBs. As students age, these domains become increasingly predictive of reading comprehension, particularly comprehension of texts featuring more complex language and ideation. However, these broad domains are not easily remediated; the scope of skills and knowledge that fall under the umbrellas of linguistic comprehension and background knowledge belies such hopes.
The preceding paragraph also formed the basis for our moderation hypothesis: that intervention participants with relatively higher baseline vocabulary levels would benefit from the intervention more than participants with more pronounced vocabulary deficits. We hypothesized that as the intervention improved foundational reading skills, these improvements would afford participants with relatively higher language proficiency greater access to text. We did not observe a statistically significant interaction in Spring of Year 1. However, in fall of Year 2 (prior to Year 2 intervention) there was a significant interaction of treatment assignment with reading comprehension, based on initial levels of vocabulary. Students in the treatment condition with higher initial vocabulary scores scored higher on the reading comprehension measure than students in the BaU comparison condition, though we interpret this finding with caution.
Year 2 Findings
School closures related to the COVID-19 pandemic interrupted the 2-year intervention. These closures occurred in March 2020, approximately 2 months before posttest. The research team designed a remote assessment battery that could be administered with the assistance of parents. Usable data was received from 151 participants. However, there was differential uptake across conditions (more treatment students participated in remote assessment), perhaps due to greater commitment among intervention students and families to the research study. In addition, an inspection of standard scores from Year 1 of the trial indicated that students who participated in the remote assessment battery tended to score higher at pretest. These factors complicate interpretation of Year 2 findings.
After controlling for baseline performance, we observed no significant differences between students assigned to the treatment and comparison condition. Effect sizes on the remote testing protocol were directionally appropriate and of a magnitude indicative of meaningful differences. However, the small sample size and differential attrition reduce confidence for interpreting these observed positive effects for treatment. Yet, the converse must also be noted: data collected at the end of Year 2 do not persuade us to reject our initial hypotheses for the multi-year treatment. Students who enter middle school with significant RD– particularly EBs who face the compounding challenge of acquiring language proficiency in a second language–are likely to require multiple years of instructional support to address multiple deficit areas. It is also possible the language development needs of the target students were extensive, and it may be that a school-wide approach to language development would be necessary to adequately address language comprehension.
Limitations
As noted above, the findings from Year 2 should be interpreted with caution, due to the disruptions introduced by the COVID-19 pandemic. Despite significant efforts, we were successful in engaging 151 families (52% of the families who started the school year) to complete the remote testing protocol. This experience was consistent with that of many schools in high-poverty areas, which struggled to maintain contact with students when schools shifted to remote instruction (e.g., Stelitano et al., 2020). This led to high levels of attrition for our experiment, which introduces the possibility of bias for the Year 2 results. In addition, the remote assessment process occurred in June 2020, following 3 months of inconsistent remote schooling. Our schools reported that a majority of the children in our study had no contact with their schools during this time. In many ways, this is equivalent to conducting posttest in fall, following a full summer vacation. It is possible, then, that this delay in the administration of the posttest impacted results—although such assertions are speculative. Despite these concerns, we are committed to publishing the results from Year 2 because the 2-year design represented our a priori, pre-registered research plans. Also, although COVID-19 hampered our study into the efficacy of the RISE intervention, describing the influence of COVID-19 on our school-based research Year 2 sheds further light on the significant impact of the pandemic on schooling, particularly in high-poverty, urban environments.
Conclusion
We set out to evaluate the effects of a comprehensive, intensive reading intervention, provided for 2 years, for EBs with significant RD in the middle school grades. Based on the What Works Clearinghouse standards (WWC, 2020), we used procedures to maximize the rigor of our study, including (a) random assignment of students to study conditions; (b) well-defined sample selection criteria to allow for generalization of findings; (c) use of sample size with sufficient statistical power; (d) thorough documentation of attrition and adjustments for differential attrition in analyses; (e) clear division of intervention implementers and assessment data collectors to keep conditions blind; (f) precise procedures for intervention implementation to allow for subsequent replication studies; (g) documentation of the core instructional components and the fidelity of implementation; (h) use of technically adequate, standardized measures of student outcomes with multiple measures of constructs; (j) analyses that recognize the “nestedness” of educational data; (k) examination of a learner characteristic as a moderator of efficacy to explore variation in outcomes; and (l) pre-registration of study.
This study would have represented the largest RCT for an understudied population, Spanish-speaking adolescent EBs with RD. However, like many well-designed studies underway in March 2020, this randomized trial was interrupted when schools closed due to the COVID-19 pandemic. This interruption ended the intervention approximately 3 months early and forced the research team implement a novel remote testing protocol facilitated by parents. At the end of Year 1 of the RCT, we observed statistically significant between group differences on a word reading measure. However, there were no statistically significant differences on standardized measures of word reading, reading fluency, or reading comprehension at the end of Year 1 or Year 2. The Year 1 results suggest that single-year, small-group interventions may be insufficient in addressing the reading comprehension difficulties of middle school EBs in high-poverty. Future research may need to consider school wide approaches to instruction that ensure EBs with RD have opportunities to receive rich academic language instruction and engage in text-based activities across the school day. The Year 2 results must be interpreted with caution due to smaller sample sizes and differential uptake of the remote testing protocol.
Supplemental Material
sj-docx-1-rse-10.1177_07419325231213876 – Supplemental material for An Extensive Reading Intervention for Emergent Bilingual Students With Significant Reading Difficulties in Middle School
Supplemental material, sj-docx-1-rse-10.1177_07419325231213876 for An Extensive Reading Intervention for Emergent Bilingual Students With Significant Reading Difficulties in Middle School by Philip Capin, Jeremy Miciak, Bethany H. Bhat, Greg Roberts, Paul K. Steinle, Jack Fletcher and Sharon Vaughn in Remedial and Special Education
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
This research was supported by grant P50 HD052117-07 from Eunice Kennedy Shriver National Institute of Child Health and Human Development at the National Institutes of Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of Eunice Kennedy Shriver National Institute of Child Health and Human Development or the National Institutes of Health.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
