Sage Journals: Discover world-class research

Abstract

This U.S. study evaluated the effects of a reading intervention for emergent bilingual students with significant reading difficulties in Grades 6 and 7 within a multisite randomized controlled trial. Emergent bilinguals were randomized to a researcher-provided intervention (n = 171) or business-as-usual comparison condition (n = 169). Results on a measure of word reading indicated significant differences favoring treatment after Year 1; however, there were no significant differences between groups on standardized measures of reading comprehension. Initial English vocabulary knowledge moderated reading comprehension scores at the beginning of the second year of intervention, indicating that students’ response to instruction varied as a function of their initial English language proficiency. The discussion focuses on interpreting these findings with an emphasis on improving the effectiveness of interventions for secondary grade emergent bilinguals with significant reading difficulties.

Keywords

English language learners middle school(s)reading instruction

Despite influential findings showing high-quality reading interventions can prevent reading difficulties among at-risk children in the primary grades (e.g., Mathes et al., 2005), many students demonstrate inadequate reading proficiency in the middle grades (Grades 4–9). In 2019, 27% of eighth graders performed below a basic level in reading (National Center for Education Statistics, 2020). Over the past two decades studies investigating the effects of reading interventions for students with reading difficulties in the middle grades consistently demonstrate low or no impact on reading comprehension outcomes (e.g., Cheung & Slavin, 2016; Donegan & Wanzek, 2021; James-Burdumy et al., 2012). Effects are consistently low when reading interventions are tested in rigorous randomized controlled trials (RCTs) using standardized measures of reading (Donegan & Wanzek, 2021). In a recent meta-analysis summarizing the effects of reading interventions for middle-grade struggling readers, Donegan and Wanzek (2021) found mean effect sizes ranging from g = 0.08 to 0.13 across reading outcomes. This was consistent with a previous review conducted by Scammacca and colleagues (2015), which found an average effect size of 0.13 on standardized reading outcomes for struggling readers in Grades 4–12.

Few of these intervention studies focused on emergent bilingual (EB) students with reading difficulties in the middle grades, despite the expectation that EBs represent a growing subpopulation that may reach 25% of the total student population in the U.S. in the next decade (National Educational Association, 2008). Richards-Tutor et al. (2016) conducted a meta-analysis examining the effects of reading interventions for EBs with reading difficulties (RD) in Grades K–12. The authors found word reading interventions were beneficial, particularly when implemented in the early grades (K–1). They identified only three studies focused on improving reading comprehension in Grade 2 and beyond, determining that the effects were “minimal” across these studies with only a few exceptions (p. 164). Since the publication of the Richards-Tutor’s review, additional studies have demonstrated the impact of interventions for improving word reading outcomes for EBs in the early grades (Dussling, 2020; Klingbeil et al., 2020). Relevant for older EBs, Vaughn, Martinez, et al. (2019; Williams & Vaughn, 2020) recently examined the effects of interventions for high school EBs with RD. Findings revealed small, not statistically significant effects on standardized measures of word reading, fluency, and reading comprehension, underscoring the critical need for research on interventions with older EBs. Yet, few past studies focused on improving reading outcomes, particularly reading comprehension, among EBs with RD in the middle grades (Capin et al., 2020).

Hall and colleagues (2017) examined the effects of class wide reading approaches in Grades 4 through 8 on the reading performance of EBs, reporting an overall mean effect size of (g = 0.35) across reading measures. However, the mean effect size was near zero (g = 0.01) for standardized reading comprehension measures. Most studies examined the effectiveness of teaching reading comprehension strategies (e.g., self-questioning, summarizing) and addressed the underdeveloped language skills of EBs through intensive vocabulary instruction. Programs that targeted both reading comprehension and vocabulary (g = 0.39) yielded larger gains in comprehension than studies that focused on only vocabulary (g = 0.08) across all measure types. We interpret these findings as suggesting instruction for EBs targeting academic language and reading comprehension directly are likely to be associated with improved reading outcomes. However, none of the interventions tested for older EBs focused on word reading—perhaps because they were not designed specifically for struggling readers.

The lack of rigorous research examining intensive interventions for EBs with RD, coupled with the limited success of class wide approaches to improve reading comprehension, underscores the need to develop and test reading interventions for EBs in the middle school grades with significant RD. Our research team sought to address this need by developing and testing the efficacy of an intensive, 2-year reading intervention, Reading Intervention for Students who are Emergent bilinguals with Reading Difficulties (RISE). The RISE intervention was designed to be both intensive and extensive, based on previous research with native English speakers that shows these interventions lead to positive changes for adolescents with significant reading problems. Vaughn et al. (2015) reported that a 2-year reading intervention implemented daily with ninth-grade students with significant reading problems yielded a strong, positive effect (g = 0.44) on a standardized measure of reading comprehension. Another study conducted by Vaughn and colleagues (2011) also reported significant effects on standardized measure of word reading (g = 0.28–0.44), text reading (g = 0.26–0.27), and reading comprehension (g = 0.52–0.56) after a second year of daily reading intervention. Both interventions targeted multiple components of reading (e.g., code- and meaning-based instruction) including multisyllabic word reading, vocabulary, reading fluency, and comprehension. However, these approaches have not been tested with EBs with RD.

Study Purpose

The purpose of this study was to conduct a multisite RCT to evaluate the effects of a comprehensive and extensive reading intervention for Spanish-speaking EB students with significant RD in Grades 6 and 7. One reason past interventions for native English speakers and EB students beyond the primary grades may have yielded small effects is because they do not address the deficits that readers with significant reading problems present related to word reading, vocabulary, and linguistic comprehension (Capin et al., 2023; Cho et al., 2019). Another explanation may be that interventions did not include adequate instructional intensity (e.g., class sizes were too large, instruction was too brief) to fully address their difficulties. This may be particularly true for studies conducted with EBs, for whom few intensive interventions have been studied (Richards-Tutor et al., 2016; Roberts et al., 2022). To address the need for rigorous research for EB students with RD in the middle school grades, this study evaluated the effects of a 2-year, multicomponent intervention in a rigorously conducted, multisite RCT.

In our initial grant submission and preregistration, we hypothesized that students receiving the researcher-provided treatment would outperform students in the business-as-usual (BaU) condition on word reading and fluency at the end of Year 1. Informed by research that indicates more intensive and extensive interventions are needed with older students with significant reading comprehension difficulties (Scammacca et al., 2015; Vaughn et al., 2010, 2011); we hypothesized there would be smaller differences in reading comprehension measures after 1 year of intervention, and that significant between-group differences on comprehension would be evident after 2 years of intervention. We also hypothesized that initial vocabulary performance would not moderate students’ performance on code-based reading skills (i.e., word reading, reading fluency), but that students with higher levels of vocabulary knowledge at pretest would demonstrate greater improvement in reading comprehension than students with lower levels of vocabulary. This was based on research that show differences among subpopulations of bilingual students are associated with differences in academic growth trajectories primarily for comprehension-related outcomes (Genesee et al., 2006; Hwang et al., 2015; Kieffer, 2008, 2011; Lawrence, 2012; Mancilla-Martinez & Lesaux, 2011). However, this finding has not been tested for EBs with significant RD, nor in studies where intensive interventions could affect trajectories. In summary, we addressed two research questions (RQs):

What are the effects of intensive reading intervention on reading outcomes for EB students in the middle school grades with significant RD after 1 and 2 years of reading intervention (RQ1)?

To what extent does students’ initial vocabulary moderate the effect of treatment on reading comprehension (RQ2)?

Method

Participants

Participants for this study came from six middle schools from two research sites in the southwestern United States. Of the six participating middle schools, four of the schools were urban and two were suburban. The enrollment at the schools ranged from 665 to 988 students, with a mean of 812. With regards to student demographics, all of the schools received Title I funding and had high percentages of students eligible for free or reduced-price lunch (school M = 94.1%). All schools had a higher proportion (school M = 32%) of students identified as limited English proficient based on their performance to the state’s English language proficiency exam (Texas English Language Proficiency Assessment System) than the state average. The four urban schools were similar demographically, with student enrollment by ethnicity ranging from 91% to 98% Hispanic, 1% to 4% Black, and 1% to 3% White. The remaining two suburban schools were more balanced in enrollment by ethnicity, ranging from 46% to 47% Hispanic, 33% to 35% White, and 13% Black.

Sixth and seventh grade students were selected for study participation based on four inclusion criteria: (a) failure on the state reading test in the prior school year, (b) currently identified as limited English proficient or reclassified as English proficient in the past two school years according to performance on state tests, (c) a home language of Spanish, and (d) were of Mexican or Central American descent. We adopted the latter criteria (home language of Spanish and Mexican or Central American descent) as part of our overall study which had multiple goals including to support the collection and analysis of epigenetic data, which required a relatively homogeneous sample with similar genetic admixture. We included students who were reclassified as nonlimited English proficient in the prior two school years because these students frequently require academic language support. Students identified as having disabilities and receiving special education services who met all other criteria were allowed to participate in the study. Our research team provided students who met these criteria and their parents an opportunity to consent to study participation. After obtaining consent, a total sample of 340 students (157 Grade 6 students and 183 Grade 7 students) were identified for participation.

Participating students who met inclusion criteria were blocked by school and randomly assigned to the RISE intervention condition (n = 171) or to continue with BaU condition (n = 169). Students randomly assigned to the treatment condition were treated in Years 1 and 2 (Grades 6 and 7) and students in both conditions were tested in the fall and spring of each year. We present student demographic information by condition in Table 1. Our study sample experienced attrition over the course of 2 years, particularly in Year 2 due to the COVID-19 pandemic. We discuss the impact of COVID-19 below in the Method section (see Impact of COVID-19 Pandemic) and present information related to attrition and how we addressed attrition in the data analysis and results section.

Table 1.

Demographic Information for Participants in the Extensive Reading Intervention Study.

Demographic Variable	Full sample (N = 340)	BaU (n = 169)	INT (n =171)
Age at the start of study
M (SD)	12.27 (0.73)	12.21 (0.7)	12.32 (0.76)
Grade at the start of study, n (%)
Grade 6	157 (46.18)	79 (46.75)	78 (45.61)
Grade 7	183 (53.82)	90 (53.25)	93 (54.39)
Gender, n (%)
Male	207 (60.88)	99 (58.58)	108 (63.16)
Female	133 (39.12)	70 (41.42)	63 (36.84)
Free or reduced-price lunch, n (%)
Yes	262 (77.06)	126 (74.56)	136 (79.53)
No	7 (2.06)	5 (2.96)	2 (1.17)
Missing	71 (20.88)	38 (22.49)	33 (19.3)
Special education, n (%)
Yes	51 (15)	23 (13.61)	28 (16.37)
No	289 (85)	146 (86.39)	143 (83.62)
LEP, n (%)
Currently identified as LEP	322 (94.71)	162 (95.86)	160 (93.57)
In monitoring	18 (5.29)	7 (4.14)	11 (6.43)

Note. BaU = business-as-usual condition; INT = intervention condition; LEP = limited English proficient, all students identified as Hispanic.

RISE Intervention Procedures and Description

Students randomized to the RISE intervention condition received one period of instruction (approximately 45 to 50 mins per day) from the research team for two consecutive school years. The RISE reading intervention was implemented by trained teachers to medium-sized groups of students (M = 6.3). All students regardless of study condition continued to receive their core instructional classes. Treatment students participated in the RISE intervention in place of an elective class. School staff reported treatment students were removed from another reading intervention class they were scheduled to take with school personnel because they had failed the state reading test during the prior year. Students in the BaU condition remained in their school-provided reading intervention, which also met daily and included similar class sizes as the RISE treatment classes.

The RISE intervention is a multicomponent reading intervention intended to occur daily over two academic years, with an opportunity for students to extend their learning in the summer between years one and two. Grounded in the simple view of reading (Gough & Tunmer, 1986) and based on previous research that shows EBs with RD in the middle grades experience code-based and meaning-based difficulties (e.g., Capin et al., 2023; Cho et al., 2019), the intervention targets word reading, text reading fluency, academic vocabulary, and reading comprehension. This multicomponent approach found to be effective with high school students in a prior study with native English speakers (Vaughn et al., 2015). We allocated more instructional time to code-based word reading and reading fluency instruction in the first semester of the first year of intervention. In subsequent semesters, instructional time shifted to include more instruction focused on developing academic vocabulary and reading comprehension to help students read and understand complex grade-level texts. The rationale was that developing students’ word reading would contribute to their access to text reading, thus promoting fluency and vocabulary development (see Supplemental File F1 for an intervention timeline). During the summer between Years 1 and 2, treatment students received novels and accompanying reading guides that included comprehension and vocabulary support.

Word reading

The intervention lessons included multisyllabic word reading, and text fluency instruction taught using explicit instructional techniques. Advanced phonics instruction focused on teaching common vowel combinations (e.g., “R” controlled vowels), affixes, and multisyllabic word reading. This instruction was coupled with “rule-breaker” instruction that focused on learning irregular words commonly found in texts. The word reading instruction followed an explicit instructional sequence of teacher modeling, guided practice with support, and independent practices (Archer & Hughes, 2013). Moreover, it was systematic as students practiced reading vowel patterns and affixes in isolation and then read words and texts with those elements. Students also had opportunities to develop their word reading fluency by reading word lists of regular and irregular words in partners using timers to record their speed. In all word reading instruction, there was an emphasis on developing students’ English vocabulary knowledge, drawing on students’ Spanish language knowledge when possible.

Text reading fluency

Text Reading fluency instruction included the introduction of a repeated reading routine, which provided students with opportunities to read the same text multiple times. Teachers often modeled elements of fluent reading (i.e., accuracy, rate, and prosody) during the first read of the text. From there, students worked in pairs to read the text two additional times. Teachers were asked to provide corrective feedback when needed. Although these activities had a primary focus on developing efficient connected text reading skills, teachers checked for word and concept understanding after reading to ensure students maintained a focus on reading for understanding.

Reading Comprehension Instruction With Embedded Vocabulary and Self-Regulation Instruction

Reading Intervention for Students who are Emergent bilinguals with RD teachers provided students with opportunities to practice reading informational and narrative texts. Although text complexity varied, most informational texts were at a middle-school level and organized within unit topics focused on history or science. We selected middle-school level texts (which we called “stretch texts”) to challenge students to read texts beyond their current independent reading level and learn grade-level content knowledge. Explicit routines were used to enhance children’s vocabulary and background knowledge before text reading to support EBs in understanding. Vocabulary instruction included the use of vocabulary graphic organizers with student-friendly explanations, visuals, and opportunities for peers to discuss key vocabulary in multiple contexts. This prereading instruction was brief and primarily focused on Tier 2-type vocabulary (Beck et al., 2013) that were likely to be found across grade-level texts. When key vocabulary appeared in texts, teachers and students would discuss each word’s meaning in context to further support academic language development. Teachers also frequently developed background knowledge on the day’s topic by showing images and short video clips and describing key information, particularly information that students needed to know to access the day’s text.

Reading Intervention for Students who are Emergent bilinguals with RD teachers also used explicit instructional practices to teach reading comprehension strategies, such as summarizing, identifying the main idea, and making inferences. To support the gradual release of responsibility and students’ self-regulation, teachers taught students to use a self-monitoring document (see Supplemental File F2) that enabled students to set goals before reading, monitor their progress toward these goals while reading, and evaluate their progress after reading. Initially, the self-monitoring form identified goals for students related to monitoring text understanding and identifying unknown words and figuring out their meaning, as well as strategies for meeting these goals (e.g., identifying main idea). Over time, students were able to select their own goals and strategies to apply during reading and asked to reflect on their strategy use with a partner. To support students’ oral language development and reading comprehension, students worked on the structured text-based practices in cooperative learning groups. This provided students with an opportunity to integrate language-focused instruction (e.g., vocabulary, syntax, and morphology) as well as higher-order cognitive processes (e.g., inference-making) while reading and acquiring knowledge from informational texts. As shown in the self-monitoring form, a high-priority was placed on teaching students to develop word consciousness (i.e., recognize when they come across an unknown word and try to learn the meaning of that word) because EB students often do not know the meaning of words in grade-level texts. Teachers also explicitly taught students to highlight when they came across unknown vocabulary in texts and use vocabulary learning strategies (look within and around the word for clues) to try to determine the meaning of the unknown word.

Students were also provided opportunities to read age-appropriate novels. For example, students read Iqbal (D’Adamo, 2005), a fictionalized account of a Pakistani child who was sold into slavery and worked alongside other children at a carpet factory. Reading comprehension instruction was similar across text types with one exception. For narrative texts, instruction included instruction related to key story grammar elements that are critical to narrative text structures (Bogaerds-Hazenberg et al., 2021), such as internal feelings, problems, and solutions.

Summer Book Club Program

To minimize the potential for summer learning loss and the benefits of intervention experienced in Year 1, teachers provided students in the treatment condition with materials to engage in a summer book club program during the summer between Year 1 and 2. Similar to home-based, summer reading interventions we provided students with texts and self-paced reading guides to support vocabulary acquisition and reading comprehension These guides were aligned to the instructional practices students engaged with during the school year. Children selected three books from eight choices that the research team selected based on results from surveyed students to better understand topics that were of interest to them. The research team privileged texts that were identified as culturally relevant to Spanish-speaking children of Mexican and Central American family ancestries because students demonstrated higher engagement with these texts during the school year. Finally, we considered student reading ability and selected novels that ranged in Lexile from 600L to 840L, which approximately corresponded to students’ reading performance on curriculum-based measures. To support parent knowledge of the summer reading program, a parent literacy night was held at each campus.

Reading Interventionists

Reading interventionists were currently or former credentialed public school teachers hired and trained by the research team to provide instruction. All interventionists were female, held at least a bachelor’s degree, and had at least 3 years of teaching experience. All reading interventionists participated in an 8-hour training and then participated in practice sessions for 1 week. In addition to the initial training, members of the research team provided ongoing coaching and support as instruction was being implemented.

Fidelity

Treatment fidelity was measured as a multidimensional construct, with multiple indicators evaluating the extent to which the treatment was implemented as planned, including: (1) treatment adherence; (2) quality of instruction; (3) treatment dosage; and (4) treatment differentiation, an indicator of the differences between the treatment condition and the comparison condition. Adherence and quality were evaluated with a tailored observation instrument our research team has utilized in over 20 randomized control trials. This observation instrument included adherence ratings on a 4-point Likert-type scale (1 = low adherence through 4 = good adherence) across nine instructional activities implemented within the intervention class. In addition, intervention quality was rated on eight global indicators of quality (e.g., quality of feedback, quality of behavior management) on a 7-point Likert-type scale (1 = poor quality through 7 = highest quality).

Intervention teachers audio-recorded all intervention lessons. We then randomly selected 10% of the total intervention lessons for coding, blocking by teacher. A trained team of coders established at least 90% interrater reliability before independently coding each lesson randomly selected for fidelity coding. Across intervention teachers and instructional components, adherence was generally good (median adherence by component = 3.2, SD = 1.0, range: 3.0–3.7). Across global quality indicators, tutors generally scored high (mean quality across indicators = 5.8, SD = 1.2, mean range = 5.3–6.1).

BaU Instruction

We interviewed school staff to understand the counterfactual instruction that students randomized to the BaU condition received. School staff reported that all participating students were enrolled in a reading intervention class because all students had failed the state reading test the prior year. There was a lot of variation across the six school sites in the reading instruction provided. Some schools reported that teachers used miscellaneous resources to teach reading with an emphasis on study skills and test taking in preparation for the next state test. Other schools reported using the SRA Corrective Reading (a multicomponent reading intervention) and the Read to Achieve (a comprehension-focused reading program).

Impact of COVID-19 Pandemic on Study Implementation

During the second semester of Year 2 (late March 2020), all participating districts discontinued in-person instruction in response to the COVID-19 pandemic. As a result, in-person instruction was prematurely terminated for all students in the treatment condition. At the time of the unforeseen school closures, treatment students had completed approximately 70% of the intended lessons for the second year and about 85% of the intended intervention lessons across both years. We were unable to continue with intervention lessons for the same reason that schools had difficulty engaging students in remote instruction. Participating schools reported that they attempted to provide instruction via videoconference in April and May of 2020, but most students did not participate in this instruction because many families did not have the resources to engage in remote instruction. For some families, this occurred because families lacked resources such as a computer to access videoconferencing. In some cases, they had devices, but no one was available during the school day to help their child access the remote instruction.

Our research team was able to remotely administer a small number of assessments in May through June of 2020 to assess student learning postintervention. Our research team selected assessments for posttesting that could be easily implemented remotely. As shown in Table 2, these assessments included the GMRT-4 reading comprehension, the TOWSRF-2, and the KTEA-3 SRF tests. We identified these assessments because they could be administered without deviations from standardized protocols over the telephone with the support of a parent. The assessment stimuli were packaged in envelopes within envelopes that allowed student prompts and response packets to remain inaccessible to students until the time of testing. Bilingual members of the assessment team, blind to students’ study condition, administered tests with students over the phone while on speakerphone with an adult family member (e.g., parent) and the participating child. Although our preference would have been to provide the full battery of assessment at each time point, emerging research suggests that these types of measures can be reliably applied via a telephone administration (e.g., Larner, 2021; Magimairaj et al., 2022).

Table 2.

Test Administration Schedule in the Extensive Reading Intervention Study.

	Year 1		Year 2
Measures	Fall (in-person)	Spring (in-person)	Fall (in-person)	Spring (tele-administration)
WJ-III: PV	X
GMRT-RC	X	X	X	X
KTEA-3: LWR	X	X	X	U
KTEA-3: WRF	X	X	X	U
WRMG		X		U
KTEA-3: SRF				X
TOSWRF-2				X

Note. X = data collected; U = unable to collect due to COVID-19; WJ-III = Woodcock Johnson-III; PV = Picture Vocabulary; GMRT-RC = Gates-MacGinitie Reading Test Reading Comprehension; KTEA-3 = Kaufman Test of Educational Achievement–Third Edition; LWR = Letter & Word Recognition; WRF = Word Recognition Fluency; WRMG = Word Reading in the Middle Grades; SRF = Silent Reading Fluency; TOSWRF-2 = Test of Silent Word Reading Fluency-2.

Measures

Members of the research team who were not involved in instruction and were blind to study conditions administered all assessments. The assessment team members received extensive training from a senior member of the research team and established 100% reliability in a mock testing session prior to administering assessments in the field. Further, all assessments were double-scored and -entered to ensure the reliable collection of data.

Kaufman Test of Educational Achievement (Third Edition) Letter Word Recognition (KTEA-3 LWR)

The KTEA-3 LWR subtest is an individually administered assessment of the student’s ability to accurately recognize letters and read words. The measure is comprised of letters and then words of increasing difficulty. Split-half reliabilities for ages 13–15 are .96 to .97.

Word Reading in the Middle Grades (WRMG)

The researcher-developed reading measure consisted of 45 items that were administered individually. The first 15 items consisted of word parts (seven vowel digraphs and eight affixes). The next 15 items included multisyllable words (ranging from 2 to 4 syllables) that followed common grapheme-phoneme correspondences, such as reproach and spaciously. The final 15 items included both single and multisyllable words that contained irregular grapheme-phoneme correspondences. Examples of these words include although, instead, and rhythm. Test administrators asked to identify the word parts and read the words.

KTEA-3 Word Recognition Fluency (KTEA-3 WRF)

The KTEA-3 WRF subtest is a test of word reading fluency. In this timed subtest, test administrators asked each student to read a list of single words aloud as quickly and accurately as possible during two 15-second trials. The test manual reports high alternate form reliability (.89) and concurrent validity (ranging from .71 to .90) for this age range.

Test of Silent Word Reading Fluency (2nd edition; TOSWRF-2)

The TOSWRF is a timed measure of word reading fluency that can be administered individually or in groups. The test presents rows of words with no spaces, and the directions prompt students to draw a line between as many unrelated words as possible in three mins. Practice items are presented before administration to ensure students understand directions. The authors of TOSWRF-2 report test–retest reliability is high (range from .84 to .91).

KTEA-3 Silent Reading Fluency (KTEA-3 SRF)

The KTEA-3 SRF is a timed, individually administered test in which a student silently reads simple interrogative statements (e.g., is water dry?) and marks yes or no to each. The items are intended to assess the child’s text-reading ability rather than their knowledge base, so the language and knowledge presented in the statements are simple. The KTEA-3 manual reports adequate split-half reliability (.82) and alternative form reliability (.78).

Woodcock Johnson-III Picture Vocabulary (WJ-III PV)

We administered the WJ-III PV (Woodcock et al., 2001) subtest in Fall of Year 1 to obtain an estimate of students’ general vocabulary knowledge in English. This information was used to characterize the sample and assess the degree to which treatment effects varied based on students’ English language vocabulary knowledge (Research question 2). The WJ-III PV is an individually administered measure of word knowledge and expressive vocabulary. Students are asked to identify the appropriate picture when providing multiple choices. The WJ-III PV demonstrates high internal reliability (α = .81).

Gates-MacGinitie Reading Test Reading Comprehension Subtest

The Gates-MacGinitie Reading Test (GMRT-4) reading comprehension subtest is a timed, group-administered test of reading comprehension. Students are asked to read expository and narrative text passages ranging from 3 to 15 sentences in length and answer three to six multiple-choice questions per passage for 35 mins. Items increase in difficulty as the student progresses through the test. Internal consistency reliability ranges from .91 to .93 and alternate form reliability is reported as .80 to .87.

Analytic Methods

We used multilevel regression (MLM; Raudenbush & Bryk, 2002) to evaluate the efficacy of the intensive reading intervention on reading outcomes and to estimate the moderating effect of initial vocabulary skills. We nested students in schools. We ignored the effect of the classroom because middle school students have multiple teachers throughout the school day. Interventionists delivered intervention to students in the treatment condition, which represents partial clustering. However, only two of six schools had more than one tutor during each year of the intervention, thus clustering at the tutor level was ignored. Using R (R Core Team, 2020) and the lme4 (Bates et al., 2015) and lmerTest (Kuznetsova et al., 2017) packages, we estimated sample average treatment effects with restricted maximum likelihood estimation (REML). We fit the following general model to estimate treatment’s effect on all outcomes:

L e v e l 1 : Y_{i j} = β_{0 j} + β_{1 j} P r e_{i j}^{C} + β_{2 j} T_{i j} + e_{i j}

L e v e l 2 : β_{0 j} = γ_{00} + u_{0 j}

β_{1 j} = γ_{10}

β_{2 j} = γ_{20}

where T_ij is the dummy coded treatment variable and Pre^C_ij is the student-level pretest mean centered on the school means, (Pre_ij—Pre_.j). Restricted maximum likelihood estimation outperforms other estimators when variance components are comprised of a small number of clusters (McNeish & Stapleton, 2016). However, if random effects models were not estimable due to a failure to converge, we estimated fixed effects models instead. We included a grand-mean centered pretest as a covariate for models with minimal clustering (i.e., when models reduced to fixed effects). We modeled the effects of 1 year of treatment (Time Point 2), at the start of the second year of treatment (Time Point 3), and after 2 years of treatment (Time Point 4) across the outcomes.

We fit the following model to evaluate the moderating effect of initial vocabulary scores:

\begin{array}{l} L e v e l 1 : Y_{i j} = β_{0 j} + β_{1 j} P r e_{i j}^{C} + β_{2 j} T_{i j} + β_{3 j} W J 3_{i j}^{C} \\ + β_{4 j} (W J 3_{i j}^{C} \times T_{i j}) + e_{i j} \end{array}

L e v e l 2 : β_{0 j} = γ_{00} + u_{0 j}

β_{1 j} = γ_{10}

β_{2 j} = γ_{20}

β_{3 j} = γ_{30}

β_{4 j} = γ_{40}

where WJ3^C_ij is the group-mean centered WJ-III scores, (WJ3_ij—WJ3_.j). Again, in cases where ICC’s were equal to or less than .01, we fit fixed effects models with grand-mean centering of covariates. To correct the type I error rate, we implemented the Benjamini-Hochberg procedure for controlling the false discovery rate (Benjamini & Hochberg, 1995). This procedure resulted in a critical value of $α = . 0031$ . We reported Hedges’ g effect sizes (Hedges & Olkin, 1985).

Results

Sample Attrition and Baseline Equivalence

There were no statistically significant differences in mean scores between study conditions at pretest (Fall of Year 1). Moreover, there were no effect size differences greater than 0.25 at pretest and, thus, baseline equivalence was established for the full sample according to What Works Clearinghouse (WWC, 2020) standards. However, attrition can undermine the assumption that the treatment and BaU groups do not differ on measured and unmeasured variables at baseline, thus threatening internal validity. We report attrition as the proportion of a randomized sample with missing outcome data. Differential attrition describes differences between rates of attrition for the treatment group and BaU. Even in the most well-designed studies, patterns of overall and differential attrition can introduce bias by creating imbalance in a previously balanced design.

To evaluate the threat of attrition, we adhered to the WWC (2020) recommendations by first calculating sample attrition and then assessing baseline equivalence when sample attrition was considered high. We present the rates of overall and differential attrition for each measure in Supplemental File S1. Although the WWC does not require reporting sample attrition by measure, we do so because we seek to be transparent that there were small differences in the number of students who took each test (e.g., 286 students completed the GMRT whereas 292 students completed the KTEA-3 LWR in Fall of Year 2) due to student absences. We treat these data as missing at random because the students that were absent during the administration of the posttest battery had either transferred out of the participating school or were missing due to being unreachable when contacted multiple times after schools closed due to COVID-19. Because we only had missing data on the dependent variable (and less than 5% on the covariate), listwise deletion was used to address missing outcome data (Jakobsen et al., 2017).

Applying the WWC (2020) standards, all measures collected at spring of Year 1 had low levels of sample attrition based on the combination of overall and differential attrition. This suggests that baseline equivalence was not a threat to the internal validity of the study after 1 year of instruction and contrasts examining differences after 1 year of instruction met “WWC design standards without reservations” (WWC, 2020, p. 9). The combined rates of overall and differential attrition also met WWC standards for “tolerable level of potential bias” at the beginning of Year 2, which suggests the effects of attrition experienced by children moving over the summer was also not a considerable threat to the study (p. 14).

However, COVID-19 led to substantial attrition during Year 2, which impacted the size of the analytic sample in Spring of Year 2. The combination of overall and differential attrition did not meet WWC standards for acceptable threat under cautious or optimistic assumptions for the three outcome measures administered in Spring of Year 2. For RCTs with high attrition, WWC recommends that baseline equivalence between the treatment and comparison conditions be assessed in the analytic sample. We used Hedges’ g to index baseline differences on the three outcome measures administered at Time 4. The TOSWRF-2 and KTEA-3 SRF measures were not administered at pretest (i.e., Fall of Year 1), so we evaluated baseline equivalence for these analytic samples using the pretest KTEA-3 WRF measure. We selected KTEA-3 WRF because it was the most similar to the TOSWRF-2 and to the KTEA-3 SRF in design and in the construct assessed of all the measures collected at pretest. Based on their effect sizes, KTEA-3 SRF (g = 0.52) and TOSWRF-2 (g = 0.38) did not satisfy equivalence standards (baseline effect size < 0.25). This suggests treatment and comparison students at spring of Year 2 may not be equivalent at baseline, although we encourage readers to consider this conclusion with caution. We evaluated baseline equivalence for the GMRT-4 RC using performance on the same measure at pretest (Fall Year 1) and found that baseline equivalence (g = 0.25) was met. Although we adjusted for pretest differences in the analytic sample when evaluating main effects, we note that attrition represents a substantial concern when examining effects at the end of Year 2.

Main Effects of Intervention (RQ1)

Our first research question addressed the effect of RISE on reading outcomes. Supplemental File S2 presents standard score means and standard deviations for the standardized outcome measures at pretest and posttest, as well as for the moderator variable (WJ-III PV). We used extended scale scores and growth scale scores for data analysis with these measures. We present extended and growth scale score means and standard deviations in Supplemental File S3 and unconditional model estimates in Supplemental File S4.

As shown in Table 3A and 3B, results in the spring of Year 1 revealed a significant, positive effect favoring the treatment condition over BaU on the WRMG, β = 5.70, SE = 0.79, t(295.81) = 7.25, p ≤ .001; g = 0.72, 95% CI [0.52, 0.92]. No significant treatment effects were found on standardized word reading measures: KTEA LWR ( $β$ = 3.91, SE = 2.09, p = .06), KTEA-3 WRF ( $β$ = 0.23, SE = 1.03, p = .82). This same pattern held in the Fall of Year 2 on the KTEA-3 LWR ( $β$ = 2.16, SE = 2.24, p = .34) and KTEA WRF ( $β$ = −0.86, SE = 1.21, p = .48). Results during the spring of Year 2 revealed non-significant differences across the TOSWRF-2 ( $β$ = 2.97, SE = 2.05, p = .15) and the KTEA-SRF ( $β$ = 3.17, SE = 2.84, p = .27). Note, the two-level multilevel models were singular on the TOSWRF and KTEA-SRF due to little variation in schools at level-2 on outcome scores, so we modeled fixed effects only. Although no significant differences were found in spring of Year 2, Table 3A and 3B reveals a pattern of effect sizes favoring treatment over BaU. Scores on the GMRT-RC showed no significant differences between students in the treatment and the BaU conditions at the spring of Year 1 ( $β$ = 2.72, SE = 2.22, p = .22), fall of Year 2 $(β$ = 0.12, SE = 2.41, p = .96) and spring of Year 2 ( $β$ = 3.02, SE = 3.18, p = .34). The multilevel models were singular due to little variance in the outcome scores across schools, so we modeled fixed effects only.

Table 3A.

Estimating the Main Effect of Intervention Across Two Years: Fixed Effects.

Fixed effects	Spring Year 1				Fall Year 2				Spring Year 2
Fixed effects	Estimate	SE	p	g(SE)	Estimate	SE	p	g(SE)	Estimate	SE	p	g(SE)
GMRT-RC
Intercept	486.92	2.69	<.001		494.18	1.73	<.001		503.16	2.31	<.001
Pretest	0.69	0.04	<.001		0.60	0.04	<.001		0.693	0.07	<.001
RISE	2.73	2.22	.221	0.10 (0.08)	0.12	2.41	.961	0.01 (0.09)	3.02	3.18	.344	0.12 (0.13)
KTEA-3: LWR
Intercept	517.45	2.91	<.001		522.35	2.94	<.001
Pretest	0.70	0.04	<.001		0.63	0.04	<.001
RISE	3.91	2.09	.06	0.14 (0.08)	2.16	2.24	.336	0.08 (0.08)
KTEA-3: WRF
Intercept	510.92	1.76	<.001		511.65	1.73	<.001
Pretest	0.82	0.03	<.001		0.81	0.04	<.001
RISE	0.23	1.03	.82	0.01 (0.06)	−0.86	1.21	.478	−0.05 (0.07)
WRMG
Intercept	30.65	0.80	<.001
Pretest	0.12	0.01	<.001
RISE	5.70	0.79	<.001	0.72 (0.10)
KTEA-3: SRF
Intercept									503.78	2.08	<.001
Pretest									0.17	0.08	.042
RISE									3.40	2.84	.233	0.21 (0.17)
TOSWRF-2
Intercept									78.15	1.70	<.001
Pretest									0.37	0.07	<.001
RISE									3.18	2.04	.122	0.24 (0.16)

Table 3B.

Estimating the Main Effect of Intervention Across Two Years: Random Effects.

Random effects	Spring Year 1		Fall Year 2		Spring Year 2
Random effects	Variance	ICC	Variance	ICC	Variance	ICC
GMRT-RC
Student-level	711.47
School-level	12.22	0.02
KTEA-3: LWR
Student-level	326.40		344.70
School-level	34.10	0.10	33.32	0.09
KTEA-3: WRF
Student-level	78.97		99.42
School-level	14.34	0.15	12.44	0.11
WRMG
Student-level	46.20
School-level	1.691	0.04
KTEA-SRF
Student-level					272.07
School-level					0.88	0.003
TOSWRF-2
Student-level					126.16
School-level					2.85	0.022

Note. Bolded values indicate statistically significant main effects (p < .05); ICC = intraclass correlation coefficient; g = Hedges’ g effect size; GMRT-RC = Gates-MacGinitie Reading Test–Reading Comprehension; KTEA-3 = Kaufman Test of Educational Achievement–Third Edition; LWR = Letter & Word Recognition; WRF = Word Recognition Fluency; WRMG = Word Reading in the Middle Grades; SRF = Silent Reading Fluency; TOSWRF-2 = Test of Silent Word Reading Fluency-2.

The Moderating Effect of English Vocabulary (RQ2)

Our second research question concerned the extent to which the effects of RISE on reading outcomes were moderated by students’ initial English vocabulary performance. Model estimates for the moderating effect of initial vocabulary scores on the relation between treatment and GMRT-RC are reported in Table 4A and 4B. No moderation effect of initial vocabulary scores was found during spring of Year 1 or spring of Year 2. We found that initial English vocabulary did moderate the relation between treatment and GMRT-RC scores at the beginning of year 2, β = 0.56, SE = 0.19, t(268) = 2.983, p ≤ .001. Using the Johnson-Neyman regions of significant technique (Johnson & Neyman, 1936), we found the regions of significance were <−9.96 and >11.60 of the grand-mean centered initial scores on the WJ-III PV. Figure 1 displays the interaction between treatment effect and initial vocabulary scores. For students who scored below average on WJ-III PV, those in the treatment condition had lower scores on GMRT-RC than the BaU. For students with above average performance on WJ-III PV measure in Fall of Year 1, those randomized to the treatment condition showed greater GMRT-RC scores than the BaU group in Fall of Year 2.

Table 4A.

Estimating the Moderating Effect of Vocabulary on Reading Comprehension Across Two Years: Fixed Effects.

Fixed effects	Spring Year 1			Fall Year 2			Spring Year 2
Fixed effects	Estimate	SE	p	Estimate	SE	p	Estimate	SE	p
GMRT-RC
Intercept	487.01	2.74	<.001	494.10	1.65	<.001	503.49	2.30	<.001
Pretest	0.65	0.04	<.001	0.56	0.04	<.001	0.66	0.07	<.001
RISE	2.49	2.19	.255	−0.34	2.30	.881	2.44	3.15	.439
WJ-III	0.26	0.13	.052	0.16	0.14	.226	0.25	0.18	.173
WJ-III × RISE	0.10	0.18	.598	0.56	0.19	.003	0.12	0.25	.635
KTEA-3: LWR
Intercept	517.45	2.86	< .001	522.35	2.93	<.001
Pretest	0.67	0.04	<.001	0.59	0.04	<.001
RISE	4.10	2.06	.048	2.23	2.21	.314
WJ-III	0.35	0.13	.005	0.35	0.13	.008
WJ-III × RISE	−0.17	0.17	.343	−0.06	0.18	.751
KTEA-3: WRF
Intercept	510.97	1.81	<.001	511.64	1.73	<.001
Pretest	0.81	0.03	<.001	0.79	0.04	<.001
RISE	0.24	1.03	.815	−0.78	1.21	.519
WJ-III	0.17	0.06	.008	0.17	0.07	.023
WJ-III × RISE	−0.21	0.08	.02	−0.20	0.10	.052
WRMG
Intercept	30.67	0.81	<.001
Pretest	0.11	0.01	<.001
RISE	5.62	0.78	<.001
WJ-III	0.09	0.05	.061
WJ-III × RISE	−0.01	0.07	.907
KTEA-3: SRF
Intercept							503.71	2.13	<.001
Pretest							0.15	0.09	.109
RISE							3.41	2.84	.232
WJ-III							−0.01	0.16	.940
WJ-III × RISE							0.21	0.22	.341
TOSWRF-2
Intercept							78.17	1.73	<.001
Pretest							0.34	0.07	<.001
RISE							3.19	2.06	.123
WJ-III							0.03	0.16	.842
WJ-III × RISE							0.09	0.19	.621

Table 4B.

Estimating the Moderating Effect of Vocabulary on Reading Comprehension Across Two Years: Random Effects.

Random effects	Spring Year 1		Fall Year 2		Spring Year 2
Random effects	Variance	ICC	Variance	ICC	Variance	ICC
GMRT-RC
Student-level	355.54
School-level	27.12	0.07
KTEA-3: LWR
Student-level	317.54		334.09
School-level	32.75	0.09	33.25	0.091
KTEA-3: WRF
Student-level	77.56		98.49
School-level	15.58	0.17	12.50	0.11
WRMG
Student-level	45.52
School-level	1.79	0.04
KTEA-SRF
Student-level					272.47
School-level					1.68	0.01
TOSWRF-2
Student-level					126.98
School-level					2.88	0.02

Note. Bolded values indicate statistically significant main effects (p < .05); ICC = intraclass correlation coefficient; g = Hedges’ g effect size; GMRT-RC = Gates-MacGinitie Reading Test–Reading Comprehension; KTEA-3 = Kaufman Test of Educational Achievement–Third Edition; LWR = Letter & Word Recognition; WRF = Word Recognition Fluency; WRMG = Word Reading in the Middle Grades; SRF = Silent Reading Fluency; TOSWRF-2 = test of silent word reading fluency-2.

Figure 1.

Interaction plot of RISE treatment and WJ-III picture cocabulary on GMRT reading comprehension.

Discussion

The goal of this randomized trial was to evaluate the efficacy of the RISE reading intervention on reading outcomes for Spanish-speaking EBs in middle school with significant, persistent RD. Following 1 year of intervention, we observed a significant between group difference (g = 0.72) on the WRMG word reading measure. There were no statistically significant differences on standardized measures of word reading, word reading fluency, or reading comprehension, though effect sizes typically favored treatment. At the end of Year 2, we used a novel remote testing protocol administered over the phone and facilitated by parents due to school closures related to the COVID-19 pandemic. We observed no statistically significant differences between intervention and BaU students on a measure of word reading fluency, a sentence verification task, or a reading comprehension measure. Given the uncertainty and difficulty associated with interpreting Year 2 effects amid unexpected and disruptive school closures, we divide our discussion by study year, beginning with Year 1 findings.

Effects of 1 Year of Intervention

We hypothesized that we would observe significant intervention effects on word reading and reading fluency following 1 year of intervention but would not observe significant effects on reading comprehension at that time. We observed statistically significant effects on the WRMG (g = 0.72) but did not observe statistically significant effects on standardized measures of word reading (g = 0.14) or word reading fluency (g = 0.01). There were no statistically significant effects for reading comprehension (g = 0.10). We interpret these observed effects as generally consistent with our Year 1 hypotheses, which were based on previous studies (Miciak et al., 2018; Vaughn, Roberts, Miciak, et al., 2019), suggesting that word reading and reading fluency may be more immediately malleable to intensive interventions than reading comprehension, which relies on impacting broad domains such as linguistic comprehension and general knowledge.

The relatively large effect (g = 0.72) observed on the researcher-developed word reading measure is encouraging, particularly in light of its focus on advanced word reading skills and its greater sensitivity to detect small differences in performance at participants’ skill level. In the first semester of Year 1, the RISE intervention included a relatively intensive focus on building the code-based skills of word reading and connected text reading fluency. Tutors split instructional time evenly between code-based instruction and meaning-based instruction. This code-based instruction included advanced phonics instruction focused on vowel combinations, affixes, and multisyllabic word reading. In addition, instruction focused on reading “rule-breaker” words—irregular words of low frequency. This word reading measure directly assessed students’ ability to apply these skills, with items consisting of word parts (i.e., vowel combinations, affixes), multisyllable phonetically regular words, and irregular words. Although the effect on our standardized measure of word reading was not statistically significant, it should be noted that the observed effect size for word reading accuracy (g = 0.14) is consistent with previous meta-analytic effect sizes for word reading for intervention studies with struggling readers in Grades 4–12 (g = 0.14; Wanzek et al., 2013).

To put this effect size in the broader educational context, a 0.15 SD increase represents improvements on a standardized test that can be expected from about three-quarters of an academic school year for middle school children (Bloom et al., 2008). Put differently, this effect is similar to the effect one would see from having a very good teacher versus an average teacher for about 1 year (Hanushek, 2011). When interpreted in combination with the large effect on the WRMG, we find the small, directionally appropriate (but not statistically significant) effect on word reading promising.

The lack of any discernible effect on word reading fluency is surprising and somewhat puzzling, as word and text reading fluency were a significant focus of the RISE intervention, particularly in Year 1. In previous studies with similar populations and interventions, we have observed significant effects on reading fluency (Miciak et al., 2018; Vaughn, Roberts, Capin, et al., 2019), although not always (Vaughn, Martinez, et al., 2019). In addition, an inspection of standard scores for both word reading and reading fluency following 1 year of intervention indicate that participants continued to demonstrate significant normative deficits in word reading, reading fluency, and reading comprehension. This is consistent with previous research that finds most older students with RD experience difficulties across code- and meaning-based skills (Cirino et al., 2013), including samples restricted to EBs (Capin et al., 2023; Miciak et al., 2022). Although those deficits were most pronounced in reading comprehension, deficits in word reading and reading fluency should not be ignored, because accurate and effortless text reading is a gateway to comprehension (Stevens et al., 2017).

That we did not find statistically significant between groups differences in reading comprehension as measured by a standardized measure was consistent with previous research documenting the difficulty of remediating comprehension deficits in a single academic year (Clemens & Fuchs, 2022). Our study findings suggest that the difficulty improving reading comprehension among students with significant reading comprehension difficulties extends to EBs with significant RD. There are two potential (and partial) explanations for this difficulty. First, as noted above, most secondary students with RD experience deficits in foundational reading skills that make understanding grade-level texts difficult. The EBs with significant RD in our study were not immune from these difficulties. In addition, most secondary students with comprehension difficulties experience challenges in linguistic comprehension and general knowledge (e.g., Cirino et al., 2013), particularly EBs. As students age, these domains become increasingly predictive of reading comprehension, particularly comprehension of texts featuring more complex language and ideation. However, these broad domains are not easily remediated; the scope of skills and knowledge that fall under the umbrellas of linguistic comprehension and background knowledge belies such hopes.

The preceding paragraph also formed the basis for our moderation hypothesis: that intervention participants with relatively higher baseline vocabulary levels would benefit from the intervention more than participants with more pronounced vocabulary deficits. We hypothesized that as the intervention improved foundational reading skills, these improvements would afford participants with relatively higher language proficiency greater access to text. We did not observe a statistically significant interaction in Spring of Year 1. However, in fall of Year 2 (prior to Year 2 intervention) there was a significant interaction of treatment assignment with reading comprehension, based on initial levels of vocabulary. Students in the treatment condition with higher initial vocabulary scores scored higher on the reading comprehension measure than students in the BaU comparison condition, though we interpret this finding with caution.

Year 2 Findings

School closures related to the COVID-19 pandemic interrupted the 2-year intervention. These closures occurred in March 2020, approximately 2 months before posttest. The research team designed a remote assessment battery that could be administered with the assistance of parents. Usable data was received from 151 participants. However, there was differential uptake across conditions (more treatment students participated in remote assessment), perhaps due to greater commitment among intervention students and families to the research study. In addition, an inspection of standard scores from Year 1 of the trial indicated that students who participated in the remote assessment battery tended to score higher at pretest. These factors complicate interpretation of Year 2 findings.

After controlling for baseline performance, we observed no significant differences between students assigned to the treatment and comparison condition. Effect sizes on the remote testing protocol were directionally appropriate and of a magnitude indicative of meaningful differences. However, the small sample size and differential attrition reduce confidence for interpreting these observed positive effects for treatment. Yet, the converse must also be noted: data collected at the end of Year 2 do not persuade us to reject our initial hypotheses for the multi-year treatment. Students who enter middle school with significant RD– particularly EBs who face the compounding challenge of acquiring language proficiency in a second language–are likely to require multiple years of instructional support to address multiple deficit areas. It is also possible the language development needs of the target students were extensive, and it may be that a school-wide approach to language development would be necessary to adequately address language comprehension.

Limitations

As noted above, the findings from Year 2 should be interpreted with caution, due to the disruptions introduced by the COVID-19 pandemic. Despite significant efforts, we were successful in engaging 151 families (52% of the families who started the school year) to complete the remote testing protocol. This experience was consistent with that of many schools in high-poverty areas, which struggled to maintain contact with students when schools shifted to remote instruction (e.g., Stelitano et al., 2020). This led to high levels of attrition for our experiment, which introduces the possibility of bias for the Year 2 results. In addition, the remote assessment process occurred in June 2020, following 3 months of inconsistent remote schooling. Our schools reported that a majority of the children in our study had no contact with their schools during this time. In many ways, this is equivalent to conducting posttest in fall, following a full summer vacation. It is possible, then, that this delay in the administration of the posttest impacted results—although such assertions are speculative. Despite these concerns, we are committed to publishing the results from Year 2 because the 2-year design represented our a priori, pre-registered research plans. Also, although COVID-19 hampered our study into the efficacy of the RISE intervention, describing the influence of COVID-19 on our school-based research Year 2 sheds further light on the significant impact of the pandemic on schooling, particularly in high-poverty, urban environments.

Conclusion

We set out to evaluate the effects of a comprehensive, intensive reading intervention, provided for 2 years, for EBs with significant RD in the middle school grades. Based on the What Works Clearinghouse standards (WWC, 2020), we used procedures to maximize the rigor of our study, including (a) random assignment of students to study conditions; (b) well-defined sample selection criteria to allow for generalization of findings; (c) use of sample size with sufficient statistical power; (d) thorough documentation of attrition and adjustments for differential attrition in analyses; (e) clear division of intervention implementers and assessment data collectors to keep conditions blind; (f) precise procedures for intervention implementation to allow for subsequent replication studies; (g) documentation of the core instructional components and the fidelity of implementation; (h) use of technically adequate, standardized measures of student outcomes with multiple measures of constructs; (j) analyses that recognize the “nestedness” of educational data; (k) examination of a learner characteristic as a moderator of efficacy to explore variation in outcomes; and (l) pre-registration of study.

This study would have represented the largest RCT for an understudied population, Spanish-speaking adolescent EBs with RD. However, like many well-designed studies underway in March 2020, this randomized trial was interrupted when schools closed due to the COVID-19 pandemic. This interruption ended the intervention approximately 3 months early and forced the research team implement a novel remote testing protocol facilitated by parents. At the end of Year 1 of the RCT, we observed statistically significant between group differences on a word reading measure. However, there were no statistically significant differences on standardized measures of word reading, reading fluency, or reading comprehension at the end of Year 1 or Year 2. The Year 1 results suggest that single-year, small-group interventions may be insufficient in addressing the reading comprehension difficulties of middle school EBs in high-poverty. Future research may need to consider school wide approaches to instruction that ensure EBs with RD have opportunities to receive rich academic language instruction and engage in text-based activities across the school day. The Year 2 results must be interpreted with caution due to smaller sample sizes and differential uptake of the remote testing protocol.

Supplemental Material

sj-docx-1-rse-10.1177_07419325231213876 – Supplemental material for An Extensive Reading Intervention for Emergent Bilingual Students With Significant Reading Difficulties in Middle School

Supplemental material, sj-docx-1-rse-10.1177_07419325231213876 for An Extensive Reading Intervention for Emergent Bilingual Students With Significant Reading Difficulties in Middle School by Philip Capin, Jeremy Miciak, Bethany H. Bhat, Greg Roberts, Paul K. Steinle, Jack Fletcher and Sharon Vaughn in Remedial and Special Education

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

This research was supported by grant P50 HD052117-07 from Eunice Kennedy Shriver National Institute of Child Health and Human Development at the National Institutes of Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of Eunice Kennedy Shriver National Institute of Child Health and Human Development or the National Institutes of Health.

ORCID iDs

Philip Capin

Sharon Vaughn

Supplemental Material

Supplemental material for this article is available at

References

Archer

A. L.

Hughes

C. A.

(2013). Explicit instruction: Effective and efficient teaching (2nd ed.). Guilford Press.

Bates

Mächler

Bolker

Walker

(2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01

Beck

I. L.

McKeown

M. G.

Kucan

(2013). Bringing words to life: Robust vocabulary instruction. Guilford Press.

Benjamini

Hochberg

(1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x

Bloom

H. S.

Hill

C. J.

Black

A. R.

Lipsey

M. W.

(2008). Performance trajectories and performance gaps as achievement effect-size benchmarks for educational interventions. Journal of Research on Educational Effectiveness, 1(4), 289–328. https://doi.org/10.1080/19345740802400072

Bogaerds-Hazenberg

S. T.

Evers-Vermeul

van den Bergh

(2021). A meta-analysis on the effects of text structure instruction on reading comprehension in the upper elementary grades. Reading Research Quarterly, 56(3), 435–462. https://doi.org/10.1002/rrq.311

Capin

Hall

Vaughn

(2020). Best practices in the assessment and treatment of language-based reading and writing disabilities among English language learners. Perspectives on Language and Literacy, 46(2), 26–31.

Capin

Vaughn

Miller

Miciak

Fall

A. M.

Roberts

Cho

Vaughn

(2023). Investigating the reading profiles of middle school emergent bilinguals with significant reading comprehension difficulties. Scientific Studies of Reading. https://doi.org 10.1080/10888438.2023.2254871

Cheung

A. C.

Slavin

R. E.

(2016). How methodological features affect effect sizes in education. Educational Researcher, 45(5), 283–292. https://doi.org/10.3102/0013189X16656615

10.

Cho

Capin

Roberts

G. J.

Vaughn

(2019). Examining sources and mechanisms of reading comprehension difficulties: Comparing English learners and non-English learners within the simple view of reading. Journal of Educational Psychology, 111(6), 982–1000. https://doi.org/10.1037/edu0000332

11.

Cirino

P. T.

Romain

M. A.

Barth

A. E.

Tolar

T. D.

Fletcher

J. M.

Vaughn

(2013). Reading skill components and impairments in middle school struggling readers. Reading and Writing, 26(7), 1059–1086. https://doi.org/10.1007/s11145-012-9406-3

12.

Clemens

N. H.

Fuchs

(2022). Commercially developed tests of reading comprehension: Gold standard or fool’s gold? Reading Research Quarterly, 57(2), 385–397. https://doi.org/10.1002/rrq.415

13.

D’Adamo

. (2005). Iqbal: A novel ( Leonori

, Trans.). Aladdin.

14.

Donegan

R. E.

Wanzek

(2021). Effects of reading interventions implemented for upper elementary struggling readers: A look at recent research. Reading and Writing, 34(8), 1943–1977. https://doi.org/10.1007/s11145-021-10123-y

15.

Dussling

T. M

. (2020). The impact of an early reading intervention with English language learners and native-English-speaking children. Reading Psychology, 41(4), 241–263. https://doi.org/10.1080/02702711.2020.1768977

16.

Genesee

Lindholm-Leary

Christian

Saunders

(2006). Educating English language learners: A synthesis of research evidence. Cambridge Press.

17.

Gough

P. B.

Tunmer

W. E.

(1986). Decoding, reading, and reading disability. Remedial and Special Education, 7(1), 6–10. https://doi.org/10.1177/074193258600700104

18.

Hall

Roberts

G. J.

Cho

McCulley

L. V.

Carroll

Vaughn

(2017). Reading instruction for English learners in the middle grades: A meta-analysis. Educational Psychology Review, 29(4), 763–794. https://doi.org/10.1007/s10648-016-9372-4

19.

Hanushek

E. A.

(2011). The economic value of higher teacher quality. Economics of Education Review, 30(3), 466–479. https://doi.org/10.1016/j.econedurev.2010.12.006

20.

Hedges

L. V.

Olkin

(1985). Statistical methods for meta-analysis. Academic Press.

21.

Hwang

J. K.

Lawrence

J. F.

Snow

C. E.

(2015). Differential effects of a systematic vocabulary intervention on adolescent language minority students with varying levels of English proficiency. International Journal of Bilingualism, 19(3), 314–332. https://doi.org/10.1177/1367006914521698

22.

Jakobsen

J. C.

Gluud

Wetterslev

Winkel

(2017). When and how should multiple imputation be used for handling missing data in randomized clinical trials: A practical guide with flowcharts. BMC Medical Research Methodology, 17(1), Article 162. https://doi.org/10.1186/s12874-017-0442-1

23.

James-Burdumy

Deke

Gersten

Lugo-Gil

Newman-Gonchar

Dimino

Haymond

Liu

A. Y. H.

(2012). Effectiveness of four supplemental reading comprehension interventions. Journal of Research on Educational Effectiveness, 5(4), 345–383. https://doi.org/10.1080/19345747.2012.698374

24.

Johnson

P. O.

Neyman

(1936). Tests of certain linear hypotheses and their application to some educational problems. Statistical Research Memoirs, 1, 57–93.

25.

Kieffer

M. J

. (2008). Catching up or falling behind? Initial English proficiency, concentrated poverty, and the reading growth of language minority learners in the United States. Journal of Educational Psychology, 100(4), 851–868. https://doi.org/10.1037/0022-0663.100.4.851

26.

Kieffer

M. J

. (2011). Converging trajectories: Reading growth in language minority learners and their classmates, kindergarten to grade 8. American Educational Research Journal, 48(5), 1187–1225. https://doi.org/10.3102/0002831211419490

27.

Klingbeil

D. A.

January

S. A. A.

Ardoin

S. P.

(2020). Comparative efficacy and generalization of two word-reading interventions with English learners in elementary school. Journal of Behavioral Education, 29(3), 490–518. https://doi.org/10.1007/s10864-019-09331-y

28.

Kuznetsova

Brockhoff

P. B.

Christensen

R. H. B.

(2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), 1–26. https://doi.org/10.18637/jss.v082.i13

29.

Larner

A. J

. (2021). Cognitive testing in the COVID-19 era: Can existing screeners be adapted for telephone use? Neurodegenerative Disease Management, 11(1), 77–82. https://doi.org/10.2217/nmt-2020-0040

30.

Lawrence

J. F

. (2012). English vocabulary trajectories of students whose parents speak a language other than English: Steep trajectories and sharp summer setback. Reading and Writing, 25(5), 1113–1141. https://doi.org/10.1007/s11145-011-9305-z

31.

Magimairaj

B. M.

Capin

Gillam

S. L.

Vaughn

Roberts

Fall

A. M.

Gillam

R. B.

(2022). Online administration of the test of narrative language–Second edition: Psychometrics and considerations for remote assessment. Language, Speech, and Hearing Services in Schools, 53(2), 1–13. https://doi.org/10.1044/2021_LSHSS-21-00129

32.

Mancilla-Martinez

Lesaux

N. K.

(2011). The gap between Spanish speakers’ word reading and word knowledge: A longitudinal study. Child Development, 82(5), 1544–1560. https://doi.org/10.1111/j.1467-8624.2011.01633.x

33.

Mathes

P. G.

Denton

C. A.

Fletcher

J. M.

Anthony

J. L.

Francis

D. J.

Schatschneider

(2005). The effects of theoretically different instruction and student characteristics on the skills of struggling readers. Reading Research Quarterly, 40(2), 148–182. https://doi.org/10.1598/RRQ.40.2.2

34.

McNeish

Stapleton

(2016). The effect of small sample size on two-level model estimates: A review and illustration. Educational Psychology Review, 28(2), 295–314. https://doi.org/10.1007/s10648-014-9287-x

35.

Miciak

Ahmed

Capin

Francis

D. J.

(2022). The reading profiles of late elementary English learners with and without risk for dyslexia. Annals of Dyslexia, 72(2), 276–300. https://doi.org/10.1007/s11881-022-00254-4

36.

Miciak

Roberts

Taylor

W. P.

Solis

Ahmed

Vaughn

Fletcher

J. M.

(2018). The effects of one versus two years of intensive reading intervention implemented with late elementary struggling readers. Learning Disabilities Research & Practice, 33(1), 24–36. https://doi.org/10.1111/ldrp.12159

37.

National Center for Education Statistics. (2020). The nation’s report card: 2019 reading assessment. Institute of Education Sciences, U.S. Department of Education.

38.

National Education Association. (2008). English language learners face unique challenges. http://educationvotes.nea.org/wp-content/uploads/2010/05/ELL.pdf

39.

Raudenbush

S. W.

Bryk

A. S.

(2002). Hierarchical linear models: Applications and data analysis methods. SAGE.

40.

R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

41.

Richards-Tutor

Baker

D. L.

Gersten

Baker

S. K.

Smith

J. M.

(2016). The effectiveness of reading interventions for English learners: A research synthesis. Exceptional Children, 82(2), 144–169. https://doi.org/10.1177/001440291558548

42.

Roberts

G. J.

Hall

Cho

Coté

Lee

Van Ooyik

(2022). The state of current reading intervention research for English learners in grades K–2: A best-evidence synthesis. Educational Psychology Review, 34(7), 1–27. https://doi.org/10.1007/s10648-021-09629-2

43.

Scammacca

N. K.

Roberts

Vaughn

Stuebing

K. K.

(2015). A meta-analysis of interventions for struggling readers in grades 4–12: 1980–2011. Journal of Learning Disabilities, 48(4), 369–390. https://doi.org/10.1177/0022219413504995

44.

Stelitano

Doan

Woo

Diliberti

Kaufman

J. H.

Henry

(2020). The digital divide and COVID-19: Teachers’ perceptions of inequities in students’ internet access and participation in remote learning (Research report RR-A134-3). Rand Corporation. https://doi.org/10.7249/RRA134-3

45.

Stevens

E. A.

Walker

M. A.

Vaughn

(2017). The effects of reading fluency interventions on the reading fluency and reading comprehension performance of elementary students with learning disabilities: A synthesis of the research from 2001 to 2014. Journal of Learning Disabilities, 50(5), 576–590. https://doi.org/10.1177/0022219416638028

46.

Vaughn

Cirino

P. T.

Wanzek

Wexler

Fletcher

J. M.

Denton

C. D.

Barth

Romain

Francis

D. J.

(2010). Response to intervention for middle school students with reading difficulties: Effects of a primary and secondary intervention. School Psychology Review, 39(1), 3–21. https://doi.org/10.1080/02796015.2010.12087786

47.

Vaughn

Martinez

L. R.

Williams

K. J.

Miciak

Fall

A. M.

Roberts

(2019). Efficacy of a high school extensive reading intervention for English learners with reading difficulties. Journal of Educational Psychology, 111(3), 373–386. https://doi.org/10.1037/edu0000289

48.

Vaughn

Roberts

G. J.

Capin

Miciak

Cho

Fletcher

J. M.

(2019). How initial word reading and language skills affect reading comprehension outcomes for students with reading difficulties. Exceptional Children, 85(2), 180–196. https://doi.org/10.1177/0014402918782618

49.

Vaughn

Roberts

G. J.

Miciak

Taylor

Fletcher

J. M.

(2019). Efficacy of a word-and text-based intervention for students with significant reading difficulties. Journal of Learning Disabilities, 52(1), 31–44. https://doi.org/10.1177/0022219418775113

50.

Vaughn

Roberts

G. J.

Wexler

Vaughn

M. G.

Fall

A. M.

Schnakenberg

J. B.

(2015). High school students with reading comprehension difficulties: Results of a randomized control trial of a two-year intervention. Journal of Learning Disabilities, 48(5), 546–558. https://doi.org/10.1177/0022219413515511

51.

Vaughn

Wexler

Roberts

Barth

A. A.

Cirino

P. T.

Romain

M. A.

Denton

C. A.

(2011). Effects of individualized and standardized interventions on middle school students with reading disabilities. Exceptional Children, 77(4), 391–407. https://doi.org/10.1177/001440291107700401

52.

Wanzek

Vaughn

Scammacca

N. K.

Metz

Murray

C. S.

Roberts

Danielson

(2013). Extensive reading interventions for students with reading difficulties after grade 3. Review of Educational Research, 83(2), 163–195. https://doi.org/10.3102/0034654313477212

53.

What Works Clearinghouse. (2020). Standards handbook (Version 4.1). U.S. Department of Education, Institute of Education Sciences. https://ies.ed.gov/ncee/wwc/Docs/ReferenceResources/wwc_attrition_v3.0.pdf

54.

Williams

K. J.

Vaughn

(2020). Effects of an intensive reading intervention for ninth-grade English learners with learning disabilities. Learning Disability Quarterly, 43(3), 154–166. https://doi.org/10.1177/0731948719851745

55.

Woodcock

R. W.

McGrew

K. S.

Mather

(2001). Woodcock-Johnson III Tests of Achievement. Riverside.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.39 MB