Abstract
Students in grades 2 through 4 with significant word reading difficulties were randomly assigned to one of two 10-week interventions. In the
Difficulties in word reading are the primary cause of chronic reading failure (Perfetti, 1985; Share & Stanovich, 1995) and the most common factor associated with reading comprehension difficulties (Ritchey et al., 2015; Shankweiller et al., 1999). Slow and inaccurate word reading are hallmarks of word-level reading disability (WLRD), which may be formally identified as dyslexia or specific learning disability in basic reading, the largest subgroup of the learning disability population (Fletcher et al., 2018).
Numerous interventions and instructional approaches aimed at improving word reading skills have been developed, published, and studied, and they have been the subject of numerous literature syntheses and meta-analyses (e.g., Al Otaiba et al., 2023; Shanahan, 2023). The conclusions of these meta-analyses, especially from the most contemporary ones, is that interventions demonstrate benefits on the reading skills of students with WLRD, but the effects are modest and inconsistent.
Programs based on the Orton-Gillingham approach have long been referred to as the preferred “gold standard” types of interventions for WLRD, as reflected in state legislation (Stevens et al., 2021), and case law (Sayeski & Zirkel, 2021). However, Stevens and colleagues’ (2021) meta-analysis indicated that although they are associated with some improvements in basic reading skills (
We interpret these recent reviews as indicating a need for reevaluation and improvement of interventions for students with WLRD. Overall effects tend to be modest, with interventions often producing large improvements in pseudoword decoding; however, they have weaker and inconsistent intervention effects on students’ generalized ability to read real words and text, especially for students with more significant reading difficulties. This evidence is consistent with scholars’ calls for critically evaluating interventions for students with WLRD. Compton et al. (2014) questioned whether interventions are adequately aligned with theories of reading and suggested that the way orthographic-phonological relations are typically taught in interventions for WLRD, with an emphasis on phonetically “regular” spelling-sound correspondences and sound-by-sound decoding, may not afford students the opportunities to develop knowledge of larger letter combinations or words with spelling-sound variability. This may result in a shallow set of decoding skills that are effective for phonetically regular pseudowords but have limited transfer to reading real words, where flexibility with spelling-sound correspondence is important (Compton et al., 2014; Ehri, 2017; Elbro & deJong, 2017; Fletcher et al., 2018). Overall, existing interventions may have failed to sufficiently expose students to variability in pronunciations, which is necessary to promote generalization to untaught words and spelling patterns, or to employ generative and inductive learning strategies that are hallmarks of successful reading development. In addition, interventions may have focused insufficient attention on increasingly larger
This implicit perception of patterns and statistical regularities in stimuli across time and space—referred to as
This perception of statistical regularities is central to connectionist accounts of reading development (Foorman, 1994; Harm & Seidenberg, 2004; Seidenberg & McClelland, 1989), which posit that learning to read thousands of words results from a gradual buildup of connections among orthographic, phonological, and semantic units. Through extensive interaction with print and feedback on correct pronunciations, the learner becomes tuned to the statistical regularities of the writing system and thereby exhibits rule-like behavior without having experienced extensive rule-based instruction (Foorman, 1994). In a quasi-regular orthography like English, “rules” about spelling-sound correspondence do not always apply, but there are patterns and probabilities that can be perceived based on position in a word (e.g.,
Although statistical learning is likely involved in reading, we are not optimistic that it is something that can be “trained.” Instead, we think a more productive perspective is to consider how explicit phonics instruction and reading practice can be situated within contexts, or environments (Rueckl, 2010), that provide ideal conditions for statistical learning to occur. Statistical learning requires extensive exposure to the distributional properties of sensory input (Frost et al., 2019). It thrives on variability; otherwise, the perception of regularities in stimuli (i.e., when things do and do not occur together, when rules do and do not apply) would not be possible. It may be the case that, given evidence that interventions have had stronger effects on pseudoword decoding skills, interventions may have overemphasized spelling rules and provided insufficient access to the variability present in written English. This may result in success in reading words in which letters conform to their most common sound, but difficulty reading words in which letters deviate from their most common pronunciation.
A critical aspect of reading instruction and intervention is that the benefits should not be limited reading to reading words or specific letter patterns that were taught, but that it should ultimately result in inductive, flexible, generalized skills in reading words that were not part of instruction. Phonics instruction that teaches common grapheme-phoneme relations but also alternative pronunciations, situates instruction and supported practice within a word corpus in which spelling-sound correspondence is allowed to vary, and encourages flexibility in spelling-sound correspondence, may be a step toward creating an environment in which statistical learning is more likely to occur. Furthermore,
Purpose
This pilot study is part of a series of experiments to iteratively develop an intervention to improve word and text reading efficiency for elementary students with significant word reading difficulties. The purpose of this study was to investigate the extent to which decoding intervention that directs students’ attention to larger letter units, intentionally interleaves words containing alternative pronunciations of targeted letter patterns, and aligns high-frequency words with decoding instruction improves generalized word reading skills for students with significant word reading difficulties in grades 2 through 4.
Method
Participants
Participants were recruited from five schools and three school districts in the Southwestern United States. All second, third, and fourth-grade students were administered the Test of Silent Reading Efficiency and Comprehension (TOSREC; Wagner et al., 2010). Individuals who scored at or below the 10th percentile received parent consent forms. Students who returned signed consent were administered the Sight Word Efficiency (SWE) subtest from the Test of Word Reading Efficiency, Second Edition (TOWRE-2; Torgesen et al., 2012). Students who performed at or below the 10th percentile on the SWE were enrolled. Sixty-six students met the inclusion criteria. The 10th percentile was selected because we sought students with significant reading word-reading difficulties. There are no universally agreed-upon criteria for identifying or operationally defining WLRD, however the 10th percentile falls below the lower bound of the average range and is proximal to criteria used in other studies for identifying significant reading difficulties or WLRD, such as the 25th percentile (e.g., Christodoulou et al., 2014; Hoeft et al., 2011; Ozernov-Palchik et al., 2023), 20th percentile (e.g., Torgesen, 2006; Wagner et al., 2023), 16th or 15th percentiles (e.g., Alonzo et al., 2020; Compton et al., 2012; Perrachione et al., 2016), 10th percentile (e.g., Burns et al., 2020; Grigorenko et al., 2000; Torppa et al., 2015), and 5th percentile (e.g., Wagner, 2018; Wagner et al., 2023).
Randomization, Attrition, and Final Sample
Simple randomization was not possible because the schools did not have common times when all students were available for intervention, although common times were available within grade levels. Therefore, we employed a stratified randomization approach, in which we first assigned enrolled students within grade levels to intervention groups of 2 to 4 students, and then randomly assigned the groups to one of the two conditions. A total of 21 groups (11 Dual Treatment, 10 Integrated) were randomly assigned. Of 66 students that met inclusion criteria, 58 students received pretests, intervention, and posttests (four students relocated to another school district during the study, two students were withdrawn due to scheduling conflicts, one student withdrew assent, and one student was withdrawn by their caregiver). Participant demographic characteristics of the final sample are reported in Table 1.
Participant Demographics, Final Analysis Sample (
Includes students identified with a disability and/or identified for services through Section 504.
Intervention Conditions
Conditions were referred to as
Instruction and practice in both conditions involved a total of 34 words per lesson. Both interventions involved (a) explicit and systematic decoding instruction in reading GPCs (which included letter combinations), (b) decoding and practicing reading words with those letter combinations, and (c) instruction and practice reading high-frequency words. However, the way instruction was delivered and the sequencing of content differed between the two conditions. Table 2 summarizes the contrast of instructional elements and sequence. An abbreviated description is provided below, and an additional description of the intervention conditions is included in the online Supplemental file.
Comparison of Intervention Condition Elements and Content.
Dual Treatment Condition
The Dual Treatment condition reflected ways in which word reading instruction and interventions have often been structured. Decoding instruction and practice focused on standard (i.e., the most common) pronunciations of targeted GPCs, which were introduced in a sequence in which simpler ones were targeted before complex. Words emphasized spelling-sound consistency (i.e., “highly decodable”); that is, to the greatest extent possible, nontargeted graphemes in words also reflected the most commonly associated sounds. Decoding was taught explicitly on a sound-by-sound basis, in which the teacher modeled and prompted students to sound out and blend words using simple (i.e., 1-2 letter) GPCs. With a set of six words that contained the target GPC, tutors led students in chorally sounding out and blending each word, followed by another reading of the list using “whisper read,” a delayed prompting technique in which tutors pointed to each letter (pausing briefly on each one) while students whispered the sounds to themselves and then say the word aloud when signaled. The tutor provided affirmative feedback and repeated the word. All errors were corrected using sound-by-sound prompts and modeling.
Students then practiced reading a set of 24 words comprised of (a) 12 words containing the target GPC (six words practiced previously and six new words), (b) 10 words containing the GPC targeted in the previous lesson (six words practiced previously and four new words), and (c) two words containing the GPC targeted two lessons previous (both words were new to the student). In all words, the target GPC, as well as other letters, were associated with their standard pronunciations. Tutors led students in reading the list using echo reading (i.e., the tutor read the word, and students repeated) and whisper reading, followed by 1:1 practice with the tutor (students practiced with a partner while they waited for their turn).
High-frequency words targeted in the Dual Treatment condition were taught in a separate exercise after decoding instruction in each lesson. These high-frequency words were taught differently from words in decoding instruction. Words were introduced in order of their frequency in printed English, as reported by (https://norvig.com/ngrams/), and therefore did not necessarily include the target GPCs for that lesson. Words were taught using an approach consistent with the tendency for reading instruction and intervention programs to treat instruction in high-frequency words differently from phonics instruction (Ferrell et al., n.d.). With a set of five words, tutors read and pointed to each word, prompting students to repeat it. Next, tutors and students engaged in orally spelling the word, followed by a choral reading of the word, and an additional student reread of the word. The list was reread using choral reading and individual student responses. Students then practiced a set of 10 high-frequency words, which included (a) five high-frequency words just introduced, (b) three high-frequency words targeted in the previous lesson, and (c) two high-frequency words targeted in the lesson from two lessons earlier. Practice involved tutor-led echo reading, choral reading of the word list, and 1:1 student practice reading to the tutor.
Integrated Word Reading Condition
Lessons in the Integrated condition included the same elements as the Dual Treatment condition (i.e., introduction of the target GPC, decoding instruction, high-frequency word instruction, and word reading practice), but content was presented and taught differently. We taught standard
Grapheme-phoneme correspondences were introduced in order of their frequency in printed English. Each GPC was taught beginning with its standard pronunciation, followed by instruction in which the GPC appeared with letters that commonly occur with it (e.g., “or” says /ore/ in
Next, high-frequency word instruction in the Integrated condition targeted five words. Although the same high-frequency words were taught in both conditions, in the Integrated condition, the order of introduction of the high-frequency words was based on the GPC or sound targeted in the lesson (as opposed to being ordered based on frequency in the Dual Treatment condition). In addition, in the Integrated condition, students were prompted to sound out the high-frequency words in the same manner as they did in decoding instruction (as opposed to a look-say approach in the Dual Treatment condition). Tutors led students in chorally segmenting and blending the words, followed by another reading of the list with whisper reading.
In contrast to the Dual Treatment condition, in which words in decoding and high-frequency word instruction were practiced separately, word reading practice in each lesson of the Integrated condition involved practice with a mixed set of 34 words. This set included 12 words that interleaved six standard pronunciation words and six alternate pronunciation words with the target GPC (three in each set were practiced in the previous activity and three were new), five standard pronunciation and five alternate pronunciation words with the GPC targeted in the previous lesson (three words in each set that were practiced previously and two new words), one new word containing the standard pronunciation and one new word with the alternate pronunciation of GPC targeted two lessons ago, the five high-frequency words introduced in the current lesson, three high-frequency words targeted in the previous lesson, and two high-frequency words targeted two lessons ago. Words in each list were randomly ordered. Tutors led students in reading this list through echo reading, whisper reading, and individual student practice with the tutor. Tutors corrected errors and provided support by modeling decoding using larger letter units.
Daily and Periodic Review in Both Conditions
In addition to the embedded review described in both conditions above, nine review lessons occurred in both conditions. In both conditions, practice involved reading words in lists through tutor-led practice, partner practice, and 1:1 practice with the tutor. In the Dual Treatment condition review lessons, students practiced reading a list of words containing taught GPCs; six previously taught words per GPC taught across the previous six to eight lessons were included in each review word list. They then practiced reading a separate list that contained all high-frequency words introduced since the last review lesson. Tutors provided corrective feedback by prompting students to use individual letter sounds to decode words, and for the high-frequency word list, modeled the reading of the words as a whole unit.
In the Integrated condition review lessons, students practiced reading two word lists; the lists were composed of words from the decoding and high-frequency word sets, but were split into two separate lists to promote opportunities for student success in reading smaller sets of words. Across the two lists, words were randomly ordered and included (a) all high-frequency words targeted across the six previous lessons, and (b) six previously introduced words for each spelling pattern targeted in the last six lessons (three standard pronunciation and three alternate pronunciations of the GPC). Consistent with the Integrated condition, tutors provided corrective feedback by prompting students to decode words using large letter units.
Selection of Words for Instruction and Practice, and Alignment With Condition
Although they were ordered differently, both intervention conditions used the same content in terms of GPCs and words targeted across the lessons. We selected GPCs that are commonly taught in phonics instruction programs (e.g., Wilson Fundations). A total of 36 GPCs were targeted in instruction across lessons.
A corpus of 902 words was created for the intervention. This included 697 words that contained target GPCs that were targeted in decoding instruction and practice and 205 high-frequency words that were targeted in high-frequency word instruction and practice. Words with target GPCs were selected using
Shared and Unique Words Across Conditions
A total of 902 words were targeted in each condition (i.e., included in decoding instruction activities, high-frequency word instruction, and practice activities). A core set of 477 words occurred in instruction or practice opportunities in both conditions, which included 272 words targeted in decoding instruction or practice, and 205 words targeted in high-frequency word instruction or practice (the same high-frequency words were targeted in both conditions). However, because the Integrated condition included instruction and practice with words in which GPCs and other letters did not conform to their most common sound, 239 words occurred in the Integrated condition that did not appear in the Dual Treatment condition. In contrast, decoding instruction and practice in the Dual Treatment condition only included words with standard pronunciations of the target GPCs and other letters. Therefore, decoding instruction in the Integrated condition included 239 variable pronunciation words and 458 standard pronunciation words (272 of which were also taught in the Dual Treatment condition), whereas decoding instruction in the Dual Treatment condition included 697 standard words.
Tutors and Training
Tutors were current or former educators with experience implementing reading interventions or working with elementary-aged students. Tutors received explicit instruction in GPCs and training for both conditions using scripted lesson plans. Training involved modeling of both conditions by the first and second authors while tutors observed, followed by guided practice, partner practice, and independent rehearsal of both conditions. Tutors then delivered the intervention to project leads, who evaluated their adherence to the lesson protocols.
Implementation Fidelity
Fidelity was measured through in-person and audio-recorded observations of intervention sessions. We used a 0-2 scale (i.e., 0 =
Measures
Norm-Referenced Measures
The SWE subtest from the TOWRE-2 is an individually administered test of word reading efficiency. Scores consist of the number of words read correctly in 45 seconds from a list that increases in difficulty. Test-retest reliability for elementary students exceeds .89 (Torgesen et al., 2012).
The Test of Silent Word Reading Fluency, Second Edition (TOSWRF-2; Mather et al., 2014) consists of a series of words arranged in rows without spaces between them (e.g., ballpooltreerunbead); students complete the measure silently by marking a slash between individual words. The measure is scored in terms of the number of correct slashes in 3 minutes. The mean test-retest reliability for elementary students is .89; correlations with word- and text-reading exceed .70 (Mather et al., 2014).
The TOSREC is a group-administered test of reading efficiency and comprehension. Students silently read a series of sentences and verify the truthfulness of as many sentences as possible within 3 minutes. Form O was administered at pretest and posttest. Alternate-form reliability exceeds .85 across all grades and forms (Johnson et al., 2011; Wagner et al., 2010).
We used the Oral Reading Fluency (ORF) subtest from the Woodcock Reading Mastery Test, Third Edition (WRMT-3; Woodcock, 2011). Students read two passages based on their grade level, but are administered a passage from a lower level of difficulty if they score below a predetermined cut point. Scores are calculated based on the number of words read correctly, divided by the number of seconds needed to read the passage. The quotient is multiplied by 10 and converted to a standard score. Split-half reliability exceeds .90 and test-retest reliability exceeds .76 for elementary grades (Woodcock, 2011).
Researcher-Developed Measures
The Proximal Word Reading (PWR) task was developed for this study, which measured students’ ability to read 300 words that were either (a) taught in the interventions or (b) not included in the intervention content but contained spelling patterns targeted in both conditions. To mitigate test fatigue, the words were randomly divided into six test forms of 50 words each. The test forms were administered in a randomized order for each student across the pre- and posttest sessions. On each test form, students were asked to read the words aloud while the examiner scored words pronounced correctly and incorrectly and recorded the amount of time needed to read each list. All words in the PWR were randomly sampled from their respective sets and representation was as follows: (a)
Letter Combination Accuracy included 34 letters and letter combinations taught in both conditions, which were randomly ordered in a list. Students were asked to say the sound of each letter combination while the examiner scored correct and incorrect responses. Nonstandard pronunciations for letter combinations were accepted if the pronunciation occurred in printed English (e.g., for the
We also assessed students’ ability to spell words that included letter combinations targeted across the lessons. We selected 10 words that each contained multiple targeted GPCs (e.g., theater, chain, cashew). Students spelled each word on a sheet of lined paper. We scored the measure two ways: (a) the number of words students spelled correctly, and (b) the correct letter sequences metric, in which one point is awarded for every correctly sequenced pair of letters including the first and last letters, which allows for scores that are sensitive to partially correct spellings (e.g., six points are possible for the word
Analyses
We fit a multilevel model to estimate the impact of the treatment on student outcomes, controlling for baseline pretest scores and accounting for group-level variability. Students were treated as Level 1 units nested within groups at Level 2. Fixed effects included grand-mean-centered pretest scores and the treatment condition, coded as 0 for the Dual Treatment condition and 1 for the Integrated condition. Random effects captured group-level variation, enabling a more precise estimation of the treatment effect. However, for some outcomes, the multilevel model did not converge. We implemented a single-level model to ensure stable and reliable results in these cases. Given the relatively small number of schools in the sample (
where
Results
Pretest Equivalence
We used the What Works Clearinghouse (WWC) standard to evaluate baseline equivalence across the intervention conditions (What Works Clearinghouse, 2020). The WWC requires that quasi-experimental and randomized trials with high attrition or compromised random assignment demonstrate equivalence between groups at pretest. Equivalence is evaluated by comparing baseline differences measured in standardized effect size (ES) units, where ESs less than or equal to 0.05 satisfy the equivalence standard, ESs between 0.05 and 0.25 (including 0.25) require statistical adjustment, and ESs greater than 0.25 fail to satisfy the baseline equivalence standard. Descriptive statistics for pretest comparisons are reported in Table 3. None of the pretest group differences were statistically significant. Effect sizes ranged from 0.01 to 0.18, except for Oral Reading Fluency (
Descriptive Statistics.
Posttest Analyses
Results of the regression analysis estimating the main effect of intervention are reported in Table 4. Effect sizes are summarized in Table 5. Given the small sample size and the pilot nature of this study, we considered effect sizes, which controlled for pretest scores and effects of clustering, as primary indices of the direction of intervention effects. In contrast to the rules of thumb that Cohen (1992) offered, contemporary perspectives on effect size interpretation in education and social sciences research are more empirically informed and consider study and contextual elements (Hill et al., 2008; Kraft, 2020; Schmitt et al., 2017). Kraft (2020), based on numerous aspects relevant to educational research, offered updated general guidelines that suggested that effect sizes below 0.05 are considered small, 0.05 to 0.20 are considered moderate, and above 0.20 are considered large. Given the pilot nature of this study and the developmental purpose of the overarching project, we considered effect sizes as the primary indices of the direction of effects.
Results of the Regression Analysis Estimating the Main Effect of Intervention.
Effect Size Estimates With 95% Confidence Intervals for All Outcomes (Positive Effect Sizes Favor the Integrated Condition).
On the standardized measures, while controlling for pretest scores, statistically significant differences at posttest were observed favoring the Integrated condition on SWE (
Considering the researcher-developed measures, posttest group differences were not statistically significant but effect sizes favored the Integrated condition on the total number of correctly read words on the PWR (
We also examined group differences on different word categories on the PWR, which included “taught” (i.e., appeared in both conditions) and “untaught” (i.e., did not appear in either condition) high-frequency words, untaught standard pronunciation (i.e., words in which letters conformed to their most common sound), and untaught variable pronunciation (i.e., words in which some letters did not conform to their most common sound) items. Results are reported in Table 4, and effect sizes are reported in Table 5. Effect sizes favored the Integrated condition across these categories (
Discussion
Influenced by calls for reading interventions to better improve skills beyond pseudoword decoding (Compton et al., 2014), this pilot experiment was part of the ongoing development of an intervention to enhance generalized word- and text-reading efficiency for elementary students with significant word-reading difficulties. Instruction compared a Dual Treatment condition to an Integrated condition that included exposure to variability in spelling-sound correspondence, attending to larger letter units, and alignment of high-frequency words with decoding instruction. Although students in both conditions improved on most measures from pretest to posttest, most effect sizes favored students in groups randomly assigned to the Integrated condition while adjusting for clustering and controlling for pretest performance.
At posttest, statistically significant group differences (and the largest effect sizes) favored the Integrated condition on standardized tests of word reading efficiency (TOWRE-2 SWE) and word recognition efficiency (TOSWRF-2). Nonstatistically significant effect sizes also favored the Integrated condition on a test of silent reading efficiency (TOSREC), however, examination of the data indicated a slight decline in the mean grade-based standard score for the Dual Treatment condition whereas the pre- and posttest standard scores for the Integrated condition remained equivalent.
On researcher-developed measures, effect sizes favored the Integrated condition in associating sounds with the GPCs targeted in the interventions, even though all were taught in both conditions. Effect sizes also favored the Integrated condition on words in the PWR total score, including subscales for words taught in both conditions
Overall, these preliminary results suggest that the makeup of the Integrated condition may have benefited not only students’ ability to read content targeted in the intervention but also improved their generalized word reading skills, as evidenced by their performance on standardized and researcher-developed measures. Students in both conditions were equally exposed to a large number of words, however, students in the Integrated condition were exposed to much greater variability in spelling-sound correspondence.
The Integrated condition was designed to expose students to more diversity in spelling-sound correspondences by interleaving alternative pronunciations among standard pronunciations and to promote greater attention to larger letter units. In doing so, it is possible that this condition may have resulted in greater attention to word spellings, which may have (a) enhanced orthographic-phonological connections for words included in the intervention, and (b) resulted in a greater likelihood of recognizing statistical regularities in spelling patterns that promoted generalized word reading improvement. Statistical learning does not appear to develop quickly (Treiman & Kessler, 2022), and as noted earlier, we are doubtful that a domain-general ability to learn implicitly can be “trained” specifically for reading. A more reasonable direction, which we pursued in this study, is the possibility that explicit phonics instruction can be embedded within environments (Rueckl, 2010) that facilitate statistical learning. One way to establish this type of environment might be to teach and provide abundant opportunities to practice reading words (with immediate feedback) that reflect variability in spelling-sound correspondence, as we did in the Integrated condition. We can only speculate on whether the Integrated condition enabled statistical learning more so than the Dual Treatment condition, but the design of the Integrated condition was similar to how scholars have suggested that instruction might foster perception of probabilistic regularities across stimuli (e.g., Foorman, 1994; Treiman & Kessler, 2022); that is, by exposing students to variations in spelling-sound correspondence and instruction in letters that commonly co-occur.
Outcomes did not favor the Integrated condition on all measures, and one of those was the standardized ORF measure. This may be another example of the tendency for interventions to have weaker effects on text-reading fluency (Hall et al., 2023; Torgesen, 2006), but it is also possible that the lack of effect was due to issues with the measure. The ORF subtest from the WRMT-3 involves administering two passages based on the student’s grade level, but if students in grades 3 or 4 score below a minimum score on the first passage, a passage from a lower grade level is administered as the second passage. Standard scores can only be derived from two passages. In this study, 17 students saw an easier passage at pretest or posttest, while the other students saw the same two passages on both occasions. This approach is acceptable in an individual evaluation of students at a single point in time (i.e., the primary purpose of the WRMT-3), but is problematic for use on a repeated basis, as in pre- and posttest administration.
In addition, effect sizes favored the Dual Treatment condition at posttest on the spelling measure scored for correct spelling sequences. Instruction in the Dual Treatment condition, by focusing on consistency in spelling-sound correspondence and letter units, may have fostered a stronger ability to recall letter sequences from target words. Interestingly, this was the only occasion in which students in the Dual Treatment condition outperformed students in the Integrated condition; even in the case of the letter combination-sound accuracy measure, students in the Integrated condition outperformed the Dual Treatment condition in the recall of the sounds of printed letters. Although spelling and decoding rely on a foundation of common skills (Ehri, 2000), spelling words is more difficult than reading them across ability levels because correct spelling requires the recall of precise orthographic representations. This precision may have been more prevalent in the Dual Treatment condition with its focus on small grain sizes and consistency in GPC. However, the precision that enables spelling accuracy is not imperative for skilled word reading, given that even highly proficient readers are prone to spelling mistakes and may require significant cognitive effort to spell words they can recognize effortlessly in print. Therefore, the greater variability in the Integrated condition may have benefited students’ word reading skills but less so their recall of word spellings.
Limitations
One limitation of this study is the absence of a no-treatment control group. Our randomized two-group comparison design provided a more stringent test than comparing the Integrated condition to a business-as-usual instruction, but we lacked data on students’ improvement relative to those who did not receive either of the interventions.
Second, this was a small and underpowered pilot study, which is likely the reason for the lack of statistical significance in most of the group comparisons. Mean scores in both conditions improved from pretest to posttest on most measures. Although effect sizes mostly favored the Integrated condition, conclusions regarding the benefits of the tested intervention elements await larger studies.
Third, the multilevel model did not converge for all outcomes. This may be due to limited data or small cluster sizes, which can affect the precision of some estimates. Future studies with larger samples may help address these modeling challenges.
Implications for Research and Intervention Development
The Integrated condition differed from the Dual Treatment condition in several ways, including variability in spelling-sound pronunciations, prompting decoding with larger letter units, and the approach to teaching high-frequency words. We considered these elements collectively as ways in which intervention may be aligned (or misaligned) with connectionist and statistical learning perspectives. Our results suggest that something about the Integrated condition benefited word learning and generalization; however, we cannot draw conclusions regarding which specific elements of the Integrated condition were more effective. Although the findings do not permit making specific practical recommendations, they raise worthwhile questions regarding how interventions for students with significant word reading difficulties may be improved.
First, our intervention provided nearly no instruction in spelling or syllable “rules,” as is common in many interventions designed for students with dyslexia. We provided explicit instruction and feedback in letter-sound correspondence and decoding, but this was predominantly in the context of opportunities to read many words. This was true of both conditions; however, the Dual Treatment condition was designed in a way that resembled this to an extent by emphasizing that “[letter] usually says [sound].” Kearns (2020) raised concerns about instruction that teaches students to memorize syllable division rules due to how inconsistently they actually apply. The same types of questions can be asked about the utility of teaching students to recite spelling rules, particularly for students with WLRD. It is possible that significant instruction in rules is inferior to providing students with abundant opportunities to read various types of words, accompanied by immediate, affirmative or corrective feedback on their pronunciations, which reinforces flexibility in spelling-sound correspondence. Such an approach may better engage the brain’s propensity to learn spelling-sound correspondence through exposure to letter position within words and surrounding letters that influence their pronunciation. Future studies might contrast instruction that emphasizes learning spelling rules with instruction in which rule instruction is minimized in favor of extensive and varied word reading practice.
Second, there is value in further investigating when and how to expose students to the variability present in a semi-transparent orthography like English. Although most programs do not avoid phonetically irregular words, many treat such words differently and separate them instructionally, referring to them as “tricky words” or “heart words” (i.e., an irregularly pronounced portion or a word must be memorized “by heart”), or suggest they need to be memorized as whole units rather than attacked through letter-sound correspondence. In contrast to this view, the Integrated condition in this study exposed students to the variation in spelling-sound correspondence from the beginning of each lesson, reminding students that “we have to be flexible decoders because sometimes [letters] say [alternate pronunciation],” and then immediately provided instruction and practice opportunities in both words with standard and alternative pronunciations. Our approach to the Integrated condition was influenced by the connectionist perspective, which argues that there are not different “types” of words, but rather that all words exist on a continuum of spelling-sound consistency (Seidenberg et al., 2022). Prior research indicates that there is no superior way of teaching students to read irregular words (Colenbrander et al., 2020). Some have suggested that students should be taught to attack such words like regular words with decoding support from a teacher (e.g., Carnine et al., 2004). Colenbrander et al. (2022) compared methods for teaching typically-developing kindergarten students to read irregular words and found that strategies that involved sounding out and adjusting pronunciations (i.e., like in the Integrated condition), or opportunities to spell the words, were superior to “look-say” methods of whole word reading.
Clearly, there remains a need to continue investigating instructional approaches for students with WLRD. Although there is theoretical and empirical support for teaching synthetic phonics (i.e., “sounding out”) as a foundation of decoding, research, including studies of students with reading difficulties, has not clearly established its superiority over analytic phonics, or superiority of any one approach including phonemic/segmental, analogy/onset-rime, or whole-word approaches over another (Castles et al., 2018; Levy et al., 1997; Lovett et al., 1990; McArthur et al., 2015; Shapiro & Solity, 2016; Wise, 1992; Wise et al., 2000). There is a need for continued research on the potential benefit of teaching students a combination of strategies for reading words (e.g., Castles et al., 2018; Hart et al., 1997), and how to flexibly apply them (e.g., Lovett et al., 2017), that may foster stronger spelling-sound connections and greater skill generalization.
Third, it is possible that historically, interventions have been too cautious in introducing spelling-sound variability, opting for consistency in GPC in efforts to increase students’ rates of success. This notion has long been present in interventions and supports for students with disabilities, aiming to maintain motivation and engagement. However, research has revealed the benefits of “desirable difficulties” (Bjork & Bjork, 2011) on human learning, such as intentionally introducing more variation in stimuli, such as through interleaved practice, varying practice conditions, and using testing as a learning tool. Missing from this work, however, are studies investigating if and how desirable difficulties benefit the word reading skills of students with or at risk for WLRD. In our Integrated condition, it is possible that the interleaving of standard and variable pronunciations from the start of each lesson created a desirable difficulty and required greater attention to the orthographic forms of the words. But our conclusions remain speculative. It is also worth noting that, although treatment acceptability was not part of the research questions in this study, to inform the overall project we asked students the extent to which they “liked” the intervention and helped them “read better,” using a 0 to 2 scale (0 =
We strongly recommend further investigation into whether there is a benefit in carefully introducing desirable difficulties that may slow acquisition but could result in more durable and generalizable skills. These studies should also use delayed posttests to evaluate long-term retention, as research on desirable difficulties has demonstrated stronger retention of taught content compared with better short-term retention of the same content under less demanding conditions (Bjork & Bjork, 2011). Studies investigating interleaved practice and other desirable difficulties have not typically involved early literacy skills or included students with WLRD and disabilities (Richter et al., 2022), underscoring the need for further research in this area.
Finally, subsequent studies investigating these (and other) intervention elements should consider other experimental designs that support causal inferences. Obviously, there are advantages to studies with larger samples, including greater statistical power and the ability to test multiple treatment arms and control conditions. However, larger studies require greater resources and more intensive commitments from school partners in a climate where obtaining funding is increasingly difficult and schools are facing greater restrictions on their ability to support large studies. To advance research on intervention innovations in this context, researchers might consider quasi-experimental designs that account for maturation using additional pretest/posttest occasions, interrupted time series designs, or adding nonequivalent dependent variables, and consider underutilized randomized experimental designs such as factorial designs which are advantageous for testing combinations of treatments or elements (Shadish et al., 2002).
Conclusion
This pilot study suggested advantages of an intervention that exposed students to greater variability in spelling-sound correspondence, attended to larger GPC units, and aligned high-frequency word instruction with decoding instruction for elementary students with significant word reading difficulties. Our findings suggest there may be approaches that better expose students with WLRD to the variation present in semi-transparent orthographies, and have implications for subsequent studies that investigate instructional contexts that, combined with explicit instruction, take advantage of the brain’s ability to learn through frequency and probabilistic patterns.
Supplemental Material
sj-docx-1-ldx-10.1177_00222194261417589 – Supplemental material for A Pilot Study Examining Elements to Improve Generalized Word Reading Skills for Students With Significant Word Reading Difficulties
Supplemental material, sj-docx-1-ldx-10.1177_00222194261417589 for A Pilot Study Examining Elements to Improve Generalized Word Reading Skills for Students With Significant Word Reading Difficulties by Nathan H. Clemens, Alexis N. Boucher, Sharon Vaughn, Marcia A. Barnes, Greg Roberts, Anna-Mari Fall, J.E. Miller, Nancy Scammacca and Megan Osbon in Journal of Learning Disabilities
Footnotes
Acknowledgements
The opinions expressed in this article are those of the authors and do not represent the views of the Institute of Education Sciences or the U.S. Department of Education.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
This study was supported by a grant from the U.S. Department of Education, Institute for Education Sciences, Grant #R324A200209.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
