Sage Journals: Discover world-class research

Abstract

Dynamic assessments (DAs) of word reading skills demonstrate strong criterion reference validity with word reading measures (WRMs). However, DAs vary in the skills they assess, their format and administration method, and the type of words and symbols used in test items. These characteristics may have implications on assessment validity. To compare validity of DAs of word reading skills on these factors of interest, a systematic review of five databases and the gray literature was conducted. We identified 35 studies that met the inclusion criteria of evaluating participants aged 4 to 10, using a DA of word reading skills and reporting a Pearson’s correlation coefficient as an effect size. A random effects meta-analysis with robust variance estimation and subgroup analyses by DA characteristics was conducted. There were no significant differences in mean effect size based on administration method (computer vs. in-person) or symbol type (familiar vs. novel). However, DAs that evaluate phonological awareness or decoding (vs. sound-symbol knowledge), those that use a graduated prompt format (vs. test-teach-retest), and DAs that use nonwords (vs. real words) demonstrated significantly stronger correlations with WRMs. These results inform selection of DAs in clinical and research settings, and development of novel, valid DAs of word reading skills.

Keywords

dynamic assessment validity literacy reading decoding phonological awareness alphabetic principle virtual assessment

Introduction

Literacy is a complex construct requiring integration of multiple skills. Simply, it can be described as the product of the ability to recognize or decode words and to comprehend language (Hoover & Gough, 1990). In this review, we define word reading skills as subskills that comprise word recognition ability (e.g., Scarborough, 2001). These subskills—phonological awareness, knowledge of the alphabetic principle (or sound-symbol knowledge) decoding and sight word recognition have been consistently found to be among the strongest and most accurate predictors of reading ability for young children (e.g., Catts et al., 2005; Ehri, 1998; Hogan et al., 2005). In speech-language pathology (SLP), psychology and education, most word reading tools employ a static assessment (SA) paradigm (e.g., Phonological Awareness Test-2 [PAT-2:NU], Robertson & Salter, 2017; Woodcock Reading Mastery Test-III [WRMT-III], Woodcock, 2011). In SA, the examiner evaluates a child’s acquired knowledge in a given domain without providing prompts or feedback and then compares their performance to their peers (Grigorenko & Sternberg, 1998). For example, in a static decoding task, a child would be presented with a word (e.g., pup) and asked to read it. If the child struggled or made an error, their response would be marked as incorrect, and the examiner would continue on to the next item.

Children from, diverse linguistic backgrounds or those with fewer literacy experiences are prone to perform poorly because they have limited or different acquired knowledge compared to the English monolingual children for whom the tests are designed (Bedore & Peña, 2008; Ginsborg, 2006). In our previous example of a static decoding task, a bilingual child might struggle to read the word “pup” because of a lack of familiarity with the vocabulary term, or the English letters or sounds. When many children underperform, it leads to floor effects, rendering it difficult to discern those who are truly at-risk from those who have had insufficient linguistic or educational experiences. This can result in failure to identify word reading difficulties early (Catts et al., 2009).

Given these limitations, interest in alternative approaches, like dynamic assessments (DAs), has been increasing. While SAs measure a child’s acquired skills DAs examine a child’s ability to learn a skill with support in the form of teaching, feedback and prompting from the examiner in the test (Grigorenko & Sternberg, 1998). This approach reduces bias and misidentification of difficulty because the impact of previous linguistic or literacy experiences on test outcomes is minimized (Bedore & Peña, 2008; Petersen & Gillam, 2013). Reviews that have evaluated the use of DAs report promising findings on their utility and validity. DAs demonstrate greater predictive validity than SAs across several domains (e.g., DAs of cognitive ability, literacy, and mathematics; Caffrey et al., 2008). DAs of word reading can predict unique variance in later reading ability beyond SAs; (Dixon, Oxley, Gellert, & Nash, 2023), can contribute to the accurate identification of reading difficulties; (Dixon, Oxley, Nash, & Gellert, 2023), and demonstrate consistent validity with word reading outcome measures across typically developing, at-risk, bilingual, and monolingual children (Wood et al., 2024).

However, these DAs are also characterized by heterogeneity in terms of the word reading skills they assess, their format, administration method; in addition to the word and symbol type they use. Previous reviews have explored the impact of these factors in the domain of static assessment. For example, the factor of administration method (virtual vs. in-person) was considered by Alfano et al. (2024). Authors found that across the seven included studies, there were no significant differences between online and in-person administration of pediatric language and literacy assessments. Outcomes such as these provide support for the development of novel SA tools that can be administered online. Whether administration method, or any other of the characteristics described below affect the validity of DAs has not been considered thus far.

Research Aim

In this meta-analysis we directly examine characteristics of DAs and determine which if any types of assessments are superior to others in terms of their criterion reference validity with word reading measures by the five factors discussed below (word reading skill type, format, administration method, word type, symbol type). Outcomes of the current meta-analyses have implications for the revision of existing assessments and development of novel DAs.

Word Reading Skill Type

DAs have been designed to evaluate various skills associated with literacy development and ability including decoding (e.g., Cho et al., 2017), phonological awareness (e.g., Gellert & Elbro, 2017b), sound-symbol knowledge (e.g., Clayton et al., 2018), morphological awareness (e.g., Navarro et al., 2018), expressive vocabulary development (Peña et al., 2001), oral narratives (Peña et al., 2014), reading comprehension (e.g., Gruhn et al., 2020), and working memory (e.g., Swanson, 1994). As stated, in this review, we focus exclusively on DAs of that evaluate the word reading skills of decoding, phonological awareness or sound-symbol knowledge because these skills have consistently been found to be among the strongest predictors of word reading ability (e.g., The National Early Literacy Panel (2008). Phonological awareness (PA) is a metalinguistic ability that develops during early childhood. It is the ability to recognize that speech is made of different elements, including words, syllables, and speech sounds (phonemes). An important component of phonological awareness is the ability to manipulate these elements (e.g., blending, omitting, etc.). Sound-symbol knowledge (SSK) entails recognizing the relationship between a visual symbol (e.g., a letter of the alphabet) and the sound that it corresponds to (e.g., the phoneme). Decoding is the ability to integrate these two former skills together. First, to recognize letter-sound correspondences (e.g., the letter “c” in cat makes the /k/ sound) and then to use the phonological awareness skill of blending phonemes together to sound out words (e.g., blending the sounds /k/ /a/ /t/ makes the word “cat”; Scarborough, 2001).

Historically research in the validity of static measures of these word reading skills has found that all three correlate strongly with later word reading ability in alphabetic languages. The National Early Literacy Panel (2008) which included nearly 300 studies, and a meta-analysis of 60 effect sizes by Elbro and Scarborough (2003) both documented strong correlations between kindergarten letter(-sound) knowledge (r = .50 and r = .52 respectively) and decoding (r = .53 and r = .57 respectively) and later word reading ability. Similarly, the National Early Literacy Panel and a meta-analysis of 235 studies by Melby-Lervåg et al. (2012) both found moderate-strong correlations between kindergarten-aged phonemic awareness ability, a subskill of phonological awareness and later reading ability, (r = .42 and r = .43 respectively). These across-study results suggest that SAs of word reading skills are all similarly correlated with later reading abilities, with slight advantage to decoding and letter-sound knowledge over phonological awareness. Whether this is indeed the case in DA is investigated by the current study.

Rationale for analysis: In the realm of DA there have been fewer systematic examinations into the capacity of these skills to predict later word reading. A recent systematic review of 18 studies found that across studies, DAs of phonological awareness and decoding predicted between 1% and 21% additional unique variance beyond traditional static measures in later word reading ability, but that a DA of paired associate learning (a task akin to learning sound-symbol knowledge) accounted for only 6% unique variance (Dixon, Oxley, Gellert, & Nash, 2023), suggesting a different pattern of prediction between DAs and SAs. Given that DAs evaluate ability to learn rather than acquired knowledge, it may be that assessments that evaluate more complex skills better permit children to demonstrate their ability to learn in the context of DA. Simple sound-symbol knowledge tasks require that a child learn the name that corresponds to a symbol. However, complex phonological awareness tasks like phoneme substitution require that a child identify a sound, delete it, replace it with a new sound and blend the new sounds together to form the new word. Complex decoding tasks require integration of both sound-symbol knowledge and phonological awareness skills. These more complex tasks might be better suited to capturing learning potential in the context of DA. Given this, we hypothesize that DAs of decoding and phonological awareness may demonstrate stronger correlational relationships with WRMs, relative to DAs of SSK.

Format

DAs come in many formats, but there are two primary approaches (Lantolf & Poehner, 2004). Interactionist DA is unscripted and endeavors to modify cognitive or skill ability. The examiner responds contingently to the individual examinee and their capacities. Interventionist DA, however, more closely parallels SA. The examiner provides pre-defined levels of support in response to student performance. Its scripted nature requires less clinical skill and time to administer, and its standardization permits evaluation of its psychometric validity (Poehner, 2008). Interventionist DAs across cognitive domains demonstrate stronger predictive validity than interactionist DAs (Caffrey et al., 2008). In the field of word reading assessment, DAs are generally characterized as interventionist. The studies included in this review focus on two common formats of interventionist DA.

The first, referred to as the (test)/teach/retest (TT) format in this paper, consists of a static pre-test, followed by a dynamic teaching phase, and a static re-test (Budoff, 1987). During the teaching phase, children receive feedback and instruction. Not all assessments incorporate the initial static pre-test. If one is conducted, post-test performance is compared to the initial score to assess the difference in performance following teaching. When no pre-test is administered, the post-test measures how a child performs after receiving explicit dynamic instruction in a task.

The second approach, referred to as the graduated prompts (GP) format in this paper, combines teaching and testing phases of the assessment within each item (Brown & Ferrara, 1985). Children are provided with feedback regarding their response. If incorrect, a hierarchy or series of increasingly explicit prompts are provided, until the child answers correctly, or all prompts are exhausted. The greater number of prompts required, the lower the score on an item (Brown & Ferrara, 1985). A previous review suggested that there are no differences in classification accuracy of reading disorder for DAs that use a GP versus TT format and that both formats are used with similar frequency in assessment of word reading (Dixon, Oxley, Nash, & Gellert, 2023).

Rationale for analysis: Previous research has found that noncontingent DAs correlate more strongly with word reading outcomes than contingent ones (Caffrey et al., 2008). We hypothesize that there DAs that use a GP format would be superior to those that use the TT format, because this approach is highly explicit and structured, while TT allows greater flexibility in response to the student in the training/teaching portion meaning outcomes could be more easily influenced by individual examiner styles of teaching. However, to date no quantitative comparison of the validity of these two interactionist formats has been conducted.

Administration Method

Assessments, dynamic or otherwise, can be conducted in-person or via computer. For example, many static tests originally developed for in-person use are now available to be used through their computer-based platform Q-global and Q interactive (e.g., WRMT-II test of decoding), and dynamic tests are being developed for both in-person (e.g., Gellert & Elbro, 2017a) and computer use (e.g., Aravena et al., 2013). Development of virtual or computer-based assessments has become increasingly important, in the wake of the COVID-19 pandemic and the subsequent shift to distance learning (Campbell & Goldstein, 2022).

Rationale for analysis: Post-pandemic, many clinicians and researchers continue to operate virtually, and therefore the factor of administration method and its implications on validity should be considered. No significant differences have been found between administering an SA online versus in-person (Alfano et al., 2024), but this factor has not been considered in the context of DAs, which can be administered in-person (e.g., Spector, 1992), virtually by an examiner (e.g., Barker & Saunders, 2020), or in a computer program where no examiner is required (e.g., Aravena et al., 2018). A recent review found that computerized DAs were used less frequently than in-person measures (Dixon, Oxley, Nash, & Gellert, 2023), but the implications of administration method on the validity of DAs have not yet been considered quantitatively. We hypothesize that because DA is characterized by increased interaction between examiner and examinee, it maybe be impacted to a greater extent by computer administration than static assessment, where the examiner acts as an objective observer rather than an interactive participant in assessment.

Word Type

Assessments of the word reading skills of phonological awareness and decoding use either real words or nonwords, which are made-up words that abide by the language’s phonotactic and orthotactic constraints (e.g., “meeb” in English). Commercially developed SA tools such as the Comprehensive Test of Phonological Processing-2 (CTOPP-2, Wagner et al., 2013) which evaluates phonological awareness, or the Woodcock Reading Mastery Test – Third Edition (WRMT-III, Woodcock, 2011), which evaluates decoding, include subtests with words and nonwords. Some DAs like the CUBED-3 dynamic test of decoding include a word reading and nonword decoding measure (Petersen et al., 2016). However, many DAs have fewer subtests and tend to employ either words or nonwords. For example, Gellert and Elbro’s dynamic phoneme identification task uses real words (Gellert & Elbro, 2017b), but their dynamic decoding measure uses nonwords (Gellert & Elbro, 2017a).

Rationale for analysis: Reading words and non-words are purported to tap into different processes (Shapiro et al., 2013). Children may initially recognize familiar words by sight without activating their knowledge of sound-symbol correspondences, phoneme blending, and decoding skills (Ehri & Wilce, 1985). For example, recognizing their name or a high frequency word like “the” in print. However, when reading nonwords, decoding skills are necessary because these words are unfamiliar (Hoover & Tunmer, 1993). Similarly, word and nonword phonological awareness tasks may activate different skills. When using real words, performance may be impacted by acquired vocabulary knowledge, while nonword tasks may be a purer form of phonological ability (Wagner et al., 2013). In the domain of oral language assessment, nonword repetition tasks have been shown to reduce bias against culturally and linguistically diverse children (Ortiz, 2021). Nonword decoding and phonological awareness tasks may similarly reduce bias against those with different or limited literacy experiences in word reading skills assessments. Children enter kindergarten with a wide range of language and literacy abilities that can be attributed linguistic diversity, their home literacy environment, access to books and libraries, or exposure to literacy instruction in preschool (Ackerman & Barnett, 2005). Importantly, nonword tasks do not disadvantage strong readers with advanced lexical knowledge (Castles et al., 2018). They can account for significant unique variance in word reading ability beyond real word reading (e.g., Hogan et al., 2005). To date, no studies have considered the role of word type in validity of DAs of word reading skills. We hypothesize that nonword tasks will be better suited to predict later reading ability in a DA paradigm, because DAs are designed to measure ability to learn a skill. Because nonword items are unfamiliar, they may be better suited to tap into evaluation or a child’s ability to learn word reading skill in assessment. In essence, if a word is familiar and already known to the child, as some real words might be, it cannot be learned in testing.

Symbol Type

Word reading assessments of sound-symbol knowledge (SSK) and decoding use either familiar or novel symbols. Typically, SAs use the letters or characters of the language for which they were created. For instance, in the PAT-2 (Robertson & Salter, 2017) the phoneme-grapheme subtest (a measure of sound-symbol knowledge) evaluates a child’s acquired knowledge of the relationship between familiar English letters and sounds, and the phoneme decoding subtest evaluates ability to read nonwords comprised of English graphemes. Recently, there has been increased interest in using novel symbols in DAs as it permits evaluation of how well a child can learn new symbol-sound relationships (e.g., that the symbol = sound /m/, Gellert & Elbro, 2017a), and apply this knowledge to decode symbol-based words (e.g., that the symbols = the nonword /ma/, Gellert & Elbro, 2017a), while minimizing the influence of previous linguistic and literacy exposure.

Rationale for analysis: No prior reviews have examined whether symbol type (novel vs. familiar) affects DA’s strength of correlational relationship with word reading ability. Primary studies suggest that DAs that use novel symbols can differentiate between typical readers and those with dyslexia (Aravena et al., 2013, 2018). These measures can explain unique variance in later reading ability beyond traditional measures for preliterate children (Horbach et al., 2015). When administered in kindergarten to predict ability in grade 1, a DA decoding measure that used novel symbols (Gellert & Elbro, 2017a) had a superior diagnostic accuracy to a one that used familiar letters (Petersen et al., 2016). Use of novel symbols is a recent development in the field of word reading assessment and there has not yet been a systematic quantitative examination of the relative validity of these two approaches. We hypothesize that in a DA paradigm, evaluating ability to learn SSK or decoding skills with novel symbols may be associated with stronger criterion referenced validity because it may permit greater capacity to evaluate ability to learn, given the novelty of the symbols used in the tasks.

The Current Study

The current study investigates whether these characteristics of word reading skill, format, administration method, word, and symbol type affect DA’s validity as measured by association with performance on word reading measures. We examine criterion validity, as represented by the correlation between performance on a DA of word reading skills (phonological awareness, sound-symbol knowledge, or decoding) and a word reading measure (single real or nonword accuracy or fluency). Like Caffrey et al. (2008) we use Pearson’s correlation coefficients as our effect size, given that these are the most observed type of effect size reported across studies. We focus exclusively on DAs of word reading skills as they are best suited to evaluate and predict reading ability in our target demographic of children are learning to read (Catts et al., 2005). We examine overall validity and stratify DAs into subgroups by their format (graduated prompts vs. train/test), administration method (in-person vs. via computer), word type (real word vs. nonword) and symbol type (familiar vs. novel). We also conduct a comprehensive search of the gray literature and include studies published in languages other than English.

Overall rationale: Outcomes of this review will inform which characteristics of DAs of word reading skills are associated with the greatest criterion reference validity, as measured by strength of correlation between performance on DAs and word reading measures. For clinicians, outcomes will provide insight into which DA measures are appropriate for use in their practice (e.g., is it preferable to use a DA that evaluates decoding or phonological awareness, one that uses nonwords or real words? Is it suitable to use computerized DAs, or are in-person DAs superior?). For researchers, a quantitative examination how these factors affect validity of DAs of word reading skills can inform revisions of existing measures or development of new tools. This can be achieved by modifying or developing tests with characteristics shown to be most strongly associated with performance on word reading measures (e.g., when designing new DAs, which factors are important to consider?)

Research Questions and Hypotheses

Do the following five factors have implications on the criterion reference validity of dynamic assessments of word reading skills as measured by strength of correlation between performance on DAs and performance on WRMs?

1. Word reading skill type: Phonological awareness (PA), sound-symbol knowledge (SSK) versus Decoding.

• We hypothesize that performance on DAs of decoding and phonological awareness will be more strongly correlated with performance on WRMs, relative to DAs of SSK. This is because we anticipate that these more complex tasks might be better suited to capturing learning potential in the context of DA.

2. Format: (Test)-teach-retest (TT) versus graduated prompts (GP).

• We hypothesize that performance on DAs that use a graduated prompts format will be more strongly correlated with performance on WRMs, relative to DAs that use a test-teach-retest format. This is because the GP is highly explicit and structured, while the TT format allows for greater flexibility in response to the student in the training/teaching portion. Previous work has found that more explicit approaches demonstrate greater criterion and predictive validity with outcome measures (Caffrey et al., 2008).

3. Administration method: Computer versus in-person.

• We hypothesize that performance on DAs that are administered in-person will be more strongly correlated with performance on WRMs, relative to DAs that are administered via computer. This is because DA is characterized by increased interaction between examiner and examinee and so it maybe be impacted to a greater extent by computer administration than static assessment, which is typically scripted and where the examiner acts as an objective observer rather than an interactive participant in assessment

4. Word type: Real word versus nonwords.

• We hypothesize that performance on DAs that use nonwords will be more strongly correlated with performance on WRMs, relative to DAs that use real words. This is because nonwords are unfamiliar to all children and may therefore be better suited to evaluate a child’s ability to learn decoding skills in assessment. Real words may be known to children, and therefore would not allow them to showcase their “learning” in the test.

5. Symbol type: Novel versus Familiar.

• We hypothesize that performance on DAs that use novel symbols will be more strongly correlated with performance on WRMs, relative to DAs that use familiar symbols. This is because novel symbols are unfamiliar to all children and may therefore be better suited to evaluate a child’s ability to learn sound-symbol correspondences. Familiar symbols (letters of their own alphabet) may already be known to children and therefore would not allow them to showcase their “learning” in assessment.

Method

Protocol Availability

The review objectives and meta-analytic approach were planned a priori and detailed in a registered protocol on the Open Science Framework (Wood & Molnar, 2023).

Ethics Statement

Ethics approval was not required for this meta-analysis given that all data were collected from studies that are publicly accessible (University of Toronto, n.d.).

Eligibility Criteria

Study inclusion criteria were determined a priori and outlined in the review protocol (Wood & Molnar, 2023). Included studies are:

(i) Primary research articles found in peer-reviewed journals, or unpublished gray literature such as Masters or Doctoral theses found in preprint repositories and on Google Scholar.

(ii) Studies that assessed children with a mean age between 4;0 and 10;0 who were monolingual or bi/multilingual, typically developing, at-risk for reading or diagnosed with a reading difficulty. Articles that included adults or children with other developmental challenges, such as hearing impairment, developmental language disorder, or autism spectrum disorder were excluded.

(iii) Articles that reported a correlation coefficient between a DA of a word reading skill, and a word reading measure, concurrently or longitudinally.

(iv) No limitation was placed on setting or location, but only articles written in English, French, Spanish, or a different language with full text translation to one of these languages were included.

Search Strategy and Information Sources

An initial search was carried out in five databases, MEDLINE, Embase, CINAHL (Cumulative Index to Nursing and Allied Health Literature), PsycINFO, and ERIC (Education Resources Information Centre), using the terms “dynamic assessment” and “literacy” as well as their related keywords in titles and abstracts. The search strategy was developed via consultation with a University of Toronto librarian. A complete list of search terms used in each database can be found in Tables S1 and S2 of the Supplemental files. No filters were used. Equivalent terms “dynamic assessment” and “literacy” were searched in three preprint repositories: MedArxiv, EdArxiv, and PsyArxiv. Forward searching was then completed on Google Scholar using the “cited by” function with included articles. To check whether any relevant articles were missed during the database, preprint, and Google Scholar search, the reference lists of the included articles were reviewed and compared to the list of included articles. Finally, appeals for unpublished work were made via social media callouts, posts to mailing lists, and direct emails to labs across Canada, the United States and Europe that reported conducting research in field of literacy.

Study Selection Reliability

Study selection and data extraction were managed in Covidence (2023), a web-based software that facilitates completion of reviews. A team of 10 research assistants (RAs) assisted in title/abstract screening and full text review. At the title/abstract stage, RAs received a 1-hr training session covering key concepts and relevant terms (e.g., defining dynamic assessment and each word reading skill) and subsequently completed 100 practice title/abstract screenings on a mock review prior to screening in earnest. At this stage, two independent team members voted to include or exclude based on whether the title and abstract indicated that the paper evaluated a word reading skill DA. Across all pairs of reviewers, the weighted mean average agreement was 94% and the Cohen’s Kappa coefficient was .40 which is characterized as fair (McHugh, 2012). In the full text stage, RAs again received a 1-hr training session lead by the first author detailing specific eligibility criteria (e.g., reviewing whether a word reading measure was included, whether the age group was correct, etc.). Each RA completed a practice full text review of a paper with feedback from the first author. Two independent reviewers then voted to include or exclude full texts based on whether they met the pre-defined eligibility criteria. At this stage, interrater agreement across pairs was 85% and the weighted mean Cohen’s Kappa coefficient was .66, which is considered substantial (McHugh, 2012).

As demonstrated in Figure 1, 24 articles of the 4,824 records identified via the database search were relevant and included. A search of 3 preprint repositories yielded 850 articles of which one was included. Forward searching of these 25 included articles via Google Scholar led to identification of an additional 9 studies. The reference lists of the 34 articles were reviewed to determine if there were relevant articles that had been missed. One additional study was identified through this process. Finally, callouts were made for unpublished studies or data to mailing lists, via posts to social media and by directly contacting labs conducting literacy related research across Canada, the United State and Europe, but this did not lead to identification of any additional relevant articles. In summary, 35 articles met the criteria for inclusion. The study identification process, including reasons for study exclusion (e.g., no use of a dynamic assessment of one of three word reading skills as in Navarro et al., 2018) is outlined in the PRISMA diagram below (Page et al., 2021).

Figure 1.

Preferred reporting items for systematic review and meta-analyses flowchart.

Coding Data Items

Data from relevant articles was extracted using a custom template on Covidence, which is available on the Open Science Framework protocol site (Wood & Molnar, 2023). The first and second author both extracted data from all articles and compared and consolidated findings. Any disagreements were resolved through discussion between all three authors. The following data points were extracted for each included study:

General Information

The study title, journal name, date of publication, DOI, author name(s), institutional affiliation(s), funding, any potential conflicts of interest, and the country in which the study took place.

Participant Characteristics

The number of participants included in analyses, the percentage of males, as well as the mean age of the children at the outset of the study was noted. A total of 6,683 participants were included. The overall mean age was 5 years 6 months, and the overall average percentage of males was 51%. Mean age was not reported for 14 studies and the percentage of males was not reported for 9. We were able to ensure that studies that did not report mean age still met inclusion criteria, as all minimally reported the grade of participants (e.g., indicated that participants are in grade 1 and therefore between age 4 and 10). Authors also extracted information regarding participant reading status (typically developing vs. at-risk), language status (monolingual vs. bilingual; and age (4–5, 6–7 vs. 8–9). These factors are examined in a separate paper evaluating the validity of DAs of word reading skills across diverse populations (Wood et al., 2024). Table 1 provides additional details regarding the mean age and % of males of included studies.

Table 1.

Number of Participants, Effect Size, Mean Age, Grade, % Males, Study Design, Skills Evaluated and Characteristics of DAs, and Word Reading Measures of Included Studies.

Study	N	r	Age	% Male	Study design	DA admin method	DA format	DA skill	DA word type	DA symbol type	WRM skill	WRM type	WRM name	PR	QAR/12
Aravena et al. (2013)	42	.52	9;9	56.4	C	Computer	TT	Dec	Words	Novel	Real word rate	NR	One Minute Test	Y	11
Aravena et al. (2018)	71	.015	9;2	58.2	C	Computer	TT	SSK	N/A	Novel	Real word accuracy	NR	3DM	Y	11
Aravena et al. (2018)	71	.147	9;2	58.2	C	Computer	TT	SSK	N/A	Novel	Real word rate	NR	3DM	Y
Aravena et al. (2018)	71	.075	9;2	58.2	C	Computer	TT	SSK	N/A	Novel	Real word accuracy	NR	3DM	Y
Aravena et al. (2018)	71	.214	9;2	58.2	C	Computer	TT	SSK	N/A	Novel	Real word rate	NR	3DM	Y
Aravena et al. (2018)	71	.237	9;2	58.2	C	Computer	TT	Dec	Words	Novel	Real word accuracy	NR	3DM	Y
Aravena et al. (2018)	71	.392	9;2	58.2	C	Computer	TT	Dec	Words	Novel	Real word rate	NR	3DM	Y
Barker and Saunders (2020)	27	.47	4;11	51.8	C	Computer	TT	SSK	N/A	Novel	Nonword accuracy	NR	WRMT-III	Y	10
Barker and Saunders (2020)	27	.63	4;11	51.8	C	Computer	GP	SSK	N/A	Novel	Real word accuracy	NR	WRMT-III	Y
Caffrey (2006) Study 1	25	.624	N/A	52	L	Person	GP	Dec	Nonwords	Familiar	Real word accuracy	NR	WRAT	N	10
Caffrey (2006) Study 1	25	.706	N/A	52	L	Person	GP	Dec	Nonwords	Familiar	Nonword accuracy	NR	WRAT	N
Caffrey (2006) Study 1	25	.585	N/A	52	L	Person	GP	Dec	Nonwords	Familiar	Nonword rate	NR	WRAT	N
Caffrey (2006) Study 2	95	.745	N/A	57	L	Person	GP	Dec	Nonwords	Familiar	Real word accuracy	NR	WRAT	N	10
Caffrey (2006) Study 2	95	.765	N/A	57	L	Person	GP	Dec	Nonwords	Familiar	Nonword accuracy	NR	WRAT	N
Caffrey (2006) Study 2	95	.613	N/A	57	L	Person	GP	Dec	Nonwords	Familiar	Nonword rate	NR	WRAT	N
Cho and Compton (2015)	112	.24	6;8	57.1	C	Person	GP	SSK	N/A	Novel	Nonword accuracy	NR	WRMT-R/NU	Y	12
Cho and Compton (2015)	112	.24	6;8	57.1	C	Person	GP	SSK	N/A	Novel	Nonword rate	NR	TOWRE	Y
Cho and Compton (2015)	112	.23	6;8	57.1	C	Person	GP	SSK	N/A	Novel	Real word accuracy	NR	WRMT-R/NU	Y
Cho and Compton (2015)	112	.14	6;8	57.1	C	Person	GP	SSK	N/A	Novel	Real word rate	NR	TOWRE	Y
Cho and Compton (2015)	112	.5	6;8	57.1	C	Person	GP	Dec	Words	Novel	Nonword accuracy	NR	WRMT-R/NU	Y
Cho and Compton (2015)	112	.4	6;8	57.1	C	Person	GP	Dec	Words	Novel	Nonword rate	NR	TOWRE	Y
Cho and Compton (2015)	112	.43	6;8	57.1	C	Person	GP	Dec	Words	Novel	Real word accuracy	NR	WRMT-R/NU	Y
Cho and Compton (2015)	112	.32	6;8	57.1	C	Person	GP	Dec	Words	Novel	Real word rate	NR	TOWRE	Y
Cho and Compton (2015)	112	.64	6;8	57.1	C	Person	GP	Dec	Words	Novel	Nonword accuracy	NR	WRMT-R/NU	Y
Cho and Compton (2015)	112	.51	6;8	57.1	C	Person	GP	Dec	Words	Novel	Nonword rate	NR	TOWRE	Y
Cho and Compton (2015)	112	.54	6;8	57.1	C	Person	GP	Dec	Words	Novel	Real word accuracy	NR	WRMT-R/NU	Y
Cho and Compton (2015)	112	.41	6;8	57.1	C	Person	GP	Dec	Words	Novel	Real word rate	NR	TOWRE	Y
Cho et al. (2014)	134	.69	N/A	50	C	Person	GP	Dec	Nonwords	Familiar	Nonword accuracy	NR	WRMT-R/NU	Y	10
Cho et al. (2014)	134	.58	N/A	50	C	Person	GP	Dec	Nonwords	Familiar	Nonword rate	NR	TOWRE	Y
Cho et al. (2017)	105	.21	6;8	55	C	Person	GP	SSK	N/A	Novel	Real word accuracy	NR	WRMT-R/NU	Y	11
Cho et al. (2017)	105	.1	6;8	55	C	Person	GP	SSK	N/A	Novel	Real word rate	NR	TOWRE	Y
Cho et al. (2017)	105	.22	6;8	55	C	Person	GP	SSK	N/A	Novel	Nonword accuracy	NR	WRMT-R/NU	Y
Cho et al. (2017)	105	.24	6;8	55	C	Person	GP	SSK	N/A	Novel	Nonword rate	NR	TOWRE	Y
Cho et al. (2017)	105	.24	6;8	55	C	Person	GP	Dec	Words	Novel	Real word accuracy	NR	WRMT-R/NU	Y
Cho et al. (2017)	105	.24	6;8	55	C	Person	GP	Dec	Words	Novel	Real word rate	NR	TOWRE	Y
Cho et al. (2017)	105	.43	6;8	55	C	Person	GP	Dec	Words	Novel	Nonword accuracy	NR	WRMT-R/NU	Y
Cho et al. (2017)	105	.31	6;8	55	C	Person	GP	Dec	Words	Novel	Nonword rate	NR	TOWRE	Y
Cho et al. (2017)	105	.43	6;8	55	C	Person	GP	Dec	Words	Novel	Real word accuracy	NR	WRMT-R/NU	Y
Cho et al. (2017)	105	.27	6;8	55	C	Person	GP	Dec	Words	Novel	Real word rate	NR	TOWRE	Y
Cho et al. (2017)	105	.55	6;8	55	C	Person	GP	Dec	Words	Novel	Nonword accuracy	NR	WRMT-R/NU	Y
Cho et al. (2017)	105	.38	6;8	55	C	Person	GP	Dec	Words	Novel	Nonword rate	NR	TOWRE	Y
Cho et al. (2017)	105	.22	6;8	55	L	Person	GP	SSK	N/A	Novel	Real word accuracy	NR	WRMT-R/NU	Y
Cho et al. (2017)	105	.11	6;8	55	L	Person	GP	SSK	N/A	Novel	Real word rate	NR	TOWRE	Y
Cho et al. (2017)	105	.22	6;8	55	L	Person	GP	SSK	N/A	Novel	Nonword accuracy	NR	WRMT-R/NU	Y
Cho et al. (2017)	105	.19	6;8	55	L	Person	GP	SSK	N/A	Novel	Nonword rate	NR	TOWRE	Y
Cho et al. (2017)	105	.45	6;8	55	L	Person	GP	Dec	Words	Novel	Real word accuracy	NR	WRMT-R/NU	Y
Cho et al. (2017)	105	.34	6;8	55	L	Person	GP	Dec	Words	Novel	Real word rate	NR	TOWRE	Y
Cho et al. (2017)	105	.42	6;8	55	L	Person	GP	Dec	Words	Novel	Nonword accuracy	NR	WRMT-R/NU	Y
Cho et al. (2017)	105	.39	6;8	55	L	Person	GP	Dec	Words	Novel	Nonword rate	NR	TOWRE	Y
Cho et al. (2017)	105	.46	6;8	55	L	Person	GP	Dec	Words	Novel	Real word accuracy	NR	WRMT-R/NU	Y
Cho et al. (2017)	105	.32	6;8	55	L	Person	GP	Dec	Words	Novel	Real word rate	NR	TOWRE	Y
Cho et al. (2017)	105	.47	6;8	55	L	Person	GP	Dec	Words	Novel	Nonword accuracy	NR	WRMT-R/NU	Y
Cho et al. (2017)	105	.43	6;8	55	L	Person	GP	Dec	Words	Novel	Nonword rate	NR	TOWRE	Y
Chow (2014)	121	.22	7;0	54.5	C	Person	TT	SSK	N/A	Novel	Real word accuracy	NR	HKT-SpLD	Y	12
Chow (2014)	121	.28	7;0	54.5	C	Person	TT	SSK	N/A	Novel	Real word accuracy	RD	N/A	Y
Clayton et al. (2018)	97	.12	9;2	50.5	C	Computer	TT	SSK	N/A	Novel	Real word accuracy	NR	SWRT6-16	Y	10
Clayton et al. (2018)	97	.1	9;2	50.5	C	Computer	TT	SSK	N/A	Novel	Real word rate	NR	TOWRE	Y
Clayton et al. (2018)	97	.05	9;2	50.5	C	Computer	TT	SSK	N/A	Novel	Nonword rate	NR	TOWRE	Y
Clayton et al. (2018)	97	.29	9;2	50.5	C	Computer	TT	SSK	N/A	Novel	Real word accuracy	NR	SWRT6-16	Y
Clayton et al. (2018)	97	.26	9;2	50.5	C	Computer	TT	SSK	N/A	Novel	Real word rate	NR	TOWRE	Y
Clayton et al. (2018)	97	.22	9;2	50.5	C	Computer	TT	SSK	N/A	Novel	Nonword rate	NR	TOWRE	Y
Compton et al. (2010)	355	.69	N/A	53.45	L	Person	GP	Dec	Nonwords	Familiar	Composite-real word accuracy, Real word rate, Nonword accuracy, Nonword rate, Reading comprehension	NR	TOWRE, WMRT-R/NU	Y	10
Coventry et al. (2011)	1988	.423	4;11	50.5	C	Person	GP	PA	Words	N/A	Real word accuracy, Real word rate	NR	TOWRE	Y	10
Cunningham and Carroll (2011)	45	.67	N/A	N/A	C	Person	GP	PA	Nonwords	N/A	Real word accuracy	NR	BAS-2	Y	11
Cunningham and Carroll (2011)	45	.77	N/A	N/A	L	Person	GP	PA	Nonwords	N/A	Real word accuracy	NR	BAS-2	Y
Edwards (2020)	312	.47	N/A	N/A	C	Person	GP	Dec	Nonwords	Familiar	Real word rate	NR	TOWRE	N	9
Edwards (2020)	312	.46	N/A	N/A	C	Person	GP	Dec	Nonwords	Familiar	Nonword rate	NR	TOWRE	N
Edwards (2020)	312	.53	N/A	N/A	C	Person	GP	Dec	Nonwords	Familiar	Real word accuracy	NR	WJ-III	N
Edwards (2020)	312	.5	N/A	N/A	C	Person	GP	Dec	Nonwords	Familiar	Nonword accuracy	NR	WJ-III	N
Edwards (2020)	312	.55	N/A	N/A	C	Person	GP	Dec	Nonwords	Familiar	Real word rate	NR	TOWRE	N
Edwards (2020)	312	.6	N/A	N/A	C	Person	GP	Dec	Nonwords	Familiar	Nonword rate	NR	TOWRE	N
Edwards (2020)	312	.61	N/A	N/A	C	Person	GP	Dec	Nonwords	Familiar	Real word accuracy	NR	WJ-III	N
Edwards (2020)	312	.64	N/A	N/A	C	Person	GP	Dec	Nonwords	Familiar	Nonword accuracy	NR	WJ-III	N
Fuchs et al. (2011)	318	.72	N/A	49.4	L	Person	GP	Dec	Nonwords	Familiar	Real word accuracy	NR	WRMT-R/NU	Y	10
Fuchs et al. (2011)	318	.61	N/A	49.4	L	Person	GP	Dec	Nonwords	Familiar	Real word rate	NR	TOWRE	Y
Gan et al. (2023)	135	.68	N/A	54	C	Person	GP	Dec	Words	Familiar	Real word accuracy	RD	Character Recognition Task	Y	11
Gan et al. (2023)	135	.7	N/A	54	L	Person	GP	Dec	Words	Familiar	Real word accuracy	RD	Character Recognition Task	Y
Gan et al. (2023)	135	.66	N/A	54	L	Person	GP	Dec	Words	Familiar	Real word accuracy	RD	Character Recognition Task	Y
Gan et al. (2023)	135	.64	N/A	54	L	Person	GP	Dec	Words	Familiar	Real word accuracy	RD	Character Recognition Task	Y
Gellert and Elbro (2017a)	171	.09	6;11	N/A	C	Person	TT	SSK	N/A	Novel	Real word accuracy	RD	N/A	Y	10
Gellert and Elbro (2017a)	171	.65	6;11	N/A	C	Person	TT	Dec	Nonwords	Novel	Real word accuracy	RD	N/A	Y
Gellert and Elbro (2017a)	171	.21	6;11	N/A	L	Person	TT	SSK	N/A	Novel	Real word accuracy, Nonword accuracy	RD	N/A	Y
Gellert and Elbro (2017a)	171	.66	6;11	N/A	L	Person	TT	Dec	Nonwords	Novel	Real word accuracy, Nonword accuracy	RD	N/A	Y	11
Gellert and Elbro (2017b)	160	.46	6;4	48	L	Person	GP	PA	Words	N/A	Real word accuracy	RD	N/A	Y
Gellert and Elbro (2017b)	160	.47	6;4	48	L	Person	GP	PA	Words	N/A	Real word accuracy, Nonword accuracy	RD	N/A	Y
Gellert and Elbro (2017b)	160	.36	6;4	48	L	Person	GP	PA	Words	N/A	Real word accuracy, Nonword accuracy	RD	N/A	Y
Gellert and Elbro (2017b)	160	.65	6;4	48	L	Person	GP	Dec	Nonwords	Novel	Real word accuracy, Nonword accuracy	RD	N/A	Y
Gellert and Elbro (2017b)	160	.76	6;4	48	L	Person	GP	Dec	Nonwords	Novel	Real word accuracy, Nonword accuracy	RD	N/A	Y
Gellert and Elbro (2017b)	160	.63	6;4	48	L	Person	GP	Dec	Nonwords	Novel	Real word accuracy, Nonword accuracy	RD	N/A	Y
Gellert and Elbro (2018)	158	.2	6;11	N/A	C	Person	GP	SSK	N/A	Novel	Real word accuracy	RD	N/A	Y	10
Gellert and Elbro (2018)	158	.47	6;11	N/A	C	Person	GP	SSK	N/A	Novel	Real word accuracy	RD	N/A	Y
Gellert and Elbro (2018)	158	.76	6;11	N/A	C	Person	GP	Dec	Nonwords	Novel	Real word accuracy	RD	N/A	Y
Gellert and Elbro (2018)	158	.84	6;11	N/A	C	Person	GP	Dec	Nonwords	Novel	Real word accuracy, Nonword accuracy	RD	N/A	Y
Gellert and Elbro (2018)	158	.25	6;11	N/A	L	Person	GP	SSK	N/A	Novel	Real word accuracy, Nonword accuracy	RD	N/A	Y
Gellert and Elbro (2018)	158	.25	6;11	N/A	L	Person	GP	SSK	N/A	Novel	Real word accuracy, Nonword accuracy	RD	N/A	Y
Gellert and Elbro (2018)	158	.19	6;11	N/A	L	Person	GP	SSK	N/A	Novel	Real Word Rate, Nonword rate	RD	N/A	Y
Gellert and Elbro (2018)	158	.42	6;11	N/A	L	Person	GP	SSK	N/A	Novel	Real word accuracy, Nonword accuracy	RD	N/A	Y
Gellert and Elbro (2018)	158	.35	6;11	N/A	L	Person	GP	SSK	N/A	Novel	Real word accuracy, Nonword accuracy	RD	N/A	Y
Gellert and Elbro (2018)	158	.8	6;11	N/A	L	Person	GP	Dec	Nonwords	Novel	Real word accuracy, Nonword accuracy	RD	N/A	Y
Gellert and Elbro (2018)	158	.65	6;11	N/A	L	Person	GP	Dec	Nonwords	Novel	Real word accuracy, Nonword accuracy	RD	N/A	Y
Gellert and Elbro (2018)	158	.56	6;11	N/A	L	Person	GP	Dec	Nonwords	Novel	Real Word Rate, Nonword rate	RD	N/A	Y
Gellert and Elbro (2018)	158	.69	6;11	N/A	L	Person	GP	Dec	Nonwords	Novel	Real word accuracy, Nonword accuracy	RD	N/A	Y
Gellert and Elbro (2018)	158	.61	6;11	N/A	L	Person	GP	Dec	Nonwords	Novel	Real Word Rate, Nonword rate	RD	N/A	Y
Gillam et al. (2011)	64	.54	N/A	47	C	Computer	GP	PA	Words	N/A	Nonword accuracy	NR	WRMT-R/NU	Y	11
Horbach et al. (2015)	75	.283	6;2	56	C	Computer	TT	SSK	N/A	Novel	Nonword accuracy	RD	N/A	Y	10
Horbach et al. (2015)	75	.384	6;2	56	C	Computer	TT	Dec	Nonwords	Novel	Nonword accuracy	RD	N/A	Y
Horbach et al. (2015)	75	.212	6;2	56	C	Computer	TT	Dec	Nonwords	Novel	Nonword accuracy	RD	N/A	Y
Horbach et al. (2015)	75	.215	6;2	56	L	Computer	TT	SSK	N/A	Novel	Composite-real word accuracy, Real word rate, Nonword accuracy, Nonword rate, Reading comprehension	NR	SLRT-II	Y
Horbach et al. (2015)	75	.372	6;2	56	L	Computer	TT	Dec	Nonwords	Novel	Composite-real word accuracy, Real word rate, Nonword accuracy, Nonword rate, Reading comprehension	NR	SLRT-II	Y
Horbach et al. (2015)	75	.258	6;2	56	L	Computer	TT	Dec	Nonwords	Novel	Composite-real word accuracy, Real word rate, Nonword accuracy, Nonword rate, Reading comprehension	NR	SLRT-II	Y
Horbach et al. (2015)	193	.169	6;2	56	L	Computer	TT	SSK	N/A	Novel	Composite-real word accuracy, Real word rate, Nonword accuracy, Nonword rate, Reading comprehension	NR	SLRT-II	Y
Horbach et al. (2015)	193	0.279	6;2	56	L	Computer	TT	Dec	Nonwords	Novel	Composite-real word accuracy, Real word rate, Nonword accuracy, Nonword rate, Reading comprehension	NR	SLRT-II	Y
Horbach et al. (2015)	193	.362	6;2	56	L	Computer	TT	Dec	Nonwords	Novel	Composite-real word accuracy, Real word rate, Nonword accuracy, Nonword rate, Reading comprehension	NR	SLRT-II	Y
Horbach et al. (2018)	17	.855	5;0	41%	L	Computer	TT	Dec	Nonwords	Novel	Real word rate	NR	SLRT-II	Y	10
Horbach et al. (2018)	17	.807	5;0	41%	L	Computer	TT	SSK	Nonwords	Novel	Nonword rate	NR	SLRT-II	Y
Law et al. (2018)	84	.086	8;3	N/A	C	Computer	TT	SSK	N/A	Novel	Real word accuracy	NR	EMT	Y	11
Law et al. (2018)	84	.044	8;3	N/A	C	Computer	TT	SSK	N/A	Novel	Nonword accuracy	NR	Klepel	Y
Law et al. (2018)	84	.027	8;3	N/A	C	Computer	TT	Dec	Words	Novel	Real word accuracy	NR	EMT	Y
Law et al. (2018)	84	.022	8;3	N/A	C	Computer	TT	SSK	N/A	Novel	Nonword accuracy	NR	Klepel	Y
Law et al. (2018)	84	.315	8;3	N/A	C	Computer	TT	SSK	N/A	Novel	Real word accuracy	NR	EMT	Y
Law et al. (2018)	84	.279	8;3	N/A	C	Computer	TT	Dec	Words	Novel	Nonword accuracy	NR	Klepel	Y
Liu et al. (2021)	203	.19	5;0	51.23	C	Person	TT	SSK	N/A	Novel	Real word accuracy	RD	N/A	Y	9
Liu et al. (2021)	203	.18	5;0	51.23	C	Person	TT	SSK	N/A	Novel	Real word accuracy	RD	N/A	Y
Loreti (2015)	10	.75	4;10	60%	C	Computer	TT	PA	Nonwords	N/A	Real word accuracy	NR	WMLSR	N	9
Osa Fuentes (2003)	164	.44	5;6	45.7	L	Person	GP	PA	Words	N/A	Real word accuracy	NR	K-ABC	N	9
Osa Fuentes (2003)	164	.28	5;6	45.7	L	Person	GP	PA	Words	N/A	Real word rate	NR	TALE	N
Osa Fuentes (2003)	164	.44	5;6	45.7	L	Person	GP	PA	Words	N/A	Real word accuracy	NR	K-ABC	N
Osa Fuentes (2003)	164	.33	5;6	45.7	L	Person	GP	PA	Words	N/A	Real word rate	NR	TALE	N
Osa Fuentes (2003)	164	.64	5;6	45.7	L	Person	GP	PA	Words	N/A	Real word accuracy	NR	K-ABC	N
Osa Fuentes (2003)	164	.36	5;6	45.7	L	Person	GP	PA	Words	N/A	Real word rate	NR	TALE	N
Petersen and Gillam (2015)	63	.24	5;4	53.9	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	CR	DIBELS	Y	12
Petersen and Gillam (2015)	63	.28	5;4	53.9	L	Person	TT	Dec	Nonwords	Familiar	Real word accuracy	NR	WRMT-R	Y
Petersen and Gillam (2015)	63	.33	5;4	53.9	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	CR	DIBELS	Y
Petersen and Gillam (2015)	63	.39	5;4	53.9	L	Person	TT	Dec	Nonwords	Familiar	Real word accuracy	NR	WRMT-R	Y
Petersen and Gillam (2015)	63	.46	5;4	53.9	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	CR	DIBELS	Y
Petersen and Gillam (2015)	63	.51	5;4	53.9	L	Person	TT	Dec	Nonwords	Familiar	Real word accuracy	NR	WRMT-R	Y	11
Petersen et al. (2016)	320	.32	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	NR	TOWRE	Y
Petersen et al. (2016)	320	.32	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	NR	TOWRE	Y
Petersen et al. (2016)	320	.26	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	CR	DIBELS	Y
Petersen et al. (2016)	320	.21	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	NR	TOWRE	Y
Petersen et al. (2016)	320	.2	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	NR	TOWRE	Y
Petersen et al. (2016)	320	.26	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	CR	DIBELS	Y
Petersen et al. (2016)	320	.39	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	NR	TOWRE	Y
Petersen et al. (2016)	320	.37	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	NR	TOWRE	Y
Petersen et al. (2016)	320	.34	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	CR	DIBELS	Y
Petersen et al. (2016)	320	.52	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	NR	TOWRE	Y
Petersen et al. (2016)	320	.49	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	NR	TOWRE	Y
Petersen et al. (2016)	320	.37	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	CR	DIBELS	Y
Petersen et al. (2016)	320	.54	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	NR	TOWRE	Y
Petersen et al. (2016)	320	.5	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	NR	TOWRE	Y
Petersen et al. (2016)	320	.4	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	CR	DIBELS	Y
Petersen et al. (2016)	280	.34	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	NR	TOWRE	Y
Petersen et al. (2016)	280	.35	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	NR	TOWRE	Y
Petersen et al. (2016)	280	.32	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	CR	DIBELS	Y
Petersen et al. (2016)	280	.27	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	NR	TOWRE	Y
Petersen et al. (2016)	280	.32	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	NR	TOWRE	Y
Petersen et al. (2016)	280	.44	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	CR	DIBELS	Y
Petersen et al. (2016)	280	.48	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	NR	TOWRE	Y
Petersen et al. (2016)	280	.5	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	NR	TOWRE	Y
Petersen et al. (2016)	280	.41	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	CR	DIBELS	Y
Petersen et al. (2016)	280	.55	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	NR	TOWRE	Y
Petersen et al. (2016)	280	.52	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	NR	TOWRE	Y
Petersen et al. (2016)	280	.53	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	CR	DIBELS	Y
Petersen et al. (2016)	280	.57	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	NR	TOWRE	Y
Petersen et al. (2016)	280	.56	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	NR	TOWRE	Y
Petersen et al. (2016)	280	.54	N/A	N/A	L	Person	TT	Dec	Nonwords	Familiar	Nonword rate	CR	DIBELS	Y
Petersen et al. (2018)	370	.46	N/A	50.4	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	CR	DIBELS	Y	10
Petersen et al. (2018)	370	.45	N/A	50.4	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	CR	DIBELS	Y
Petersen et al. (2018)	370	.44	N/A	50.4	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	CR	DIBELS	Y
Petersen et al. (2018)	370	.42	N/A	50.4	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	CR	DIBELS	Y
Petersen et al. (2018)	370	.48	N/A	50.4	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	CR	DIBELS	Y
Petersen et al. (2018)	370	.46	N/A	50.4	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	CR	DIBELS	Y
Petersen et al. (2018)	370	.45	N/A	50.4	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	CR	DIBELS	Y
Petersen et al. (2018)	370	.42	N/A	50.4	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	CR	DIBELS	Y
Petersen et al. (2018)	370	.41	N/A	50.4	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	CR	DIBELS	Y
Petersen et al. (2018)	370	.4	N/A	50.4	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	CR	DIBELS	Y
Petersen et al. (2018)	370	.38	N/A	50.4	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	CR	DIBELS	Y
Petersen et al. (2018)	370	.38	N/A	50.4	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	CR	DIBELS	Y
Petersen et al. (2018)	370	.5	N/A	50.4	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	CR	DIBELS	Y
Petersen et al. (2018)	370	.49	N/A	50.4	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	CR	DIBELS	Y
Petersen et al. (2018)	370	.48	N/A	50.4	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	CR	DIBELS	Y
Petersen et al. (2018)	370	.44	N/A	50.4	L	Person	TT	Dec	Nonwords	Familiar	Real word rate	CR	DIBELS	Y
Sittner Bridges and Catts (2011) Study 1	90	.477	N/A	N/A	L	Person	GP	PA	Words	N/A	Real word accuracy	NR	WRMT-R	Y	11
Sittner Bridges and Catts (2011) Study 1	90	.516	N/A	N/A	L	Person	GP	PA	Words	N/A	Nonword accuracy	NR	WRMT-R/NU	Y
Sittner Bridges and Catts (2011) Study 2	96	.485	N/A	N/A	L	Person	GP	PA	Words	N/A	Real word accuracy	NR	WRMT-R	Y	11
Sittner Bridges and Catts (2011) Study 2	96	.426	N/A	N/A	L	Person	GP	PA	Words	N/A	Nonword accuracy	NR	WRMT-R/NU	Y
Spector (1992)	38	.6	5;11	N/A	L	Person	GP	PA	Words	N/A	Real word accuracy	Inventory	San Diego Quick Assessment List	Y	10
Wyman-Chin (2018)	9	.44	N/A	66	C	Computer	TT	PA	Nonwords	N/A	Real word accuracy	NR	WMLSR	N	9
Wyman-Chin (2018)	9	.87	N/A	66	C	Computer	TT	PA	Nonwords	N/A	Real word accuracy	NR	WMLSR	N
Yap (2018)	99	.6	5;3	43.4	L	Person	GP	PA	Words	N/A	Real word accuracy	NR	WJ-III	N	11
Yap (2018)	99	.56	5;3	43.4	L	Person	GP	PA	Words	N/A	Nonword Accuracy	NR	WJ-III	N
Zumeta (2010)	37	.59	6;2	49.4	C	Person	GP	PA	Words	N/A	Real word rate	RD	Dolch word list	N	12

Note. N = number of participants per study; r = Pearson’s correlation coefficient/effect size; DA = dynamic assessment; WRM = Word reading measure; PR = Peer-reviewed; QAR = Quality Appraisal Rating; C = Cross-sectional; L = Longitudinal; TT = (Test)-teach-retest; GP = Graduated prompts; PA = phonological awareness; SSK = sound-symbol knowledge; Dec = Decoding; NR = Norm-referenced; CR = Criterion-referenced; RD = Researcher developed; N/A = Not applicable/available; TOWRE = Test of word reading efficiency; DIBELS = Dynamic indicators of basic early literacy skills; WJ-III = Woodcock Johnson-III; WRAT = Wide Range Achievement Test; WRMT(R/NU) = Woodcock Reading Mastery Tests-Revised/Normative Update; BAS = British Ability Scales; SLRT-III = Salzburger lese und rechtschreibtest; SWRT = Single word reading test; TALE = Test de Análisis de la Lecto-escritura; HKT-SpLd = Hong Kong test of specific learning difficulties; EMT = One minute test; Y = Yes; N = No.

Effect Sizes

Pearson’s correlation coefficients representing the relationship between DAs and WRMs were extracted. All relevant correlation coefficients listed were noted (e.g., a DA that used two PA tasks and their correlations with WRMs). In some instances, a lower score on a DA translated to better performance. In these situations, negative correlations were transformed to positive ones for analysis. From the 35 studies (k), a total of 192 effect sizes (m) were extracted, with a mean of 5.6 effect sizes per study and a range of 1 to 30. To account for dependencies between samples, we noted whether two studies conducted analyses on the same set of participants, and if so, nested these effect sizes together under the same study group. For example, Petersen et al. (2018), conducted a follow-up study using the same participant set as in Petersen et al. (2016), and so these effect sizes were grouped together under the umbrella of one study, using robust variance estimation (see statistical analyses below for further details). Study design (e.g., concurrent or longitudinal) was noted. Effect sizes representing the relationship between DAs and WRMs are presented in Table 1.

Measures

Dynamic Assessments (DAs)

In this review, DA is defined as an assessment that provides teaching, training, feedback on performance, and/or prompting in testing. In some instances, these measures were not reported as “DAs” but were described as paired associate learning tasks. This was typically for measures that evaluated SSK skills (e.g., Liu et al., 2021). We reported the word reading skills evaluated (phonological awareness, sound-symbol knowledge, and/or decoding), and the task used to assess the skill (e.g., phonological awareness/phoneme blending). If multiple tasks were used to evaluate a skill, authors listed all tasks utilized. A DA was characterized as SSK if the task involved learning the relationship between a visual referent (symbol or letter) and a syllable or phoneme. DAs were considered to evaluate PA skills if they assessed one or more of the auditory skills of rhyming, blending, segmenting, manipulating, deleting, substituting phonemes, syllables, words or onset and rimes. Tasks that required a child to recognize more than one symbol-sound relationship and blend these sounds together to “read” multi-symbol words (e.g., CV, VC, CVC, etc.) were labeled as decoding tasks. Authors also noted the format of the DA (i.e., graduated prompts (GP) or train/test (TT). DAs were considered GP if a series of prompts was used after a participant response to an individual test item and were characterized as TT if they minimally incorporated a teaching/training phase, followed by a separate static post-test. In terms of administration method (i.e., in person or computer) DAs that were conducted virtually by a clinician over the computer, or those that were computerized (i.e., no clinician) were considered as computer-based administration, while all others were characterized as in-person. For word type, DAs that used words that existed in the language of testing’s lexicon were considered real words (e.g., cat in English) while those that used invented words (e.g., meeb in English), or words from other languages were considered nonwords (e.g., “copa” a Spanish word in an English task). Finally, authors indicated whether novel or familiar symbols were used. DAs that used letters or characters that belonged to the orthography of the language of testing were said to use familiar symbols (e.g., using alphabet letters in an English DA), while those that used invented symbols or letters from orthographies distinct from the language of testing were characterized as using novel symbols (e.g., using Hebrew letters in an English DA measure).

Of the 192 effect sizes from the 35 included studies that examined use of a DA, most evaluated decoding (k = 20, m = 123), some assessed sound-symbol knowledge (k = 12, m = 45) and fewest examined phonological awareness (k = 12, m = 24). In some studies, the DA evaluated more than one word reading skill, such as in the case of Cho and Compton (2015) where one subtest evaluated SSK and another decoding. In terms of format, 20 studies used a graduated prompts approach (m = 98), while 15 used a (test)-teach-retest approach (m = 94). Most DAs were conducted in person, (k = 24, m = 156) and fewer via computer (k = 11, m = 36). DAs of phonological awareness and decoding tasks used either real or nonwords in their items. Of the studies that evaluated these skills, most used nonwords (k = 17, m = 96) while fewer used real words (k = 15, m = 51). The studies that evaluated sound-symbol knowledge did not use words given the nature of the task (k = 12, m = 45). Decoding and sound-symbol knowledge tasks employed either novel or familiar symbols. Of the studies that evaluated these word reading skills, most used novel symbols (k = 15, m = 93) while fewer used familiar symbols (k = 10, m = 75). Phonological awareness tasks are auditory in nature and so studies that evaluated this skill did not use visual symbols (k = 12, m = 24). For details about DA characteristics, see Table 1.

Word Reading Measures (WRMs)

For the purposes of this review, WRMs are assessments that evaluate word reading ability of single real or nonwords using a correct/incorrect evaluation system and without provision of feedback, prompting or teaching. WRMs were conducted concurrently with the DA or longitudinally at a later timepoint. Of the 35 studies, most effect sizes represented longitudinal relationships (k = 20, m = 116) while fewer are associated with concurrent correlations (k = , 21, m = 76). Some studies provided both concurrent and longitudinal correlation coefficients between measures. In extraction, authors noted the name of the WRM, (e.g., the Woodcock Reading Mastery Tests-III), what type of word reading ability was evaluated and whether this was achieved using words or nonwords (e.g., real word accuracy vs. nonword rate). The most used type of WRM was norm-referenced tests (k = 22, m = 127), followed by researcher developed tools (k = 9, m = 35), criterion-referenced measures (k = 3, m = 29) and finally inventories (k = 1, m = 1). The Test of Word Reading Efficiency (TOWRE), a norm-referenced test, was the most commonly used tool (k = 9, m = 49), along with the Dynamic Indicators of Basic Early Literacy Skills (DIBELS), (k = 3, m = 29) and versions of the Woodcock Reading Mastery Tests (WRMT-R/NU), (k = 10, m = 28). Other measures like the Woodcock-Johnson-III (WJ-III), were used less frequently (k = 2, m = 6). There was variety across task type, with some effect sizes representing correlations between DAs and nonword accuracy measures (m = 25), real word accuracy measures (m = 52), nonword rate tasks (m = 40), real word rate tasks (m = 49), or a combination or composite of several word reading abilities (m = 26). Characteristics of WRMs used are detailed in Table 1.

Quality Appraisal Ratings

Following extraction, included studies were evaluated independently by the first and second author using an adapted version of two quality assessment tools from the Johanna Briggs Institute (Moola et al., 2020). This tool is available on the Open Science Framework protocol page (Wood & Molnar, 2023). Most studies (k = 26) were peer-reviewed and those that were not (k = 9) were Masters or Doctoral theses. Studies were assessed on: (i) participant selection, (ii) index assessments (DAs), (iii) reference assessments (WRMs), (iv) flow and timing of the study, (v) statistical analyses. First, coders rated whether the age, sex, and demographic characteristics of the participants were adequately described. The DA domain rating was informed by whether the tool was explained with adequate detail regarding the skills assessed, and the characteristics of the measure. Coders also noted whether the word reading skill(s) employed were developmentally appropriate for the sample population. The reference assessments (WRMs) were also evaluated on their appropriateness of use for the population, and their psychometric properties. To assess flow and timing, coders evaluated whether the analyses included all participants and if not, whether the author(s) provided reasoning for attrition. Lastly, coders considered whether appropriate statistical analyses were conducted.

Overall, the quality appraisal consisted of eight items to be rated over five domains. Items regarding participants, flow and timing, and statistical analyses were assigned one point, while items concerned with the index test (DA) and the reference tests (WRMs) were worth two points due to their greater significance in achieving review objectives. Conflicts were resolved through discussion between all three authors. Quality scores were ranked as either low quality (0%–33%), medium quality (34%–66%) or high quality (67%–100%). Only medium and high-quality studies were included in the analyses. No studies were excluded based on their score. The overall quality appraisal rating for each study is included in Table 1. Please refer to Table S3 in the Supplemental Material for individual ratings for each question for each study.

Statistical Analyses

All statistical analyses were conducted in R using the metafor package (R Core Team, 2021; Viechtbauer, 2010). First, a random effects meta-analysis with robust variance estimation (RVE) was conducted to examine the overall mean effect representing the association between DAs of word reading skills and WRMs. Random effects models assume that variability can stem from multiple sources, both individual sampling error and heterogeneity between studies. This is appropriate in this instance given that studies evaluated populations of different ages, from distinct locations using varied DAs and WRMs (Borenstein et al., 2010). We elected to use RVE because it permits inclusion of multiple effect sizes from a single study, while accounting for dependence between samples (Pustejovsky & Tipton, 2022). Prior to analysis, the 192 effect sizes were transformed into Z scores using Fisher Z transformation (Corey et al., 1998). Effect sizes are nested within study cluster, with the assumption that they intercorrelate both within and across clusters. A weighted average of these effect sizes was calculated, then transformed back to a Pearson’s correlation coefficient for interpretation as an overall effect (Hedge’s g). We then conducted five subgroup analyses to examine whether the strength of the association between DAs and WRMs differed across the test characteristics of skill type, administration method, format, word, and symbol type. These subgroup analyses were planned a priori. To account for multiple comparisons, we used a Bonferroni adjustment to set a new value for significance (p = .006; one-sided). An additional three subgroup analyses were conducted using the same data based on participant characteristics in a separate study for a total of eight subgroup analyses (Wood et al., 2024). The mixed effect model used in subgroup analysis allows us to determine if the mean effect sizes differ significantly based on DA characteristics.

Results

Research Question: Do DAs of word reading skills demonstrate consistent validity with word reading measures when stratified by word reading skill type, administration method, format, word, and symbol type?

As demonstrated in Table 1, 35 studies reported 192 correlations between a DA of word reading skills (phonological awareness, sound-symbol knowledge, or decoding) and a WRM. The effect sizes from these 35 studies were included in the random effects meta-analysis with robust variance estimation examining the relationship between DAs word reading skills and WRMs. The forest plot of these 192 effect sizes can be found in Figure S1 of the Supplemental Material. As anticipated, the overall mean effect size is large (r = .49, 95% CI = [0.42, 0.55]) suggesting that DAs of word reading skills are strongly correlated with WRMs. This result was expected as measures of word reading skills like decoding, phonological awareness and sound symbol knowledge are known to correlate strongly with and be consistent predictors of later word reading ability. The RVE method used in this analysis assumes a consistent correlation (r) across study—clusters (Pustejovsky & Tipton, 2022). We confirmed that this result was consistent across all values of r (.1–1.0). Results reported here are for r = .8. This three-level model considered variance from sampling (level 1), within study cluster (level 2) and between study cluster (level 3) heterogeneity. The total I² was 91.99%. Level 1 sampling error variance was negligible (8.01%), within cluster variance was moderate (30.15%), and between cluster variance was substantial (61.83%), suggesting that the largest proportion of variance is attributed to differences between study clusters. Between study heterogeneity was anticipated given the variation in dynamic assessment characteristics (e.g., administration method, format etc.). This heterogeneity was examined via subgroup analyses.

The contribution of individual effect sizes to heterogeneity was also examined via a Baujat plot, (Figure S2 in the Supplemental Material; Baujat et al., 2002). Two effect sizes from Gellert and Elbro (2017a) were identified in the upper right quadrant of the plot, suggesting they contributed significantly to heterogeneity. No plausible reasons for this were identified based on the characteristics of the study. We reran the meta-analysis with these effect sizes removed, but this did not change magnitude of the overall effect (g = .49, 95% CI: [0.41, 0.55], p < .0001) or the degree of heterogeneity (I² = 91.99%), and so we elected to retain them in our analysis given that it is not advisable to remove studies without specific reason (Deeks et al., 2023).

Risk of Publication Bias

A funnel plot was generated to subjectively examine risk of publication bias (See Figure S3 in the Supplemental Material). Visual inspection of the funnel plot suggests potential asymmetry. Several studies with small sample sizes and positive findings were identified and included, compared to studies with small sample sizes and negative findings (e.g., Horbach et al., 2018; Loreti, 2015) and are clustered at the bottom right of the funnel. There is a possibility that studies with negative outcomes were not completed, published, or submitted into the gray literature (Lee & Hotopf, 2012). However, this may also be simply because the skills evaluated in the DAs included (phonological awareness, sound-symbol knowledge and decoding) are known to correlate with word reading performance, and so negative effects are not anticipated. Furthermore, despite the apparent visual asymmetry, Egger’s test was calculated and not found to be significant for presence of plot asymmetry (z = 1.42, p = .16; Egger et al., 1997). Ultimately, there is minimal risk of publication bias in this analysis.

Subgroup Analyses

Subgroup analyses by DA word reading skill type, format, administration method word and symbol type, were planned a priori and were conducted to examine whether these characteristics have implications for the criterion reference validity of DAs with word reading measures. Mixed effects models were used to examine whether there were significant differences in mean effect sizes for DAs based on these factors. Results of these subgroup analyses are reported in Table 2. The adjusted significance value was set to p < .006 following our Bonferroni adjustment for multiple comparisons.

Findings for each factor are described below:

Word reading skills. In line with our hypothesis, results indicate that there are significantly stronger correlations between DAs of phonological awareness and decoding and WRMs, relative to DAs of SSK. Though multiple comparisons for each of the three subgroups were not completed, the mean effect sizes for DAs of decoding (g = .56) and phonological awareness (g = .47) were larger than DAs of sound-symbol knowledge (g = .32; p < .0001).

Format. In line with our hypothesis, results indicate that there are significantly stronger correlations between DAs that use a graduated prompts and WRMs (g = .54), relative to those that use a test-teach-retest format (g = .36; p < .0001).

Administration method. Contrary to our hypothesis, there were no significant differences in strength of correlational relationship between DAs administered in-person (g = .51) versus via computer (g = .39; p = .09).

Word type. In line with our hypothesis, results indicate that there are significantly stronger correlations between DAs that use nonwords (g = .63) relative to those that used real words (g = .45; p < .0001).

Symbol type. Contrary to our hypothesis, there were no significant differences in strength of correlational relationship between DAs that used familiar (g = .59) versus novel symbols (g = .38) though this did approach significance (p = .007).

Table 2.

Results of Subgroup Analysis by Skill Type, Format, Administration Method, Word, and Symbol Type.

Subgroups for Analysis	Effect size (g)	SE [95% CI]	p Value	p Subgroup
Skill type				Subgroup p = <.0001*
Decoding	.56	0.04 [0.51, 0.62]	<.0001
Phonological awareness	.47	0.06 [0.30, 0.61]	.05
Sound-symbol knowledge	.32	0.03 [0.18, 0.43]	<.0001
Format				Subgroup p = <.0001*
Graduated prompts	.54	0.04 [0.49, 0.59]	<.0001
(Test) Teach – Retest	.36	0.06 [0.19, 0.56]	<.0001
Administration method				Subgroup p = .09
In-person	.51	0.09 [0.23, 0.91]	.09
Computer	.39	0.08 [0.25, 0.52]	<.0001
Word type				Subgroup p = <.0001*
Nonwords	.63	0.05 [0.56, 0.69]	<.0001
Words	.45	0.06 [0.25, 0.61]	<.0001
Symbol type				Subgroup p = .007
Familiar	.59	0.08 [0.48, 0.68]	<.0001
Novel	.38	0.10 [0.04, 0.64]	.009

Note. SE = Standard Error; CI = Confidence Interval, The “p Value” column shows whether subgroup-specific effects are significant while the “p Subgroup” column shows whether there are significant differences between the subgroups.

Adjusted p Value for significance = .006.

Significant result.

Discussion

This review examined whether characteristics of dynamic assessments (DAs) of word reading skills (phonological awareness, sound-symbol knowledge, and decoding) affected the criterion reference validity of DAs of word reading skills as measured by strength of correlational relationship with performance on a word reading measure (WRM). Thirty-five articles met inclusion criteria of evaluating children with a mean age between 4;0 and 10;0 and that reported a Pearson’s correlation coefficient between a DA of a word reading skill and a WRM. This is the first review to directly and quantitatively examine the criterion reference validity DAs of word reading skills with WRMs on the basis of these characteristics. Findings have important implications for informing clinical word reading assessment practices, and developing novel DAs of word reading tools.

Main Findings

As expected, results of the overall meta-analysis suggest that DAs of word reading are strongly correlated with WRMs. Regarding results of the subgroup analysis, DAs of phonological awareness and decoding, those that used a graduated prompts format, and those that used nonwords demonstrated greater strength of correlational relationship with WRMs than those that evaluated sound-symbol knowledge, used a test-teach-retest format, or used real words. There were no significant differences in terms of strength of correlational relationship between DAs and WRMs for factors of administration method (in-person vs. computer) or symbol type (familiar vs. novel).

Word Reading Skills

Analysis of word reading skill type indicates that performance on DAs of decoding and phonological awareness (PA) is strongly correlated with performance on word reading measures, while performance on DAs of sound-symbol knowledge (SSK) is only moderately correlated. These results are in line with Dixon, Oxley, Gellert, and Nash (2023), who in their systematic review documented that DAs of PA and decoding generally accounted for greater unique variance beyond static word reading accuracy performance (between 4%–21%, and 1%–17% respectively), and in nonword reading fluency (between 1%–9% and 1%–17% respectively), than SSK tasks, which accounted for only 2% to 6% unique variance in later word reading fluency and accuracy. These findings suggest that in DA, PA and decoding tasks may be better suited than SSK tasks to evaluate current and predict later word reading ability. This contrasts with SA, where letter-sound tasks have been found to demonstrate stronger correlations with word reading performance than PA tasks (e.g., National Early Literacy Panel, 2008). We speculate that the differences in relative strength of correlation between word reading skills of PA and SSK and word reading across static and dynamic domains may be due to the nature of the assessments themselves. In static assessments, children are evaluated on acquired knowledge. PA tasks may be less familiar, and therefore more challenging than SSK tasks. A static SSK task merely requires a child to recognize a letter and name the sound, while a PA task could range in complexity, from something simple like asking the child to segment a word into syllables, to something more challenging like manipulating phonemes in words. Since static PA tasks are often unfamiliar and complex, and children receive no prompting, teaching or feedback in the assessment, many perform poorly on these measures at the earliest stages of learning to read. This results in floor effects, which weaken the predictive value of a measure. Indeed, previous research has documented that static PA tasks demonstrate floor effects for a longer period of time relative to letter naming or letter-sound tasks (Catts et al., 2009). However, in DA, the opposite may be true, given that clinicians and researchers are attempting to not only evaluate what the child already knows, but also how they learn. When presented with an unfamiliar complex task, such as a PA task, there is ample opportunity to evaluate ability to learn. However, a simple, more familiar SSK task may not create the opportunities necessary to demonstrate this ability. These results highlight the differences between static and dynamic assessments of early word reading skills. In SA, word reading skill tasks must be simple enough to ensure that not all children will struggle given their limited acquired knowledge. However, in DA, it may be that tasks must be complex enough that children have the opportunity to learn in the context of the assessment.

Format

The analyses evaluating DA format found significant differences between the graduated prompts (GP) or train-test (TT) approaches in favor of the GP format. These results are consistent with findings from a previous review which found that DAs that used contingent feedback demonstrated stronger predictive validity than those that used non-contingent feedback (Caffrey et al., 2008). GP DAs are scripted and use contingent feedback, employing a series of pre-defined, increasingly explicit prompts following an examinee’s response (e.g., Spector’s (1992) use of six pre-defined prompts). Many of the TT DAs also used non-contingent feedback (e.g., Horbach et al., used non-contingent scripted verbal feedback). However, the feedback in the training and teaching phase of TT DAs was characterized by greater variability. For example, in Petersen and Gillam’s (2015) study examining a DA of nonword decoding, examiners used noncontingent feedback to teach children how to read the nonwords if they were unsuccessful in the initial pre-test phase. This increased variability may have contributed to the weaker relationship between TT DAs and WRMs as the potential for individual differences in teaching styles across examiners and researchers in TT formats is more likely to have had an influence on DA performance.

Administration Method

The analysis examining the role of administration method in DAs of word reading skills found that there were no significant differences between those administered in-person, versus those conducted by a person via computer/computerized. These findings are consistent with a previous review which reported no significant differences between in-person and computer administration methods for static assessments (SAs) in pediatric language assessment (Alfano et al., 2024). Though the difference was not significant, a larger mean effect size was found for DAs administered in person. This could be a result of several factors. First, all WRMs were conducted in-person. It is promising that despite this difference in administration method, computer administered DAs still demonstrated strong mean effect sizes with in-person WRMs. Second, as posited earlier, it is possible that because DAs are characterized by increased interaction between examiner and examinee relative to SAs, administering them via computer may result in a reduced ability to engage in meaningful interaction or provision of accurate feedback. Similar challenges (e.g., technical issues disruption assessment, the need for caregiver support in evaluation and difficulties associated with providing feedback and maintaining child engagement) have been documented in the literature examining virtual use of SAs (e.g., Hodge et al., 2019, Wood et al., 2021). However, results from these studies and this review indicate that much like SAs, computer-based or virtual DAs are a valid alternative to in-person administration.

Word Type

Regarding word type in DAs of word reading skills, results indicate that DAs that use nonwords are associated with significantly greater mean effect sizes than those that use real words. These results differ from previous findings in the SA literature, which suggested that nonwords were too distant from real word reading to be valid and were impractical for beginning readers who lack the necessary skills to decode (Wagner et al., 1997). However, as previously stated, it is possible that use of nonwords in DAs permits evaluation of a child’s ability to learn, since all children are unfamiliar with them, and cannot use previous knowledge or experiences to recognize words in testing (Hoover & Tunmer, 1993). The issue of lacking the necessary skills is also resolved, given that DAs incorporate teaching, feedback and prompting that parallels what a child might receive in a classroom context. This finding again highlights the differences between static and dynamic approaches. In a static approach, familiar material is required to evaluate acquired knowledge, but in DA, unfamiliar material, such as nonwords, may be better suited to evaluate learning potential.

Symbol Type

This trend of unfamiliarity leading to increased capacity to evaluate ability to learn in DA was not reflected in the subgroup analysis by symbol type. There were no significant differences between DAs that used novel versus familiar symbols. DAs that used familiar letters or characters were associated with larger mean effect sizes than those that used novel symbols, and this difference approached significance. We speculate that this may be a result of the types of symbols used in these DAs. For instance, some used real letters and characters from a different language in their test items (e.g., Aravena et al., 2013 used Hebrew characters in evaluating Dutch children). Others however, used symbols or that did not resemble any letter or character in an existing script (e.g., Horbach et al., used dots and dashes to represent the syllable-sound correspondences in their DA measure). True letters and characters, whether familiar or unfamiliar, exhibit features and characteristics that allow them to be differentiated from scribbles or symbols (Dehaene, 2009; Heimann et al., 2013). It is possible that unfamiliar symbols that minimally resemble real characters or letters may be better suited to predict word reading ability.

Despite this, these results still have important implications for the validity of DAs of word reading skills. The use of nonwords and novel symbols has the potential to further reduce bias against children with diverse linguistic experiences. When words and symbols are unfamiliar to all, lack of experience with the oral and print system of the language of evaluation has less impact on performance. Results of this review support the use and development of nonword novel-symbol based DA measures. These types of tools could be used to equitably evaluate children with and without experience in the language of testing. This is in contrast with static tools, which are developed for and can only be used in a valid, unbiased manner with a single language group, typically a monolingual population. A DA of word reading skills that uses nonwords and novel-symbols evaluates ability to learn word reading skills, while minimizing impact of previous linguistic experience. This type of tool and could act as an equitable alternative to language-specific SAs for culturally and linguistically diverse children for whom there are limited assessment tools.

Limitations

First, while we endeavored to examine relevant DA characteristics and their implications on validity, it is possible that there are other factors that may be contributing to the overall strength of relationship between DAs of word reading skills and word reading measures. One such factor is word reading measure type. In the Caffrey et al. (2008) review, authors examined whether type of outcome measure had implications for validity of DAs and reported that researcher developed tools demonstrated the largest mean effect size relative to norm and criterion-referenced measures, or teacher/clinician ratings. Regrettably, we were not able to replicate the subgroup analysis of word reading measure type fairly, since a large majority (156/192) of effect sizes represented correlations between DAs and a norm-referend or criterion-referenced WRM, while significantly fewer (35/192) used a researcher developed tool, and none used a teacher or clinician rating. Second, correlation coefficients were selected as the measure of effect size because they are consistently reported. While this allowed for inclusion of additional studies, as a result, only correlational inferences can be made about results. Finally, it is possible that relevant studies may not have been identified because they were published in a language that our review team was not able to read (e.g., many studies in Korean and Hebrew were excluded in the title and abstract screening phase), or because they used key terms not captured by our search strategy.

Clinical Implications

The results of this systematic review and meta-analysis have implications for clinicians like speech-language pathologists and psychologists and educators who routinely evaluate word reading skills and who may require alternatives to static assessments for bi/multilingual students or those with limited literacy experiences. Outcomes suggest that, when possible, clinicians should favor DAs of phonological awareness and decoding skills, that are structured in a graduated prompts format, and that use nonwords comprised of familiar or novel symbols. Results indicate that these measures can be conducted in-person or virtually, which is particularly relevant post-pandemic, as many professionals continue to evaluate children in a virtual context. An example to consider that meets most of these criteria is the CUBED-3 dynamic decoding measure (DDM). The DDM, a measure developed based on the included studies conducted by Petersen et al. (2016) and Petersen and Gillam (2015), uses a test teach retest approach rather than a graduated prompts approach but evaluates both phonological awareness and decoding, uses nonwords comprised of familiar letters and is administered in person. This measure can be used as a criterion-referenced screening tool, to set intervention targets or to monitor progress (CUBED-3 Dynamic Decoding Measure [DDM], Petersen & Spencer, 2023).

Implications for Tool Development

Results of this study also inform development of novel DAs of word reading skills, or revisions of existing tools. Notably, findings support the virtual administration of DAs of word reading skills. This is an important consideration given that there has been a significant increase in the provision of virtual care and tele-assessment since the COVID-19 pandemic (Campbell & Goldstein, 2021). Researchers should consider developing virtual versions of DAs to ensure that assessment of these critical early reading skills can be made available in any future disruptions to in-person education or clinical services, or simply to ensure that children living in rural or remote areas have access to high quality, equitable dynamic assessments. Study outcomes also support the use of dynamic assessments of word reading skills that use nonwords and novel symbols. Not only does unfamiliarity of words and symbols lend itself well to a DA task where the goal is to evaluate ability to learn, this type of measure may also minimize linguistic and cultural bias associated with traditional static word reading skill tasks. Developing new DAs or revising existing measures to include nonword, novel-symbol based versions is critical for monolingual and bilingual children for whom there are no language specific tests of word reading skills available.

Future Research

Beyond this, future studies can also directly compare DAs with differing characteristics, using research designs and statistical analyses that permit a better understanding of the causal role these factors play in predicting reading ability and identifying reading disorder at various timepoints in a child’s journey of learning to read. This can be achieved through longitudinal studies comparing the relative predictive validity of DAs that differ in their format, administration method, word and symbol type, or other relevant factors via regression or structural equation modeling. Studies should also explicitly examine whether specific characteristics of DAs of word reading skills have a greater capacity to limit floor effects associated with traditional static measures or result in improved diagnostic accuracy. Ideally, these studies should include populations for whom DA is purported to be most useful, particularly bilingual children and those with limited previous literacy experiences.

Supplemental Material

sj-docx-1-sgo-10.1177_21582440241300536 – Supplemental material for Characteristics of Dynamic Assessments of Word Reading Skills and Their Implications for Validity: A Systematic Review and Meta-analysis

Supplemental material, sj-docx-1-sgo-10.1177_21582440241300536 for Characteristics of Dynamic Assessments of Word Reading Skills and Their Implications for Validity: A Systematic Review and Meta-analysis by Emily Wood, Kereisha Biggs and Monika Molnar in SAGE Open

Supplemental Material

sj-docx-2-sgo-10.1177_21582440241300536 – Supplemental material for Characteristics of Dynamic Assessments of Word Reading Skills and Their Implications for Validity: A Systematic Review and Meta-analysis

Supplemental material, sj-docx-2-sgo-10.1177_21582440241300536 for Characteristics of Dynamic Assessments of Word Reading Skills and Their Implications for Validity: A Systematic Review and Meta-analysis by Emily Wood, Kereisha Biggs and Monika Molnar in SAGE Open

Supplemental Material

sj-docx-3-sgo-10.1177_21582440241300536 – Supplemental material for Characteristics of Dynamic Assessments of Word Reading Skills and Their Implications for Validity: A Systematic Review and Meta-analysis

Supplemental material, sj-docx-3-sgo-10.1177_21582440241300536 for Characteristics of Dynamic Assessments of Word Reading Skills and Their Implications for Validity: A Systematic Review and Meta-analysis by Emily Wood, Kereisha Biggs and Monika Molnar in SAGE Open

Supplemental Material

sj-png-4-sgo-10.1177_21582440241300536 – Supplemental material for Characteristics of Dynamic Assessments of Word Reading Skills and Their Implications for Validity: A Systematic Review and Meta-analysis

Supplemental material, sj-png-4-sgo-10.1177_21582440241300536 for Characteristics of Dynamic Assessments of Word Reading Skills and Their Implications for Validity: A Systematic Review and Meta-analysis by Emily Wood, Kereisha Biggs and Monika Molnar in SAGE Open

Supplemental Material

sj-png-5-sgo-10.1177_21582440241300536 – Supplemental material for Characteristics of Dynamic Assessments of Word Reading Skills and Their Implications for Validity: A Systematic Review and Meta-analysis

Supplemental material, sj-png-5-sgo-10.1177_21582440241300536 for Characteristics of Dynamic Assessments of Word Reading Skills and Their Implications for Validity: A Systematic Review and Meta-analysis by Emily Wood, Kereisha Biggs and Monika Molnar in SAGE Open

Supplemental Material

sj-png-6-sgo-10.1177_21582440241300536 – Supplemental material for Characteristics of Dynamic Assessments of Word Reading Skills and Their Implications for Validity: A Systematic Review and Meta-analysis

Supplemental material, sj-png-6-sgo-10.1177_21582440241300536 for Characteristics of Dynamic Assessments of Word Reading Skills and Their Implications for Validity: A Systematic Review and Meta-analysis by Emily Wood, Kereisha Biggs and Monika Molnar in SAGE Open

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by a Social Sciences and Humanities Research Council of Canada (SSHRC) Canada Graduate Scholarship (Master’s), a Ministry of Colleges and Universities Ontario Graduate Scholarship, a SSHRC Canada Graduate Scholarship (Doctoral), and a Duolingo Dissertation Grant awarded to the first author; a University of Toronto Excellence Award awarded to the second author; and a Social Sciences and Humanities Research Council of Canada Insight Grant (#435-2024-0713) awarded to the third author.

ORCID iDs

Emily Wood

Monika Molnar

Data Availability Statement

Additional supplemental materials are available on the Open Science Framework systematic review protocol page.

Supplemental Material

Supplemental material for this article is available online.

References

Ackerman

D. J.

Barnett

W. S.

(2005). Prepared for kindergarten: What does “readiness” mean?National Institute for Early Education Research. https://nieer.org/policy-issue/policy-report-prepared-for-kindergarten-what-does-readiness-mean

Alfano

A. R.

Concepcion

Espinosa

Menendez

(2024). Pediatric language assessments via telehealth: A systematic review. Journal of Telemedicine and Telecare, 30(7), 1065–1076. https://doi.org/10.1177/1357633X221124998

*Aravena

Snellings

Tijms

van der Molen

M. W.

(2013). A lab-controlled simulation of a letter–speech sound binding deficit in dyslexia. Journal of Experimental Child Psychology, 115(4), 691–707. https://doi.org/10.1016/j.jecp.2013.03.009

*Aravena

Tijms

Snellings

van der Molen

M. W.

(2018). Predicting individual differences in reading and spelling skill with artificial script–based letter–speech sound training. Journal of Learning Disabilities, 51(6), 552–564. https://doi.org/10.1177/0022219417715407

*Barker

R. M.

Saunders

K. J.

(2020). Validity of a nonspeech dynamic assessment of the alphabetic principle in preschool and school-aged children. Augmentative and Alternative Communication, 36(1), 54–62. https://doi.org/10.1080/07434618.2020.1737965

Baujat

Mahé

Pignon

J. P.

Hill

(2002). A graphical method for exploring heterogeneity in meta-analyses: Application to a meta-analysis of 65 trials. Statistics in Medicine, 21(18), 2641–2652. https://doi.org/10.1002/sim.1221

Bedore

L. M.

Peña

E. D.

(2008). Assessment of bilingual children for identification of language impairment: Current findings and implications for practice. International Journal of Bilingual Education and Bilingualism, 11(1), 1–29. https://doi.org/10.2167/beb392.0

Borenstein

Hedges

L. V.

Higgins

J. P.

Rothstein

H. R.

(2010). A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods, 1(2), 97–111. https://doi.org/10.1002/jrsm.12

Brown

A. L.

Ferrara

R. A.

(1985). Diagnosing zones of proximal development. In Wertsch

(Ed.), Culture, communication, and cognition: Vygotskian perspectives (pp. 273–305). Cambridge University Press.

10.

Budoff

(1987). The validity of learning potential assessment. In Lidz

C. S.

(Ed.), Dynamic assessment: An Interactional approach to evaluating learning potential (pp. 173–195). Guildford Press.

11.

*Caffrey

(2006). A comparison of dynamic assessment and progress monitoring in the prediction of reading achievement for students in kindergarten and first grade [Doctoral dissertation, Vanderbilt University].

12.

Caffrey

Fuchs

L. S.

(2008). The predictive validity of dynamic assessment: A review. The Journal of Special Education, 41(4), 254–270. https://doi.org/10.1177/0022466907310366

13.

Campbell

D. R.

Goldstein

(2021). Genesis of a new generation of telepractitioners: The COVID-19 pandemic and pediatric speech-language pathology services. American Journal of Speech-Language Pathology, 30(5), 2143–2154. https://doi.org/10.1044/2021_AJSLP-21-00013

14.

Campbell

D. R.

Goldstein

(2022). Evolution of telehealth technology, evaluations, and therapy: Effects of the COVID-19 pandemic on pediatric speech-language pathology services. American Journal of Speech-Language Pathology, 31(1), 271–286. https://doi.org/10.1044/2021_AJSLP-21-00069

15.

Castles

Polito

Pritchard

Anandakumar

Coltheart

(2018). Do nonword reading tests for children measure what we want them to? An analysis of year 2 error responses. Australian Journal of Learning Difficulties, 23(2), 153–165. https://doi.org/10.1080/19404158.2018.1549088

16.

Catts

H. W.

Hogan

T. P.

Adlof

S. M.

(2005). Developmental changes in reading and reading disabilities. In Catts

Kamhi

(Eds.), The connections between language and reading disabilities (pp. 25–40). Lawrence Erlbaum Associates.

17.

Catts

H. W.

Petscher

Schatschneider

Sittner Bridges

Mendoza

(2009). Floor effects associated with universal screening and their impact on the early identification of reading disabilities. Journal of Learning Disabilities, 42(2), 163–176. https://doi.org/10.1177/0022219408326219

18.

*Cho

Compton

D. L.

(2015). Construct and incremental validity of dynamic assessment of decoding within and across domains. Learning and Individual Differences, 37, 183–196. https://doi.org/10.1016/j.lindif.2014.10.004

19.

*Cho

Compton

D. L.

Fuchs

L. S.

Bouton

(2014). Examining the predictive validity of a dynamic assessment of decoding to forecast response to tier 2 intervention. Journal of Learning Disabilities, 47(5), 409–423. https://doi.org/10.1177/0022219412466

20.

*Cho

Compton

D. L.

Gilbert

J. K.

Steacy

L. M.

Collins

A. A.

Lindström

E. R.

(2017). Development of first-graders’ word reading skills: For whom can dynamic assessment tell us more? Journal of Learning Disabilities, 50(1), 95–112. https://doi.org/10.1177/0022219415599343

21.

*Chow

B. W. Y.

(2014). The differential roles of paired associate learning in Chinese and English word reading abilities in bilingual children. Reading and Writing, 27(9), 1657–1672. https://doi.org/10.1007/s11145-014-9514-3

22.

*Clayton

F. J.

Sears

Davis

Hulme

(2018). Verbal task demands are key in explaining the relationship between paired-associate learning and reading ability. Journal of Experimental Child Psychology, 171, 46–54. https://doi.org/10.1016/j.jecp.2018.01.004

23.

*Compton

D. L.

Fuchs

L. S.

Bouton

Gilbert

J. K.

Barquero

L. A.

Cho

Crouch

R. C.

(2010). Selecting at-risk first-grade readers for early intervention: Eliminating false positives and exploring the promise of a two-stage gated screening process. Journal of Educational Psychology, 102(2), 327. https://doi.org/10.1037/a0018448

24.

Corey

D. M.

Dunlap

W. P.

Burke

M. J.

(1998). Averaging correlations: Expected values and bias in combined Pearson Rs and Fisher’s z transformations. The Journal of General Psychology, 125(3), 245–261. https://doi.org/10.1080/00221309809595548

25.

*Coventry

W. L.

Byrne

Olson

R. K.

Corley

Samuelsson

(2011). Dynamic and static assessment of phonological awareness in preschool: A behavior-genetic study. Journal of Learning Disabilities, 44(4), 322–329. https://doi.org/10.1177/0022219411407862

26.

Covidence. (2023). Veritas Health Innovation, Melbourne, Australia. www.covidence.org

27.

*Cunningham

Carroll

(2011). Age and schooling effects on early literacy and phoneme awareness. Journal of Experimental Child Psychology, 109(2), 248–255. https://doi.org/10.1016/j.jecp.2010.12.005

28.

Deeks

J. J.

Higgins

J. P. T.

Altman

D. G.

(2023). Chapter 10: Analysing data and undertaking meta-analyses. In Higgins

P. T.

Thomas

Chandler

Cumpston

Page

M. J.

Welch

V. A.

. (Eds.), Cochrane handbook for systematic reviews of interventions, version 6.4. Cochrane. www.training.cochrane.org/handbook

29.

Dehaene

(2009). Reading in the brain. The new science of how we read. Penguin Books.

30.

Dixon

Oxley

Gellert

A. S.

Nash

(2023). Dynamic assessment as a predictor of reading development: A systematic review. Reading and Writing, 36, 673–698. https://doi.org/10.1007/s11145-022-10312-3

31.

Dixon

Oxley

Nash

Gellert

A. S.

(2023). Does dynamic assessment offer an alternative approach to identifying reading disorder? A systematic review. Journal of Learning Disabilities, 56(6), 423–439. https://doi.org/10.1177/00222194221117510

32.

*Edwards

(2020). Predictor importance in future and concurrent predictions of oral reading fluency [Thesis, Florida State University]. https://doi.org/10.31234/osf.io/4apbu

33.

Egger

Smith

G. S.

Schneider

Minder

(1997). Bias in meta-analysis detected by a simple, graphical test. British Medical Journal, 315(7109), 629–634. https://doi.org/10.1136/bmj.315.7109.629

34.

Ehri

L. C.

(1998). Grapheme–phoneme knowledge is essential for learning to read words in English. In Metsala

J. L.

Ehri

L. C.

(Eds.), Word recognition in beginning literacy (pp. 3–40). Routledge.

35.

Ehri

L. C.

Wilce

L. S.

(1985). Movement into reading: Is the first stage of printed word learning visual or phonetic? Reading Research Quarterly, 20(2), 163–179. https://doi.org/10.2307/747753

36.

Elbro

Scarborough

(2003). Early identification. In Nunes

Bryant

(Eds.), Handbook of children’s literacy (pp. 339–360). Kluwer Academic Publishers.

37.

*Fuchs

Compton

D. L.

Fuchs

L. S.

Bouton

Caffrey

(2011). The construct and predictive validity of a dynamic assessment of young children learning to read: Implications for RTI frameworks. Journal of Learning Disabilities, 44(4), 339–347.

38.

*Gan

Zhang

Kahrabi-Yamato

Zhang

Jiang

Hui

(2023). The unique predictive value of dynamic assessment of character decoding in reading development of Chinese children from grades 1-2. Scientific Studies of Reading, 27(3), 215–231. https://doi.org/10.1080/10888438.2022.2143271

39.

*Gellert

A. S.

Elbro

(2017a). Does a dynamic test of phonological awareness predict early reading difficulties? A longitudinal study from kindergarten through grade 1. Journal of Learning Disabilities, 50(3), 227–237. https://doi.org/10.1177/0022219415609185

40.

*Gellert

A. S.

Elbro

(2017b). Try a little bit of teaching: A dynamic assessment of word decoding as a kindergarten predictor of word reading difficulties at the end of grade 1. Scientific Studies of Reading, 21(4), 277–291. https://doi.org/10.1080/10888438.2017.1287187

41.

*Gellert

A. S.

Elbro

(2018). Predicting reading disabilities using dynamic assessment of decoding before and after the onset of reading instruction: A longitudinal study from kindergarten through grade 2. Annals of Dyslexia, 68, 126–144.

42.

*Gillam

S. L.

Fargo

Foley

Olszewski

(2011). A nonverbal phoneme deletion task administered in a dynamic assessment format. Journal of Communication Disorders, 44(2), 236–245. https://doi.org/10.1016/j.jcomdis.2010.11.003

43.

Ginsborg

(2006). The effects of socio-economic status on children’s language acquisition and use. In Clegg

Ginsborg

(Eds.), Language and social disadvantage: Theory into practice (pp. 9–27). Wiley Press.

44.

Grigorenko

E. L.

Sternberg

R. J.

(1998). Dynamic testing. Psychological Bulletin, 124(1), 75–111.

45.

Gruhn

Segers

Keuning

Verhoeven

(2020). Profiling children’s reading comprehension: A dynamic approach. Learning and Individual Differences, 82, Article 101923.

46.

Heimann

Umilta

M. A.

Gallese

(2013). How the motor-cortex distinguishes among letter, unknown symbol, and scribbles. A high-density EEG study. Neuropsychologia, 51(13), 2833–2840. https://doi.org/10.1016/j.neuropsychologia.2013.07.014

47.

Hodge

M. A.

Sutherland

Jeng

Bale

Batta

Cambridge

Detheridge

Drevensek

Edwards

Everett

Ganesalingam

Geier

Kass

Mathieson

McCabe

Micallef

Molomby

Pfeiffer

Pope

… Silove

(2019). Literacy assessment via telepractice is comparable to face-to-face assessment in children with reading difficulties living in rural Australia. Telemedicine and e-Health, 25(4), 279–287.

48.

Hogan

T. P.

Catts

H. W.

Little

T. D.

(2005). The relationship between phonological awareness and reading: Implications for the assessment of phonological awareness. Language, Speech & Hearing Services in Schools, 36, 4. https://doi.org/10.1044/0161-1461(2005/029)

49.

Hoover

W. A.

Gough

P. B.

(1990). The simple view of reading. Reading and Writing, 2(2), 127–160. http://doi.org/10.1007/BF00401799

50.

Hoover

W. A.

Tunmer

W. E.

(1993). The components of reading. In Thompson

G. B.

Tunmer

W. E.

Nicholson

(Eds.), Reading acquisition processes (pp. 1–19). Multilingual Matters.

51.

*Horbach

Scharke

Cröll

Heim

Günther

(2015). Kindergarteners’ performance in a sound–symbol paradigm predicts early reading. Journal of Experimental Child Psychology, 139, 256–264. https://doi.org/10.1016/j.jecp.2015.06.007

52.

*Horbach

Weber

Opolony

Scharke

Radach

Heim

Günther

(2018). Performance in sound-symbol learning predicts reading performance 3 years later. Frontiers in Psychology, 9. https://doi.org/10.3389/fpsyg.2018.01716

53.

Lantolf

J. P.

Poehner

M. E.

(2004). Dynamic assessment: Bringing the past into the future. Journal of Applied Linguistics, 1, 49–74. https://doi.org/10.1558/japl.1.1.49.55872

54.

*Law

J. M.

De Vos

Vanderauwera

Wouters

Ghesquière

Vandermosten

(2018). Grapheme-phoneme learning in an unknown orthography: A study in typical reading and dyslexic children. Frontiers in Psychology, 9. https://doi.org/10.3389/fpsyg.2018.01393

55.

Lee

Hotopf

(2012). 10-Critical appraisal: Reviewing scientific evidence and reading academic papers. In Wright

Stern

Phelan

(Eds.), Core psychiatry (3rd ed., pp. 131–142). Elsevier Ltd. https://doi.org/10.1016/B978-0-7020-3397-1.00010-0

56.

*Liu

Chung

K. K. H.

Wang

L. C.

Liu

(2021). The relationship between paired associate learning and Chinese word reading in kindergarten children. Journal of Research in Reading, 44(2), 264–283. https://doi.org/10.1111/1467-9817.12333

57.

*Loreti

(2015). Validity of a Spanish nonspeech dynamic assessment of phonological awareness in children from Spanish-speaking backgrounds [Master’s dissertation, University of South Florida].

58.

McHugh

M. L.

(2012). Interrater reliability: The kappa statistic. Biochemia Medica, 22(3), 276–282.

59.

Melby-Lervåg

Lyster

S. A. H.

Hulme

(2012). Phonological skills and their role in learning to read: A meta-analytic review. Psychological Bulletin, 138(2), 322.

60.

Moola

Munn

Tufanaru

Aromataris

Sears

Sfetcu

Currie

Lisy

Qureshi

Mattis

(2020). Chapter 7: Systematic reviews of etiology and risk. In Aromataris

Munn

(Eds.), JBI manual for evidence synthesis (pp. 82–107). JBI. https://doi.org/10.46658/JBIMES-20-08

61.

National Early Literacy Panel. (2008). Developing early literacy: A scientific synthesis of early literacy development and implications for intervention. National Institute for Literacy.

62.

Navarro

J.-J.

Mourgues-Codern

Guzmán

Rodríguez-Ortiz

I. R.

Conejo

Sánchez-Gutiérrez

de la Fuente

Martella

Saracostti

(2018). Integrating curriculum-based dynamic assessment in computerized adaptive testing: Development and predictive validity of the EDPL-BAI battery on reading competence. Frontiers in Psychology, 9, Article 1492.

63.

Ortiz

J. A.

(2021). Using nonword repetition to identify language impairment in bilingual children: A meta-analysis of diagnostic accuracy. American Journal of Speech-Language Pathology, 30(5), 2275–2295. https://doi.org/10.1044/2021_AJSLP-20-00237

64.

*Osa Fuentes

P. M. D. L

. (2003). Evaluación dinámica del procesamiento fonológico en el inicio lector [Doctoral Dissertation, Universidad de Granada].

65.

Page

M. J.

McKenzie

J. E.

Bossuyt

P. M.

Boutron

Hoffmann

T. C.

Mulrow

C. D.

(2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. The British Medical Journal, 372, Article n71. https://doi.org/10.1136/bmj.n71

66.

Peña

Gillam

R. B.

Bedore

L. M.

(2014). Dynamic assessment of narrative ability in English accurately identifies language impairment in English language learners. Journal of Speech, Language and Hearing Research, 57(6), 2208–2220. https://doi.org/10.1044/2014_JSLHR-L-13-0151

67.

Peña

Iglesias

Lidz

C. S.

(2001). Reducing test bias through dynamic assessment of children’s word learning ability. American Journal of Speech-Language Pathology, 10(2), 138–154.

68.

*Petersen

D. B.

Allen

M. M.

Spencer

T. D.

(2016). Predicting reading difficulty in first grade using dynamic assessment of decoding in early kindergarten: A large-scale longitudinal study. Journal of Learning Disabilities, 49(2), 200–215. https://doi.org/10.1177/0022219414538518

69.

Petersen

D. B.

Gillam

R. B.

(2013). Accurately predicting future reading difficulty for bilingual Latino children at risk for language impairment. Learning Disabilities Research & Practice, 28(3), 113–128. https://doi.org/10.1111/ldrp.12014

70.

*Petersen

D. B.

Gillam

R. B.

(2015). Predicting reading ability for bilingual Latino children using dynamic assessment. Journal of Learning Disabilities, 48(1), 3–21. https://doi.org/10.1177/0022219413486930

71.

*Petersen

D. B.

Gragg

S. L.

Spencer

T. D.

(2018). Predicting reading problems 6 years into the future: Dynamic assessment reduces bias and increases classification accuracy. Language, Speech, and Hearing Services in Schools, 49(4), 875–888.

72.

Petersen

D. B.

Spencer

T. D.

(2023). CUBED-3 Dynamic Decoding Measure (DDM). Language Dynamics Group.

73.

Poehner

M. E.

(2008). Dynamic assessment: A Vygotskian approach to understanding and promoting L2 development (Vol. 9). Springer Science & Business Media.

74.

Pustejovsky

J. E.

Tipton

(2022). Meta-analysis with robust variance estimation: Expanding the range of working models. Prevention Science, 23(3), 425–438.

75.

R Core Team. (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

76.

Robertson

Salter

(2017). Phonological awareness test, second edition: Normative update (PAT-2: NU). PAR Inc.

77.

Scarborough

H. S.

(2001). Connecting early language and literacy to later reading (dis)abilities: Evidence, theory, and practice. In Neuman

Dickinson

(Eds.), Handbook for research in early literacy (pp. 97–110). Guilford Press.

78.

Shapiro

L. R.

Carroll

J. M.

Solity

J. E.

(2013). Separating the influences of prereading skills on early word and nonword reading. Journal of Experimental Child Psychology, 116(2), 278–295. https://doi.org/10.1016/j.jecp.2013.05.011

79.

*Sittner Bridges

Catts

H. W

. (2011). The use of a dynamic screening of phonological awareness to predict risk for reading disabilities in kindergarten children. Journal of Learning Disabilities, 44(4), 330–338. https://doi.org/10.1177/0022219411407863

80.

*Spector

J. E.

(1992). Predicting progress in beginning reading: Dynamic assessment of phonemic awareness. Journal of Educational Psychology, 84(3), 353–364. https://doi.org/10.1037/0022-0663.84.3.353

81.

Swanson

H. L.

(1994). The role of working memory and dynamic assessment in the classification of children with learning disabilities. Learning Disabilities Research & Practice, 9(4), 190–202.

82.

University of TorontoAnonymized. (n.d.). Activities exempt from human ethics review. https://research.utoronto.ca/ethics-human-research/activities-exempt-human-ethics-review

83.

Viechtbauer

(2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48. https://doi.org/10.18637/jss.v036.i03

84.

Wagner

R. K.

Torgesen

J. K.

Rashotte

C. A.

Hecht

S. A.

Barker

T. A.

Burgess

S. R.

Donahue

Garon

(1997). Changing relations between phonological processing abilities and word-level reading as children develop from beginning to skilled readers: A 5-year longitudinal study. Developmental Psychology, 33(3), 468–479. https://doi.org/10.1037//0012-1649.33.3.468

85.

Wagner

R. K.

Torgesen

J. K.

Rashotte

C. A.

Pearson

N. A.

(2013). CTOPP-2: Comprehensive test of phonological processing-2. Pro-ed.

86.

Wood

Bhalloo

McCaig

Feraru

Molnar

(2021). Towards development of guidelines for virtual administration of paediatric standardized language and literacy assessments: Considerations for clinicians and researchers. SAGE Open Medicine, 9, Article 20503121211050510.

87.

Wood

Biggs

Molnar

(2024). Dynamic assesments of word reading skills in diverse school-aged children: A meta-analysis. Perspectives of the ASHA Special Interest Groups, 9(3), 817–835. https://doi-org./10.1044/2024_PERSP-23-00262

88.

Wood

Molnar

(2023, March 4). Screening protocol for a systematic review and meta-analysis of dynamic assessment of early literacy skills in children: Concurrent and predictive validity. https://osf.io/bcghx/

89.

Woodcock

(2011). The Woodcock Reading Mastery Tests – Third Edition (WRMT-III). Pearson.

90.

*Wyman Chin

K. R

. (2018). Validity of a dynamic assessment of phonological awareness in emergent bilingual children [Master’s dissertation, University of South Florida].

91.

*Yap

D. F. F.

(2018). The utility of dynamic assessment of phonological awareness for bilingual children in Singapore [Doctoral dissertation, San Francisco State University & University of California, Berkeley].

92.

*Zumeta

R. O. R.

(2010). Enhancing the accuracy of kindergarten screening [Doctoral dissertation, Vanderbilt University].

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB

0.03 MB

0.34 MB

0.11 MB

0.13 MB