Abstract
This article reviews the administrative and psychometric properties of the Tests of Dyslexia–Comprehensive (TOD-C). The TOD-C is part of the multi-battery Tests of Dyslexia (TOD), which also includes a screener and separate test for young children. The TOD-C measures reading, spelling, and linguistic characteristics of dyslexia. The TOD-C also includes tests of vocabulary and reasoning.
Introduction
Dyslexia—categorized as a specific learning disability/disorder in the Diagnostic and Statistical Manual of Mental Disorders (American Psychiatric Association, 2022) and Individuals with Disabilities Education Act (Yudin, 2015)—is characterized by unexpected difficulties with accurate, fluent word reading and by poor spelling and decoding abilities, which are considered to result from phonological processing deficits (Lyon et al., 2003). The Tests of Dyslexia–Comprehensive (TOD-C; Mather et al., 2024) is a recently published test that purportedly assesses key characteristics of dyslexia via a comprehensive test battery.
Test Description
General Description
The Tests of Dyslexia (TOD) consists of three tests: the TOD-Screener (TOD-S), which is administered to all examinees; the TOD-Early (TOD-E), which is administered to pre-readers or emerging readers; and the TOD-C, which is administered to individuals starting in first grade. The TOD-S, TOD-E, and TOD-C yield various indexes to indicate dyslexia risk, but we have focused our review on the TOD-C as it is the most comprehensive test battery and incorporates TOD-S tests.
The TOD-C includes major elements in a dyslexia evaluation for individuals from first grade to adulthood and is intended to (a) distinguish between individuals that have or are at risk for dyslexia from individuals with typical reading skills and (b) assess skills in reading, spelling, and linguistic processing. The TOD-C is administered on paper, and it takes 30–40 minutes to obtain three index scores. Notably, the TOD-C also requires administration of the TOD-S tests which takes 10–15 minutes. Additional TOD-C tests take 5–10 minutes each to complete. Psychologists, reading specialists, educational diagnosticians, and related professionals may administer and interpret the TOD-C as long as they have prior knowledge of dyslexia and formal training in test administration, scoring, and interpretation. Examiners must also be familiar with diacritical marks in order to correctly pronounce letter sounds and words during administration.
Specific Description and Test Scores
The TOD-C produces three index scores: Dyslexia Diagnostic Index (DDI), Reading and Spelling Index (RSI), and Linguistic Processing Index (LPI). The DDI requires the completion of eight subtests (see Figure 1) and yields a score used to indicate the probability that an examinee has dyslexia; this probability index is an alternative descriptive classification that is based on percentile rank interpretation. The RSI measures overall reading and spelling ability and the LPI measures overall linguistic processing ability, with both indexes based on four tests (see Figure 1). The RSI and LPI tests contribute to the DDI, but having separate RSI and LPI scores allows for interpretive focus on each index’s domain. TOD-C indexes and required tests per index. Note. TOD-C = Test of Dyslexia–Comprehensive; TOD-S = Test of Dyslexia–Screener.
The TOD-C also yields up to 15 composites. The reading and spelling domain has seven composites based on nine tests that assess related basic skills, knowledge, efficiency, and comprehension. The linguistic processing domain has four composites based on nine tests that assess related awareness, rapid naming, working memory, and processing. The vocabulary and reasoning domain has four composites based on four tests that assess related general vocabulary and reasoning skills (see Figure 2). TOD-C composites can be used to identify examinee strengths and weaknesses and to conduct comparisons of specific skills. TOD-C composites (organized by domain) and required tests per composite.
The TOD-C provides age- and grade-based standard scores, percentile ranks, confidence intervals, and descriptive ranges for each test, composite, and index. Grade-based standard scores are based on spring/fall semesters at each grade level, and age-based standard score increments increase with examinees age (e.g., for 6-year-olds, increments are every 3 months; for 9-year-olds, increments are every 6 months; and for adults, increments range from 5 to 15 years). The TOD-C allows examiners to compare index, composite, and test scores, with comparison results indicating statistical significance and frequency rates of differences. Additionally, the TOD-C provides growth scores and age- and grade-equivalent scores for individual tests. Scores can be attained via hand scoring or web-based scoring.
Test Materials
The TOD-C includes a manual, norms book (for converting raw scores to standard scores), test record form, examinee response booklet, rating scales, interventions and recommendations guidebook, and two test easels. Select test materials are available as PDFs, but paper materials (e.g., record form and response booklet) are required for administration. The manual includes descriptions, administration information, and scoring instructions for each test, as well as discussions about how tests are related to dyslexia and suggestions for interpreting test performance (e.g., explaining why examinees might perform poorly on a test). The manual also includes examples of score interpretations, suggestions for writing reports using the TOD-C, examples for completing record forms, and recommendations that could be given based on TOD-C results.
Test easels include administration instructions, scoring directions, lists of required materials, and stimulus pictures/words. A brief description of each test, starting points, and basal/discontinue rules are also included in easels for easy administration. The test record forms are used to record identifying information, general observations of behavior, and examinee responses, and they include administration instructions similar to those included in the easels. Additionally, for tests that have items to which examinees respond quickly (e.g., rapid letter naming), the record forms indicate when it is appropriate to turn the easel page so that examiners can do so without having to look up from the test record and unintentionally slow the examinee’s rate of responding.
Technical Adequacy
Test Development
The authors note that the TOD-C was designed to assess skills most relevant to a dyslexia diagnosis based on the following areas of difficulty: phonics skills, word reading, pseudoword reading, irregular word reading, spelling, reading fluency, reading comprehension, linguistic processing, phonological awareness, rapid automatized naming, auditory working memory, orthographic processing, and visual–verbal paired-associate learning. TOD-C pilot testing included examination of passing rates, examiner feedback, and Rasch model analyses to evaluate goodness-of-fit of test items/scales and differential item functioning. Items were eliminated if they did not fit measurement models, demonstrated bias, or compromised the precision of the measurement. The TOD-C was also reviewed by bias/fairness experts to further identify and remove items deemed to be potentially biased or offensive. Thus, the test authors suggest that TOD-C is free from systemic gender, race/ethnicity, and socioeconomic status bias.
Standardization Sample
The TOD-C included 1401 child and 347 adult examinees in standardization and validation samples. Examinee demographic characteristics (including gender, parents’ educational level, race/ethnicity, and U.S. geographic region) are reported separately for child and adult samples and closely match proportions of the 2020 U.S. Census; however, for demographic location, the south region was slightly overrepresented and the north region was slightly underrepresented. The authors do not provide demographic information for each age or grade level in the standardization samples; thus, there is no clear indication about the representativeness of demographic characteristics for the various age and grade-level groups in the TOD-C.
The test authors provide sample size information for the child sample by age (e.g., 6-years-old and 7-years-old) and grade level (e.g., first grade and second grade). Examinees from ages 8 through 12 years have the highest representation (with an average of 131 examinees per age year); conversely, samples were smallest for examinees’ ages 6 (n = 31), 7 (n = 56), and 18 years (n = 37). Examinees from third through seventh grade have the highest representation (with an average of 131 examinees per grade); conversely, samples were smallest for examinees in first grade (n = 50) and 12th grade (n = 70). For the adult sample, authors provided sample size information based on multi-year age groups; the 18- to-23-year group (n = 86) represented approximately 32% of adult examinees, with older adults represented by smaller sample sizes. The TOD-C also included a clinical sample consisting primarily of examinees with an identified reading disability, attention-deficit/hyperactivity disorder, speech disorder, language disorder, autism spectrum disorder, or visual impairment. Finally, it is important to note that the TOD-C was standardized on individuals who demonstrated fluency in English, making it inappropriate to administer the TOD-C to individuals that do not demonstrate sufficient English proficiency. The TOD-C has not been translated into other languages.
Reliability
The test authors provide internal consistency estimates as evidence of reliability. Split-half reliability was used for most of the tests, while Rasch-based reliability was used for timed tests and tests with item sets. For index and composite scores, reliability estimates were calculated using the formula for reliability of linear combinations. Detailed internal consistency estimates are reported by age and grade and by tests, composites, and indexes. For TOD-C tests, most of the internal consistency estimates ranged from .80 to .90. For indexes, internal consistency estimates were between .80 and .90, which is considered excellent. Test–retest reliability, based on a subsample of 61 participants, was also satisfactory with coefficients for tests, indexes, and composites ranging from .70 to .95 (median = .84).
Validity
Content and construct validity seemed appropriate based on a review of the test development, authors’ evaluation of examinee performance across time, and inter-correlations between scores on conceptually similar and dissimilar tests. Confirmatory factor analysis (CFA)—conducted with data from the standardization and clinical samples—was used to evaluate two models based on the three major indexes: (a) a one-factor model based on the DDI, which includes tests of reading, spelling, and linguistic processing; and (b) a two-factor model with two indexes (LPI and RSI). Goodness-of-fit statistics indicated acceptable—and nearly identical—fit across both models, suggesting that interpretation is appropriate using either model. Convergent validity was evaluated by comparing performance on the TOD-C to related tests—including tests of cognitive abilities, achievement, spoken language, and phonological processing—based on a sample that ranged from 42 to 57 examinees per test. The highest correlations (greater than .60) were found for measures related to spelling, word reading and decoding, and reading fluency. Moderate correlations (between .59 and .40) were found for measures related to receptive vocabulary, passage comprehension, and working memory. The lowest correlations (less than .40) were found for measures related to elision, phoneme isolation, and blending words, which are all in the TOD-C linguistic processing domain. Additionally, sensitivity and specificity estimates are reported for the DDI, which was used to differentiate between individuals with/at risk of having dyslexia and individuals with typical reading skills. Using a standard score cutoff of 80 (one and a third standard deviations below the mean) and clinically diagnosed (n = 160) and typically developing samples (n = 1285), sensitivity is .78 and specificity is .97. Finally, results of validity testing from clinical-group samples demonstrated that examinees with dyslexia or other reading disabilities scored lower on the TOD-C than did examinees from other clinical samples (e.g., language disorder, ADHD).
Commentary and Recommendations
The TOD-C is a technically sound measure of the characteristics most often associated with dyslexia. Many TOD-C tests are similar to those in other measures (as described below), but the TOD-C distinguishes itself as it allows for a specific focus on dyslexia, particularly through the DDI which represents a unique index score that demonstrate appropriate validity for identifying individuals with difficulties characteristic of dyslexia. The TOD-C also allows for comprehensive evaluation and comparisons of specific characteristics of dyslexia, as it includes numerous composites focused on various reading, spelling, and linguistic skills. Administering the complete TOD-C battery can be time intensive, but doing so can yield significant information for identifying examinee strengths and weaknesses and for informing recommendation. Another distinguishing aspect of the TOD-C is that the test materials seem designed to be particularly useful for aiding examiners in interpretation and offering specific recommendations. For example, the manual includes discussions about why examinees may perform poorly on each test, as well as commentary about how each test may be associated with dyslexia. The TOD-C also includes a recommendations guidebook to aid in addressing specific aspects of dyslexia (e.g., phonics, spelling, and reading fluency), as well as an accommodations guide to support examinees with dyslexia. The recommendations seem to be evidence-based, as the authors provide lists of references for each targeted skill. However, it is important to note that the authors do not provide evidence that the use of test-based scores to inform the associated recommendations enhances their effectiveness.
That said, we recommend that examiners administer other tests, along with the TOD-C, to better account for possible reasons that examinees might score low on the TOD-C and to collect more comprehensive data that may be necessary for diagnostic and eligibility determinations. For example, students scoring in the high-risk range on the TOD-C may have intellectual disability or difficulties with speech, which will require comprehensive assessment in those domains. Additionally, while the vocabulary and reasoning tests from the TOD-C can be used as an estimate of cognitive ability, it would still be helpful to conduct more comprehensive intellectual assessment to more confidently determine an examinee’s overall cognitive ability and to explore skills in non–reading-related abilities (e.g., mathematics and listening comprehension), which may be expected for comprehensive evaluations of cognitive functioning. It is important to note that the standardization sample for the TOD-C is overrepresented by examinees from the late elementary to middle school grades. Thus, for younger examinees in first or second grades it may be more appropriate to administer the TOD-E, which is designed for early readers.
It is also important to note that there is some similarity between the TOD-C and other, more broad tests of achievement. For example, both the Wechsler Individual Achievement Test, Fourth Edition (WIAT-4; NCS Pearson, 2020) and Kaufman Test of Educational Achievement, Third Edition (KTEA-3; Kaufman & Kaufman, 2014) include dyslexia screeners and the WIAT-4 and KTEA-3 also include tests (e.g., spelling, word reading, decoding, reading fluency, and reading comprehension) that assess skills that are reflected in the TOD-C. Thus, it is important for examiners to consider how (and if) the TOD-C will be more useful than other tests that might already be available to them. One potential difference—as previously noted—is that the TOD-C allows for more comprehensive measures of reading, spelling, and linguistic processing, as the TOD-C provides multiple composites (based on various tests) for these domains, whereas other measures (e.g., WIAT-4 and KTEA-3) might only provide a single test of related skills. If examiners administer all of the TOD-C tests, we recommend that examiners avoid administering similar subtests from other measures in order to avoid unnecessary, duplicate testing unless the authors have concern about measurement error from the TOD-C. For example, if examiners complete the TOD-C, and they plan to also administer the KTEA-3, additional testing should be limited to skills which the TOD does not measure (e.g., mathematics, written expression) unless the authors have a specific reason for testing the same domain across multiple tests.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
