Sage Journals: Discover world-class research

Abstract

Curriculum-based measures (CBMs), which allow educators to monitor progress over time and make instructional decisions based on student performance, represent a fixture of general approaches to reading and data-based instructional frameworks. However, evidence supporting the use of CBM for students with intellectual disabilities is limited. This study evaluated the criterion validity of a reading CBM battery. Multiple CBM and standardized criterion measures were administered to elementary-age children (N = 56) with intellectual disabilities. Inferential analyses identified numerous domain-specific correlations between CBM and criterion measures; however, no single CBM emerged as a more effective predictor of reading performance. Findings provide qualified support for the use of CBM with children who have intellectual disabilities.

Keywords

intellectual disability reading criterion validity assessment

Reading intervention and assessment are increasingly associated with use of curriculum-based measures (CBM). Unlike traditional standardized measures or mastery measures developed by instructors, CBMs are tests of discrete academic skills (e.g., reading fluency and number identification) designed to be administered repeatedly to assess long-term student performance and guide instructional decisions (King et al., 2022; Tindal, 2013). CBM is used as a universal screening and progress monitoring tool within school-wide multi-tiered systems of support (MTSS) frameworks to allocate educational resources (Van Meveren et al., 2020). A related use of CBM is data-based individualization (DBI; Fuchs et al., 2021), which entails developing a goal for a student based on performance norms and the student’s level of performance and adjusting instruction based on the student’s progress. Sensitivity to changes in student behavior is one important feature of CBM; these tools may align closely with intervention procedures, providing a more efficient test of the effectiveness of instruction relative to larger, standardized measures (King et al., 2022).

CBM is of particular importance for instructional planning for students with critical reading needs, such as students with intellectual disabilities (ID; IQ scores below 70 or 75 coupled with adaptive behavior deficits). These learners have well-documented challenges associated with developing reading skills (e.g., Channell et al., 2013). Contemporary work in the field of reading instruction nonetheless demonstrates that students with ID can make critical improvements in reading with direct and explicit instruction (e.g., Lemons et al., 2018).

Increased attention to developing effective literacy instructional strategies for learners with ID necessitates comparable attention to measuring their skill development over time. That is, routine assessment is a critical aspect of DBI (e.g., Lemons et al., 2014, 2018) and while the reading intervention literature for students with ID is still limited, the relevant research on whether CBMs are appropriate tools for use with these learners is decidedly scanter. The few studies in which researchers employed CBM as a metric of reading progress (e.g., Allor et al., 2010; King et al., 2022; Lemons et al., 2013) suggest progress on CBM emerges slowly or is otherwise undetectable when compared with proximal measures of performance. This is in stark contrast to the broader CBM literature, which includes considerable evidence for the validity of these measures as screening and progress monitoring tools for typically developing students, as well as those who are at risk for reading difficulties or who have specific learning disabilities (e.g., Fewster & MacMillan, 2002; Reschly et al., 2009). The National Center for Intensive Intervention (NCII, 2019) likewise evaluates the technical adequacy of CBM for general populations as part of their mission to disseminate effective practices, but provides limited insight into suitability of CBM for children with ID.

Studies employing CBM as measures of intervention effectiveness of children with ID (e.g., King et al., 2022) have conducted severely underpowered correlation analyses among small samples (e.g., N = 17) that provide qualified support for validity of reading CBM. However, studies specifically examining the validity of CBM for students with ID have primarily involved older individuals. Hosp et al. (2014) assessed the criterion of a variety of subject-specific CBM (e.g., reading, math, and writing) among postsecondary students with ID (N = 41) enrolled in a 2-year university certificate program. Oral reading fluency (ORF) and maze comprehension CBM results were correlated with the broad reading, reading fluency, and passage comprehension scores of the Woodcock–Johnson Tests of Achievement (WJTA). Analysis revealed moderate-to-strong correlations (e.g., .30–.75) between CBM and criterion measures; however, subsequent analyses (i.e., Meng’s z test) did not identify significant differences in the ability of ORF and maze measures to predict performance on the WJTA. A direct replication featuring the same measures and analyses (Hosp et al., 2018) and a separate sample of postsecondary students with ID (N = 45) produced similar results, with reading CBM predicting performance on criterion measures without demonstrating differences in predictive capability.

Purpose

Much of the research supporting the use of CBM stems from work with children with learning disabilities and other populations who do not share the cognitive profile of individuals with ID (Snyder & Ayres, 2020). There is, therefore, little evidence to support the suitability of CBM as an assessment or decision-making tool for elementary and middle school students with ID. Research validating the use of CBM is needed if children with ID are to be responsibly included in large-scale education initiatives predicated on CBM (Hosp et al., 2018). Demonstrating the validity of reading CBM for children with ID will likewise support the use of potentially effective instruction among this population (e.g., DBI; Fuchs et al., 2021).

One approach to validating CBM involves comparing outcomes to standardized assessments with established validity (i.e., criterion measures). The present study examined the criterion validity of CBM in reading for elementary students with ID. Like Hosp and colleagues’ previous work (2014, 2018), we aimed to study whether CBM outcomes were correlated with and predicted performance on domains encompassed by more comprehensive assessments. Guiding questions included (a) To what extent are the CBM outcomes correlated with the criterion measures, and are those correlations significant?; and (b) Are some CBM outcomes significantly better at predicting performance on criterion measures than others?

Method

Participants and Setting

Following approval by a university-affiliated institutional review board (IRB) and multiple district IRBs, potential participants were recruited from elementary and middle schools (n = 33) across districts (n = 8) in four southern states as part of a separate reading intervention study for struggling readers. Eligible participants for this study: (a) were diagnosed with ID; (b) spoke fluent English; (c) used speech as their primary form of communication; (d) possessed hearing and vision faculties sufficient to complete administered assessment; (e) had access to a paraprofessional for 30 to 40 min for 4 days per week; and (f) repeated a model of at least one letter name, sound, or word during screening. We excluded students able to read 17 words per minute (wpm) with 90% accuracy or 60 wpm with 80% accuracy. Also excluded were students who required intensive behavior support before academic intervention would be appropriate. The families of participants received a U.S. $50 gift card as an incentive for participating. The reading intervention study recruited 92 students. Due to the issues with data collection, attendance, and an inability to confirm a diagnosis of ID, we retained a sample of 56 for this study. Parents completed a demographic survey prior to students’ completion of the test battery. Of the members of the sample who reported their age (98%), average age was 9.2 years (range = 6–15; SD = 2.3). The participant who did not report their age attended an elementary school. A description of additional demographic variables appears in Table 1.

Table 1.

Demographics of Participants in Study of CBM and Children with ID.

Variable	%
Sex
Male	59
Female	41
Secondary disability
Developmental disability	36
Speech/language impairment	32
Autism	18
Other health impairment	4
Traumatic brain injury	2
Not reported	20
Race
White	52
African American	21
Other	4
Not reported	21

Note. Total individuals = 56. Percentages refer to individuals for whom data are available. All individuals diagnosed with intellectual disability (ID). Participants reporting multiple disabilities resulted in secondary disability totals exceeding 100%.

Test administrators included three graduate students in special education and three staff members affiliated with the project supervised by a doctoral-level faculty member with expertise in reading for individuals with developmental disabilities. Test administrators received training on administration and were required to score a mock student’s answer script across all assessments with a minimum of 90% accuracy. Training continued until administrators met criterion.

Measures

Criterion Measures

Participants completed two subtests of the Test of Word Reading Efficiency, Second Edition (TOWRE-2; Torgesen et al., 2012). The sight word efficiency (SWE) subtest required students to read as many nondecodable sight words as possible in 45 seconds. The phonemic decoding efficiency (PDE) subtest required students to read as many phonemically regular nonwords as possible in 45 seconds. Alternate form reliability for the SWE (.91) and PDE (.92) is acceptable (Tarar et al., 2015). The TOWRE-2 demonstrates excellent classification accuracy based on chronological age and reading ability. The norming sample of the TOWRE-2 was generally consistent with estimates of national population prevalence and accounted for ID.

Participants also completed the print knowledge (PK) and phonological awareness (PA) subtests of the Test of Preschool Early Literacy (TOPEL; Lonigan et al., 2007). The PK subtest encompasses items (n = 36) related to concepts of print, letter and word discrimination, letter–name identification, and letter–sound identification. The PA subtest assesses blending, segmenting, and phonemic awareness using 27 items. Internal consistency for the TOPEL is acceptable (range = .86–.96; Wilson & Lonigan, 2010). The TOPEL is highly correlated with alternative standardized literacy assessments and predictive of overall reading ability (Lonigan et al., 2007). The standardization sample for the TOPEL was stratified on the basis of numerous demographic variables, including disability status (Madle & Owens, 2010).

As the final standardized measure, participants completed the Woodcock–Johnson IV (WJIV). Specific subtests included the letter–word identification (LWID), spelling, and passage comprehension. For LWID, students needed to identify letters and words. The spelling test assessed the ability to print letters and spell words. During the passage comprehension assessment, students identified pictures corresponding with words. The test also included a cloze procedure, in which students read a sentence and determined the correct word to place in blank spaces. Split-half reliability across WJIV subtests is acceptable (.84–.94); in addition, all subtests have adequate content validity (Villarreal, 2015). The WJIV has a long history of application among individuals with ID and is an accepted test of achievement among this population (Cook, 2018).

CBMs

Administered AIMSweb assessments (Shinn & Shinn, 2002) included measures of letter-sound fluency (LSF), letter-naming fluency (LNF), phoneme segmentation fluency (PSF), nonsense word fluency (NWF), and ORF as well as word identification fluency (WIF) developed by Fuchs et al. (2004). For LNF, students named randomly ordered letters. Test–retest reliability is acceptable (i.e., >.81; Clemens et al., 2017). The LSF measure required students to provide letter sounds without prompting when presented with randomly ordered lowercase letters. Test–retest reliability is acceptable (.80; Elliott et al., 2001). Phoneme segmentation fluency required students to segment words into their smallest component phonemes, with the score reflecting the number of sound segments stated. Test–retest reliability is .85 (O’Hearn, 2013). For NWF, the measure addressed students’ ability to decode phonemically regular nonwords, with a point awarded for each sound correctly identified. Alternate form reliability is acceptable (.83). Participants completed 3 first-grade level AIMSweb ORF passages to measure students’ ability to accurately and fluently read connected text. Alternate form reliability exceeds .93 (NCS Pearson, 2012). The WIF measure presented 50 high-frequency words randomly sampled from a 100-word Dolch list (Fuchs et al., 2004). Alternate test form reliability is .97.

All CBMs were scored in two ways: raw number of correct items and percentage correct. The raw number of correct items represented the total number of correct items a student obtained on each individual CBM task during the allotted time, regardless of the number of items attempted. Percentage correct was calculated by dividing the number of correct by the total value of items attempted (i.e., correct plus incorrect items), and multiplying by 100. All measures were administered one time except for the ORF task, which was administered three times using three separate forms. For our analyses, we selected each participant’s median ORF scores.

Procedures

Project staff administered and scored reading assessments for participants during in-person, 1:1 sessions in an isolated setting within the participants’ schools. Assessments were administered in a single session separated by frequent breaks. Multiple ORF probes were administered within the same session. We scheduled assessment sessions during noninstructional time based on the availability of the student. If a student could not attend a scheduled session, project staff rearranged a meeting with a cooperating special educator or paraprofessional. Including breaks, the duration of all assessment sessions was approximately 90 minutes per student. All assessments were video-recorded and administered in accordance with procedures specified in relevant test administration manuals. A secondary observer reviewed 100% of all data collection sheets for scoring accuracy before being entered in a spreadsheet for analysis. Disagreements were discussed and reconciled prior to conducting the analysis.

Interobserver Agreement

Project staff involved in test administration reviewed an assessment protocol. Each assessor was required to score a mock script of student responses across responses with at least 90% accuracy before independently working in the field. Following data collection, a secondary observer randomly selected video-recorded assessment sessions for 21.4% (n = 12) of the students in the sample for the purposes of collecting interobserver agreement (IOA). Total agreement for each measure was determined by dividing the lowest value of correct responses determined by an observer by the higher observed value and multiplying by 100%. Acceptable levels of agreement were obtained for LNF (M = 94.28%; range = 75%–100%; SD = 7.47), LSF (M = 95.1%; range = 80%–100%; SD = 6.82), PSF (M = 88.80%; range = 35.9%–100%; SD = 20.74), NWF (M = 92.17%; range = 47%–100%; SD = 15.53), WIF (M = 96.58%; range = 75%–100%; SD = 7.22), and ORF (M = 96.58%; range = 75%–100%; SD = 7.22). Disparities were resolved via consensus.

Data Analysis

Our data analysis plan involved several steps. First, we calculated descriptive statistics and determined whether the data met assumptions for subsequent analyses (i.e., normality and lack of outliers). We then calculated bivariate correlations between the CBM to examine the relations between the outcomes. Although primarily associated with regression, we interpreted large correlations (i.e., r ≥ .80) as potential evidence of collinearity with the potential to inflate subsequent analyses (Kim, 2019). We calculated bivariate correlations separately for the two CBM outcome types (i.e., raw number of correct and percentage correct).

Next, we made a priori pairings of criterion measures (i.e., TOWRE, TOPEL, and WJIV) and content-appropriate CBM metrics (Hosp et al., 2014, 2018) For example, NWF, WIF, and ORF were paired with the SWE subtest of the TOWRE due to their overlap with sight word reading (e.g., Barth et al., 2014). Likewise, we paired ORF CBM with WJIV passage comprehension subtest due to the established relationship between fluency and comprehension (e.g., Tighe & Schatschneider, 2016). We made the following pairings between criterion measures and CBMs: TOWRE SWE would be predicted by NWF, WIF, and ORF; TOWRE PDE would be predicted by LSF, LNF, and PSF; TOPEL PK would be predicted by LSF and LNF; TOPEL PA would be predicted by NWF and PSF; WJIV LWID would be predicted by WIF, LNF, and ORF; WJIV spelling would be predicted by LSF, LNF, and PSF; and WJIV passage comprehension would be predicted by the ORF and WIF. To answer our first research question, we calculated bivariate correlations between specific criterion measure—CBM pairings, examining both the magnitude and statistical significance of the correlations. We ran correlations between the criterion and CBMs for the raw correct and percentage correct outcomes. Results were interpreted relative to two separate criteria. For the purposes of providing initial evidence of validity, we considered correlations equal to or greater than .50 adequate for evidence of validity, consistent with prior research (e.g., Good et al., 2013). Although our small, nonrepresentative sample is not consistent with the validity guidelines of the NCII, we also noted when coefficient guidelines met the more stringent NCII validity criterion of .60.

Finally, to answer our second research question, we conducted a Meng’s z (Meng et al., 1992) to examine whether one of the CBMs paired with the criterion measure was significantly better at predicting performance on that criterion measure. Meng’s z tests the correlations between two independent variables and one dependent variable, CBMs and the criterion measure in the present study. Calculations involve the correlations between the dependent (criterion) variable and the first independent variable $(r_{y x^{1}})$ , the correlation between the dependent (criterion) variable and the second independent variable $(r_{y x^{2}})$ , and the correlation between the two independent variables $(r_{x^{1} x^{2}})$ . Because Meng’s z only permits the comparison of two predictor variables, in some instances we had to select two of the available CBM outcomes to include in the analysis. For example, the SWE and PDE subtests of the TOWRE, and the LWID and spelling subtests of the WJIV all had three associated CBMs. In these instances, we selected the two highest correlation coefficients. For example, for the SWE subtest of the TOWRE, we included WIF (r = .93) and ORF (r = .88) in Meng’s z analysis, and we excluded NWF (r = .80). All data analyses were conducted using RStudio (RStudio Team, 2021) and the cocor package (Diedenhofen & Much, 2015).

Results

Descriptive statistics appear in Table 2. We assessed skewness and kurtosis for all measures. Hair et al. (2010) suggest skewness and kurtosis values falling between ±2 and ±7, respectively, are acceptable for the purposes of our analyses. Most measures demonstrated appropriate levels of skewness and kurtosis, with the exception of raw scores for WIF, PSF, and ORF. These values are likely due to the high degree of variability within our sample (see Table 2). We decided to conduct the analyses without removing outliers or measures for several reasons. First, skewness and kurtosis minimally exceeded the recommended values. Second, reducing the size of the sample through the elimination of outliers would diminish the ability of our analyses to accurately identify significant effects (Hosp et al., 2018). Finally, variability is commonly observed in measures of reading performance for students with ID (e.g., King et al., 2022).

Table 2.

Descriptive Statistics for Criterion and CBMs.

Measure		Raw correctM (SD)	% AccurateM (SD)	Skewness(% Measure)	Kurtosis(% Measure)
TOWRE	SWE	8.66 (11.18)	N/A	1.62	2.32
TOWRE	PDE	1.57 (3.41)	N/A	2.10	3.33
TOPEL	PK	24.32 (11.67)	N/A	−0.73	−0.89
TOPEL	PA	13.14 (8.52)	N/A	0.01	−1.29
WJIV	LWID	15.18 (9.56)	N/A	0.59	−0.39
	Spelling	6.70 (4.17)	N/A	−0.13	−1.04
	PassComp	7.93 (4.55)	N/A	0.27	−0.38
CBM	NWF	11.55 (17.74)	0.30 (0.39)	1.65 (0.68)	2.62 (−1.32)
	WIF	10.50 (18.45)	0.27 (0.32)	2.36 (0.57)	5.22 (−1.30)
	LSF	16.55 (18.52)	0.48 (0.40)	0.80 (−0.09)	−0.70 (−1.75)
	LNF	26.59 (21.63)	0.68 (0.36)	0.74 (−0.98)	0.31 (−0.58)
	PSF	5.12 (10.83)	0.14 (0.27)	2.09 (1.59)	3.23 (1.32)
	ORF	9.70 (19.85)	0.25 (0.32)	3.07 (0.76)	10.81 (−1.11)

Note. TOWRE = Test of Word Reading Efficiency–2nd ed.; SWE = sight word efficiency; PDE = phonemic decoding efficiency; TOPEL = Test of Preschool Early Literacy; PK = print knowledge; PA = phonological awareness; WJIV = Woodcock–Johnson IV; LWID = letter–word identification; CBM = curriculum-based measure; PassComp = passage comprehension; NWF = nonsense word fluency; WIF = word identification fluency; LSF = letter–sound fluency; LNF = letter-naming fluency; PSF = phoneme segmentation fluency; ORF = oral reading fluency.

Bivariate correlations between the CBM outcomes—for both raw number correct and percentage correct—are reported in Table 3. Correlations for the raw number correct outcomes are reported above the diagonal, while correlations for the percentage correct outcomes are reported below the diagonal. Regarding raw correct, all outcomes were positively correlated, at least moderately so. All correlations were statistically significant, except for the correlation between LNF and PSF in the raw number correct metric. Correlations ranged from .30 to .96. The lowest observed correlation was between PSF and LNF (r = .30). Three correlations exceeded the .80 threshold indicative of collinearity: WIF and NWF (r = .81), LSF and NWF (r = .84), and ORF and WIF (r = .96).

Table 3

Correlations Between CBM Outcomes for Raw Number Correct and Percentage Correct.

Measure	NWF	WIF	LSF	LNF	PSF	ORF
NWF	—	.81***	.84***	.46***	.75***	.75***
WIF	.56***	—	.73***	.48***	.53***	.96***
LSF	.72***	.60***	—	.69***	.66***	.66***
LNF	.29*	.33*	.58***	—	.30*	.46***
PSF	.70***	.56***	.51***	.13	—	.40**
ORF	.60***	.86***	.54***	.27*	.43**	—

Note. Values above the diagonal are correlations between the CBMs when scored for number of correct responses, while the values below the diagonal are correlations between the CBMs when scored for percentage accuracy. CBM = curriculum-based measure; NWF = nonsense word fluency; WIF = word identification fluency; LSF = letter–sound fluency; LNF = letter-naming fluency; PSF = phoneme segmentation fluency; ORF = oral reading fluency.

Significant at p < .05. **Significant at p < .01. ***Significant at p < .001.

Correlations for the percentage correct outcomes showed a similar pattern. Namely, all outcomes were positively correlated and ranged from .13 to .86. Like the raw number correct correlations, the PSF and LNF correlation was the smallest (r = .13), and this was the only correlation that was not statistically significant. Also akin to the previous results, the WIF and ORF correlation for the percentage correct outcomes exceeded the .80 threshold (r = .86), but the WIF and NWF (r = .60) and LSF and NWF (r = .58) correlations did not.

Correlations Between Criterion and CBMs

Correlations between the criterion and CBMs for the raw number correct outcomes are reported in Table 4. Of the total number of associations assessed (n = 17), 100% of raw scores met the .50 threshold for adequate validity. Fewer (70.59%; n = 12) met the numeric criterion of NCII. All correlations were positive, moderate-to-strong in strength (range: .54–.93), and statistically significant. The strength of relations between the CBMs and their associated criterion measure varied across the criterion measures. For the SWE subtest of the TOWRE, WIF and ORF were the strongest predictors. For the PDE subtest of the TOWRE, LSF and PSF were the strongest predictors. The PK and PA subtests of the TOPEL were both associated with two CBMs (i.e., LSF and LNF, and NWF and PSF, respectively). In each case, both predictors of the PK and PA subtests yielded similar correlations (range: .58–.66). The WIF and ORF measures were the strongest predictors of the LWID subtest of the WJIV, and LSF and LNF were the strongest predictors of performance on the spelling subtest of the WJIV. The WIF outcome yielded a strong correlation with the WJIV passage comprehension measure (r = .67), as did the ORF outcome (r = .59).

Table 4.

Validity Correlations for CBM Raw Correct and Percentage Accurate Correct Outcomes.

	TOWRE (Raw/% Accurate)		TOPEL (Raw/% Accurate)		WJIV (Raw/% Accurate)
Measures	SWE	PDE	PK	PA	LWID	Spelling	PassComp
NWF	.80*/.58*			.64*/.71*
WIF	.93*/.78*				.82*/.80*		.67*/0.73*
LSF		.79*/.46*	.65*/.77*			.75*/.64*
LNF		.54***/.12	.66*/.73*		.58*/.48*	.61*/.45*
PSF		.62*/.55*		.58*/.62*		.54*/.50*
ORF	.88*/.79*				.74*/.83*		.59*/.66*

Note. Correlations calculated between CBMs and content-appropriate standardized measures. All CBMs taken from AIMSweb 2.0. CBM = curriculum-based measure; TOWRE = Test of Word Reading Efficiency–2nd ed.; SWE = sight word efficiency; PDE = phonemic decoding efficiency; TOPEL = Test of Preschool Early Literacy; PK = print knowledge; PA = phonological awareness; LWID = letter–word identification; PassComp = passage comprehension; NWF = nonsense word fluency; WIF = word identification fluency; LSF = letter–sound fluency; LNF = letter-naming fluency; PSF = phoneme segmentation fluency; ORF = oral reading fluency; WJIV = Woodcock–Johnson IV.

***

Significant at p < .001.

Correlations for the same criterion-CBM pairs for the percentage accurate outcomes are reported in Table 4. Most correlations (82.35%; n = 14) met the .50 threshold for adequate validity, with only 58.82% (n = 10) consistent with the NCII threshold. The correlations are all positive, ranging from weak to strong (range: .12–.83). All correlations were statistically significant, except for the correlation between the PDE subtest of the TOWRE and LNF. Percentage accurate WIF and ORF yielded the strongest correlation with the SWE subtest of the TOWRE. The LSF and PSF values yielded the strongest correlation with the PDE subtest of the TOWRE. The correlations between the CBMs and the PK and PA subtests of the TOPEL are strong. The WIF and the ORF measures were the strongest predictors of performance on the LWID subtest of the WJIV. The LSF and PSF values were the strongest predictors of performance on the WJIV spelling subtest. We observed positive, strong correlations between the WJIV passage comprehension subtest and the WIF and ORF measures (r = .73 and r = .66, respectively).

Difference Between Predictive Correlations

Meng’s z test results between correlation coefficients for criterion-CBM pairs for both raw number correct and percentage correct appear in Table 5. We observed significant predictions for only CBMs scored by raw number correct; all comparisons for the percentage accurate outcome were not significant. First, WIF was a significantly better predictor of performance on the SWE subtest of the TOWRE, the WJIV spelling subtest, and the passage comprehension subtest of the WJIV than was ORF. For the raw number correct outcome, LSF was a better predictor of the PDE subtest of the TOWRE than was the PSF subtest. The remaining comparisons were not statistically significant, suggesting that the CBM had equal predictive power to the criterion measure despite observed differences from the previous analytic step.

Table 5.

Results for Differences Between Predictive Validity of CBMs.

Raw number correct outcomes				Percentage accurate outcomes
Criterion	Subtest	CBMs	Z-Score	Criterion	Subtest	CBMs	Z-Score
TOWRE	SWE	WIF	−3.96***	TOWRE	SWE	WIF	−0.24
		ORF				ORF
	PDE	LSF	2.37*		PDE	LSF	−0.80
		PSF				PSF
TOPEL	PK	LSF	−0.13	TOPEL	PK	LSF	0.56
		LNF				LNF
	PA	NWF	0.81		PA	NWF	1.21
		PSF				PSF
WJIV	LWID	WIF	3.37***	WJ	LWID	WIF	−0.77
		ORF				ORF
	Spelling	LSF	1.92		Spelling	LSF	1.34
		LNF				PSF
	PassComp	WIF	2.67***		PassComp	WIF	0.84
		ORF				ORF

Note. CBM = curriculum-based measure; TOWRE = Test of Word Reading Efficiency–2nd ed.; SWE = sight word efficiency; PDE = phonemic decoding efficiency; TOPEL = Test of Preschool Early Literacy; PK = print knowledge; PA = phonological awareness; WJIV = Woodcock–Johnson IV; LWID = letter–word identification; PassComp = passage comprehension; NWF = nonsense word fluency; WIF = word identification fluency; LSF = letter–sound fluency; LNF = letter-naming fluency; PSF = phoneme segmentation fluency; ORF = oral reading fluency.

=significant at < .05. *** = significant at < .001.

Discussion

This study assessed the criterion validity of reading CBM for elementary-age children with ID. Analyses suggest that CBM were significantly correlated with domain-specific criterion measures. Specifically, the raw number of items correct represented a stronger predictor of performance than accuracy percentage. There were minimal differences in terms of predictive power between most CBMs to the criterion measures, regardless of the outcome metric (i.e., raw number correct or percentage accurate). Only four significant differences were identified: WIF was a better predictor than ORF for the SWE subtest of the TOWRE, the spelling subtest of the WJIV, and the passage comprehension subtest of the WJIV. In addition, LSF was a better predictor than PSF on the PDE subtest of the TOWRE for raw correct only. In particular, we observed no significant differences in prediction for all CBM comparisons with the percentage accurate outcome. Results provide support for the use of reading CBM among children with ID.

In addition to featuring a younger population, this study featured a larger sample and wider range of criterion measures than featured in previous studies (e.g., Hosp et al., 2014, 2018) Comparisons are further impeded by our emphasis on early reading CBM. Nonetheless, results across studies remain relatively consistent, with correlations indicative of an association between ORF and criterion measures of passage comprehension (i.e., >.60; see Hosp et al., 2018). Our findings are also consistent with previously observed moderate-to-strong correlations between early reading CBM (i.e., LSF, WIF, ORF, and FSF) and standardized criterion letter- and word-identification subtests (King et al., 2022). Taken together, this scholarship supports the continued use of reading CBM in research and practice involving children with ID.

Results of this study contribute to work establishing the link between CBM administered on a weekly to biweekly basis and conventional standardized measures used to assess reading progress in children with ID. This is important given the limited responsiveness of this population on measures thought to represent the acquisition of reading (e.g., Lemons et al., 2013)—that is, standardized assessments (King et al., 2022). Findings do not directly address the apparent lack of sensitivity of CBM to student progress, however (e.g., Allor et al., 2010).

With the exception of WIF and LSF, our analyses did not identify any of the CBM as a superior predictor of performance on criterion measures. This is consistent with Hosp and colleagues (2014, 2018) findings concerning the equivalence of reading comprehension CBM. However, Hosp et al. suggested that the lack of distinction between ORF and maze (i.e., a measure specifically designed to test comprehension) assessments as predictors of passage comprehension may stem from the small sample sizes used within their studies. When compared with ORF, our findings indicate WIF is a superior predictor of word identification and serves as an equivalent predictor of comprehension. Eliminating potentially redundant CBM could reduce the resources practitioners and students expend on assessment. The utility of assessing students with ID using direct measures of connected text (i.e., ORF) nonetheless warrants further examination.

Limitations

This study had several notable limitations. Validity is a multifaceted concept only partially addressed in this study. Additional work was needed to examine various forms of evidence needed to comprehensively evaluate validity. Due to the considerable burden associated with re-administrations of the assessment battery, we did not assess reliability—a prerequisite of validity—for the CBM. We also did not counterbalance the assessments to avoid sequence effects. Although we encourage authors to address these issues in future studies, extensive documentation supporting reliability of these assessments among general populations (e.g., Clemens et al., 2017) is supported by recent studies that attest to the test–retest reliability (.81–.95) of similar measures (e.g., DIBELS ORF; Good et al., 2013) for students with ID (King et al., 2022). Finally, several correlations, as noted, exceeded the .80 threshold associated with collinearity. However, the high correlations between NWF and NWF and NWF and LSF are inconsequential, as these measures were not paired for Meng’s z analysis. We did include ORF and WIF in Meng’s z analysis despite the strength of the correlation because of the theoretical alignment between the measures. Findings should, therefore, be interpreted with caution.

While participant ID was verified for included participant based on parental reports and school documentation, we did not collect IQ scores at any point during the study and were unable to assess the influence of cognitive ability on the relationship between measures. Likewise, the modest sample size did not permit an evaluation of performance based on specific disabilities. We also restricted participation to children whose communication, sensory, and behavioral strengths were most conducive to reading instruction. Consequently, their performance may not reflect the broader population of students with ID (Snyder & Ayres, 2020). Although drawn from several states, districts, and schools, we did not attempt to acquire a randomized, nationally representative sample. Consequently, our findings contribute to evidence supporting the use of CBM among children with ID, but are not sufficiently rigorous to satisfy standards of organizations such as the NCII (2019).

Implications for Practice

The results herein provide preliminary evidence of the appropriateness of CBMs for assessing the reading development of elementary-age students with ID. Our findings suggest that a range of CBM may be used to adequately predict reading performance in specific domains, which may have implications for type and number of CBM educators should administer in practice. From the perspective of a practitioner, this means that using CBMs to approximate students’ reading abilities is feasible, perhaps saving considerable time given the relative ease of administering, scoring, and interpreting CBM tools. In addition, practitioners may be able to save time in some instances by avoiding the administration of redundant tools—however, we hesitate to make this recommendation explicitly without further corroboration. In addition, the results indicate that the way in which the CBMs are scored—either as raw number correct or percentage correct—vary little in their prediction to larger, standardized assessments.

Future Directions

Results suggest that, for a specific population of students with ID, CBM may be appropriate for screening and instructional decision-making. However, additional research is needed to gauge the effect of modifications (e.g., extra time) frequently used for children with ID on the utility of CBM (Snyder & Ayres, 2020). Findings from this study pertain to children for whom modifications appear to be unnecessary, a population that does not fully represent the range of students who receive special education services (e.g., Lemons et al., 2013).

The results of this study should be not generalized to CBM that measure other academic skills (e.g., mathematics and writing). Hosp and colleagues (2014, 2018) demonstrated the usefulness of a CBMs for reading, writing, and mathematics for young adults with ID. The convergence of our findings with Hosp and colleagues’ should not presuppose a similar convergence for mathematics or writing CBM tools for elementary-age students in this population. Future research should explicitly validate those tools for these learners.

A related issue involves the extent to which both sampling and the reporting of demographic characteristics inhibit the examination of participant- or contextual-level variables on the efficacy of assessment and other factors relevant to instruction (King et al., 2022). Identifying large samples of students with ID, administering extensive assessment batteries, and other logistic challenges represent significant impediments to additional research in this area. Innovations in the analysis of smaller, single-case designs more commonly used to examine reading among populations who require intensive instructional supports may provide insight into variables that influence the effectiveness of reading instruction (Miočević et al., 2022).

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The research described herein was supported by the Institute of Education Sciences, U.S. Department of Education, through grant R324A190240. The opinions expressed are those of the authors and do not represent the views of the Institute or the U.S. Department of Education.

ORCID iD

Derek B. Rodgers

References

Allor

J. H.

Mathes

P. G.

Roberts

J. K.

Jones

F. G.

Champlin

T. M.

(2010). Teaching students with moderate intellectual disabilities to read: An experimental examination of a comprehensive reading intervention. Education and Training in Autism and Developmental Disabilities, 45(1), 3–22. https://www.jstor.org/stable/23880147

Barth

A. E.

Tolar

T. D.

Fletcher

J. M.

Francis

(2014). The effects of student and text characteristics on the oral reading fluency of middle-grade students. Journal of Educational Psychology, 106(1), 162–180. https://doi.org/10.1037/a0033826

Channell

M. M.

Loveall

S. J.

Conners

F. A.

(2013). Strengths and weaknesses in reading skills of students with intellectual disabilities. Research in Developmental Disabilities, 34(2), 776–787. https://doi.org/10.1016/j.ridd.2012.10.010

Clemens

N. H.

Lai

M. H.

Burke

J. Y.

(2017). Interrelations of growth in letter naming and sound fluency in kindergarten and implications for subsequent reading fluency. School Psychology Review, 46(3), 272–287. https://doi.org/10.17105/SPR-2017-0032.V46-3

Cook

(2018). Woodcock-Johnson Tests of achievement. In Braaten

E. B.

(Ed.), The SAGE encyclopedia of intellectual and developmental disorders (pp. 1746–1748). SAGE.

Diedenhofen

Much

(2015). Cocor: A comprehensive solution for the statistical comparison of correlations. PLOS ONE, 4(10), Article e0131499. https://doi.org/10.1371/journal.pone.0121945

Elliott

Lee

S. W.

Tollefson

(2001). A reliability and validity study of the dynamic indicators of basic early literacy skills—modified. School Psychology Review, 30, 33–49.

Fewster

MacMillan

P. D.

(2002). School-based evidence for the validity of curriculum-based measurement of reading and writing. Remedial and Special Education, 23, 149–156. https://doi.org/10.1177/07419325020230030301

Fuchs

L. S.

Fuchs

Compton

D. L.

(2004). Monitoring early reading development in first grade: Word identification fluency versus nonsense word fluency. Exceptional Children, 71(1), 7–21. https://doi.org/10.1177/001440290407100101

10.

Fuchs

L. S.

Fuchs

Hamlett

C. L.

Stecker

P. M.

(2021). Bringing data-based individualization to scale: A call for the next-generation technology of teacher supports. Journal of Learning Disabilities, 54(5), 319–333. https://doi.org/10.1177/0022219420950654

11.

Good

R. H.

Kaminski

R. A.

Dewey

E. N.

Wallin

Powell-Smith

K. A.

Latimer

R. J.

(2013). Acadience reading K-6 technical manual. Dynamic Measurement Group. https://acadiencelearning.org/wp-content/uploads/2020/01/Acadience_Reading_K-6_Technical_Manual.pdf

12.

Hair

Black

W. C.

Babin

B. J.

Anderson

R. E.

(2010). Multivariate data analysis (7th ed.). Pearson Educational International.

13.

Hosp

J. L.

Ford

J. W.

Huddle

S. M.

Hensley

K. K.

(2018). The importance of replication in measurement research: Using curriculum-based measures with postsecondary students with developmental disabilities. Assessment for Effective Intervention, 43(2), 96–109. https://doi.org/10.1177/1534508417727489

14.

Hosp

J. L.

Hensley

Huddle

S. M.

Ford

J. W.

(2014). Using curriculum-based measures with postsecondary students with intellectual and developmental disabilities. Remedial and Special Education, 35(4), 247–257. https://doi.org/10.1177/0741932514530572

15.

Kim

J. H.

(2019). Multicollinearity and misleading statistical results. Korean Journal of Anesthesiology, 72(6), 558–569. https://doi.org/10.4097/kja.19087

16.

King

Rodgers

Lemons

C. J.

(2022). The effect of supplemental reading instruction on fluency outcomes for children with Down syndrome: A closer look at curriculum-based measures. Exceptional Children, 88(4), 421–441. https://doi.org/10.1177/00144029221081006

17.

Lemons

C. J.

Kearns

D. M.

Davidson

K. A.

(2014). Data-based individualization in reading: Intensifying interventions for students with significant reading disabilities. Teaching Exceptional Children, 46(4), 20–29. https://doi.org/10.1177/0040059914522978

18.

Lemons

C. J.

King

S. A.

Davidson

K. A.

Puranik

C. S.

Al Otaiba

Fidler

D. J.

(2018). Personalized reading intervention for children with Down syndrome. Journal of School Psychology, 66, 67–84. https://doi.org/10.1016/j.jsp.2017.07.006

19.

Lemons

C. J.

Zigmond

Kloo

A. M.

Hill

D. R.

Mrachko

A. A.

Paterra

M. F.

. . .Davis

S. M.

(2013). Performance of students with significant cognitive disabilities on early-grade curriculum-based measures of word and passage reading fluency. Exceptional Children, 79(4), 408–426. https://doi.org/10.1177/001440291307900402

20.

Lonigan

C. J.

Wagner

R. K.

Torgesen

J. K.

Rashotte

C. A.

(2007). TOPEL: Test of preschool early literacy. Pro-ed.

21.

Madle

R. A.

Owens

(2010). Test of preschool early literacy. The Eighteenth Mental Measurements Yearbook.

22.

Meng

X. L.

Rosenthal

Rubin

D. B.

(1992). Comparing correlated correlation coefficients. Psychological Bulletin, 111(1), 172–175. https://doi.org/10.1037/0033-2909.111.1.172

23.

Miočević

Moeyaert

Mayer

Montoya

A. K.

(2022). Causal mediation analysis in single case experimental designs: Introduction to the special issue. Evaluation & the Health Professions, 45(1), 3–7. https://doi.org/10.1177/01632787211073194

24.

National Center on Intensive Intervention. (2019). Academic progress monitoring tools chart rating rubric. https://intensiveintervention.org/sites/default/files/NCII_AcadProgMonitoring_RatingRubric_Aug2019.pdf

25.

NCS Pearson. (2012). AIMSweb progress monitoring and RTI systems. Pearson.

26.

O’Hearn

(2013). The relationship between the Boehm and AIMSweb on kindergarten student achievement in one northern Michigan intermediate school district [Doctoral dissertation, Eastern Michigan University]. ProQuest Dissertations and Theses.

27.

Reschly

A. L.

Busch

T. W.

Betts

Deno

S. L.

Long

J. D.

(2009). Curriculum-based measurement oral reading as an indicator of reading achievement: A meta-analysis of the correlational evidence. Journal of School Psychology, 47(6), 427–469. https://doi.org/10.1016/j.jsp.2009.07.001

28.

RStudio Team. (2021). RStudio: Integrated development environment for R. Rstudio, PBC.

29.

Shinn

M. M.

Shinn

M. R.

(2002). AIMSweb training workbook: Administration and scoring of reading curriculum-based measurement (R-CBM) for use in general outcome measurement. Edformation.

30.

Snyder

S. M.

Ayres

(2020). Investigating the usage of reading curriculum-based measurement (CBM-R) to formatively assess the basic reading skills of students with intellectual disability. Education and Training in Autism and Developmental Disabilities, 55(1), 60–74. https://www.jstor.org/stable/10.2307/26898714

31.

Tarar

J. M.

Meisinger

E. B.

Dickens

R. H.

(2015). Test review: Test of word reading efficiency–second edition (TOWRE 2) by Torgesen, J. K., Wagner, R. K., & Rashotte, C. A. Canadian Journal of School Psychology, 30, 320–326. https://doi.org/10.1177/0829573515594334

32.

Tighe

E. L.

Schatschneider

(2016). Examining the relationships of component reading skills to reading comprehension in struggling adult readers: A meta-analysis. Journal of Learning Disabilities, 49(4), 395–409. https://doi.org/10.1177/0022219414555415

33.

Tindal

(2013). Curriculum-based measurement: A brief history of nearly everything from the 1970s to the present. International Scholarly Research Notices, 2013, Article 958530. https://doi.org/10.1155/2013/958530

34.

Torgesen

J. K.

Wagner

Rashotte

(2012). Test of word reading efficiency–second edition (TOWRE-2). AGS.

35.

Van Meveren

Hulac

Wollersheim-Shervey

. (2020). Universal screening methods and models: Diagnostic accuracy of reading assessments. Assessment for Effective Intervention, 45(4), 255–265. doi: 10.1177/1534508418819797.

36.

Villarreal

(2015). Test review: Schrank

F. A.

Mather

McGrew

K. S

. (2014). Woodcock-Johnson IV Tests of achievement. Journal of Psycho Educational Assessment, 33, 391–398. https://doi.org/10.1177/0734282915571408

37.

Wilson

S. B.

Lonigan

C. J.

(2010). Identifying preschool children at risk of later reading difficulties: Evaluation of two emergent literacy screening tools. Journal of Learning Disabilities, 43, 62–76. https://doi.org/10.1177/0022219409345007

Assessing the Criterion Validity of Curriculum-Based Measures for Children with Intellectual Disabilities

Abstract

Keywords

Purpose

Method

Participants and Setting

Measures

Criterion Measures

CBMs

Procedures

Interobserver Agreement

Data Analysis

Results

Correlations Between Criterion and CBMs

Difference Between Predictive Correlations

Discussion

Limitations

Implications for Practice

Future Directions

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

References