Abstract
This study used a randomized controlled trial design to investigate the ROOTS curriculum, a 50-lesson kindergarten mathematics intervention. Ten ROOTS-eligible students per classroom (n = 60) were randomly assigned to one of three conditions: a ROOTS five-student group, a ROOTS two-student group, and a no-treatment control group. Two primary research questions were investigated as part of this study: What was the overall impact of the treatment (the ROOTS intervention) as compared with the control (business as usual)? Was there a differential impact on student outcomes between the two treatment conditions (two- vs. five-student group)? Initial analyses for the first research question indicated a significant impact on three outcomes and positive but nonsignificant impacts on three additional measures. Results for the second research question, comparing the two- and five-student groups, indicated negligible and nonsignificant differences. Implications for practice are discussed.
The rising concern over early mathematics and its role in long-term development occurs as greater expectations around mathematics are codified, through initiatives such as the Common Core State Standards (CCSS; CCSS Initiative, 2010), and used to guide service delivery in schools. However, while expectations are increasing, national achievement data continue to show consistent and worrisome patterns of performance. Data from the 2015 National Assessment of Educational Progress indicate that relatively few students are likely to meet these new higher standards, with only 40% of students being classified as at or above proficiency and 18% as below basic. Data are even more concerning for minority students, low socioeconomic status students, and students with disabilities, as relatively few (16%–26%) are classified as at or above proficiency. In addition, National Assessment of Educational Progress data show that after positive gains from 1990 through 2007, fourth-grade achievement levels have largely stagnated.
Approaches to Address Low Mathematics Achievement
Solutions to address systematic low achievement in mathematics are confounded by the challenge that schools face in providing additional time for mathematics instruction at the early elementary grades. Data indicate that a relatively small amount of time during the school day is allocated to mathematics instruction (La Paro et al., 2009) and that time is likely spent toward core or Tier 1 instruction without added support for at-risk students. In addition, schools are not likely to have developed the same institutionalized supports around early mathematics as are in place for beginning reading instruction. Such supports include mandated time blocks of instruction, research-validated or research-based materials for core and intervention instruction, screening systems to identify at-risk students, progress-monitoring systems to measure growth over time, professional development on best practices, and implementation support provided by reading specialists or coaches (Balu et al., 2015; Clarke, Baker, & Chard, 2008).
The aforementioned supports are components of a response to intervention (RTI) service delivery model. Initially conceptualized and operationalized as an alternate mechanism to determine eligibility for special education services (Individuals with Disabilities Education Act of 2004) by differentiating whether a student’s lack of growth was due to poor instruction or disability (Fuchs, Fuchs, & Hollenbeck, 2007), aspects of RTI models have been adopted in general to support the academic and behavioral growth of all students (Fuchs & Vaughn, 2012; Vaughn & Swanson, 2015) through a multitier system of support (MTSS) service delivery model. Although multitier models vary, most have three tiers of support, with Tier 1 consisting of core instruction, Tier 2 including the use of standard protocol interventions typically delivered in small groups, and Tier 3 focused on individualized problem solving and more intensive degrees of support (e.g., one-on-one tutoring). MTSS service delivery models are increasingly common in the area of reading, with fewer applications in the area of mathematics. A self-reported survey of mathematics practices detailed that only one-third of schools provided multitier systems of support in first grade. In contrast from the same grade, 71% of schools reported providing full implementation of MTSS models in the area of reading (Balu et al., 2015).
A research base on early mathematics practices for use in MTSS in the early elementary grades is emerging and includes key elements, such as screening for mathematics difficulty in kindergarten (Fuchs et al., 2010; Smolkowski & Gunn, 2012) and a number of research studies investigating the efficacy of kindergarten intervention programs (Clarke, Doabler, Smolkowski, Kurtz Nelson, et al., 2016; Dyson, Jordan, & Glutting, 2013; Fuchs et al., 2005; Sood & Jitendra, 2013). A number of key elements exist across these intervention programs. As called for by experts, the focus of each program is on the development of number sense (Berch, 2005; Gersten & Chard, 1999) and whole number understanding (Clarke, Baker, & Fien, 2009; National Mathematics Advisory Panel, 2008). In addition, each program employs a systematic and explicit instructional framework (Archer & Hughes, 2011; Coyne, Kame’enui, & Carnine, 2011) that includes instructional design elements and features shown to be effective for at-risk learners (Baker, Gersten, & Lee, 2002; Gersten et al., 2009; Kroesbergen & Van Luit, 2003).
Despite broader advances made by the field in designing interventions based on these key elements and rigorous studies of their efficacy, there have been calls to examine more finite questions related to intervention effectiveness (Gersten, 2016; Ochsendorf, 2016) to help refine the implementation of multitier models to better meet the needs of all learners (Miller, Vaughn, & Freund, 2014). One potential mechanism for such investigations is examining the treatment of instructional intensity of intervention services.
Treatment Intensity and Small Group Instruction
Warren, Fey, and Yoder (2007) theorized that treatment intensity functioned as a generalized variable that offered a key mechanism by which to optimize intervention effects, and they specified a framework for quantifying treatment intensity of interventions by examining variables related to dose (i.e., teaching episode), dose form, dose frequency, and duration, resulting in a metric of cumulative intervention intensity. A small cluster of studies in the area of print awareness have examined finite manipulations of treatment intensity to more fully understand intervention impacts (e.g., Breit-Smith, Justice, McGinty, & Kaderavek, 2009; Ezell, Justice, & Parsons, 2000; Justice & Ezell, 2000, 2002; Justice, Kaderavek, Fan, Sofka, & Hunt, 2009; Justice, McGinty, Piasta, Kaderavek, & Fan, 2010; Lovelace & Stewart, 2007; McGinty, Breit-Smith, Fan, Justice, & Kaderavek, 2011). However, despite calls for such investigations, studies that isolate variables of treatment intensity are still relatively limited (Codding & Lane, 2015) in other academic areas, including mathematics. Given the research base on systematic and explicit instruction in mathematics as a cornerstone of programs designed for at-risk students (Baker et al., 2002; Gersten et al., 2009; Kroesbergen & Van Luit, 2003), the behaviors associated with this approach (e.g., teacher models, practice opportunities, feedback) are logical mechanisms to manipulate to increase treatment intensity.
A number of studies have begun to look at the impact of small groups as a proxy for treatment intensity. Small group instruction is considered crucial to the provision of interventions because it represents a mechanism for individualizing and intensifying instruction (Fien et al., 2011; Gersten et al., 2008), and the use of smaller groups is generally accepted practice within MTSS as a mechanism to increase intensity within and across tiers of instruction (Baker, Fien, & Baker, 2010; Denton et al., 2013).
Meta-analyses of studies on reading interventions show positive effects for small group instruction. Wanzek and Vaughn (2007) conducted a meta-analysis examining a range of variables for a K–3 reading intervention including group size ranging from one on one to small groups of eight students. Results indicated that smaller instructional groups were associated with greater effect sizes. In addition, Elbaum, Vaughn, Hughes, and Moody (2000) found significant positive effects for 1:1 reading interventions for at-risk elementary students. While such comparisons investigate the overall efficacy of small group instruction or 1:1 instruction, additional work has examined contrasting small group composition. Research indicates that variations in small group size are associated with key academic variables, including academic engaged time (Thurlow, Ysseldyke, Wotruba, & Algozzine, 1993). Vaughn, Thompson, Kouzekanani, and Dickson (2003) conducted a study in which three small group sizes were contrasted, holding all other variables constant (i.e., each group size used the same intervention program), and they found a significant impact for smaller group sizes (1:1 and 1:3) at posttest and follow-up when compared with a larger group (1:10) but no significant differences between 1:1 and 1:3. A study by Vaughn, Cirino, et al. (2010) also investigated manipulating group size (1:5 and 1:12–15) and found positive but nonsignificant results on reading outcomes for seventh and eighth students.
Within mathematics, relatively few investigations of varying group size have been conducted. B. R. Bryant et al. (2016) examined the impact of a Tier 3 intervention that was developed by systematically modifying a Tier 2 intervention to provide a more intensive instructional experience. Due to modifying multiple aspects of the intervention (e.g., dose, instructional design features) with group size, attributing cause to any single variable was not possible. However, results were generally effective, with 75% of participants showing performance after the intervention that would no longer indicate the need Tier 3 services (i.e., >25th percentile on a distal measure). Such results thus indicate some promise for considering group size as a variable by which to increase intervention intensity in mathematics. Note that their work was across two studies and not a direct investigation of group size as an independent variable within a single study and that the results did not include an analysis of whether students maintained their gains (i.e., remained >25th percentile) at later time points. Despite mixed results related to group size and student outcomes, the interest in mechanisms to increase treatment intensity (Codding & Lane, 2015) within multitier systems of service delivery suggests that continued investigation into the role of group size and student outcomes is warranted.
Purpose and Research Questions
To date, we have found no studies in mathematics that manipulated small group size. The purpose of our research was to conduct an investigation of an early mathematics intervention delivered in different group size formats. The study used a randomized controlled trial design (blocking on classrooms) to investigate the ROOTS intervention in 69 kindergarten classrooms with approximately 10 eligible students per classroom. ROOTS is a 50-lesson Tier 2 kindergarten intervention curriculum. The goal of ROOTS is to support students’ conceptual understanding of and procedural fluency with critical whole number concepts. ROOTS is fully aligned to the kindergarten CCSS in the area of number and operations (CCSS Initiative, 2010). The research team randomly assigned these 10 students to one of three conditions (student:teacher ratio): a ROOTS large group (5:1), a ROOTS small group (2:1), and a no-treatment control group. Overall efficacy of the ROOTS intervention has been investigated (Clarke, Doabler, Smolkowski, Baker, et al., 2016; Clarke, Doabler, Smolkowski, Kurtz Nelson, et al., 2016) and found to positively affect student math outcomes. Two primary and one secondary research question were investigated as part of this study:
Research Question 1: What was the overall impact of the treatment (ROOTS intervention) as compared with the control (business as usual)?
Research Question 2: Was there a differential impact on teacher and student behaviors between the two treatment conditions (ROOTS large group vs. ROOTS small group)?
Research Question 3: Was there a differential impact on student outcomes between the two treatment conditions (ROOTS large group vs. ROOTS small group)?
We hypothesized that ROOTS would have a positive impact on student achievement when compared with the control condition. In addition, we hypothesized that there would be a significant difference in treatment intensity between ROOTS small group and ROOTS large group as measured by critical teacher and student behavior, and that there would be a differential impact on student outcomes for the ROOTS small group compared to the ROOTS large group.
Method
Participants
Schools
Fourteen elementary schools from four Oregon school districts participated in the present study. Three were located in rural and suburban areas of western Oregon and one in the Portland metropolitan area. Student enrollment ranged from 2,736 to 39,002. Schools targeted for recruitment received Title I funding. Within these 14 schools, 0%–12% of students were American Indian or Native Alaskan; 0%–16%, Asian; 0%–9%, Black; 0%–74%, Hispanic; 0%–2%, Native Hawaiian or Pacific Islander; 19%–92%, White; and 0%–15%, more than one race. Within these same schools, 8%– 25% of students received special education services; 5%–69%, English language learners; and 17%–87%, eligible for free or reduced-price lunch. School and district enrollment and demographics did not change significantly from Year 1 to Year 2.
Classrooms
Sixty-nine classrooms participated in the study (n = 37 classrooms in Year 1, n = 32 classrooms in Year 2). Each year represents a separate sample. In this study, the samples from each year were combined. Of the 69 classrooms, 63 offered a half-day kindergarten program, and 6 offered a full-day program. All classrooms provided mathematics instruction in English and operated 5 days per week. Across both years of the study, classrooms had an average of 25.06 students (SD = 5.60).
The 69 classrooms were taught by 31 teachers. Of the 31 teachers, 20 participated in both years of the study. Nine Year 1 teachers and seven Year 2 teachers taught two participating half-day classrooms (a.m. and p.m.). All were certified kindergarten teachers and participated for the full duration of the ROOTS study. Of the 31 teachers, 100% identified as female, 84% as White, and 10% as Asian American/Pacific Islander. One teacher identified as representing another ethnic group, and one teacher declined to provide ethnicity information. Teachers had an average of 16.45 years of teaching experience and 8.81 years of kindergarten teaching experience; 87% of teachers had a master’s degree in education; and 68% of teachers had completed an algebra course at the college level.
Criteria for participation
In each participating classroom, all students with parental consent were screened in the late fall of their kindergarten year. The screening process included the Assessing Student Proficiency in Early Number Sense (ASPENS; Clarke, Gersten, Dimino, & Rolfhus, 2011) and the Number Sense Brief (NSB; Jordan, Glutting, & Ramineni, 2008), which are standardized measures of early mathematics proficiency. Students were eligible for the ROOTS intervention and thus considered at risk for mathematics difficulties if they received an NSB score ≤20 and an ASPENS composite score in the strategic or intensive ranges.
Once students were determined eligible for the ROOTS intervention, the project’s independent evaluator separately converted students’ NSB and ASPENS scores into standard scores and then combined the two standard scores to form an overall composite score for each student. Composite scores within each classroom were then rank ordered, and the 10 lowest ROOTS-eligible students were randomly assigned to a two-student ROOTS intervention group (2:1), a five-student ROOTS intervention group (5:1), or a no-treatment control condition. Out of the 69 participating classrooms, 53 had at least 10 students who met ROOTS eligibility criteria. Fourteen classrooms in Year 1 and two classrooms in Year 2 had <10 ROOTS-eligible students, and in these instances classrooms were combined to create virtual ROOTS “classrooms.” The cross-class grouping procedure was applied seven times, with five sets of two classrooms combined to create five ROOTS classrooms and two sets of three classrooms combined to make two ROOTS classrooms. After these procedures were applied, a total of 60 ROOTS classrooms participated in this study.
Students
A total of 1,550 kindergarten students were screened for ROOTS eligibility. Of these students, 592 met eligibility criteria and were randomly assigned within each of the 60 classrooms to the two-student group condition (n = 120), the five-student group condition (n = 295), or the no-treatment control condition (n = 177). Student demographic information for all ROOTS eligible students is presented in Table 1.
Descriptive Statistics for Student Characteristics by Condition
Note. The sample included 120 students in the two-student ROOTS group condition and 295 students in the five-student ROOTS group condition, and 177 students in the control condition. SPED = special education.
Values are presented as percentages unless noted otherwise.
Interventionists
ROOTS intervention groups were taught by district-employed instructional assistants and by interventionists hired specifically for this study. Among the interventionists, 89% identified as female, 93% as White, 4% as Hispanic, and 2% as another ethnicity. Most interventionists had previous experience providing small group instruction (93%) and had a bachelor’s degree or higher (58%). Interventionists had an average of 8 years of teaching experience; 20% had a current teaching license; and 63% had taken an algebra course at the college level.
Procedures
Intervention
ROOTS is a Tier 2 kindergarten program that consists of 50 lessons designed to build students’ whole number proficiency. The ROOTS intervention was delivered in 20-min small group sessions (two or five students) 5 days per week for approximately 10 weeks. For all students, instruction began in late fall and ended in the spring. The late fall start date was selected to provide students with opportunities to respond to core mathematics instruction and therefore minimize the identification of typically achieving students during the screening process. To ensure that ROOTS students received ROOTS instruction and core mathematics instruction, ROOTS occurred at times that did not conflict with core whole-class mathematics instruction.
ROOTS instruction is aligned with CCSS for mathematics (CCSS Initiative, 2010) and recommendations from expert panels to focus intensively on whole number concepts and skills (Gersten et al., 2009). Specifically, ROOTS instruction emphasizes concepts from the Counting & Cardinality and Operations & Algebraic Thinking domains of the CCSS for mathematics to promote robust whole number sense for struggling students. The ROOTS instructional approach is drawn from principles of explicit and systematic mathematics instruction (Coyne et al., 2011; Gersten et al., 2009). In this way, lessons include explicit teacher modeling, deliberate practice, visual representations of mathematics, and academic feedback. ROOTS also provides frequent opportunities for students to verbalize their mathematical thinking and discuss problem-solving methods. For more information on ROOTS, see Clarke, Doabler, Smolkowski, Kurtz Nelson, et al. (2016).
Professional development
All interventionists participated in two 5-hr professional development workshops delivered by project staff. The first workshop focused on the instructional objectives and content of Lessons 1–25, whole number concepts and skills, empirically validated instructional practices in mathematics, and small group management techniques. The second workshop focused on the mathematics content emphasized in Lessons 26–50. Workshops provided opportunities for interventionists to practice and receive feedback on lesson delivery from instructional coaches and project staff. To promote implementation fidelity and enhance the quality of instruction, all interventionists received between two and four coaching visits from ROOTS coaches during intervention implementation. ROOTS coaches were former educators with specialized knowledge and training in the science of early mathematics instruction and effective small group instructional practices. Coaching visits consisted of direct observations of lesson delivery, followed by feedback on instructional quality and fidelity of intervention implementation.
Control condition
Core (Tier 1) mathematics instruction delivered in the kindergarten classroom served as the control condition or counterfactual in this study, as all participating treatment and control students received daily core mathematics instruction. For treatment students, ROOTS instruction was provided in addition to core mathematics instruction. The control condition was documented through teacher surveys and direct observations of core instruction. Observation and survey data reflected that teachers used a variety of published and teacher-developed mathematics programs during core instruction. The majority of teachers reported using Everyday Math as part of core instruction, and additional published programs included Houghton Mifflin, Bridges in Mathematics, Saxon Math, Investigations, and Engage New York.
Teachers reported that they provided an average of 31.32 min of daily mathematics instruction (SD = 9.88). Survey data also identified that all teachers included mathematics topics during calendar time. All teachers reported that counting and cardinality was incorporated into core mathematics instruction, and 97% of teachers reported that core mathematics instruction addressed operations and algebraic thinking as well as numbers and operations in base 10. Sixty-five percent of teachers noted that knowing number names and the count sequence was their first priority when teaching whole number concepts and skills, while 29% stated that counting to tell the number of objects was the primary instructional priority. All teachers reported that they provided whole group and teacher-led mathematics instruction, and the majority of teachers reported that they provided opportunities for peer or group work, independent student work, and math centers. Seventy-seven percent of teachers also reported providing small group mathematics instruction, and 65% of teachers stated that they provided individual mathematics instruction. Finally, teachers stated that they regularly incorporated explicit instructional practices into their core math instruction, such as demonstrations of mathematics concepts, guided practice, and opportunities for students to verbalize their mathematical thinking.
Information about the control condition was also gathered from direct observations or core mathematics instruction by trained project staff. Direct observations were conducted in each participating classroom. All observations indicated that ROOTS materials were not used during core instruction, and no evidence of treatment diffusion during core mathematics instruction was identified. Nearly all observations (98%) documented some form of teacher-led instruction, while other instructional formats were observed less frequently, including peer learning, independent student learning, mathematics centers, small group or 1:1 instruction, and instruction via technology. The majority of observations documented instruction on counting (83%) and operations and algebraic thinking (66%). Observations also showed clear evidence of the following principles of explicit and systematic instruction (Gersten et al., 2009): demonstrations of mathematics content, opportunities for group and individual verbalization of mathematics thinking, guided and independent practice opportunities, mathematics representations, and academic feedback. Observations indicated that teachers were less likely to provide scaffolded instruction for struggling students and written mathematics practice for all students.
Fidelity of implementation
Fidelity of implementation was measured via direct observations by trained research staff. Each ROOTS group was observed three times during the course of the intervention. On a 4-point scale (4 = all, 3 = most, 2 = some, 1 = none), observers rated the extent to which the interventionist (a) met the lesson’s instructional objectives, (b) followed the provided teacher scripting, (c) used the prescribed mathematics models for that lesson, and (d) taught the number of prescribed activities. For example, an interventionist received a rating of 3 for prescribed activities if she taught four of the five activities in an observed lesson. Observations indicated that interventionists delivered the majority of prescribed activities (M = 4.03 out of 5 activities, SD = 0.87). Interventionists were also observed to meet mathematics objectives (M = 3.43, SD = 0.74), follow teacher scripting (M = 3.20, SD = 0.77), and use prescribed mathematics models (M = 3.58, SD = 0.67). Intraclass correlation coefficients (ICCs) were calculated across observers for these items. ICCs for individual fidelity ratings indicated moderate to nearly perfect agreement: .92 for number of activities delivered, .72 for met mathematics objectives, .72 for followed teacher scripting, and .59 for used prescribed mathematics models. Landis and Koch (1977) characterize ICCs of .41 to .60 as moderate, .61 to .80 as substantial, and .81 to 1.00 as nearly perfect.
Measures
Students were administered five measures of whole number sense at pretest (T1) and posttest (T2). These measures included a proximal assessment of whole number understanding that measured skills taught during ROOTS, two distal measures of whole number sense, and a set of curriculum-based measures of discrete skills related to early number sense. In addition, a distal outcome measure was administered 6 months into students’ first-grade year (T3). Trained research staff administered all student measures, and interscorer reliability criteria ≥.95 were met for all assessment.
ROOTS Assessment of Early Numeracy Skills (RAENS; Doabler, Clarke, & Fien, 2012) is a researcher-developed instrument that was administered at T1 and T2. RAENS is individually administered and consists of 32 items assessing aspects of counting and cardinality, number operations, and the base-10 system. In an untimed setting, students are asked to count and compare groups of objects; write, order, and compare numbers; label visual models (e.g., 10-frames); and write and solve single-digit addition expressions and equations. The predictive validity RAENS ranges from .68 to .83 for the Test of Early Mathematics Ability–Third Edition (TEMA-3) and the NSB. Interrater scoring agreement is reported at 100% (Clarke, Doabler, Smolkowski, Kurtz Nelson, et al., 2016).
Oral Counting–Early Numeracy Curriculum-Based Measurement (Clarke & Shinn, 2004) is a curriculum-based measure that requires students to orally count in English for 1 min. Oral counting scores have predictive validity, with spring criteria ranging from .46 to .72, as well as high interscorer (.99) and test-retest (.78) reliability.
ASPENS (Clarke et al., 2011) is a set of three curriculum-based measures validated for screening and progress monitoring in kindergarten mathematics. Each 1-min fluency-based measure assesses an important aspect of early numeracy proficiency, including number identification, magnitude comparison, and missing number identification. Test-retest reliabilities of kindergarten ASPENS measures are in the moderate to high range (.74–.85). Predictive validity of fall scores on the kindergarten ASPENS measures, with spring scores on the TerraNova 3, ranges from .45 to .52.
NSB is an individually administered measure with 33 items that assess counting knowledge and principles, number recognition, number comparisons, nonverbal calculation, story problems, and number combinations. Jordan and colleagues (Jordan et al., 2008; Jordan, Glutting, Ramineni, & Watkins, 2010) report a coefficient alpha for the NSB of .84 at the beginning of first grade and high levels of diagnostic accuracy, as measured by receiver operating characteristics.
TEMA-3 (Pro-Ed, 2007) is a standardized, norm-referenced, individually administered measure of beginning mathematics ability. The TEMA-3 assesses whole number understanding, including counting and basic calculations, for children ranging in age from 3 years to 8 years 11 months. The TEMA-3 reports alternate-form and test-retest reliabilities of .97 and .82–.93, respectively. The TEMA-3 manual reports concurrent validity with other mathematics measures ranging from .54 to .91.
Stanford Achievement Test–Tenth Edition (SAT-10; Harcourt Educational Measurement, 2002) and the Stanford Early School Achievement Test (SESAT) are group-administered, standardized, norm-referenced measures. Both measures are multiple choice and have two mathematics subtests: Problem Solving and Procedures. The SESAT is administered in the kindergarten year and the SAT-10 in first grade. The SAT-10 is a standardized achievement test with adequate and well-reported validity (r = .67) and reliability (r = .93). All treatment and control students were administered the SESAT at posttest (T2) and the SAT-10 midway through their first-grade year (T3).
Observations
To gain information about instructional interactions within the two- and five-student ROOTS groups, the Classroom Observations of Student-Teacher Interactions–Mathematics (COSTI-M; Doabler et al., 2015) measure was used during direct observations of ROOTS instruction. This observation measure is a modified version of the Smolkowski and Gunn (2012) early literacy observation instrument that was designed to document the frequency of explicit student-teacher instructional interactions that occur during kindergarten mathematics instruction. Observers used the COSTI-M to collect data on the frequency of teacher models, guided practice, unguided practice, individual practice, and group practice. Teacher models represented teachers clearly explaining and overtly demonstrating mathematical concepts, procedures, and skills. For example, teacher models might include a teacher describing the attributes of 3-dimensional shapes or showing students how to graph data with a bar graph. Guided practice was operationally defined as an opportunity for one student or multiple students to practice a mathematical concept, definition, procedure, strategy, fact, or task with varying levels of concurrent instructional support (physical or verbal) from the teacher. For example, guided practice might include a student or group of students counting with the teacher or tracing a number while directed by the teacher. Unguided practice was defined as an opportunity for one student or multiple students to independently practice a mathematical concept, definition, procedure, strategy, fact, or task without teacher support. Unguided practice might include a student identifying a numeral independently or answering a number combination (e.g., 5 + 1 = ) on one’s own. Individual practice opportunities were defined as any practice opportunity (guided or unguided) provided to one student, while group practice opportunities were defined as any practice opportunity provided to two or more students. One student counting out loud or identifying a numeral would be documented as an individual practice opportunity, while two or more students counting together or writing numerals would be recorded as a group practice opportunity.
Each of these variables is considered an indicator of instructional intensity, with more frequent instructional interactions indicating higher instructional intensity. Mean rates of these behaviors were calculated by dividing the frequency of each behavior during an observed lesson by the number of minutes in the observation. In addition, an “all practice” variable was calculated by summing all observed practice opportunities (i.e., guided, unguided, individual, and group practice) and dividing that total by the number of minutes in the observation.
Each ROOTS group was directly observed three times by trained observers. Observers completed a 6-hr training focused on direct observation procedures and use of the observation instrument. Prior to completing independent observations, observers were required to complete a video checkout in which they coded a 5-min video of small group kindergarten mathematics instruction. Next, observers completed a real-time checkout with a primary observer during a ROOTS observation. On both checkouts, observers were required to meet interobserver reliability standards ≥.85. Interobserver reliability ICCs for COSTI-M variables were as follows: .73 for teacher models, .91 for all practice, .94 for all individual practice, .96 for all group practice, .59 for all guided practice, and .72 for all unguided practice. These ICCs indicate nearly perfect agreement for all practice, all individual practice, and all group practice; moderate agreement for all guided practice; and substantial agreement for teacher models and all unguided practice (Landis & Koch, 1977). ICCs were also calculated across the three observations within each ROOTS group to provide an estimate of stability. Stability ICCs for COSTI-M variables were as follows: .06 for teacher models, .37 for all practice, .13 for all individual practice, .32 for all group practice, .40 for all guided practice, and .21 for all unguided practice. These ICCs represent moderate to low stability, indicating that rates of instructional interactions generally differed across observations.
Statistical Analysis
Analyses were conducted to address three research questions. First, we assessed overall ROOTS intervention effects, with two- and five-student ROOTS groups as the intervention condition, on student outcomes using a mixed model (multilevel) Time × Condition analysis (Murray, 1998) designed to account for students partially nested within small groups (Baldwin, Bauer, Stice, & Rohde, 2011; Bauer, Sterba, & Hallfors, 2008). The study design called for the randomization of individual students to receive ROOTS, nested within two- or five-student ROOTS groups, or a nonnested comparison condition, and the analytic model must account for the potential heterogeneity among variances across conditions (Roberts & Roberts, 2005). In particular, the ROOTS groups required a group-level variance, while the unclustered controls did not. Furthermore, because the residual variances may have differed among conditions, we tested the assumption of homoscedasticity of residuals. The analysis tested for differences among conditions on gains in outcomes from the fall (T1) to spring (T2) of kindergarten and is described in detail by Clarke, Doabler, Smolkowski, Kurtz Nelson, et al. (2016) and Doabler, Clarke, Kosty, et al. (2016). The statistical model included time, coded 0 at T1 and 1 at T2; condition, coded 0 for control and 1 for ROOTS; and the interaction between the two. These models test for net differences among conditions (Murray, 1998), which provide an unbiased and straightforward interpretation of the results (Allison, 1990; Jamieson, 1999). For two outcomes—the SESAT (available only at posttest) and the SAT-10 (collected as a follow-up measure in Grade 1)—we used the analysis of covariance approach described by Bauer et al. (2008) and Baldwin et al. (2011).
Second, we tested whether two- and five-student ROOTS groups experienced differential rates of observed instructional interactions using independent-samples t tests.
Third, we examined the effects of the two- versus five-student ROOTS group size on student outcomes using a fully nested mixed-model (multilevel) Time × Condition analysis (Murray, 1998) to account for the intraclass correlation associated with students nested within ROOTS groups. Similar to the first set of analyses, the model included time, coded 0 at T1 and 1 at T2; condition, coded 0 for five-student ROOTS groups and 1 for two-student ROOTS groups; and the interaction between them. Mixed analysis of covariance models were used to analyze the SESAT and the SAT-10 measured at one time point.
Model estimation
We fit models to our data with SAS PROC MIXED version 9.2 (SAS Institute Inc., 2009) using restricted maximum likelihood, generally recommended for multilevel models (Hox, 2002). Maximum likelihood estimation for the Time × Condition analysis uses all available data to provide potentially unbiased results even in the face of substantial attrition, provided the missing data were missing at random (Graham, 2009). We did not believe that attrition or other missing data represented a meaningful departure from the missing-at-random assumption, meaning that missing data did not likely depend on unobserved determinants of the outcomes of interest (Little & Rubin, 2002). The majority of missing data involved students who were absent on the day of assessment (e.g., due to illness) or transferred to a new school (e.g., due to their families moving).
The models assume independent and normally distributed observations. We addressed the first, more important assumption (Van Belle, 2008) by explicitly modeling the multilevel nature of the data. The data in the present study also do not markedly deviate from normality; skewness and kurtosis fell with ±2.0 for all measures except for oral counting, where kurtosis was 3.1. Nonetheless, multilevel regression methods have also been found quite robust to violations of normality (e.g., Hannan & Murray, 1996).
Effect sizes
To ease interpretation, we computed an effect size, Hedges’s g (Hedges, 1981), for each fixed effect. Hedges’s g, recommended by the What Works Clearinghouse (2014), represents an individual-level effect size comparable to Cohen’s d (Cohen, 1988; Rosenthal & Rosnow, 2008).
Results
Table 2 presents means, standard deviations, and sample sizes for the seven dependent variables by assessment time and condition. In what follows, we present results from tests of bias due to attrition, efficacy effects for ROOTS (Research Question 1), differential rates of instructional interactions between two- and five-student ROOTS groups (Research Question 2), and effects of the two- versus five-student ROOTS group size on student outcomes (Research Question 3).
Descriptive Statistics for Mathematics Measures by Condition and Assessment Time
Note. The sample sizes represent students with a particular measure at each assessment period. The complete sample included 120 students in the two-student ROOTS group, 295 students in the five-student ROOTS group, and 177 students in the control condition. NSB = Number Sense Brief; ASPENS = Assessing Student Proficiency in Early Number Sense; TEMA = Test of Early Mathematics Ability–Third Edition; RAENS = ROOTS Assessment of Early Numeracy Skills; SESAT = Stanford Early School Achievement Test; SAT-10 = Stanford Achievement Test–Tenth Edition.
Attrition
Student attrition was defined as students with data at T1 but missing data at T2, and we examined attrition with respect to the ROOTS-eligible sample of 592 students. Attrition rates were approximately 11% for all outcomes measured at T2. Only 9% (52) of students were missing all posttest data. The proportion of students missing all posttest data did not differ between the ROOTS condition, with 10% (41) missing, and the control condition, with 6% (11) missing, χ2(1) = 2.08, p = .1492. Although differential rates of attrition are undesirable, differential scores on mathematics tests present a far greater threat to validity, so we conducted an analysis to test whether student mathematics scores were differentially affected by attrition across conditions. We examined the effects of ROOTS condition (two- or five-student group), attrition status, and their interaction on T1 scores for all five measures available at T1. We found no statistically significant interactions or evidence that mathematics scores were differentially affected by attrition across conditions.
Efficacy Effects for ROOTS
Table 3 presents the results of the partially nested statistical models comparing gains between nested ROOTS students and unclustered control students. The table presents the results of the homoscedastic model if it was deemed equivalent to the more complicated heteroscedastic model (ASPENS, TEMA, and RAENS). Otherwise, we provide results for the heteroscedastic model (NSB, oral counting). The bottom two rows of the table show the likelihood ratio test results that compared homoscedastic residuals with heteroscedastic residuals. Although the variance structures differed between these models, the condition effect estimates and statistical significance values were very similar for the heteroscedastic and homoscedastic models.
Results From a Partially Nested Time × Condition Analysis on Fall-to-Spring Gains in Math Comparing Intervention Students Nested Within ROOTS Groups and Unclustered Control Students
Note. Table entries show parameter estimates with standard errors in parentheses. NSB = Number Sense Brief; ASPENS = Assessing Student Proficiency in Early Number Sense; TEMA = Test of Early Mathematics Ability–Third Edition; RAENS = ROOTS Assessment of Early Numeracy Skills.
Tests of fixed effects (first four rows) accounted for small groups as the unit of analysis within the intervention condition (ROOTS) and unclustered individuals in the control condition. bThe likelihood ratio test compared homoscedastic residuals with heteroscedastic residuals with a criterion α of .20 (df = 1).
p < .10. *p < .05. **p < .01. ***p < .001. ****p < .0001.
The models in Table 3 tested fixed effects for differences among conditions at pretest (condition effect), gains across time, and the interaction between the two. We found no statistically significant differences at pretest (p > .16 for all measures), which suggested that students were similar in the fall of kindergarten. We found statistically significant differences by condition in gains from fall to spring for three dependent variables. Students in the ROOTS condition made greater gains than control students on the ASPENS (t = 6.41, df = 272, p < .0001), TEMA standard scores (t = 3.76, df = 263, p = .0002), and RAENS (t = 9.36, df = 315, p < .0001). We did not detect statistically significant differences among conditions in gains on the NSB or oral counting or differences among conditions on the SESAT (p = .1117) or SAT-10 (p = .1253), both tested with the ASPENS and TEMA as pretest covariates. The Time × Condition model estimated differences in gains among conditions of 0.4 for the NSB (Hedges’s g = 0.09), 18.3 for the ASPENS (g = 0.52), 3.2 for oral counting (g = 0.14), 2.0 for the TEMA standard score (g = 0.25), and 4.7 for the RAENS (g = 0.76). The analysis of covariance model estimated differences between ROOTS and control conditions of 3.9 for the SESAT (g = 0.12) and 0.1 for the SAT-10 (g < 0.01).
Rates of Instructional Interactions
Table 4 presents descriptive statistics for the observed rates of instructional interactions as well as results of independent-samples t tests comparing rates of instructional interactions by ROOTS group size. Compared with the five-student ROOTS groups, two-student ROOTS groups experienced higher rates of individual practice opportunities (t = 4.25, p < .001, g = 0.78) and lower rates of group practice opportunities (t = −3.18, p = .002, g = −0.58). We found no effects of ROOTS group size on the rate of teacher models (p = .273), guided practice (p = .529), unguided practice (p = .131), or all practice combined (p = .309).
Results of Independent-Samples t Tests Comparing Rates of Instructional Interactions by Size of ROOTS Group
Note. Group t tests were based on 60 two-student ROOTS groups and 59 five-student ROOTS groups (df = 117).
Effects of the Two- Versus Five-Student ROOTS Group on Student Outcomes
Table 5 presents the results of the fully nested statistical models comparing gains between two- and five-student ROOTS groups. The models in Table 5 tested fixed effects for differences among conditions at pretest (two-student ROOTS group effect), gains across time, and the interaction between the two. We found no statistically significant differences at pretest (p > .21 for all measures), which suggested that students were similar in the fall of kindergarten. We found no statistically significant differences by ROOTS group size in gains from fall to spring (p > .15 for all measures). The Time × Condition model estimated differences in gains between ROOTS group sizes of 0.0 for the NSB (Hedges’s g = 0.00), −4.8 for the ASPENS (g = −0.14), 2.0 for oral counting (g = 0.08), −0.1 for the TEMA standard score (g = −0.01), and 0.15 for the RAENS (g = 0.03). The analysis of covariance model estimated differences between two- and five-student ROOTS groups of 1.0 for the SESAT (g = 0.03) and 0.5 for the SAT-10 (g = 0.02).
Results From a Fully Nested Time × Condition Analyses on Fall-to-Spring Gains in Math Comparing Two- and Five-Student ROOTS Groups
Note. Table entries show parameter estimates with standard errors in parentheses. NSB = Number Sense Brief; ASPENS = Assessing Student Proficiency in Early Number Sense; TEMA = Test of Early Mathematics Ability–Third Edition; RAENS = ROOTS Assessment of Early Numeracy Skills.
Tests of fixed effects (first four rows) accounted for small groups as the unit of analysis within the two- and five-student ROOTS conditions.
p < .10. *p < .05. **p < .01. ***p < .001. ****p < .0001.
Discussion
As educators grapple with building and providing better services within RTI or MTSS frameworks, research examining intervention efficacy is crucial, as are research questions that focus on moderators and mediators of treatment impact (Miller et al., 2014), including those related to treatment intensity and the allocation of finite resources (Codding & Lane, 2015). Our examination of the ROOTS intervention program found that overall results for the ROOTS program were effective, with a significant positive impact on 3 of 6 posttest measures and all measures with a positive effect size. Results for ROOTS program would be classified by the What Works Clearinghouse as having a “statistically significant positive impact.” Second, we found significant differences between the two- and five-student small groups, with the two-student small group providing a higher rate of individual practices opportunities and the five-student group providing a higher rate of group practice opportunities. No differences were found on other teacher and student behaviors. Despite finding differences on one measure of treatment intensity favoring the two-student small group (the rate of individual practice opportunities), we did not detect significant differences on student achievement outcome measures between the ROOTS two- and five-student groups.
The first finding related to the efficacy of the ROOTS intervention adds to the corpus of research on this particular intervention program and the general body of research on whole number interventions targeting the early elementary grades (e.g., D. P. Bryant et al., 2011; Clarke et al., 2014; Fuchs et al., 2005). For districts or schools implementing RTI or MTSS, the research base enables them to select from a growing number of programs (http://www.intensiveintervention.org/) that, if implemented with fidelity, can reasonably be expected to positively affect student outcomes and function as a component of a framework to support early mathematics achievement. Our second and third research questions present a more complex context in which to interpret our findings. While results from the study indicated a more intensive intervention experience for students in the two-student small group, this did not translate into greater student achievement outcomes. Critically, two things should be considered when contextualizing the results from the present study. First, from a school- or resource-based perspective, the lack of significant differences between small groups has significant resource allocation implications. Second, what do our lack of findings mean for understanding of treatment intensity, and how should that guide future research efforts? We address each of these areas in turn.
At a federal level, there is an increasing interest in considering cost when examining the efficacy of educational programs. For example, the Institute of Education Sciences’ 2016 Special Education Grants Request for Applications requires a cost analysis section as part of the research plan for Goal 3 efficacy and replication grants: “The cost analysis should help schools and districts understand the monetary costs of implementing the intervention (e.g., expenditures for personnel, facilities, equipment, materials, training, and other relevant inputs), and “Intervention costs can be contrasted with the costs of comparison group practice to reflect the difference between them” (p. 65). Cost analyses do not include measures of benefit (Levin, 1983), and procedures for examining costs have become relatively standardized (Levin & McEwan, 2001), thus enabling them to provide a quantitative metric to help examine questions related to cost-benefit and an additional and important lens through which to view and guide practice. For example, findings by Elbaum et al. (2000) related to the benefits of 1:1 instruction could and should be contrasted with reading interventions targeting similar content but utilizing larger groups sizes (Gersten et al., 2008) and allow comparisons of programs providing similar benefits at varying costs (Keeney & Raiffa, 1993). In a similar vein, the work of Vaughn and colleagues (Vaughn, Cirino, et al., 2010; Vaughn et al., 2003) affords opportunities to integrate cost analyses and subsequent consideration of cost-benefit into the informal evaluation of an intervention program to complement the formal evaluation of whether or not a program is efficacious.
In a district setting where monetary resources are likely capped (e.g., a set amount exists to provide Tier 2 intervention services), schools can serve as many students as possible with the available dollars. Using a ROOTS grouping size of five would allow schools to serve 150% more students than if a two-student grouping was selected. In real terms, this is a significant difference and critical in the ability to implement a multitier model. For example, in a large-scale Institute of Education Sciences–funded efficacy trial (Clarke, Doabler, Fien, Baker, & Smolkowski, 2012), we found that approximately 70% of students entered kindergarten with some degree of risk in mathematics that, in a multitier model, would warrant additional Tier 2 services. Utilizing a cost-benefit framework allows schools to evaluate equivalent positive results between the two- and five-student small groups from a resource standpoint.
Limitations and Future Research
From a treatment intensity perspective, a number of factors are important to consider that relate to limitations of the current research and directions for future research. Although we describe our five-student small group as less intense, that should not be conflated with considering the group to lack intensity. That is, the experience of students in the five-student small group would be, by almost any analysis, an intensive educational experience. The ROOTS program was designed to incorporate effective instructional design elements for at-risk learners (Archer & Hughes, 2011), and the resulting instructional experience for students included high levels of teacher models and demonstrations, opportunities to respond, and academic feedback—all variables with a demonstrated positive relationship with student outcomes (Baker et al., 2002; Gersten et al., 2009; Kroesbergen & Van Luit, 2003). The experience of the five-student small group included high rates of individual practice compared with typical instruction with high rates of group practice. Note that the selection of group size for the study was driven by recommendations for group size (Gersten et al., 2009) and by practical research design considerations (e.g., potential attrition with one-student small groups). Thus, we are not able to make statements regarding contrasts related to other group sizes.
For programs that consider these instructional design features a priori in their design and development phases, it may be that a threshold effect exists wherein after a certain base rate of critical teacher and student behavior is reached, the value of providing additional opportunities to engage in those behaviors is limited (Doabler et al., 2017). If a threshold effect exists, that would mean that structuring groups, including reducing group size, in ways to increase behaviors thought to theoretically underlie the intervention may not result in hypothesized higher outcomes. Note that in this investigation, students in the five-student group experienced significantly higher rates of group practice. Well-designed instruction with group practice built into its architecture may enable group practice to elicit the same benefit as individual practice. Future research should continue to examine the role of critical teacher and student behaviors and their interaction with group size and student outcomes in a variety of contexts. Such designs should occur with programs designed to include critical instructional principles at high rates but also those designed with different parameters. The inclusion criterion to identify the study sample was not focused exclusively on identifying a high-risk sample (e.g., students with mathematics learning disabilities) but included a broader range of mathematics abilities. Relatedly, we did not investigate whether the impact of group sizes was mediated by initial skill status. For example, it may be reasonable to hypothesize that a student with relatively low risk (within the at-risk sample) gains equal benefit from either group size but a student with relatively high risk may gain differential benefit from the smaller small group.
It is also vital to examine how we defined and operationalized treatment intensity. Our operationalization of treatment intensity focuses on a narrow, albeit critical, set of behaviors and did not attempt to account for the quality of those specific behaviors. For example, while we captured the rate of teacher models, we did not analyze the overall quality of those models. Future research should examine overall quality of behaviors hypothesized to influence student outcomes. In addition, there is significant interest in the role of teacher content and pedagogical knowledge (Garet et al., 2016; Woodward, 2016). Our measurement net did not include measures of teacher knowledge. In a systematically designed program, like ROOTS and similar early mathematics intervention programs, the content knowledge of the instructor may play a vital role. A potential hypothesis is that a teacher with greater content knowledge would offer more sophisticated mathematics models and academic feedback and thus provide students with a more conceptually rich academic experience leading to greater mathematics outcomes. If such relationships are discovered in future studies, links to targeted professional development would be worth exploring despite mixed results from studies targeting that area (Gersten, Taylor, Keys, Rolfhus, & Newman-Gonchar, 2014).
Last, while we defined treatment intensity within the scope of one lesson (i.e., the teacher and student behaviors that occur within the lesson), treatment intensity can also be thought of as the scope of the intervention and the amount of content covered. Emerging evidence suggests that kindergarten mathematics content is often overfocused on basic concepts associated with smaller gains for almost all students when a focus on advanced content has been positively associated with student learning (Engel, Claessens, & Finch, 2013; Engel, Claessens, Watts, & Farkas, 2016). While an approach of this nature is complicated in the context of working with at-risk or Tier 2 students, the point speaks to a broader issue. If we are to reduce achievement gaps and reset the foundation of students’ mathematical understanding such that they are able to acquire new material at the same rate of their peers, attempts to push the envelope in terms of the content covered in mathematics interventions is necessary. Doing so would require a significant rethinking of the traditional intervention model of providing interventions of limited duration and depth that are designed to build understanding that has already been mastered by same-grade peers.
Conclusion
Systematic examinations of intervention of delivery options are of particular interest within multitier models (Al Otaiba, Kim, Wanzek, Petscher, & Wagner, 2014; Vaughn, Denton, & Fletcher, 2010) and resource-limited environments. While work in mathematics is just beginning, it has the potential to inform the field regarding best practices in intervention (Miller et al., 2014; Vaughn & Swanson, 2015) as we strive to better understand, study, and implement models of support that address the learning needs of all students in acquiring mathematics knowledge.
Footnotes
Acknowledgements
The opinions expressed are those of the authors and do not represent views of the Institute of Education Sciences or the U.S. Department of Education. An independent external evaluator and coauthor of this publication completed the research analysis described herein.
Declaration of Conflicting Interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Ben Clarke and Hank Fien are eligible to receive a portion of royalties from the University of Oregon’s distribution and licensing of certain ROOTS-based works. Potential conflicts of interest are managed through the University of Oregon’s Research Compliance Services
Funding
This research was supported by the ROOTS Project (Grant R324A120304), funded by the Institute of Education Sciences, U.S. Department of Education.
Authors
BEN CLARKE, University of Oregon; research interests: early numeracy, curriculum-based measurement, instructional design.
CHRISTIAN T. DOABLER, University of Texas at Austin; research interests: curriculum design, mathematics interventions, learning disabilities.
DEREK KOSTY, Oregon Research Institute; research interests: design and analysis of complex efficacy and effectiveness trials.
EVANGELINE KURTZ NELSON, University of Oregon; research interests: Tier 2 mathematics interventions, supporting learners with intellectual and developmental disabilities.
KEITH SMOLKOWSKI, Oregon Research Institute; research interests: design and analysis of complex efficacy and effectiveness trials.
HANK FIEN, University of Oregon; research interests: early reading and mathematics interventions, formative assessments.
JESSICA TURTURA, University of Oregon; research interests: early math and reading intervention strategies and materials, factors predicting nonresponse to intervention, schoolwide multitiered systems of support.
