Sage Journals: Discover world-class research

Abstract

Tutoring ranks among the most versatile and potentially transformative educational tools available. Dozens of randomized experiments have evaluated preK–12 tutoring programs, varying widely in approaches, contexts, and costs. This article presents results from a systematic review and meta-analysis of tutoring field experiments. We develop a framework for understanding variation in tutoring program impact and examine effect sizes (ESs) across a range of characteristics. We find that tutoring programs yield consistently substantial positive impacts on learning, with an overall pooled ES of 0.288 SD (SE = 0.029, p < .001). ESs tend to be largest for programs that use teachers or paraprofessionals as tutors, are held in earlier grades, occur at least 3 days per week, and are held during school.

Keywords

meta-analysis randomized evaluation student learning tutoring

PreK–12 tutoring interventions rank among the most widespread, versatile, and potentially transformative instruments in today's educational toolkit. As school systems across the globe expand and engage with increasingly diverse student populations, the importance of tutoring continues to grow. Although researchers have long studied tutoring, the 1980s saw the emergence of a distinct body of tutoring evaluation research that has grown exponentially. Given the wide-ranging use and robust body of empirical evidence on tutoring programs, synthesizing the evidence constitutes a key priority for education researchers. The present article advances this endeavor by developing a framework of tutoring program impact and using it to guide a meta-analysis of the randomized controlled trial (RCT) evidence.

Existing studies suggest strong promise for tutoring interventions. A recent meta-analysis of education interventions targeting low socioeconomic status (SES) primary school students found average impacts of 0.36 SD on learning outcomes from tutoring, the largest pooled effect of all 14 intervention types included (Dietrichson et al., 2017). Yet tutoring program models vary widely. A framework for comparing the full range of tutoring programs could elucidate the conditions under which alternative models are effective. Existing RCT findings on tutoring programs—refracted through this framework—could enable substantial policy gains by guiding practitioners in selecting options most likely to be effective for particular circumstances.

We address two main research questions: First, what are the impacts of preK–12 tutoring interventions on learning outcomes? Second, how do these effects vary across different categories of tutoring programs? The last meta-analysis of tutoring RCTs was published more than a decade ago and focused only on volunteer tutoring programs (Ritter et al., 2009). Meanwhile, a robust meta-analysis literature has emerged that compares the effects of a variety of math and reading intervention types (e.g., Dietrichson et al., 2020, 2021; Gersten et al., 2020; Neitzel et al., 2022; Pellegrini et al., 2021; Wanzek et al., 2016, 2018). However, these reviews focus on specific subjects and grades, and the bandwidth required for comparing across multiple intervention types constrains their ability to explore variation in tutoring program effectiveness. A central goal of the present article is to explore the relative effectiveness of alternative tutoring models across the full preK–12 grade range for both reading and math.

Our study sample spans a broad terrain in terms of research designs and programs evaluated. Studies of the program Reading Recovery range from an experiment with 52 students across 10 schools in Australia (Center et al., 1995) to a full-scale impact evaluation with 6,888 students in 1,222 schools from across the United States (Sirinides et al., 2018). Randomization occurs at the student, classroom, and school levels, and the studies compare students given tutoring interventions with those participating in regular school activities or close equivalents. Interventions range from intensive high-dosage models with near-daily instruction from certified teachers or paraprofessionals like Reading Recovery and Match (Guryan et al., 2023) to weekly tutoring delivered by minimally trained volunteers (e.g., Ritter & Maynard, 2008). Interventions cover a range of pedagogical activities and may occur during or after school.

The studies in our sample showed substantial positive impacts on learning outcomes, with an estimated pooled effect size (ES) of 0.288 SD (SE = 0.029). Although impacts are significant for most study subsets, they are stronger on average for teacher and paraprofessional than for nonprofessional and parent programs and stronger for earlier versus later grades. Average effects for reading and math interventions are similar, although reading programs yield higher ESs in earlier grades. Programs conducted during school tend to have larger impacts than those conducted after school, as do programs that are held at least 3 days per week. We next lay out the study's conceptual framework before presenting our methodological approach in the third section and empirical results in the fourth section. The fifth section reviews the study's main findings, contextualizing tutoring impact relative to comparable programs and outlining policy lessons and areas for future research.

Conceptual Framework

We follow Dietrichson et al. (2017) in defining tutoring programs as “supplemental pedagogical support from an instructor, either one-to-one or in a small group” (p. 255) because this definition reflects the category as it is most widely understood. Approaching tutoring as a “technology of skill formation” (Cunha et al., 2006, p. 705), that is, a family of interrelated inputs for improving learning, we ask how impact varies across context and consider how impact can be maximized net of costs. We first consider the mechanisms by which tutoring exerts impact and then consider factors most likely to shape impact. Figure S1 (available online) depicts our framework.

Tutoring may improve learning through additional instruction time when it does not substitute for classroom time spent on a given subject. Another potential channel is the customization of learning, that is, “teaching at the right level,” for example, through targeting a student's “zone of proximal development” (Vygotsky, 1978) and “scaffolding” (Bruner, 1983). When students in a classroom span a wide range of skill levels, teachers struggle to address the needs of all at once. The productivity of classroom time may thus decline as skill variation increases (Banerjee et al., 2016). Following this logic, tutoring programs may also yield positive externalities for students who do not receive the tutoring (Berlinski et al., 2022; Schwartz, 2005). Tutoring interventions may also embody a distinct pedagogical moment from classroom education. With fewer distractions, students may spend more time on task and approach content with more focus than in a classroom setting (Gest & Gest, 2005). Another potentially important element is the human connection in the tutor-student relationships (Juel, 1996; Neitzel et al. 2022, p. 174).

Given the above set of potential mechanisms, what elements of tutoring programs are most likely to shape impact? One likely factor is tutor skill. We expect that more highly educated, trained, and experienced tutors will have stronger skills. Tutor type may thus moderate tutoring impact. Training to conduct a particular intervention and other fine-grained elements of skill may represent another dimension of skill, but we focus on tutor type in the quantitative analyses because the categorization is clearer within the literature. Four broad categories of tutor type emerged inductively from our review: teachers, paraprofessionals, nonprofessionals, and parents. In teacher tutoring interventions, certified classroom teachers serve as tutors. The most prominent of these is Reading Recovery,¹ first piloted in New Zealand in 1979. The program launched in the United States (1984–1985) in the Columbus Public Schools before spreading more widely. Training for Reading Recovery tutors is extensive: Already trained and certified teachers undergo a year-long course and ongoing development activities (Sirinides et al., 2018).

Moving to the next category, paraprofessional tutoring interventions employ educators who are not certified teachers, such as school staff members, education students, and professional development fellows. For example, Number Rockets² employs school staff. AmeriCorps,³ a U.S.-government-funded fellowship program, provides tutors for other programs, from the elementary Minnesota Reading Corps (Markovitz et al., 2014) and Minnesota AmeriCorps math tutoring (Parker et al., 2019) to Saga Education secondary math tutoring in Chicago, New York City, and Washington, D.C.⁴ Fellows typically remain in programs for 1 or 2 years, receiving between several days’ and a few weeks’ training and close supervision. Although paraprofessionals generally receive less training than teachers, they may be at least as skilled as teachers at tutoring given that tutoring may depend on a unique skill set (Guryan et al., 2023).

Nonprofessional or volunteer tutoring interventions deploy tutors who are not professionally engaged in education, such as community residents and retired adults. For instance, in Reading Partners,⁵ unpaid community volunteers receive only about an hour of training and act as tutors, supervised by Americorps fellows. Meanwhile, the AARP Foundation's Experience Corps matches schools with adults ages 50+ who tutor children on reading. Finally, parent tutoring interventions support caretakers in tutoring their own children.

One family of interventions that we elected not to include within the meta-analysis is that of peer or cross-age tutoring, that is, programs where the “tutors” are schoolmates of the tutees. We follow previous reviews in categorizing these as “cooperative learning” (Dietrichson et al. 2017, p. 254), which constitutes a distinct policy domain (McMaster et al., 2005).

Besides tutors’ skills, the effectiveness of tutoring programs may depend substantially on the curriculum. Programs cover different subjects (all studies in our sample covered literacy or math) and grade levels. Pedagogical approach and content also vary. For instance, some literacy programs focus on phonics and others on comprehension. Programs also vary in the level of structure the tutor follows. Programs like Reading Recovery and Number Rockets consist of structured lessons with detailed directives to the tutors. Each Reading Recovery lesson begins with “rereading familiar books” aloud, followed by targeted letter/word recognition activities, story composition, and reading a new book (Sirinides et al., 2018, p. 317). Number Rockets consists of about 45 scripted lessons delivered over the course of 4 months (Gersten et al., 2015). On the other end of the spectrum, interventions like the Northern Ireland nonprofessional elementary reading program Time to Read provide significant leeway to tutors (S. Miller & Connolly, 2013). Although subject and grade level are included within our quantitative meta-analyses, pedagogical approach and level of structure were not possible to meaningfully code.

Another potentially important set of factors is delivery mode. First, tutor-student ratio can vary from 1:1 up to 1:6 (no literature that we came across referred to teaching groups of seven or more as tutoring). Interventions with more students instructed by a single tutor at the same time should incur lower costs and may reduce stigma and activate social learning. However, more students per group may also reduce impact by dividing the tutor's attention.

In addition to the number of students participating in each session, whether tutoring occurs during or outside of school hours may shape impact. In our study sample, variation in the time and location of tutoring occurs within paraprofessional and nonprofessional tutoring programs. Number Rockets, Saga Education, and Time to Read all take place during school hours. Reading Partners may take place either during or after school. Where tutoring occurs may also matter. The home environment may be more relaxed but also more distracting (Villiger et al., 2019, p. 56). Important timing distinctions may exist even among programs held during school hours, particularly whether tutoring sessions replace classroom time on the same topic. Ideally, tutoring sessions would occupy time slots with the lowest opportunity costs, but this may be difficult to discern and coordinate. Both Reading Recovery and Number Rockets ask that schools schedule tutoring sessions to avoid conflicts with the respective subject of tutoring.

Tutoring programs vary widely in frequency, length, and overall duration. More days per week may increase impact until tutoring crowds out other learning inputs. Net of opportunity costs, one might expect longer sessions to yield stronger impacts until attention span or retention becomes an issue. Reading Recovery calls for daily 30-minute sessions, whereas Number Rockets calls for 40-minute sessions three or more times per week. Saga Education consists of daily 55-minute sessions. Program durations vary from several weeks to 2 school years, although most programs in our sample lasted between 10 weeks and 1 school year. In some interventions, students who improve more end participation earlier. Reading Recovery lasts 12 to 20 weeks, depending on the reading improvement rate. Given potential counterweighting between more instruction time and potential drawbacks like opportunity costs, attention span, and stigmatization (Marston, 1996), optimal tutoring dosage likely varies with context.

Method

To synthesize findings from the RCT tutoring research, we draw on recent advances in meta-analysis methodology (Pigott & Polanin, 2019). The “study”—defined as the enactment of a research design with a particular sample—constitutes the main level of analysis. We collected all eligible articles and aggregated them by study. Most studies reported multiple estimates (e.g., different measures, treatment arms, or subsamples). A study was included if it contained one or more estimates meeting the following criteria:

compares tutored to nontutored groups (excludes estimates where tutoring is bundled with other interventions or different tutoring models are compared to one another);

intended for students in prekindergarten through Grade 12;

RCT study design;

academic learning outcomes from independently created tests; researcher-designed measures were excluded because they may inflate effect sizes (Slavin & Madden, 2011);

published between 1980 and our final searches in February 2020 (although we used updated published versions if released between February 2020 and April 2023);

data required to calculate ES included in the article;

outcomes measured 3 months or less following intervention.

To identify eligible studies, we searched academic (Academic Search Complete, APA PsycInfo, Child Development & Adolescent Studies, EconLit, Education Abstracts, ERIC, JSTOR, SCOPUS, Web of Science), working paper (J-PAL, NBER, SSRN), university thesis/dissertation (Proquest Dissertations), and evaluation report databases (American Institutes for Research, Mathematica, MDRC, and NORC). We searched each database using the terms “tutor* & random*” (with asterisks indicating wildcard) or the closest equivalent given the specific setup of each database. We also conducted backward/forward bibliographic searching for each included article and several meta-analyses. Screening and coding were conducted by the first author with spot checks from research staff. All ambiguities were discussed with the full research team. Figure S2 (available online) shows the number of articles identified within each source.

Following recent consensus, we calculated Hedges’g for the meta-analysis rather than relying on author-reported ESs (Pigott & Polanin, 2019). We used adjusted means and unadjusted standard deviations where reported and otherwise unadjusted means (Dietrichson et al., 2017, p. 255). To analyze the resulting dataset, we calculated pooled ESs and ran metaregressions using the user-written Stata program robumeta, which employs random-effects models with inverse variance weights and robust variance estimation (RVE) to avoid overweighting studies with more estimates (Tanner-Smith & Tipton, 2014). Our primary estimates consist of pooled ESs calculated for the full study sample and for relevant subsamples. To explore heterogeneity, we present 95% prediction intervals, which indicate “how widely the effects vary across populations” (Borenstein et al., 2017, p. 9). Additionally, we report Q and τ² values to show how the dispersion of observed and true effects, respectively, changes across estimates (Borenstein et al., 2017, pp. 11–12). We further add potential moderators as controls in multivariate metaregressions to explore associations at a greater level of nuance. Moderators are tested in clusters arranged to maximize insights while allowing sufficient statistical power. Although reliable power calculation approaches using RVE have yet to evolve, we follow the guideline of omitting significance tests with fewer than four degrees of freedom (Tanner-Smith & Tipton, 2014). All variables are coded as dummies, with the reference category encompassing estimates lacking the characteristic in question. Analysis of the quantitative results is accompanied by narrative analysis.

We took several measures to guard against influence from bias within studies and bias relating to the selection of studies (e.g., publication bias and small-study effects). With regard to bias within studies, limiting the sample to RCTs eliminated many quality issues usually faced in meta-analyses. Nonetheless, we coded studies for “risk of bias” (Pigott & Polanin, 2019). Our criteria are inspired by the Cochrane framework (Higgins et al., 2011), but we focus on three dimensions: the extent to which studies systematically reported information on (a) the intervention, (b) study design, and (c) relevant statistics. We created a bias risk index ranking studies from 1 to 3 on each of the aforementioned dimensions. Summing these scores yielded a 9-point index, with 1 representing the highest risk of bias and 9 representing the lowest. We then classified studies earning 8 or 9 points as “high bias risk” and the rest as “low bias risk.” Scores for each dimension are included in our dataset.

Publication bias—“one of the biggest threats to the validity of meta-analytic results”—occurs when “studies with no novel or positive results take more time to publish or are not published at all” and are thus “less likely to be included in the meta-analysis” (Fernández-Castilla et al., 2021, p. 125). Other forms of selection bias could also affect meta-analysis results, for example, if results affect the likelihood of an article appearing in databases or if authors selectively report results. A range of selection bias tests have been widely used over the past few decades, and the last several years have seen a proliferation of simulations optimizing these tests in a range of contexts (Marks-Anglin & Chen, 2020). Two main challenges remain. First, it is difficult to distinguish publication bias from situations where smaller scale programs actually generate larger impact, as when implementation is stronger or implementers more skilled. Such dynamics are important to consider when interpreting external validity but do not constitute bias. Second, formal tests in the context of RVE (Skeen et al., 2019, p. 4) and between-studies heterogeneity (Marks-Anglin & Chen, 2020, p. 729) remain poorly understood, with fluctuations in error rates and statistical power across study designs (Fernández-Castilla et al., 2021). For heterogeneous analyses like ours, it is “unrealistic to expect to disentangle the effects of publication bias and heterogeneity reliably” (Peters et al., 2010, p. 575).

We thus take three main approaches in generating suggestive evidence on selection bias. First, we compare ESs of published articles against those of unpublished articles at several sample size levels to parse out selection bias specific to the publication process. Second, we use a modified version of the Egger's regression test to statistically assess selection bias. Traditional Egger's tests regress ES on study-level standard error, but ESs are mechanically dependent on standard errors. Although traditional Egger's tests inflate Type 1 errors, replacing the standard error with a different variance indicator can remove the problem (Pustejovsky & Rodgers, 2019, p. 59). Therefore, we replace the standard error with the term W_i as defined by the equation:

V_{i}^{d} = W_{i} + \frac{d^{2}}{2 f_{i}},

where $V_{i}^{d}$ is the variance of the ES, d is the ES, and f_i refers to degrees of freedom. Regressing ES on W_i using the same RVE models as for our main estimates, we then observe the significance of the W_i term and especially changes in the regression constant, which represents pooled ES (Rodgers & Pustejovsky, 2021). Finally, we present funnel plots in which ES is plotted against W_i and asymmetries represent potential evidence of selection bias.

Results

The search and screening process yielded 89 studies. Table S1 (available online) lists all included studies along with relevant details. Table 1 shows the breakdown of the studies over intervention and study characteristics, disaggregated by subject and tutor type. Most categories are not mutually exclusive at the study level.

Table 1

Study Characteristics

		Subject			Teacher Type
	All	Literacy	Math	Teacher	Para- professional	Non- professional	Parent
	(1)	(2)	(3)	(4)	(5)	(6)	(7)
All	89	72	22	19	43	20	9
Subject
Literacy	72	72	5	17	28	20	9
Math	22	5	22	3	16	2	1
Grade
Preschool/kindergarten	17	11	6	1	10	4	2
Grade 1	42	37	8	10	19	11	2
Grades 2–5	45	40	10	10	17	14	6
Grades 6–11	10	8	5	1	5	3	1
Teacher:student ratio
1:1	60	54	10	13	20	19	9
1:2	15	9	6	2	13	1	0
1:3+	23	15	9	5	17	1	0
Setting
During school	75	59	18	19	41	16	1
After school	15	14	4	0	2	5	8
ELL	10	7	3	3	6	1	0
Foster	3	3	3	0	1	1	1
Bias index
Low risk	80	63	22	19	42	15	5
High risk	9	9	0	0	1	5	4
Student sample size
<51	18	17	2	3	5	8	3
51–100	21	20	4	5	8	5	3
101–200	23	18	5	6	15	0	2
201–400	12	10	3	3	7	2	1
>400	15	7	8	2	8	5	0
Publication status
Published	70	55	18	15	36	15	5
Unpublished	19	17	4	4	7	5	4

Note. Cells indicate the number of studies that have at least one estimate in our meta-analysis matching the categories defined by rows and columns. Most categories are not mutually exclusive. ELL = English language learners.

Literacy tutoring is far more common within our sample (81%) than math tutoring (25%). Paraprofessional tutoring accounts for the largest share of tutor type with nearly half of the study sample, followed by nonprofessional, teacher, and then parent tutoring. Almost all math interventions utilized paraprofessional tutors. Studies cluster overwhelmingly within elementary school, with fewer than 20% of interventions involving students in sixth grade and above. Almost half of all studies include first grade. During-school tutoring (84%) is more common in our sample than after-school tutoring (17%). Most variation comes from paraprofessional and nonprofessional tutoring because all teacher tutoring interventions in our sample occurred during school and all but one parent tutoring intervention occurred outside of school hours. Nearly 70% of studies include one-to-one tutoring, whereas about a quarter include treatment arms with three or more students per tutor. A relatively small but nonnegligible handful of studies looks specifically at effects of tutoring interventions for English language learners (ELL) and foster students. We coded studies into these categories if they were specifically discussed as part of the program model or if at least half of the sample was listed as falling into the category.

We next address our main research questions, the extent of tutoring impact and variation in impact over program characteristics and contexts. Tables 2 and 3 show pooled ESs of study subsamples broken down by subject, grade level, and tutor-student ratio and additionally separate each into the four tutor type categories. Tables S2A and S2B (available online) show the same categories separated out by grade level instead of tutor type. ESs and standard errors given below for specific studies are averages of the ESs presented in the study, and these are included for all studies in Table S1 (available online).

Table 2

Pooled Effect Sizes: Program Characteristics by Tutor Type

	Coefficient	(SE)	n	[k]	Prediction Interval	τ²	Q
Panel A: All
All	0.288***	(0.029)	89	[553]	(–0.085, 0.662)	0.050	361.376
Literacy	0.290***	(0.036)	74	[494]	(–0.155, 0.735)	0.070	300.405
Math	0.268***	(0.040)	21	[59]	(–0.028, 0.564)	0.028	71.332
Preschool/kindergarten	0.412***	(0.059)	17	[91]	(–0.070, 0.894)	0.072	63.722
Grade 1	0.376***	(0.040)	42	[276]	(0.001, 0.752)	0.048	150.670
Grades 2–5	0.196***	(0.045)	45	[257]	(–0.208, 0.601)	0.056	142.592
Grades 6–11	0.128***	(0.040)	10	[36]	(–0.076, 0.332)	0.010	15.260
1:1 ratio	0.319***	(0.042)	60	[366]	(–0.122, 0.760)	0.068	267.861
1:2 ratio	0.260***	(0.059)	15	[64]	(–0.056, 0.576)	0.028	46.247
1:3+ ratio	0.255***	(0.045)	23	[159]	(–0.184, 0.693)	0.063	78.567
Panel B: Teacher
All	0.385***	(0.071)	19	[137]	(–0.019, 0.788	0.049	68.598
Literacy	0.382***	(0.078)	18	[128]	(–0.069, 0.833)	0.061	62.330
Math	0.308*	(0.158)	3	[9]	(–1.711, 2.327)	0.077	4.779
Preschool/kindergarten
Grade 1	0.484***	(0.082)	10	[65]	(0.095, 0.874)	0.037	34.273
Grades 2–5	0.352***	(0.136)	10	[74]	(–0.412, 1.116)	0.150	42.459
Grades 6–11
1:1 ratio	0.478***	(0.089)	13	[68]	(0.051, 0.904)	0.048	46.641
1:2 ratio	0.608	(0.383)	2	[9]		0.275	10.866
1:3+ ratio	0.189*	(0.105)	5	[62]	(–0.409, 0.788)	0.054	10.752
Panel C: Paraprofessional
All	0.304***	(0.031)	43	[261]	(–0.039, 0.647)	0.041	130.198
Literacy	0.323***	(0.047)	29	[214]	(–0.166, 0.812)	0.080	75.274
Math	0.287***	(0.039)	15	[47]	(–0.015, 0.589)	0.028	53.993
Preschool/kindergarten	0.363***	(0.055)	10	[49]	(–0.1250.852)	0.066	39.980
Grade 1	0.380***	(0.055)	19	[133]	(–0.141, 0.900)	0.086	68.945
Grades 2–5	0.249***	(0.041)	17	[111]	(–0.129, 0.626)	0.045	34.727
Grades 6–11	0.158***	(0.015)	5	[19]	(0.040, 0.275)	0.002	5.025
1:1 ratio	0.382***	(0.054)	20	[147]	(–0.088, 0.851)	0.070	50.338
1:2 ratio	0.215***	(0.034)	13	[53]	(0.018, 0.412)	0.011	23.152
1:3+ ratio	0.273***	(0.053)	17	[93]	(–0.209, 0.755)	0.073	66.147
Panel D: Nonprofessional
All	0.173***	(0.066)	21	[94]	(–0.190, 0.535)	0.040	61.257
Literacy	0.172***	(0.066)	21	[92]	(–0.189, 0.533)	0.039	60.835
Math
Preschool/kindergarten	0.377**	(0.166)	4	[25]	(–0.545, 1.298)	0.072	6.269
Grade 1	0.259***	(0.081)	11	[53]	(–0.101, 0.618)	0.032	19.338
Grades 2–5	0.079	(0.057)	15	[46]	(–0.228, 0.385)	0.027	40.006
Grades 6–11	0.119	(0.267)	3	[9]	(–2.947, 3.184)	0.164	7.393
1:1 ratio	0.168**	(0.069)	20	[90]	(–0.200, 0.537)	0.040	59.989
1:2 ratio
1:3+ ratio
Panel E: Parent
All	0.233*	(0.131)	9	[62]	(–0.411, 0.877)	0.098	21.175
Literacy	0.227*	(0.131)	9	[61]	(–0.415, 0.869)	0.098	21.081
Math
Preschool/kindergarten	0.398***	(0.036)		[15]		0.011	1.409
Grade 1	0.201**	(0.084)		[25]		0.000	1.055
Grades 2–5	0.184	(0.209)		[27]	(–0.796, 1.164)	0.168	16.181
Grades 6–11
1:1 ratio	0.233*	(0.131)	9	[62]	(–0.411, 0.877)	0.098	21.175
1:2 ratio
1:3+ ratio

Note. Coefficients represent pooled effect sizes with standard errors in parentheses, calculated using robust variance estimation; n and k, respectively, indicate number of studies and estimates. Prediction intervals are calculated at the 95% level. τ² and Q, respectively, show the distribution of true and observed effects.

p < .10. **p < .05. ***p < .01.

Table 3

Pooled Effect Sizes: Program Characteristics by Tutor Type

	Coefficient	(SE)	n	[k]	Prediction Interval	τ²	Q
Panel A: All
All	0.288***	(0.029)	89	[553]	(–0.085, 0.662)	0.050	361.376
During school	0.307***	(0.033)	74	[455]	(–0.088, 0.701)	0.055	334.830
After school	0.206***	(0.050)	14	[95]	(0.035, 0.377)	0.007	15.145
1–2 days/week	0.102*	(0.060)	17	[65]	(–0.226, 0.429)	0.031	44.107
3 days/week	0.288***	(0.050)	30	[161]	(–0.099, 0.676)	0.050	86.379
4–5 days/week	0.348***	(0.040)	44	[327]	(–0.023, 0.719)	0.047	178.813
≤ 30 minutes/session	0.329***	(0.038)	58	[363]	(–0.105, 0.764)	0.066	235.484
>30 minutes/session	0.208***	(0.039)	31	[190]	(–0.061, 0.476)	0.024	81.318
Short duration	0.298***	(0.033)	62	[383]	(–0.075, 0.672)	0.049	213.498
Long duration	0.263***	(0.059)	27	[170]	(–0.111, 0.636)	0.044	105.049
Panel B: Teacher
All	0.385***	(0.071)	19	[137]	(–0.019, 0.788)	0.049	68.598
During school	0.385***	(0.071)	19	[137]	(–0.019, 0.788)	0.049	68.598
After school
1–2 days/week
3 days/week	0.540*	(0.327)	3	[10]	(–3.992, 5.072)	0.408	22.995
4–5 days/week	0.375***	(0.057)	15	[125]	(0.042, 0.708)	0.032	42.393
≤30 minutes/session	0.401***	(0.089)	14	[84]	(–0.038, 0.839)	0.053	54.695
>30 minutes/session	0.350***	(0.122)	5	[53]	(–0.327, 1.027)	0.068	11.905
Short duration	0.397***	(0.080)	15	[101]	(–0.019, 0.813)	0.049	58.735
Long duration	0.300	(0.198)	4	[36]	(–0.902, 1.502)	0.130	9.759
Panel C: Paraprofessional
All	0.304***	(0.031)	43	[261]	(–0.039, 0.647)	0.041	130.198
During school	0.314***	0.032)	41	[250]	(–0.045, 0.673)	0.044	127.250
After school	0.148***	(0.023)	2	[11]		0.002	1.166
1–2 days/week	0.146***	(0.035)	4	[23]	(0.043, 0.248)	0.000	2.086
3 days/week	0.300***	(0.045)	21	[118]	(–0.032, 0.632)	0.035	44.551
4–5 days/week	0.308***	(0.066)	19	[120]	(–0.200, 0.816)	0.081	94.505
≤30 minutes/session	0.366***	(0.036)	27	[157]	(–0.067, 0.800)	0.063	75.471
>30 minutes/session	0.211***	(0.044)	16	[104]	(–0.061, 0.483)	0.022	40.471
Short duration	0.307***	(0.033)	36	[213]	(–0.078, 0.692)	0.051	93.665
Long duration	0.295***	(0.087)	7	[48]	(–0.135, 0.726)	0.038	29.518
Panel D: Nonprofessional
All	0.173***	(0.066)	21	[94]	(–0.190, 0.535)	0.040	61.257
During school	0.171*	(0.091)	15	[68]	(–0.282, 0.624)	0.057	53.742
After school	0.265***	(0.093)	5	[23]	(0.046, 0.483)	0.000	3.224
1–2 days/week	0.115	(0.087)	12	[38]	(–0.292, 0.523)	0.043	38.944
3 days/week	0.070***	(0.010)	4	[15]	(0.039, 0.100)	0.000	2.347
4–5 days/week	0.599***	(0.078)	6	[41]	(0.433, 0.765)	0.000	3.731
≤30 minutes/session	0.216*	(0.111)	11	[61]	(–0.336, 0.768)	0.078	39.373
>30 minutes/session	0.144**	(0.072)	10	[33]	(–0.169, 0.457)	0.023	21.070
Short duration	0.060	(0.072)	6	[21]	(–0.234, 0.354)	0.014	8.097
Long duration	0.231**	(0.091)	15	[73]	(–0.227, 0.688)	0.058	52.853
Panel E: Parent
All	0.233*	(0.131)	9	[62]	(–0.411, 0.877)	0.098	21.175
During school
After school	0.147	(0.100)	8	[61]	(–0.266, 0.561)	0.035	11.483
1–2 days/week
3 days/week	0.358***	(0.096)	2	[18]	(0.000, 0.000)	0.042	1.907
4–5 days/week	0.135	(0.151)	5	[41]	(–0.413, 0.683)	0.031	5.732
≤30 minutes/session	0.147	(0.100)	8	[61]	(–0.266, 0.561)	0.035	11.483
>30 minutes/session
Short duration	0.104	(0.112)	7	[48]	(–0.287, 0.496)	0.025	8.712
Long duration	0.835**	(0.416)	2	[14]	(0.000, 0.000)	0.346	5.095

p < .10. **p < .05. ***p < .01.

The central lesson is that tutoring interventions exert meaningful effects on learning across a wide range of program characteristics. ESs are positive and significant for the vast majority of subgroups. The top left cell of Table 2 reveals that across all included estimates and studies, tutoring interventions show a statistically significant and substantively large ES of 0.288 SD (SE = 0.029). Table 2 also reveals that teacher tutoring programs yield the largest impacts, followed closely by paraprofessional tutoring programs, with nonprofessional and parent tutoring accounting for the lower end of the impact distribution. Despite generally large coefficients and high significance, the estimates show substantial heterogeneity. Prediction intervals for each category extend from well below 0 to nearly a full standard deviation in the full and teacher tutoring samples. Paraprofessional tutoring shows less heterogeneity than teacher tutoring with a prediction interval that barely crosses the zero line. Nonprofessional tutoring has even less heterogeneity but also a smaller coefficient.

The advantages of teacher tutoring over paraprofessional tutoring arise most strongly in first-grade interventions, although all occur in grades 1 through 5. Prominent within this category is Reading Recovery, which has been subjected to RCTs spanning 3 decades (Center et al., 1995; Pinnell et al., 1988, 1994; Schwartz, 2005; Sirinides et al., 2018). ESs are substantial, ranging from 0.434 SD (SE = 0.024) in Sirinides et al.’s (2018) large-scale U.S. evaluation to 0.975 SD (SE = 0.294) in Australia (Center et al. 1995). Effects from other teacher tutoring interventions tend to be high as well, ranging from 0.437 SD (SE = 0.158; Mathes et al. 2005) to 0.992 SD (SE = 0.169; Bøg et al., 2021). Two ELL programs, respectively, showed ESs of 0.355 SD (SE = 0.178; Borman et al. 2020) and 0.491 SD (SE = 0.325; Vaughn et al., 2006). Evaluations of literacy programs for later grades also show promise, but results are more mixed (O’Connor et al., 2002,; Vaughn et al., 2019; Wanzek & Roberts, 2012). Only four teacher tutoring studies focused on math. Smith et al. (2013) tested Math Recovery, a program inspired by Reading Recovery, with a sample of more than 700 students across two states an find an ES of 0.242 SD (SE = 0.072). L. S. Fuchs et al. (2008) observed large ESs in a third-grade math programs, although with a small sample and focused on story problems.

Paraprofessional and nonprofessional tutoring programs also exerted highly significant impact. The most common types of paraprofessional tutors in our sample were interventionists employed by the school (Clarke et al., 2016, 2017; Doabler et al., 2016; Gersten et al., 2015; Jenkins et al., 2004; K. L. Lane et al., 2007; Mattera et al., 2018; O’Connor et al., 2010; Vadasy et al., 2006a, 2006b, 2007; Vadasy & Sanders, 2008a, 2008b, 2008c, 2009, 2010, 2011), education students and trainees (Allor & McCathren, 2004; Case et al., 2014; Denton et al., 2004; L. S. Fuchs et al., 2005, 2013; Jung, 2015; H. B. Lane et al., 2009; Mayfield, 2000; Swanson et al., 2014; Young et al., 2018), postgraduate or civic service fellows (Guryan et al., 2023; Markovitz et al., 2014; Parker et al., 2019), and research team members (Bryant et al., 2011; D. Fuchs et al., 2019; L. S. Fuchs et al., 2009; Gilbert et al., 2013; Toste et al., 2017, 2019). Nonprofessional tutoring programs employed community volunteers (Al Otaiba et al., 2005; Benner, 2004; Jacob et al., 2016; Loenen, 1989; Mooney, 2004; Vadasy et al., 2000; Vadasy, Jenkins, Antil, & Wayne, 1997; Vadasy, Jenkins, Antil, Wayne, & O’Connor, 1997), businesspeople (Baker et al., 2000; S. Miller & Connolly, 2013; S. Miller et al., 2012), older adults (Fives et al., 2013; Lee et al., 2011; Rebok et al., 2004), and undergraduates (Lachney, 2002; Lindo et al., 2018). Finally, parent tutoring programs accounted for the fewest studies. The pooled ES is large but only marginally significant (p < .10). The largest sample parent tutoring study (Lam et al., 2013) evaluated a preschool parent reading tutoring program in Hong Kong with around 200 preschoolers and showed an ES of 0.373 SD (SE = 0.144).

Moving to tutor-student ratio, one-to-one tutoring shows the highest overall effect size, followed by one-to-two. All categories show substantial heterogeneity. Most nonprofessional and parent tutoring programs were one-to-one. For teacher and paraprofessional program, one-to-one showed the largest ESs, with one-to-two and small-group interventions statistically similar.

The grade-level rows in Panel A of Table 2 reveal that at least up until middle school, ESs decline with grade level. PreK-kindergarten interventions have the highest overall ESs. All categories exhibit significant heterogeneity, but heterogeneity is highest for first grade. These differences in pooled ES could at least partially be explained by differences in the sensitivity of outcome tests used in different grades rather than real differences in impact.

Literacy and math program impact are similar, with literacy showing greater heterogeneity. However, Figure S3 and Table S2A (available online) reveal that declining impact over grade levels occurs most strongly in literacy programs. The disaggregation of ESs by subject in Table 2 shows that the smaller pooled ESs for nonprofessional and parent tutoring are driven by reading programs because most math tutoring programs utilize paraprofessional tutors. There are too few math tutoring studies to compare the effects of different math tutor types.

For reading, Vadasy and collaborators evaluated elementary literacy interventions using tutors hired by districts. Students were pulled out of different classes depending on the school. Average ESs for kindergarten range from 0.359 SD (SE = 0.263; Vadasy & Sanders, 2008b) to 0.616 SD (SE = 0.239; Vadasy & Sanders, 2010), with Vadasy et al. (2006a) in the middle at 0.438 SD (SE = 0.246). Programs involving explicit instruction in first grade (Vadasy & Sanders, 2011) and grades 2 and 3 (Vadasy et al., 2006b, 2007) generated average ESs of 0.314 SD (SE = 0.207), 0.343 SD (SE = 0.427), and 0.406 SD (SE = 0.304), respectively. However, the program Quick Reads showed smaller ESs in grades 2 through 4 (Vadasy & Sanders, 2008a, 2008c, 2009) of 0.078 SD (SE = 0.157), 0.174 SD (SE = 0.184), and 0.185 SD (SE = 0.173), respectively.

Run by the AARP foundation,⁶ Experience Corps (EC) is a nonprofessional tutoring program that uses “older adults” as reading tutors. At present, operations include around 2,000 tutors and 20,000 students over 23 cities. Lee et al. (2011) evaluated EC with a sample of nearly 900 students across 23 schools in Boston, New York City, and Port Arthur, Texas. Tutors received 15 to 32 hours of training. Different sites selected their own curricula. The program ran for a full academic year, with two to four 30- to 40-minute sessions per week. We calculated the overall average ES as 0.075 SD (SE = 0.067), although the authors reported greater impact among students who received at least 35 sessions (p. 110).

Turning to math, ROOTS constituted the most noteworthy program at the preK-kindergarten level. Here, the school district hired paraprofessional “instructional assistants” as tutors. ESs range from 0.101 SD (SE = 0.177; Clarke et al., 2016) to 0.427 SD (SE = 0.130; Doabler et al., 2016), with Clarke et al. (2017) in the middle at 0.247 SD (SE = 0.112), and are impressive given the lack of other impactful elementary math programs. Mattera et al. (2018) evaluated High 5s, an after-school small-group kindergarten math program using tutors from a nearby teaching college. The effect of High 5s was only 0.140 SD (SE = 0.078), but it was part of a sweeping curriculum change that generated no significant impact.

Moving to elementary math, the first-grade program Number Rockets was evaluated at a small scale (L. S. Fuchs et al., 2005), larger scale (L. S. Fuchs et al., 2013), and in a full-scale multistate impact evaluation with a sample of nearly 1,000 students in 76 schools across four urban districts (Gersten et al., 2015). These showed consistently strong ESs of 0.334 SD (SE = 0.179), 0.243 SD (SE = 0.100), and 0.337 SD (SE = 0.066), respectively. Parker et al. (2019) evaluated a grades 4 through 8 math intervention. The study included about 500 students in 13 of the more than 150 schools in which Number Rockets is available across Minnesota. Tutors were “community members” who had made a year-long commitment to tutoring as part of AmeriCorps. The tutors received 4 days of training and two monthly 2-hour follow-up sessions along with monthly coaching. Tutoring was given for 90 minutes per week across 2 or 3 days for one semester. The study found an ES of 0.173 SD (SE = 0.094) on a standardized math assessment, although the authors reported greater impact for students who attended an hour or more per week for at least 12 weeks.

Guryan et al. (2023) reported on the only major high school program in our sample, SAGA Education, with more than 2,700 boys in Grades 9 and 10 across 12 Chicago public schools. Students were overwhelmingly Black or Hispanic (95%), struggling academically, and eligible for free or reduced-price lunches (90%). Tutors were recent college graduates who were not certified teachers but committed to tutoring for a year with a small stipend. Tutoring occurred in 55-minute sessions daily for a full school year, with one tutor and two students in each session. Control group students were assigned to alternative elective courses. It yielded an impact of 0.168 SD (SE = 0.032) on standardized math scores, which is noteworthy given the rarity of strong impacts in high schools. Treatment on the treated estimates were substantially higher.

We turn next to program delivery characteristics as shown in Table 3 and Table S2B (available online), starting with a comparison of during- versus after-school programs. The pooled ES for during is 0.307 SD (SE = 0.033), substantially larger than after-school tutoring, although the latter is also high in magnitude and significance. The impacts of after-school programs were also less heterogeneous, with a prediction interval entirely above 0. During-school versus after-school variation occurs entirely within paraprofessional and nonprofessional tutoring programs because all teacher programs are during school and all parenting programs after school. Table S2B (available online) indicates that pooled estimates for during-school interventions are higher than after-school in all grade categories except for 6 through 11, for which we do not have enough degrees of freedom to interpret.

Counterintuitively, programs lasting over 20 weeks show a pooled ES smaller than programs under 20 weeks, and those under the median length per session (30 minutes) outperformed those above the median. On the other hand, Table 3 shows that ESs generally increase positively with the number of tutoring sessions per week. However, Table S2B (available online) shows that differences between 3 and 4 to 5 days per week are driven by preschool through grade 1 estimates, whereas grades 2 through 5 show stronger impacts for 3 days than for 4-5 days per week. There is little evidence of once-weekly tutoring sessions generating significant effects. In one noteworthy progression, S. Miller and Connolly (2013) found no significant effects from a weekly reading program for 8- and 9-year-olds in Northern Ireland with nonprofessional tutors recruited through a business network. However, S. Miller et al. (2012) found significant effects from the same program administered twice weekly. Ritter and Maynard's (2008) lack of significant findings may stem from the program's reliance on weekly tutoring sessions. Figure S4 (available online) shows ES by sessions per week plotted graphically. For all three dosage variables, heterogeneity varies positively with ES.

Next, we discuss a series of analyses that tested the sensitivity of the results to study characteristics, including potential bias and other research artifacts. Table 4 presents pooled ESs across groups defined by student and school sample size, publication year, publication type, and risk of bias. ESs reduced slightly in the past decade on average but are nearly identical regardless of bias risk. Impact remains broadly consistent for studies with sample sizes up to around 400. The pooled ES for studies with samples greater than 400 is 0.203 SD (SE = 0.045). After this threshold, ESs plateau. Large ESs within small-sample studies are associated primarily with literacy tutoring, whereas math program effects remain more consistent. The pattern for school sample size is similar but with less decline in ES at larger school sample sizes.

Table 4

Pooled Effect Sizes by Study Characteristics

	Effect Size	SE	n	[k]
All	0.288***	(0.029)	89	[553]
Student sample size
<51	0.441***	(0.092)	18	[100]
51–100	0.364***	(0.071)	21	[165]
101–200	0.337***	(0.057)	23	[169]
201–400	0.176**	(0.070)	12	[78]
>400	0.203***	(0.045)	15	[41]
School sample size
<11	0.334***	(0.050)	37	[264]
11–50	0.301***	(0.044)	39	[219]
>50	0.301***	(0.044)	39	[219]
Publication year
1985–1999	0.349**	(0.159)	10	[49]
2000–2009	0.354***	(0.050)	34	[270]
2010–2019	0.252***	(0.035)	45	[234]
Publication status
Published	0.278***	(0.031)	70	[458]
Professional report	0.279*	(0.144)	6	[18]
Dissertation	0.395***	(0.138)	10	[72]
Other unpublished	0.410**	(0.165)	3	[5]
Bias index
Low risk	0.283***	(0.029)	80	[488]
High risk	0.375**	(0.163)	9	[65]

Note. Coefficients represent pooled effect sizes with standard errors in parentheses, calculated using robust variance estimation; n and k, respectively, indicate number of studies and estimates.

p < .10. **p < .05. ***p < .01.

We unpack the connection between student and school sample sizes on one hand and ESs on the other in Tables S3A through S3D (available online). Nonprofessional tutoring programs are almost entirely responsible for weighing down average ESs among larger samples. Teacher tutoring programs remain remarkably consistent across sample sizes. Paraprofessional programs show some decline with sample size, but the pooled ES even at the largest samples remains nearly identical to that of the full study sample. There is an insufficient range of parent tutoring evaluations to identify trends. Figure S5 (available online) shows results graphically via scatter plots with overlaid fractional polynomial regression lines for each tutor type. Tables S3B and S3C (available online) show that sample size trends are similar across the other program characteristics considered in our main ES estimates.

Table S4 (available online) presents multivariate regressions exploring the sensitivity of several of our main findings to the inclusion of variables relating to study characteristics (log sample size, publication status, and bias risk) and dosage (sessions per week). The other dosage variables—minutes per session and duration in weeks—are omitted because they did not reach significance or affect other variables in any specification. The specification in the first column includes only the study characteristics and dosage variables. Columns 2 and 4, respectively, include tutor type (with paraprofessional as the reference category) and curriculum variables (subject with literacy as the reference category and grade level with first grade as the reference category), and columns 3 and 5 add study characteristics and dosage. Column 6 includes all variables. Taken together, the multivariate regressions lend weight to the robustness of our results.

Finally, we turn to a discussion of publication bias. As shown in Table 4, published studies show smaller effect sizes than unpublished studies, making it unlikely that publication bias affected our overall results. To explore the possibility that publication bias is concentrated within smaller sample studies (Fernández-Castilla et al., 2021, p. 125), Table S3D (available online) breaks down ES by publication type across different sample sizes. Although the ES for published studies is higher than unpublished studies within a few sample ranges, the table does not show evidence of a systematic relationship between publication type, sample size, and ES with sufficient magnitude to influence our results. Turning to selection bias more generally, Tables S5A and S5B (available online) replicate the main results in Tables 2 and 3 using a modified Egger's test. The coefficient for precision is significant in several cells, but impact changes little. Figure S6 (available online) shows this graphically, with a mostly symmetrical funnel plot. Especially given the likelihood that these tests are also picking up treatment heterogeneity and small study effects unrelated to bias, they suggest that our overall results are unlikely to be meaningfully affected by selection bias.

Discussion

Among the most widely relied on and dynamic educational tools available for educators and policymakers today, tutoring has been promoted as an effective method for improving education. In this review, we found that tutoring programs yield consistently substantial positive impacts on learning, with an overall pooled ES of 0.288 SD (SE = 0.029). Impacts are strongest for programs that use teacher or paraprofessional tutors, occur in earlier grades, are held 3 or more days per week, and are conducted during school. In the remainder of this section, we review limitations faced by the study, position our results within the broader preK–12 evaluation meta-analysis literature, and discuss implications for policy and research.

Limitations

This review faced several limitations. First, as for all reviews, our findings are limited to programs that have been evaluated, in our case, through RCTs. Second, pedagogical characteristics of tutoring interventions remain mostly black-boxed within our analysis. Curriculum attributes were too subtle and multifaceted for us to reliably code, and we felt that an examination at that level of nuance would be better approached by an education psychology review. Third, in many cases, control group students may have received tutoring or other remedial activities, which would downwardly bias results. Fourth, because of resource limitations, screening and coding was conducted exclusively by one author, albeit with random spot checks from research staff. These limitations also precluded classification of excluded articles according to reasons for exclusion and a more detailed risk of bias measure.

Findings in Context

To contextualize our findings, we compare them to other meta-analytic estimates of preK–12 interventions. Following Kraft (2020), we used ES as a metric for comparison with the following general thresholds: “Less than 0.05 is small, 0.05 to less than 0.20 is medium, and 0.20 or greater is large . . . based on the distribution of 1,942 effect sizes from 747 RCTs evaluating education interventions with standardized test outcomes” (p. 247). Because ESs are likely to differ across contextual domains such as grade level, subject, and sample size, ESs are primarily useful for comparison among closely related potential policy alternatives, which we have attempted to facilitate in our disaggregated tables. As a broadly intuitive benchmark for considering the results, a 0.40 SD achievement gain is expected during the fifth grade, including school and all other education inputs (Bloom et al., 2008; Kraft, 2020, p. 247).

Our overall pooled ES estimate of 0.288 SD approaches that of Ritter et al. (2009), the last meta-analysis of tutoring RCTs, which found an ES of 0.30 SD (SE = 0.061) among K–8 volunteer tutoring programs. When reducing our sample to paraprofessional and nonprofessional tutoring in grades K through 8 to more closely match Ritter et al.’s scope, we found 0.266 SD (SE = 0.033).

A series of recent reviews explored the impacts of alternative intervention types, including tutoring on learning outcomes. Of these, Dietrichson et al.’s (2017) meta-analysis of program impacts for low-SES elementary and middle schoolers covers the range closest to ours, including both literacy and math at the K–8 grade levels. Their study yielded a pooled ES for tutoring programs of 0.36 SD. Meanwhile, they estimated the next highest ESs at 0.32 SD, 0.24 SD, and 0.22 SD, respectively, for “feedback and progress monitoring,”“small-group instruction,” and “cooperative learning.” In addition to showing stronger impacts, the tutoring research base is more robust: Although 36 studies are included in the tutoring estimate, the other three estimates are based on findings from only five, four, and ten studies, respectively. The remaining ten intervention components show ESs below 0.20 SD (Dietrichson et al., 2017, p. 268).

Turning to narrower subject and age coverage, Pellegrini et al. (2021), in their meta-analysis of 66 K–5 math programs, found a pooled ES of 0.20 SD from 22 tutoring studies—similar to our finding of 0.262 SD (SE = 0.043) for the subsample of studies focusing on K–5 math programs. Seven studies on professional development training for classroom management had a similar ES of 0.19 SD, but the six other program categories included in the review showed lower or nonsignificant ESs. Meanwhile, Neitzel et al. (2022) reviewed reading tutoring programs alongside multitiered whole-class/whole-school and whole-class approaches and technology-supported adaptive instruction. They found a pooled ES of 0.26 SD for elementary reading tutoring programs (p. 172), similar to our finding of 0.294 SD (SE = 0.038). Multitiered and whole-class approaches performed comparably with tutoring at 0.27 SD and 0.31 SD, respectively (pp. 160–161), whereas the technology interventions only yielded a nonsignificant 0.09 SD (p. 162; although for promising findings on technology interventions, see Escueta et al., 2020). Baye et al. (2019) reported similar results for secondary reading tutoring programs, finding a pooled ES of 0.24 SD for tutoring—nearly twice that of the program categories with the next largest impacts, “writing focused” and “personalization” programs, both with ESs of 0.13 SD (p. 142).

We next compare our findings of the relative impact of alternative tutoring program types to those from the other recent reviews. Our finding that teacher and paraprofessional tutoring programs yield nearly identical impact matches results from other recent analyses. Dietrichson et al. (2017, p. 269) found a small positive but nonsignificant advantage of 0.06 SD for professional over nonprofessional tutors (the former category corresponds to our “teacher tutoring” category, and the latter includes the categories that we label “paraprofessional” and “nonprofessional”). Similarly, Pellegrini et al. (2021, p. 22) found that there were similar impacts whether the tutoring was given by teachers (0.24 SD) or teaching assistances (0.18 SD) for K–5 math programs. Neitzel et al. (2022, p. 172) found similar ESs whether tutors were teachers or teaching assistants: 0.34 SD for teachers, 0.29 SD for teaching assistants, 0.36 SD for paid volunteers, and 0.04 SD for unpaid volunteers. The differences between teachers, on one hand, and teaching assistants and paid volunteers, on the other hand, were not statistically significant; the differences between teachers and unpaid volunteers were statistically significant. Neitzel et al. considered this to be their “most practically important finding” (p. 172). On the other hand, Slavin et al. (2011) found stronger impacts from teachers. Nonetheless, the evidence on balance is consistent with tutoring skills as distinct from the skills required for effective classroom teaching (Guryan et al., 2023).

Another key finding from our analysis was the decline in impact with more students per tutor. This finding is also in line with previous reviews. Neitzel et al. (2022, p. 170) found an effect size of 0.41 SD for one-to-one tutoring and 0.24 SD for small-group tutoring. Gersten et al. (2020) also estimated a larger effect size for one-to-one tutoring relative to small group, but they found it statistically nonsignificant. Dietrichson et al. (2017) also found a positive, nonsignificant coefficient for group relative to one-to-one tutoring, but with sufficient magnitude to signify likely importance. Baye et al. (2019, p. 142), for secondary school reading programs, found that small-group tutoring has a magnitude only half that of one-to-one tutoring (0.14 SD vs. 0.28 SD), and the impact is nonsignificant. On the other hand, Wanzek et al. (2016) found no advantage for one-to-one over small-group tutoring, and Pellegrini et al. (2021) estimated a higher ES for small-group than for one-to-one tutoring: 0.30 SD and 0.19 SD, respectively.

A fourth important finding from our review is that impact tends to decline across grade levels, with literacy programs seeing the sharpest decline. This is consistent with previous findings that “the critical period for language development occurs early in life, while the critical period for developing higher cognitive functions extends into adolescence” (Fryer 2017, p. 104). More broadly, ESs for math programs in our sample are less variable than for literacy.

With regard to program dosage, we found that more tutoring sessions per week are associated with stronger impact, but longer sessions and overall program duration are not. Although our study is the first to our knowledge to examine the correlations between session frequency and length, on one hand, with ES, on the other, others have found, like us, that longer duration may be associated with weaker impact (Dietrichson et al. 2017, p. 274; Wanzek et al. 2006; 2013).

Implications for Policy

In recent years, tutoring has gained interest among policymakers seeking solutions for students facing learning struggles. Some states and school districts are developing their own tutoring programs, whereas others are expanding existing operations.⁷ Although a great deal remains to be learned to optimize tutoring, the present article presents results that robustly support several propositions. The overall impact of tutoring programs across a variety of settings is encouragingly high. The consistency of ESs above 0.20 suggests tutoring is one of the most effective strategies for generating meaningful improvement in academic achievement. The key policy challenge, then, becomes optimizing program quality while keeping costs low.

As discussed above, tutor category represents a central determinant of tutoring program costs. Our review suggests that paraprofessional tutoring is likely to offer the most cost-effective option of the four categories reviewed in the widest range of contexts, given strong impacts alongside low hourly tutor wage requirements. Although some nonprofessional tutoring programs have shown promising results, pools of potential tutors may be limited, and these programs typically allow less scope for training and dedicated commitment. The experimental parent tutoring research is still too thin for policy lessons. However, implementers generally have less control over parent tutoring than other types, and it may be best to approach parent engagement from a broader family support perspective to ensure complementarity with parents’ other roles. Teacher tutoring is generally impactful, but it may be prohibitively expensive (Neitzel et al., 2022, p. 156). Moving toward teacher-led small-group tutoring could be an alternative approach to reducing costs, but our analysis and other recent reviews showed a greater differential between one-to-one and small-group tutoring than between teacher and paraprofessional tutoring. Although it is certainly plausible that certified teachers may be most efficient in some contexts, it seems best to default to paraprofessionals unless skills specific to certified teachers are required.

In particular, paraprofessional school staff members and recent graduates in professional fellowship programs represent promising bodies of potential tutors. Tutoring and other in-school intervention activities may represent a viable and fulfilling career path for individuals who might not otherwise enter the education sector. Programs that employ paraprofessional school staff members as tutors may save on administrative costs given their integration into the school and may allow for stability as the programs develop. Relatedly, education-oriented civic programs are becoming increasingly common within the career trajectory of recent college graduates.

Peer, cross-age, and computer-based tutoring represent potential alternative models that show promise in some contexts but come with limitations that make them unlikely to substitute for paraprofessional tutoring. Dietrichson et al. (2017) found an average of 0.22 SD from peer tutoring, whereas Slavin et al. (2009) found 0.26 SD impacts for both peer and cross-age tutoring. These programs may reduce costs relative to paid tutoring and could generate positive spillover effects if tutoring benefits the tutor as well as the tutee. However, it may be difficult to ensure consistently high-quality tutoring from children, and the ethical necessity of ensuring benefit to the tutor and the tutee may present challenges. Computer-assisted learning (CAL) programs are thought to emulate elements of tutoring programs at potentially much lower costs, so much so that adaptive learning programs have come to be known as “intelligent tutoring systems.” The lack of human engagement may remove some of the potential benefits of tutoring, including associating positive human interaction with the educational content (Neitzel et al., 2022, p. 174). However, the conditions under which CAL can be used effectively in conjunction with human tutors or replace it remain to be explored in future research.

In terms of program delivery, our meta-analysis suggests that the extent to which program implementers are able to ensure that tutoring actually occurs at sufficiently high doses may outweigh subtleties in the content being taught. As with many social programs, ensuring fidelity to the original model remains a challenge for many tutoring programs that aim to scale. Although our data do not allow us to address this topic statistically, our review suggests that the relatively lower effects found within after-school and parent tutoring may arise largely from difficulties in ensuring that tutoring occurs as planned.

Future Directions for Research and Policy

Our review highlights numerous areas for future research, but several stand out as particularly critical. For one, there is a large scope for experimental research on high school tutoring. While ESs tend to be highest at earlier grade levels, the most relevant point of comparison for policymakers is typically the opportunity cost of a program relative to other intervention opportunities for that grade level. The Saga Education model represents an especially promising model for expansion given its low-cost success in a secondary setting. Similarly, experimental evaluations of tutoring programs in subject areas other than reading and math, for instance, science or social studies, could open a new area for tutoring policy research.

Another vital research area is unpacking opportunity costs and spillover effects from tutoring. The majority of tutoring experiments so far have been conducted at relatively small scales and focus primarily on pedagogical rather than logistical and general equilibrium issues, but these considerations become increasingly important as programs move toward scaleup. One key issue here is the opportunity cost faced by students when they invest substantial quantities of time in tutoring. For example, what are the implications of students being pulled out of classes for tutoring on the same topics as their tutoring versus different topics or activities? In our sample of studies, 39 reported avoiding pulling students out of the same subjects, whereas only two reported intentionally pulling students out of the same subjects; 10 reported a mix, 15 were held after school, and 20 did not report which subject tutoring was substituted for. Programs that explicitly avoided pulling children out of the same subject had a pooled ES of 0.376 SD (SE = 0.040), whereas those with a mix and those that did not report this showed 0.197 SD (SE = 0.033) and 0.210 SD (SE = 0.076), respectively. The three studies that explicitly replaced the same subject of tutoring did not provide sufficient degrees of freedom for a reliable significance test, but the pooled ES was 0.405 SD. Furthermore, with the exception of a handful of studies in the “mixed” category, studies reported only what was called for by the program model rather than measures of what occurred in practice from field research or monitoring data.

Future research should pay closer attention to measuring specifically what alternative activities tutoring replaces and estimate implications for impact accordingly. They should also attempt to gain a more detailed picture of the specific tutoring or other supplemental programs that control group students received because otherwise, these may downwardly bias estimates of tutoring program impact. Relatedly, future research should measure outcomes for domains outside of the main topic of tutoring. If students are taken out of classes or recreational activities that are unrelated to the topic of tutoring, do scores in other areas go down? Or, conversely, are there synergies by which tutoring on some topics helps students to perform better on other topics? It is also worth testing for spillover effects on nontutored students, particularly given recent findings from Berlinski et al. (2022) that nontutored students in schools where a tutoring intervention took place indirectly benefited.

Next, researchers and practitioners must pay close attention to impacts on equity and average impact. For the most part, it seems likely that tutoring programs following the models evaluated in this review would be on net equity-increasing. The vast majority of programs examined in the tutoring academic and policy literature are implicitly or explicitly conceptualized as remedial and aim to lessen the education gap. However, some types of tutoring may be more equity-inducing than others, depending on their effectiveness for particular groups of students. Although the specific mechanisms explaining such divergences may not be obvious a priori, they may still exert powerful effects. For instance, despite Fryer and Howard-Noveck’s (2020) findings of a modest ES for middle school reading tutoring, effects for Black students were substantially larger, yielding one of the study's most noteworthy findings.

Equity considerations may underscore the importance of free tutoring programs more broadly. Parents of wealthy students, like education researchers and practitioners, have noticed the potential effectiveness of tutoring and have attempted to leverage it. Over the past few decades, private tutoring that households pay for has grown increasingly popular. Increasing the presence and effectiveness of public tutoring systems may thus be important for less advantaged students to keep pace in this environment.

Both equity and efficiency considerations further point toward the importance of identifying the populations of students who could most benefit from tutoring. Students who have fallen behind as a result of structural barriers rather than specific learning disorders may especially benefit from tutoring programs that can set them on self-sustaining pathways toward rapid learning. A relatively distinct literature has already emerged on tutoring for foster children (Hickey & Flynn, 2019). Future studies could also test interventions for other marginalized populations of children whose circumstances may have precluded sufficient preparation for regular school, including incarcerated adolescents (Wexler et al., 2014) and refugees (Naidoo, 2009). And although a growing number of studies investigates the impact of tutoring on learning outcomes for ELL students, this area of research has enormous room for growth as well.

Tutoring programs rank among the most flexible and potentially transformative learning program types available at the preK–12 levels. This review has synthesized and quantitatively analyzed experimental evidence on all programs for which such evidence is available. With ESs averaging 0.288 SD (SE = 0.029) and impacts consistently significant across a wide range of program and study characteristics, our findings demonstrate not only the power of tutoring but also its versatility. As educators grapple with long-standing inequities and newer challenges like the COVID-19 pandemic, there is little doubt that tutoring programs will constitute a central workhorse educational tool.

Supplemental Material

sj-pdf-1-aer-10.3102_00028312231208687 – Supplemental material for The Promise of Tutoring for PreK–12 Learning: A Systematic Review and Meta-Analysis of the Experimental Evidence

Supplemental material, sj-pdf-1-aer-10.3102_00028312231208687 for The Promise of Tutoring for PreK–12 Learning: A Systematic Review and Meta-Analysis of the Experimental Evidence by Andre Nickow, Philip Oreopoulos and Vincent Quan in American Educational Research Journal

Footnotes

Notes

Andre Nickow is a research manager at Northwestern University's Global Poverty Research Lab. As a sociologist with an interdisciplinary orientation, his research focuses on evaluation methodology, socioeconomic development, political economy, and policy analysis across the quantitative-qualitative divide. Topics of special interest include social protection, agrarian development, and social movements.

Philip Oreopoulos is a distinguished professor of economics of education at the University of Toronto, a research associate of the National Bureau of Economic Research, and co-chair in education at the Jameel Poverty Action Lab. His current work focuses on education policy, especially the effort to utilize technology to facilitate more personalized learning. He often examines this field by initiating and implementing large-scale field experiments with the goal of producing convincing evidence for public policy decisions.

Vincent Quan is the co-executive director of J-PAL North America, a regional office of the Abdul Latif Jameel Poverty Action Lab at the Massachusetts Institute of Technology. His research focuses on randomized evaluations to understand the impact of different social policies and programs. He has conducted reviews of randomized evaluations of different educational interventions on student outcomes, including tutoring and technology.

References

*Allor

McCathren

(2004). The efficacy of an early literacy tutoring program implemented by college students. Learning Disabilities Research & Practice, 19(2), 116–129. https://doi.org/10.1111/j.1540-5826.2004.00095.x

*Al Otaiba

Schatschneider

Silverman

. (2005). Tutor-assisted intensive learning strategies in kindergarten. Exceptionality, 13(4), 195–208. https://doi.org/10.1207/s15327035ex1304_2

*Baker

Gersten

Keating

(2000). When less may be more. Reading Research Quarterly, 35(4), 494–519.

*Barnes

M. A.

Klein

Swank

Starkey

McCandliss

Flynn

Zucker

Huang

C. W.

Fall

A. M.

Roberts

(2016). Effects of tutorial interventions in mathematics and attention for low-performing preschool children. Journal of Research on Educational Effectiveness, 9(4), 577–606. https://doi.org/10.1080/19345747.2016.1191575

Banerjee

Banerji

Berry

Duflo

Kannan

Mukherji

Walton

(2016). Mainstreaming an effective intervention: Evidence from randomized evaluations of “Teaching at the Right Leve”, in India (NBER Working Paper No. 22746). National Bureau of Economic Research. https://doi.org/10.3386/w22746

Baye

Inns

Lake

Slavin

R. E.

(2019). A synthesis of quantitative research on reading programs for secondary students. Reading Research Quarterly, 54(2), 133–166. https://doi.org/10.1002/rrq.229

*Benner

G. J.

(2004). An investigation of the effects of an intensive early literacy support program on the phonological processing skills of kindergarten children at-risk of emotional and behavioral disorders [Unpublished doctoral dissertation]. University of Nebraska.

Berlinski

Busso

Giannola

(2022). Helping struggling students and benefiting all: Peer effects in primary education (IFS Working Paper No. W22/02). IFS. https://doi.org/10.1920/wp.ifs.2022.222

*Blachman

B. A.

Schatschneider

Fletcher

J. M.

Francis

D. J.

Clonan

S. M.

Shaywitz

B. A.

Shaywitz

S. E.

(2004). Effects of intensive reading remediation for second and third graders and a 1-year follow-up. Journal of Educational Psychology, 96(3), 444–461. https://doi.org/10.1037/0022-0663.96.3.444

10.

Bloom

H. S.

Hill

C. J.

Black

A. R.

Lipsey

M. W.

(2008). Performance trajectories and performance gaps as achievement effect-size benchmarks for educational interventions. Journal of Research on Educational Effectiveness, 1(4), 289–328. https://doi.org/10.1080/19345740802400072.

11.

*Bøg

Dietrichson

Isaksson

A. A.

(2021). A multi-sensory tutoring program for students at risk of reading difficulties. Journal of Educational Research, 114(3), 233–251. https://doi.org/10.1080/00220671.2021.1902254

12.

Borenstein

Higgins

J. P.

Hedges

L. V.

Rothstein

H. R.

(2017). Basics of meta-analysis: I2 is not an absolute measure of heterogeneity. Research Synthesis Methods, 8(1), 5–18. https://doi.org/10.1002/jrsm.1230

13.

*Borman

G. D.

Borman

T. H.

Park

S. J.

Houghton

(2020). A multisite randomized controlled trial of the effectiveness of Descubriendo la Lectura. American Educational Research Journal, 57(5), 1995–2020. https://doi.org/10.3102/0002831219890612

14.

Bruner

(1985). Child’s talk: Learning to use language. W.W. Norton.

15.

*Bryant

D. P.

Bryant

B. R.

Roberts

Vaughn

Pfannenstiel

K. H.

Porterfield

Gersten

(2011). Early numeracy intervention program for first-grade students with mathematics difficulties. Exceptional Children, 78(1), 7–23. https://doi.org/10.1177/001440291107800101

16.

*Case

Speece

Silverman

Schatschneider

Montanaro

Ritchey

(2014). Immediate and long-term effects of tier 2 reading instruction for first-grade students with a high probability of reading failure. Journal of Research on Educational Effectiveness, 7(1), 28–53. https://doi.org/10.1080/19345747.2013.786771

17.

*Center

Wheldall

Freeman

Outhred

McNaught

(1995). An evaluation of reading recovery. Reading Research Quarterly, 30(2), 240–263. https://doi.org/10.2307/748034

18.

*Clarke

Doabler

C. T.

Kosty

Kurtz Nelson

Smolkowski

Fien

Turtura

(2017). Testing the efficacy of a kindergarten mathematics intervention by small group size. AERA Open, 3(2). https://doi.org/10.1177/2332858417706899

19.

*Clarke

Doabler

C. T.

Smolkowski

Baker

S. K.

Fien

Strand Cary

(2016). Examining the efficacy of a Tier 2 kindergarten mathematics intervention. Journal of Learning Disabilities, 49(2), 152–165. https://doi.org/10.1177/0022219414538514

20.

*Cook

J. A.

(2002). Every moment counts [Unpublished doctoral dissertation]. Arizona State University.

21.

Cunha

Heckman

J. J.

Lochner

Masterov

D. V.

(2006). Interpreting the evidence on life cycle skill formation. Handbook of the Economics of Education, 1, 697–812. https://doi.org/10.1016/S1574-0692(06)01012-9

22.

*Denton

C. A.

Anthony

J. L.

Parker

Hasbrouck

J. E.

(2004). Effects of two tutoring programs on the English reading development of Spanish-English bilingual students. Elementary School Journal, 104(4), 289–305. https://doi.org/10.1086/499754

23.

Dietrichson

Bøg

Filges

Klint Jørgensen

A. M.

(2017). Academic interventions for elementary and middle school students with low socioeconomic status. Review of Educational Research, 87(2), 243–282. https://doi.org/10.3102/0034654316687036

24.

Dietrichson

Filges

Klokker

R. H.

Viinholt

B. C.

Bøg

Jensen

U. H.

(2020). Targeted school-based interventions for improving reading and mathematics for students with, or at risk of, academic difficulties in Grades 7-12. Campbell Systematic Reviews, 16(2), 1–52. https://doi.org/10.1002/cl2.1081

25.

Dietrichson

Filges

Seerup

J. K.

Klokker

R. H.

Viinholt

B. C.

Bøg

Eiberg

(2021). Targeted school-based interventions for improving reading and mathematics for students with or at risk of academic difficulties in Grades K-6. Campbell Systematic Reviews, 17(2), 1–78. https://doi.org/10.1002/cl2.1152

26.

*Doabler

C. T.

Clarke

Kosty

D. B.

Kurtz-Nelson

Fien

Smolkowski

Baker

S. K.

(2016). Testing the efficacy of a Tier 2 mathematics intervention. Exceptional Children, 83(1), 92–110. https://doi.org/10.1177/0014402916660084

27.

Escueta

Nickow

A. J.

Oreopoulos

Quan

(2020). Upgrading education with technology: Insights from experimental research. Journal of Economic Literature. 58(4), 897–996. https://doi.org/10.1257/jel.20191507

28.

Fernández-Castilla

Declercq

Jamshidi

Beretvas

S. N.

Onghena

Van den Noortgate

(2021). Detecting selection bias in meta-analyses with multiple outcomes. Journal of Experimental Education, 89(1), 125–144. https://doi.org/10.1080/00220973.2019.1582470

29.

*Fives

Kearns

Devaney

Canavan

Russell

Lyons

Eaton

O’Brien

. (2013). A one-to-one programme for at-risk readers delivered by older adult volunteers. Review of Education, 1(3), 254–280. https://doi.org/10.1002/rev3.3016

30.

Fryer

R. G.

Jr . (2017). The production of human capital in developed countries: Evidence from 196 randomized field experiments. In Banerjee

A. V.

Duflo

(Eds.), Handbook of economic field experiments (pp. 95–322). North-Holland. https://doi.org/10.1016/bs.hefe.2016.08.006

31.

Fryer

R. G.

Jr. Howard-Noveck

(2020). High-dosage tutoring and reading achievement. Journal of Labor Economics, 38(2), 421–452. http://dx.doi.org/10.1086/705882

32.

*Fuchs

Kearns

D. M.

Fuchs

L. S.

Elleman

A. M.

Gilbert

J. K.

Patton

Peng

Compton

D. L.

(2019). Using moderator analysis to identify the first-grade children who benefit more and less from a reading comprehension program. Exceptional Children, 85(2), 229–247. https://doi.org/10.1177/0014402918802801

33.

*Fuchs

L. S.

Compton

D. L.

Fuchs

Paulsen

Bryant

J. D.

Hamlett

C. L.

(2005). The prevention, identification, and cognitive determinants of math difficulty. Journal of Educational Psychology, 97(3), 493–513. https://doi.org/10.1037/0022-0663.97.3.493

34.

*Fuchs

L. S.

Geary

D. C.

Compton

D. L.

Fuchs

Schatschneider

Hamlett

C. L.

DeSelms

Seethaler

P.M.

Wilson

Craddock

C. F.

Bryant

J. D.

Luther

Changas

(2013). Effects of first-grade number knowledge tutoring with contrasting forms of practice. Journal of Educational Psychology, 105(1), 58–77. https://doi.org/10.1037/a0030127

35.

*Fuchs

L. S.

Powell

S. R.

Seethaler

P. M.

Cirino

P. T.

Fletcher

J. M.

Fuchs

Hamlett

C. L.

Zumeta

R. O.

(2009). Remediating number combination and word problem deficits among students with mathematics difficulties: A randomized control trial. Journal of Educational Psychology, 101(3), 561–576. https://doi.org/10.1037/a0014701

36.

*Fuchs

L. S.

Powell

S. R.

Seethaler

P. M.

Cirino

P. T.

Fletcher

J. M.

Fuchs

Hamlett

C. L.

(2010). The effects of strategic counting instruction, with and without deliberate practice, on number combination skill among students with mathematics difficulties. Learning and Individual Differences, 20(2), 89–100. https://doi.org/10.1016/j.lindif.2009.09.003

37.

*Fuchs

L. S.

Seethaler

P. M.

Powell

S. R.

Fuchs

Hamlett

C. L.

Fletcher

J. M.

(2008). Effects of preventative tutoring on the mathematical problem solving of third-grade students with math and reading difficulties. Exceptional Children, 74(2), 155–173. https://doi.org/10.1177/001440290807400202

38.

Gersten

Haymond

Newman-Gonchar

Dimino

Jayanthi

(2020). Meta-analysis of the impact of reading interventions for students in the primary grades. Journal of Research on Educational Effectiveness, 13(2), 401–427. https://doi.org/10.1080/19345747.2019.1689591

39.

*Gersten

Rolfhus

Clarke

Decker

L. E.

Wilkins

Dimino

(2015). Intervention for first graders with limited number knowledge. American Educational Research Journal, 52(3), 516–546. https://doi.org/10.3102/0002831214565787

40.

Gest

S. D.

Gest

J. M.

(2005). Reading tutoring for students at academic and behavioral risk: Effects on time-on-task in the classroom. Education and Treatment of Children, 28(1), 25–47. https://www.jstor.org/stable/42899826

41.

*Gilbert

J. K.

Compton

D. L.

Fuchs

L. S.

Bouton

Barquero

L. A.

Cho

(2013). Efficacy of a first-grade responsiveness-to-intervention prevention model for struggling readers. Reading Research Quarterly, 48(2), 135–154. https://doi.org/10.1002/rrq.45

42.

*Goudey

(2009). A parent involvement intervention with elementary school students [Unpublished graduate thesis]. University of Alberta. https://doi.org/10.7939/R3Q12G

43.

*Guryan

Ludwig

Bhatt

M. P.

Cook

P. J.

Davis

J. M.

Dodge

Farkas

Fryer

R. G.

Jr. Mayer

Pollack

Steinberg

Stoddard

(2023). Not too late: Improving academic outcomes among adolescents. American Economic Review, 113(3), 738–765. https://doi.org/10.1257/aer.20210434

44.

*Harper

Schmidt

(2016). Effectiveness of a group-based academic tutoring program for children in foster care. Children and Youth Services Review, 67, 238–246. https://doi.org/10.1016/j.childyouth.2016.06.009

45.

*Hickey

A. J.

Flynn

R. J.

(2019). Effects of the TutorBright tutoring programme on the reading and mathematics skills of children in foster care. Oxford Review of Education, 45(4), 519–537. https://doi.org/10.1080/03054985.2019.1607724

46.

Higgins

J. P. T.

Altman

D. G.

Gøtzsche

P. C.

Jüni

Moher

Oxman

A. D.

Savović

Schulz

K. F.

Weeks

Sterne

J. A. C.

(2011). The Cochrane Collaboration's tool for assessing risk of bias in randomised trials. BMJ, 343, Article d5928. https://doi.org/10.1136/bmj.d5928

47.

*Jacob

Armstrong

Bowden

A. B.

Pan

(2016). Leveraging volunteers. Journal of Research on Educational Effectiveness, 9(Suppl. 1), 67–92. https://doi.org/10.1080/19345747.2016.1138560

48.

*Jenkins

J. R.

Peyton

J. A.

Sanders

E. A.

Vadasy

P. F.

(2004). Effects of reading decodable texts in supplemental first-grade tutoring. Scientific Studies of Reading, 8(1), 53–85. https://doi.org/10.1207/s1532799xssr0801_4

49.

Juel

(1996). What makes literacy tutoring effective? Reading Research Quarterly, 31(3), 268–289. https://doi.org/10.1598/RRQ.31.3.3

50.

*Jung

P. G.

(2015). Effects of data-based instruction for students with intensive early writing needs: A randomized control trial [Unpublished doctoral dissertation]. University of Minnesota.

51.

Kraft

M. A.

(2020). Interpreting effect sizes of education interventions. Educational Researcher, 49(4), 241–253. https://doi.org/10.3102/0013189X20912798

52.

*Lachney

R. P.

(2002). Adult-mediated reading instruction for third through fifth grade children with reading difficulties [Unpublished doctoral dissertation]. Louisiana State University.

53.

*Lam

S. F.

Chow-Yeung

Wong

B. P.

Lau

K. K.

Tse

S. I.

(2013). Involving parents in paired reading with preschoolers. Contemporary Educational Psychology, 38(2), 126–135. https://doi.org/10.1016/j.cedpsych.2012.12.003

54.

*Lane

H. B.

Pullen

P. C.

Hudson

R. F.

Konold

T. R.

(2009). Identifying essential instructional components of literacy tutoring for struggling beginning readers. Literacy Research and Instruction, 48(4), 277–297. https://doi.org/10.1080/19388070902875173

55.

*Lane

K. L.

Fletcher

Carter

E. W.

Dejud

Delorenzo

(2007). Paraprofessional-led phonological awareness training with youngsters at risk for reading and behavioral concerns. Remedial and Special Education, 28(5), 266–276. https://doi.org/10.1177/07419325070280050201

56.

*Lee

Y. S.

Morrow-Howell

Jonson-Reid

McCrary

(2011). The effect of the Experience Corps® program on student reading outcomes. Education and Urban Society, 44(1), 97–118. https://doi.org/10.1177/0013124510381262

57.

*Lindo

E. J.

Weiser

Cheatham

J. P.

Allor

J. H.

(2018). Benefits of structured after-school literacy tutoring by university students for struggling elementary readers. Reading & Writing Quarterly, 34(2), 117–131. https://doi.org/10.1080/10573569.2017.1357156

58.

*Loenen

(1989). The effectiveness of volunteer reading help and the nature of the reading help provided in practice. British Educational Research Journal, 15(3), 297–316. https://doi.org/10.1080/0141192890150306

59.

*Lorenzo

S. L.

(1993). Effects of an experimental mentoring program on measures of performance of at-risk elementary students [Unpublished doctoral dissertation]. University of South Florida.

60.

*Markovitz

C. E.

Hernandez

M. W.

Hedberg

E. C.

Silberglitt

(2014). Impact evaluation of the Minnesota Reading Corps K-3 program. Corporation for National and Community Service.

61.

Marks-Anglin

Chen

(2020). A historical review of publication bias. Research Synthesis Methods, 11(6), 725–742. https://doi.org/10.1002/jrsm.1452

62.

*Marquis

(2013). Gender effects of a foster parent-delivered tutoring program on foster children’s academic skills and mental health [Unpublished graduate thesis]. University of Ottawa.

63.

Marston

(1996). A comparison of inclusion only, pull-out only, and combined service models for students with mild disabilities. Journal of Special Education, 30(2), 121–132.

64.

*Mathes

P. G.

Denton

C. A.

Fletcher

J. M.

Anthony

J. L.

Francis

D. J.

Schatschneider

(2005). The effects of theoretically different instruction and student characteristics on the skills of struggling readers. Reading Research Quarterly, 40(2), 148–182. https://doi.org/10.1598/RRQ.40.2.2

65.

*Mattera

Jacob

Morris

(2018, March). Strengthening children’s math skills with enhanced instruction. MDRC. https://www.mdrc.org/publication/strengthening-children-s-math-skills-enhanced-instruction

66.

*Mayfield

L. G.

(2000). The effects of structured one-on-one tutoring in sight word recognition of first-grade students at-risk for reading failure [Unpublished doctoral dissertation]. Louisiana Tech University.

67.

McMaster

K. L.

Fuchs

L. S.

Compton

D. L.

(2005). Responding to nonresponders: An experimental field trial of identification and intervention methods. Exceptional Children, 71(4), 445–463. https://doi.org/10.1177/001440290507100404

68.

*Mears

P. R.

(2007). The effects of the fast start program on the reading achievement of emergent and beginning readers [Unpublished doctoral dissertation]. George Fox University.

69.

*Mehran

White

K. R.

(1988). Parent tutoring as a supplement to compensatory education for first-grade children. Remedial and Special Education, 9(3), 35–41. https://doi.org/10.1177/074193258800900307

70.

*Miller

B. V.

Kratochwill

T. R.

(1996). An evaluation of the paired reading program using competency-based training. School Psychology International, 17(3), 269–291. https://doi.org/10.1177/0143034396173003

71.

*Miller

Connolly

(2013). A randomized controlled trial evaluation of time to read, a volunteer tutoring program for 8-to 9-year-olds. Educational Evaluation and Policy Analysis, 35(1), 23–37. https://www.jstor.org/stable/23356968

72.

*Miller

Connolly

Maguire

L. K.

(2012). The effects of a volunteer mentoring programme on reading outcomes among eight-to nine-year-old children. Journal of Early Childhood Research, 10(2), 134–144. https://doi.org/10.1177/1476718X11407989

73.

*Mooney

P. J.

(2004). An investigation of the effects of a comprehensive reading intervention on the beginning reading skills of first graders at risk for emotional and behavioral disorders [Unpublished doctoral dissertation]. University of Nebraska–Lincoln.

74.

Naidoo

(2009). Developing social inclusion through after-school homework tutoring. British Journal of Sociology of Education, 30(3), 261–273. https://doi.org/10.1080/01425690902812547

75.

Neitzel

A. J.

Lake

Pellegrini

Slavin

R. E.

(2022). A synthesis of quantitative research on programs for struggling readers in elementary schools. Reading Research Quarterly, 57(1), 149–179. https://doi.org/10.1002/rrq.379

76.

*Nielson

B. B.

(1992). Effects of parent and volunteer tutoring on reading achievement of third grade at-risk students [Unpublished doctoral dissertation]. Brigham Young University.

77.

*O’Connor

R. E.

Bell

K. M.

Harty

K. R.

Larkin

L. K.

Sackor

S. M.

Zigmond

. (2002). Teaching reading to poor readers in the intermediate grades. Journal of Educational Psychology, 94(3), 474–485. https://doi.org/10.1037/0022-0663.94.3.474

78.

*O’Connor

R. E.

Bocian

Beebe-Frankenberger

Linklater

D. L

. (2010). Responsiveness of students with language difficulties to early intervention in reading. The Journal of Special Education, 43(4), 220–235. https://doi.org/10.1177/0022466908317789

79.

*Parker

D. C.

Nelson

P. M.

Zaslofsky

A. F.

Kanive

Foegen

Kaiser

Heisted

(2019). Evaluation of a math intervention program implemented with community support. Journal of Research on Educational Effectiveness, 12(3), 391–412. https://doi.org/10.1080/19345747.2019.1571653

80.

Pellegrini

Lake

Neitzel

Slavin

R. E.

(2021). Effective programs in elementary mathematics: A meta-analysis. AERA Open, 7. https://doi.org/10.1177/;2332858420986211

81.

Peters

J. L.

Sutton

A. J.

Jones

D. R.

Abrams

K. R.

Rushton

Moreno

S. G.

(2010). Assessing publication bias in meta-analyses in the presence of between-study heterogeneity. Journal of the Royal Statistical Society, 173(3), 575–591. https://doi.org/10.1111/j.1467-985X.2009.00629.x

82.

Pigott

T. D.

Polanin

J. R.

(2019). High-quality meta-analysis in a systematic review. Review of Educational Research, 90(1), 24–46. https://doi.org/10.3102/0034654319877153

83.

*Pinnell

G. S.

DeFord

D. E.

Lyons

C. A.

(1988). Reading Recovery: Early intervention for at-risk first graders. Educational Research Service.

84.

*Pinnell

G. S.

Lyons

C. A.

Deford

D. E.

Bryk

A. S.

Seltzer

(1994). Comparing instructional models for the literacy education of high-risk first graders. Reading Research Quarterly, 29(1), 8–39. https://doi.org/10.2307/747736

85.

*Powell-Smith

K. A.

Stoner

Shinn

M. R.

Good

R. H.

III . (2000). Parent tutoring in reading using literature and curriculum materials. School Psychology Review, 29(1), 5–27. https://doi.org/10.1080/02796015.2000.12085995

86.

*Pullen

P. C.

Lane

H. B.

Monaghan

M. C.

(2004). Effects of a volunteer tutoring model on the early literacy development of struggling first grade students. Literacy Research and Instruction, 43(4), 21–40. https://doi.org/10.1080/19388070409558415

87.

Pustejovsky

J. E.

Rodgers

M. A.

(2019). Testing for funnel plot asymmetry of standardized mean differences. Research Synthesis Methods, 10(1), 57–71. https://doi.org/10.1002/jrsm.1332

88.

*Rebok

G. W.

Carlson

M. C.

Glass

T. A.

McGill

Hill

Wasik

B. A.

Ialongo

Frick

K. D.

Fried

L. P.

Rasmussen

M. D.

(2004). Short-term impact of experience Corps® participation on children and schools: Results from a pilot randomized trial. Journal of Urban Health, 81(1), 79–93. https://doi.org/10.1093/jurban/jth095

89.

*Ritter

Maynard

(2008). Using the right design to get the ‘wrong’ answer? Journal of Children’s Services, 3(2), 4–16. https://doi.org/10.1108/17466660200800008

90.

Ritter

G. W.

Barnett

J. H.

Denny

G. S.

Albin

G. R.

(2009). The effectiveness of volunteer tutoring programs for elementary and middle school students. Review of Educational Research, 79(1), 3–38. https://doi.org/10.3102/0034654308325690

91.

Rodgers

M. A.

Pustejovsky

J. E.

(2021). Evaluating meta-analytic methods to detect selective reporting in the presence of dependent effect sizes. Psychological Methods, 26(2), 141–160. https://doi.org/10.1037/met0000300

92.

*Schwartz

R. M.

(2005). Literacy learning of at-risk first-grade students in the Reading Recovery early intervention. Journal of Educational Psychology, 97(2), 257–267. https://doi.org/10.1037/0022-0663.97.2.257

93.

*Sibieta

(2016). REACH: Evaluation report and executive summary. EEF. https://educationendowmentfoudation.org.uk/public/files/Projects/Evaluation_ Reports/EEF_Project_Report_REACH.pdf

94.

*Sirinides

Gray

May

. (2018). The impacts of Reading Recovery at scale: Results from the 4-year i3 external evaluation. Educational Evaluation and Policy Analysis, 40(3), 316–335. https://doi.org/10.3102/0162373718764828

95.

Skeen

Laurenzi

C. A.

Gordon

S. L.

du Toit

Tomlinson

Dua

Fleischmann

Kohl

Ross

D. A.

Servili

Brand

Dowdall

Lund

van der Westhuizen

Carvajal-Aguirre

Eriksson de Carvalho

Melendez-Torres

G. J.

(2019). Adolescent mental health program components and behavior risk reduction: A meta-analysis. Pediatrics, 144(2). https://doi.org/10.1542/peds.2018-3488

96.

Slavin

Lake

Chambers

Cheung

Davis

(2009). Effective reading programs for the elementary grades: A best-evidence synthesis. Review of Educational Research, 79(4), 1391–1466. https://doi.org/10.3102/0034654309341374

97.

Slavin

R. E.

Lake

Davis

Madden

N. A.

(2011). Effective programs for struggling readers. Educational Research Review, 6(1), 1–26. https://doi.org/10.1016/j.edurev.2010.07.002

98.

Slavin

Madden

N. A.

(2011). Measures inherent to treatments in program effectiveness reviews. Journal of Research on Educational Effectiveness, 4(4), 370–380. https://doi.org/10.1016/j.edurev.2010.07.002

99.

*Smith

T. M.

Cobb

Farran

D. C.

Cordray

D. S.

Munter

(2013). Evaluating Math Recovery: Assessing the causal impact of a diagnostic tutoring program on student achievement. American Educational Research Journal, 50(2), 397–428. https://doi.org/10.3102/0002831212469045

100.

*Swanson

H. L.

Moran

Lussier

Fung

(2014). The effect of explicit and direct generative strategy training and working memory on word problem-solving accuracy in children at risk for math difficulties. Learning Disability Quarterly, 37(2), 111–123. https://doi.org/10.1177/0731948713507264

101.

Tanner-Smith

E. E.

Tipton

(2014). Robust variance estimation with dependent effect sizes. Research Synthesis Methods, 5(1), 13–30. https://doi.org/10.1002/jrsm.1091

102.

*Toste

J. R.

Capin

Vaughn

Roberts

G. J.

Kearns

D. M.

(2017). Multisyllabic word-reading instruction with and without motivational beliefs training for struggling readers in the upper elementary grades: A pilot investigation. Elementary School Journal, 117(4), 593–615. https://doi.org/10.1086/691684

103.

*Toste

J. R.

Capin

Williams

K. J.

Cho

Vaughn

(2019). Replication of an experimental study investigating the efficacy of a multisyllabic word reading intervention with and without motivational beliefs training for struggling readers. Journal of Learning Disabilities, 52(1), 45–58. https://doi.org/10.1177/0022219418775114

104.

*Vadasy

P. F.

Jenkins

J. R.

Antil

L. R.

Wayne

S. K.

(1997a). Community-based early reading intervention for at-risk first graders. Learning Disabilities Research & Practice, 12(1), 29–39.

105.

*Vadasy

P. F.

Jenkins

J. R.

Antil

L. R.

Wayne

S. K.

O’Connor

R. E

. (1997b). The effectiveness of one-to-one tutoring by community tutors for at-risk beginning readers. Learning Disability Quarterly, 20(2), 126–139. https://doi.org/10.2307/1511219

106.

*Vadasy

P. F.

Jenkins

J. R.

Pool

(2000). Effects of tutoring in phonological and early reading skills on students at risk for reading disabilities. Journal of Learning Disabilities, 33(6), 579–590. https://doi.org/10.1177/002221940003300606

107.

*Vadasy

P. F.

Sanders

E. A.

(2008a). Benefits of repeated reading intervention for low-achieving fourth- and fifth-grade students. Remedial and Special Education, 29(4), 235–249. https://doi.org/10.1177/0741932507312013

108.

*Vadasy

P. F.

Sanders

E. A.

(2008b). Code-oriented instruction for kindergarten students at risk for reading difficulties: A replication and comparison of instructional groupings. Reading and Writing, 21(9), 929–963.

109.

*Vadasy

P. F.

Sanders

E. A.

(2008c). Repeated reading intervention: Outcomes and interactions with readers’ skills and classroom instruction. Journal of Educational Psychology, 100(2), 272–290. https://doi.org/10.1037/0022-0663.100.2.272

110.

*Vadasy

P. F.

Sanders

E. A.

(2009). Supplemental fluency intervention and determinants of reading outcomes. Scientific Studies of Reading, 13(5), 383–425. https://doi.org/10.1080/10888430903162894

111.

*Vadasy

P. F.

Sanders

E. A.

(2010). Efficacy of supplemental phonics-based instruction for low-skilled kindergarteners in the context of language minority status and classroom phonics instruction. Journal of Educational Psychology, 102(4), 786–803. https://doi.org/10.1037/a0019639

112.

*Vadasy

P. F.

Sanders

E. A.

(2011). Efficacy of supplemental phonics-based instruction for low-skilled first graders. Scientific Studies of Reading, 15(6), 471–497. https://doi.org/10.1080/10888438.2010.501091

113.

*Vadasy

P. F.

Sanders

E. A.

Peyton

J. A.

(2006a). Code-oriented instruction for kindergarten students at risk for reading difficulties. Journal of Educational Psychology, 98(3), 508–528. https://doi.org/10.1037/0022-0663.98.3.508

114.

*Vadasy

P. F.

Sanders

E. A.

Peyton

J. A.

(2006b). Paraeducator-supplemented instruction in structural analysis with text reading practice for second and third graders at risk for reading problems. Remedial and Special Education, 27(6), 365–378. https://doi.org/10.1177/07419325060270060601

115.

*Vadasy

P. F.

Sanders

E. A.

Tudor

(2007). Effectiveness of paraeducator-supplemented individual instruction: Beyond basic decoding skills. Journal of Learning Disabilities, 40(6), 508–525. https://doi.org/10.1177/00222194070400060301

116.

*Vaughn

Mathes

Linan-Thompson

Cirino

Carlson

Pollard-Durodola

Cardenas-Hagan

Francis

(2006). Effectiveness of an English intervention for first-grade English language learners at risk for reading problems. The Elementary School Journal, 107(2), 153–180. https://doi.org/10.1086/510653

117.

*Vaughn

Roberts

G. J.

Miciak

Taylor

Fletcher

J. M.

(2019). Efficacy of a word- and text-based intervention for students with significant reading difficulties. Journal of Learning Disabilities, 52(1), 31–44. https://doi.org/10.1177/0022219418775113

118.

*Villiger

Hauri

Tettenborn

Hartmann

Näpflin

Hugener

Niggli

(2019). Effectiveness of an extracurricular program for struggling readers. Learning and Instruction, 60, 54–65. https://doi.org/10.1016/j.learninstruc.2018.11.004

119.

Vygotsky

L. S.

(1978). Mind in society: The development of higher psychological processes. Harvard University Press.

120.

*Wanzek

Roberts

(2012). Reading interventions with varying instructional emphases for fourth graders with reading difficulties. Learning Disability Quarterly, 35(2), 90–101. https://doi.org/10.1177/0731948711434047

121.

Wanzek

Stevens

E. A.

Williams

K. J.

Scammacca

Vaughn

Sargent

(2018). Current evidence on the effects of intensive early reading interventions. Journal of Learning Disabilities, 51(6), 612–624. https://doi.org/10.1177/0022219418775

122.

Wanzek

Vaughn

Scammacca

Gatlin

Walker

M. A.

Capin

(2016). Meta-analyses of the effects of tier 2 type reading interventions in grades K-3. Educational Psychology Review, 28, 551–576. https://doi.org/10.1207/s15327035ex1304_2

123.

Wanzek

Vaughn

Scammacca

N. K.

Metz

Murray

C. S.

Roberts

Danielson

(2013). Extensive reading interventions for students with reading difficulties after grade 3. Review of Educational Research, 83(2), 163–195. https://doi.org/10.3102/0034654313477212

124.

Wanzek

Vaughn

Wexler

Swanson

E. A.

Edmonds

Kim

A. H.

(2006). A synthesis of spelling and reading interventions and their effects on the spelling outcomes of students with LD. Journal of Learning Disabilities, 39(6), 528–543. https://doi.org/10.1177/002221940603900605

125.

Wexler

Pyle

Flower

Williams

J. L.

Cole

(2014). A synthesis of academic interventions for incarcerated adolescents. Review of Educational Research, 84(1), 3–46. https://doi.org/10.3102/0034654313499410

126.

*Wolff

(2011). Effects of a randomised reading intervention study: An application of structural equation modelling. Dyslexia, 17(4), 295–311. https://doi.org/10.1002/dys.438

127.

*Young

Pearce

Gomez

Christensen

Pletcher

Fleming

(2018). Read Two Impress and the Neurological Impress Method: Effects on elementary students’ reading fluency, comprehension, and attitude. Journal of Educational Research, 111(6), 657–665. https://doi.org/10.1080/00220671.2017.1393650

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.00 MB