Abstract
Gender disparities in STEM persist despite girls performing as well as boys academically, suggesting girls may benefit from role models who shape their perceptions of STEM. We examine whether female math tutors influence girls’ STEM interest, attendance, and performance. We randomly assigned 422 ninth-grade students taking Algebra 1 to a same-gender or opposite-gender tutor. Girls assigned to female tutors reported higher STEM interest (.73 SD) and were more likely to pass the course with a C– or better (3.9 percentage points) than those with male tutors. We found no impact on attendance. Effects were stronger for students working with tutors in-person rather than virtually. We provide the first experimental evidence that female tutors can boost girls’ STEM self-concept and academic outcomes.
Introduction
Women are less likely to enter and persist in STEM majors and careers than men (Ceci et al., 2009; Cheryan et al., 2017). This gender disparity cannot be explained away by ability—females and males perform equally well on STEM achievement assessments (Coley, 2001; Hyde et al., 2008; Keller et al., 2022). Rather, the disengagement of women may derive from their internalizing persistent, widely held misbeliefs about women's math abilities (Frome et al., 2006; Speer, 2023) or from assumptions about the appeal of math-based careers (Diekman et al., 2010). As Dasgupta and Stout (2014) noted, “Gender gaps in science and math performance have been closing, but gaps in STEM self-concept and aspirations remain large.”
Doubts about their STEM abilities and an absence of relatable role models in STEM fields can lead girls to opt out of pursuing higher-level STEM courses and careers (Heybach & Pickup, 2017; Msambwa et al., 2024; M. T. Wang et al., 2013). Early experiences and cultural contexts often contribute to this phenomenon (Halpern et al., 2007; Makarova et al., 2019). For instance, social cognitive theory (Bandura, 1977) suggests that girls’ STEM engagement is shaped by the interaction of personal beliefs, behaviors, and environmental influences, highlighting the potential for role models and supporting learning contexts to shape self-concept and motivation. School learning environments can play a pivotal part in either supporting or hindering girls’ success and interest in STEM subjects (Msambwa et al., 2024).
Girls’ interest in STEM often starts to diminish in high school (Sadler et al., 2012; Veldman et al., 2021), making ninth grade a particularly important time to intervene. Promising interventions that might enhance high school girls’ STEM interest include those that provide exposure to STEM role models (González-Pérez et al., 2020; Kyte & Riegle-Crumb, 2017), opportunities to succeed in STEM (Luttenberger et al., 2019), or access to a caring educator who can help with motivation and navigating stereotypes (Fox-Turnbull et al., 2023; N. Wang et al., 2023). These targeted supports can have a lasting impact on girls’ engagement and persistence in STEM fields (Sadler et al., 2012).
Tutoring programs in STEM provide high school students with a unique opportunity to integrate exposure to role models, opportunities for success, and personalized support from a caring educator, making it a promising approach to sustaining and even reigniting high school girls’ interest in STEM. Moreover, as tutoring gains popularity as an intervention to improve math test scores and boost students’ Algebra 1 passing rates in high school (Guryan et al., 2023), the possibility of deploying STEM tutors in schools— particularly those focused on math gains—is becoming increasingly feasible.
The growing interest in school-based high school math tutoring programs opens the door to exploring how these programs can be tailored to maximize their impact, particularly by considering the potential benefits of exposing students to demographically similar tutors. The existing literature exploring the impacts of educator-student demographic matches suggests that students can benefit academically and psychologically when they share racial or gender characteristics with their teachers, although the results are mixed (Cleveland & Scherer, 2024; Dee, 2005; Ouazad, 2014; Paredes, 2014; Wright et al., 2017). Yet, some research shows that female students who have female STEM teachers had higher test scores (Paredes, 2014) and were more likely to pursue further STEM coursework (Lim & Meer, 2017).
However, changing the demographic composition of the teacher workforce is not easy, nor can it be rapid. Conversely, the tutor workforce is smaller and has a far larger proportion of new hires than teaching; it also does not require the intensity of preservice preparation that teaching does. As a result, students can gain access to demographically matched educators at a far greater scale and far more quickly with tutors than they would be able to with teachers. We do not yet know whether working with a female STEM tutor will have similar impacts on girls as working with a female math teacher. Teachers and tutors have different roles. However, because they are both adults working in STEM, they may both be effective role models.
This study examines the impact of assigning female high school students to same-gender math tutors on students’ STEM beliefs, attendance, and math performance. Our research is guided by social cognitive theory (Bandura, 1977), role model theory (Morgenroth et al., 2015), and stereotype threat theory (Spencer et al., 1999), which collectively suggest that exposure to demographically similar role models—in this case, female math tutors—may positively shape girls’ motivation and academic outcomes in STEM. We hypothesize that girls randomly assigned to female tutors would experience greater benefits compared to those paired with male tutors. We find that girls randomly assigned to female tutors report significantly higher interest in STEM (.73 SD) than girls assigned to male tutors. Although we find no evidence that the female tutor-student gender match improved students’ school attendance, girls assigned to female tutors performed better in their math course. Specifically, they were less likely to receive a grade of D or F (3.9 percentage points). As tutoring continues to gain traction as a strategy to accelerate student learning, these results highlight that program design features, such as exposing students to demographically similar tutors, can influence outcomes and that the benefits of tutoring may extend beyond academic achievement.
Background
The Importance of Secondary School for Decreasing Gender Disparities in STEM Fields
Girls and women continue to be underrepresented in STEM fields despite decades of efforts focused on understanding and changing this pattern (Charlesworth & Banaji, 2019). Gender disparities are commonly measured by comparing jobs obtained and college majors pursued, but the problem starts well before high school graduation. Scholars have identified a range of formative sociocultural, psychological, and structural barriers that contribute to this phenomenon (Halpern et al., 2007; Msambwa et al., 2024; M. T. Wang & Degol, 2017). Girls make consequential decisions about whether they intend to pursue careers in STEM fields by the time they enter the ninth grade (Kyte & Riegle-Crumb, 2017).
To pursue STEM fields in college and as careers, maintaining interest in math during high school is crucial—particularly for girls (Kyte & Riegle-Crumb, 2017). Yet, girls’ STEM interest is more likely to diminish over the course of high school than their male classmates (Sadler et al., 2012; Speer, 2023). Girls’ drop in STEM interest during high school may result from existing gender disparities in STEM participation in the workforce, leading female students to have less exposure to same-gender STEM workers (Cheryan et al., 2017). This lack of exposure may result in a feedback loop in which female students are less likely to pursue STEM majors and careers, resulting in comparably fewer female STEM role models for girls as they make critical decisions about their academic trajectory (Ceci et al., 2009).
Given these concerning trends, scholars have highlighted secondary school as a critical period for interventions to increase girls’ interest and performance in STEM subjects (Chavatzia, 2017). Educators and peers in schools are important socializing agents who can promote or undermine girls’ beliefs about STEM (N. Wang et al., 2023). P. A. Banerjee (2016) found that female college students’ decisions to pursue STEM courses are, at least in part, due to their secondary teachers and classroom experiences, as well as their interest in, and the perceived value of, STEM subjects. High school girls exposed to women in STEM might believe that pursuing STEM in their future studies and professional careers is more viable, increasing their interest in doing so and altering their career expectations (Lv et al., 2022; Riegle-Crumb & Morton, 2017).
Schools and other educational organizations can strategically introduce female STEM role models for girls. A brief role model–focused intervention, in which female role models described overcoming challenges and emphasized the importance of the subject, increased academic performance and persistence in one STEM course (Herrmann et al., 2016). Other interventions have emerged in recent years to increase the proportion of female students in STEM fields (González-Pérez et al., 2020; Prieto-Rodriguez et al., 2020). The Girls Who Code organization, for example, provides hands-on STEM learning experiences by pairing girls with female mentors in STEM fields to inspire girls to pursue STEM education (Piloto, 2023).
Background Literature on Teacher-Student Demographic Match
Interventions that focus on exposing female students to female role models in STEM as a way to enhance STEM interest and aspirations build on many of the same theoretical assumptions as the educator-student match literature—that demographically similar role models can be important for changing the beliefs and trajectory for those who feel traditionally marginalized in a field (Blake-Beard et al., 2011). A few studies leverage random variation in student assignments to teachers to estimate the effects of these matches (Blazar, 2024), while most rely on quasi-experimental methods to estimate the impact of being assigned to a same-race or same-gender teacher.
Much of the evidence base on student-teacher match focuses on racial matching (Redding, 2022), which likely stems from the historic and ongoing efforts to address racial inequality in education (Cabral-Gouveia et al., 2023; Coleman, 1968). Studies on teacher-student racial matching, such as those by Dee (2004, 2005) and Gershenson et al. (2021), demonstrate that students, particularly Black students, experience academic and social benefits when they are taught by teachers of the same race or ethnicity.
In contrast, the research on the effect of teacher-student gender matching on student outcomes is mixed (Cho, 2012; Dee, 2007; Ehrenberg et al., 1995; Sokal et al., 2007). One U.S.-based study found that female teachers were more effective, in general, and that they were particularly effective for increasing girls’ math performance (Hwang & Fitzpatrick, 2021), and several international studies show that girls exposed to female math teachers score higher on math assessments (Lim & Meer, 2017; Paredes, 2014). These effects can persist beyond a single year: Lim and Meer (2020) found that the impact of having a female math teacher in middle school increased the likelihood that girls took advanced STEM courses and aspired to do so in college. However, Cho (2012) finds weak evidence for a student-teacher match effect in a study of 15 OECD countries. Additionally, other studies in the United States, Canada, and the UK show no significant impact of gender matching on student achievement (Ehrenberg et al., 1995; Francis et al., 2008; Sokal et al., 2007).
Most of the teacher-student demographic-matching literature focuses on academic achievement, such as grades or test scores, but a few studies assess how race and gender matches improve students’ perceptions of classroom relationships and aspirations. For instance, Scherer and Cleveland (2024) found that same-race, same-gender teachers can improve students’ attendance and interpersonal self-management. Additionally, Egalite et al. (2015) showed that same-race, same-gender teacher-student matches increased the likelihood that middle school students considered attending college. The gender gap in future STEM aspirations already exists by middle school (M. T. Wang & Degol, 2017), and the results from this study indicate that exposure to demographically similar teachers during secondary school matters.
Researchers have proposed several mechanisms through which having a demographically similar educator might positively impact student outcomes. These mechanisms include exposure to role models, which can inspire students by showcasing paths of seemingly similar people (Dee, 2005; Steele, 1997), and mitigation of stereotype threat, where negative preconceptions about students’ identity can hinder their confidence and performance (Beasley & Fischer, 2012). Additionally, shared experiences and cultural understanding between students and educators can foster a sense of belonging and support (Hart, 2020; Purdie-Vaughns et al., 2008). Finally, demographic similarity can contribute to the development of more positive and trusting relationships, which are often key to student engagement and success (Hughes et al., 2008).
Overall, the existing research highlights the potential for demographic alignment to shape not only academic outcomes but also students’ perceptions and aspirations. By extending the lens of research beyond the traditional classroom teacher, we can explore how similar dynamics might influence the effectiveness of other adults in students’ lives, such as tutors, in fostering student interest and success in STEM. The present study considers, specifically, the extent to which girls may benefit from being exposed to female math tutors in high school.
The Potential of Female Math Tutors to Increase Girls’ STEM Interest and Performance
The primary role of tutors is to deliver personalized, relationship-based instruction, uniquely positioning them as educators who may be able to inspire students and boost their interest and performance in STEM. Tutors meet with students one-on-one or in small groups and, as a result, may have more intensive, personalized time with students than teachers. This dual role of tutors as both academic and motivational role models raises the question: Could the benefits of demographic matching often observed with classroom teachers extend to tutors, who occupy a distinct instructional role in students’ educational experiences? To date, no research has estimated the effects of such demographic matching between tutors and students. In this section, we discuss why high school math tutoring is a promising intervention point, whether a demographic match with a tutor may be more or less impactful than that with a teacher, and why girls might benefit from having female math tutors.
The Promise of High-Impact Tutoring for Boosting High School Math Achievement
The vast research base documenting the effectiveness of tutoring in accelerating student achievement (Dietrichson et al., 2017; Nickow et al., 2024) has helped tutoring become a popular approach for districts to combat stagnating student growth and widening achievement gaps as a result of the pandemic (Biden Administration, 2024). Evidence for math tutoring in early high school is particularly robust, with large-scale randomized controlled trials demonstrating consistent gains for students (Bhatt et al., 2024; Guryan et al., 2023). While an early study focused solely on high-impact tutoring for boys (Guryan et al., 2023), additional research has shown that intensive tutoring is effective for girls as well (Bhatt et al., 2024).
The positive results from high school math tutoring initiatives come from studies evaluating programs that have features associated with “high-impact tutoring,” which include high-dosage (three or more sessions per week), making use of formative assessments and other data to monitor student learning and drive instruction, aligning session materials with school curriculum, providing formal tutor training and ongoing supports, and prioritizing working with a consistent tutor (Cortes et al., 2025; Robinson & Loeb, 2021). This last feature—students working with a consistent tutor—facilitates relationship-building between a student and their tutor. As the tutor becomes an additional caring adult in a student's schooling experience, the tutor-student demographic match may impact students. Districts may engage in other tutoring initiatives, like on-demand tutoring; however, less evidence supports the effectiveness of these approaches (Huffaker et al., 2025).
Considering the Influence of Demographically Similar Tutors
Like teachers, tutors take on an instructional role for students, but differences between tutors and teachers might influence whether a student's demographic match with a tutor is more or less impactful than with their teacher. On one hand, having a demographically similar tutor could be less impactful because students tend to spend more time with classroom teachers over the course of the year than with their tutor, changing the “strength” of the treatment (Wedenoja et al., 2022). On the other hand, teachers often teach 30 students in one class at a time, and over 100 students over the course of a school year. Tutoring is, by definition, conducted one-on-one or in small groups. Thus, students and tutors meet frequently and have personalized interactions that might make the demographic match more salient (Dietrichson et al., 2017). Students also have many experiences with teachers over the course of their K–12 education (Havik & Westergård, 2020), whereas tutors may serve as a nonteacher role model, making their impact more novel. In addition, educators’ perceptions matter, and their own beliefs toward demographically similar students might influence their expectations (Gershenson et al., 2016) or positively bias their grading (Muntoni & Retelsdorf, 2018). We have little evidence on the difference in perceptions and beliefs between teachers and tutors.
How Female Math Tutors Can Empower Girls in STEM
The theorized mechanisms from many of the racial and gender match studies could explain why a female student might benefit from having a math tutor who shares her gender. First, math tutors may act as a behavioral role model, demonstrating what is possible for high school girls taking foundational math courses (Morgenroth et al., 2015). Female tutors may not only show what is possible for girls but also that pursuing math can be exciting (Riegle-Crumb & Morton, 2017). Second, girls in math may face stereotype threat, in which the stereotypes about gender and math ability can negatively impact girls’ performance, confidence, and interest in math-related fields (Spencer et al., 1999). Even if stereotype threat does not impact K–12 girls’ math achievement (Flore & Wicherts, 2015), girls may not pursue advanced math courses if they do not see themselves as “math people.” In particular, they may lose confidence or interest in math as they start high school, and, as a result, interventions that provide girls at the secondary level with positive math role models—such as a female tutor—can improve their math self-concept and performance (Good et al., 2003). Finally, girls may perceive that female tutors have higher expectations for them than male tutors (Gershenson et al., 2016), and they may develop more positive relationships with female tutors than with male tutors (Roorda & Jak, 2024), which may impact their subsequent STEM interest and engagement. These mechanisms provide a compelling rationale for investigating whether female math tutors can help mitigate barriers to STEM engagement and achievement for girls.
The Present Study
The present study examines how same-gender tutors influence students’ STEM beliefs, attendance, and math performance. Specifically, we focus on the implementation of a high school tutoring initiative targeting ninth-grade students taking Algebra 1. For many districts across the United States, students must pass Algebra 1 to graduate from high school (Dee & Huffaker, 2024; Reys et al., 2007). Moreover, students’ performance in Algebra 1 affects the STEM courses they can take in the future (Silver, 1997). Therefore, interventions that support girls’ Algebra 1 performance and encourage positive experiences may have outsized impacts on their ability and interest to pursue STEM careers (Dee & Huffaker, 2024).
To conduct this study, we partnered with a large New England school district and Saga Education (Saga), a tutoring provider, for the 2021–22 school year. Saga is one of the few secondary school tutoring providers with robust evidence of the program's success (Bhatt et al., 2024; Guryan et al., 2023). The district contracted with Saga to deliver tutoring to ninth-grade students during math lab, an in-school math intervention block for all ninth-grade students. The program was designed for students to receive tutoring in pairs; however, the group sizes varied based on the number of students in an intervention block and the number of tutors. Students were assigned to tutoring groups ranging from one to three students.
Saga’s tutoring program operated at five high school sites; school principals could opt in to having the program at their school. Although the initial plan was for tutoring to be offered in-person, during the school day, the tutoring provider had trouble hiring enough local tutors. As a result, tutoring was ultimately offered in-person at two schools and virtually at three schools.
We ask: Do female students who are matched with a same-gender math tutor have:
1) more positive STEM-related beliefs than those who do not have a same-gender tutor?
2) better attendance?
3) improved academic outcomes?
To answer these questions, we randomly assigned student tutoring groups to be matched with a female or male tutor. Given the variation in the program across schools, we also explore whether the effect of a same-gender math tutor on STEM interest varied by instructional modality. This analysis sheds light on whether demographic similarities hold the same weight in virtual classrooms as they do in brick-and-mortar classrooms. Similarities may be less obvious when people are not physically in the same place, though many of the hypothesized mechanisms for the impact of demographic matching (e.g., perceived shared experiences, role modeling) may still hold (Darling-Aduana, 2021).
Methods
We preregistered our study design and research questions on the Registry for Effectiveness in Education Studies (REES, see Appendix B). Implementation challenges led us to deviate slightly from our preregistration, which we outline in Appendix C.
Sample
Saga delivered math tutoring in five high schools during the 2021–22 school year. All ninth-grade students taking Algebra 1 were eligible to participate in tutoring during math lab, a math intervention block. To be eligible for the study (i.e., be randomized to a tutor), students had to be present in math lab and be assigned to a tutoring group by November 1, 2021. Ultimately, 606 out of 1,327 ninth-grade students enrolled at the five sites received tutoring. Of those 606 students, 146 (30.1%) started tutoring after November 1 and thus were not part of the initial randomization or had a nonbinary tutor (see Appendix C).
Our final analytic sample included 422 students and 23 tutors. In the first column of Table 1, we provide descriptive statistics for students (Panel A) and tutors (Panel B). Half of the students were girls and 39 percent of tutors were female. Forty-five percent of students identified their race as White and 33 percent identified as Black, while 76 percent identified their ethnicity as Hispanic/Latino. Among the tutors, approximately 34 percent identified as White, 43 percent identified as Black, and about 2 percent identified as Asian. The race and ethnicity variables for tutors were constructed from an item that allowed tutors to report multiple race/ethnicities and are therefore not mutually exclusive. Approximately 89 percent of students were eligible for free-and-reduced-priced lunch (FRPL), 18 percent have individualized education plans (IEP), and 37 percent are classified as having Limited English Proficiency (LEP).
Student and Tutor Characteristics
Note. Unit of analysis in Panel A is student and Panel B is tutor. In auxiliary regression, joint F-tests show that variation in the covariates is not explained by the independent variables (see first row of Table 1).
Data
The school district provided data on math grades, school attendance, student characteristics, and achievement data from the prior school year (i.e., 2020–21). The district did not administer an end-of-year exam in 2021–22 due to the pandemic. We received beginning- and end-of-year Foundational Skills Assessment (FSA) scores (Saga’s assessment), tutoring attendance, tutor characteristics, and an end-of-year student survey from the tutoring provider.
Student and Tutor Gender
We used data on student and tutor gender in the randomization procedure. We observe the gender of every student to be either male or female. 1 One tutor reported their gender was nonbinary; however, this tutor was excluded from the analytic sample because no sessions were observed with the non-binary tutor after November 1. To estimate our primary effect of interest, we compare female students with female tutors to female students with male tutors. The data also allow us to compare boy students with male or female tutors to female students with male tutors, as well as boy students with a male tutor to boy students with a female tutor.
Student Outcomes
We observed five outcomes: STEM interest, tutoring session attendance, school absences, end-of-year tutoring administered FSA, and earning a math grade of C or above. The STEM interest outcome is the standardized average of three items on the end-of-year student survey about the usefulness of math, interest in pursuing STEM in college, and interest in pursuing a STEM career (α = .53). Table 2 includes the items and mean response for the sample. Appendix Table A1 provides more details on the response categories. Because the Cronbach's alpha for the three-item scale was relatively low, we also examined the effect on each item individually (see Appendix Table A6).
STEM Interest Items
Note. The response options for each item varied from 1 to 5. See Appendix Table A1 for details. M = mean; SD = standard deviation.
We assess tutoring attendance and school absences after the official start of tutoring (November 1). Collecting attendance data was challenging for many districts during the 2021–22 school year, the second year of the pandemic. While we include the results from our analyses on school attendance, as it was one of our preregistered outcomes, we acknowledge that the quality of the data is questionable.
We also observe two measures of academic achievement. The tutoring provider assessed math competency using their FSA, which they administer several times over the course of the year. We use district grade data to construct a binary measure equal to one if their ninth-grade math GPA in quarters 2, 3, and 4 is a C or above (i.e., GPA > 1.7). 2 We use grades from quarters 2, 3, and 4 because students were selected for tutoring after the first-quarter grades were issued. Ninth-grade Algebra 1 is often a “gateway” course that has implications for students’ future academic trajectories and requires a minimum grade to demonstrate college and career readiness. Earning a C– or better in their ninth-grade math course is also practically important because it is an indicator of eligibility for progressing through the high school math curriculum.
Missing Outcome Data
We observe math grades and student attendance (i.e., school, tutoring) for all students. 3 However, only half of the students in the analytic sample completed either the final student survey that included the STEM interest items (204 students; 48.3 percent) or the end-of-year FSA (195 students; 46.2 percent). District administrators and Saga staff attributed low completion rates to high rates of chronic absenteeism during the pandemic, students not consistently attending their math intervention blocks, or students exiting the program early. Table A2 in the appendix describes the proportion of the 422 students missing outcome data overall and by condition.
We find no evidence that completion of the tutoring provider's end-of-year student survey and FSA differed across the treatment arms for this study. Moreover, we even find little evidence that completion of the tutoring provider's end-of-year student survey and FSA differed based on student background characteristics (see Tables A3 and A4 in the appendix). 4 The joint F-tests testing for systematic missingness based on observable characteristics are not statistically significant.
Student Characteristics
Our primary model includes controls for student characteristics, including prior academic achievement, observed in both school administrative and program data. Student characteristics include binary indicators for whether a student was White, eligible for FRPL, had an IEP, and was classified as LEP. We create the prior achievement factor using achievement data measured prior to random assignment: students’ math GPA in the first quarter of the 2021–22 school year, a binary indicator for whether a student received a low score on their eighth-grade summative math assessment (STAR; 2020–21), beginning-of-year FSA, and first diagnostic FSA (measured prior to November 1). 5 We create this factor because we do not observe eighth-grade summative assessments for about 25 percent of students in 2020–21. We use Q1 math GPA and beginning-of-year FSA to impute missing values.
Randomization
The district enrolled students in the tutoring program on a rolling basis starting at the beginning of the 2021–22 school year in September through the end of October. Once students were enrolled in the tutoring program and assigned to a tutoring group, we used data on gender (e.g., students, tutors), school, and math lab assignments (i.e., class in which tutoring took place) to randomly assign students to tutors. Because the tutoring provider hired tutors to work at specific schools, the randomization was stratified within school (for tutors) and by math lab classroom (for students). We structured the randomization procedure to maximize the likelihood that girls had a 50% chance of being assigned to a female tutor, in order to maximize our power for estimating the same-gender effects for girls. We assigned students to their primary tutor by November 1, 2021, by which date almost all students who would be tutored had begun to participate in tutoring sessions. 6
Students in each condition—same gender or different gender—had similar characteristics and achievement prior to treatment. Columns 2–5 in Table 1 show the student characteristics and student achievement measured prior to the beginning of the program by condition. Students across conditions had statistically similar math grades in the first quarter, eighth-grade math assessment scores in spring 2021, prior achievement factor, and school attendance in 2020–21. The evidence of balanced prior achievement and attendance supports the validity of the randomization procedure.
Compliance with random assignment was high for students and tutors. Table 3 describes the proportion of sessions that students (i.e., boys, girls) had with tutors of their randomly assigned gender. Girls who were assigned a female tutor had 78.15 percent of their sessions with a female tutor, compared with girls who were assigned male tutors, who had 22.22 percent of their sessions with a female tutor. Student compliance with assignments was similarly high across the conditions, indicating that any noncompliance is unlikely to bias our estimates. Based on administrative records, we can infer several reasons for the observed noncompliance, including tutors being absent, being reassigned to different schools, or leaving their positions. To adjust for the noncompliance that did occur, we run treatment-on-the-treated (TOT) analyses in which we instrument the proportion of sessions students attended with tutors who shared their randomly assigned tutor gender with students’ original condition assignment.
Compliance With Treatment
Note. Each row is the proportion of tutoring sessions that a girl or boy student had with their randomly assigned tutor gender. Standard deviations in parentheses.
Estimation Strategy
We estimate the effects of same-gender matching in a linear regression framework following the equation:
where Y equals the outcome of interest for student i in class c in school s. GStu × Ftut is an indicator equal to one if a student is a girl and was randomly assigned a female tutor and zero otherwise. BStu × Mtut and BStu × Ftut are equal to one if a boy was randomly assigned a male or female tutor, respectively. The reference category is girls who were assigned to male tutors. The primary effect of interest is β1, which describes the effect of a female tutor on a girl's outcomes compared to a girl with a male tutor. X’ is a vector of exogenous student characteristics, including binary indicators for White, FRPL, IEP, and LEP. The student characteristics control for elements of students’ identities that are correlated with systematic differences in STEM interest. A’ is a vector of prior achievement covariates, including math Q1 grade, an indicator for low 2021 STAR test, and the prior math achievement factor. The baseline student achievement covariates account for underlying academic proficiency and the other outcomes (e.g., STEM interest, changes in student outcomes). δ is a classroom fixed effect for each math lab classroom c, which accounts for time-invariant characteristics, including student sorting, curriculum, and teacher efficacy. u is the error term robust to clustering by school. Our primary analysis takes an intent-to-treat (ITT) approach using students’ original tutor gender assignment to eliminate the likelihood that selection bias impacts our estimates.
We preregistered our model to focus specifically on whether female students benefit more from working with female tutors than with male tutors. This specification does not test whether students in general perform better with same-gender tutors, though by changing the reference category, we can estimate outcomes for boy students working with male versus female tutors (see Appendix Table A5). Nor does it address the related but distinct question of whether girl student–female tutor matches produce greater benefits than boy student–male tutor matches, which would require an interaction term between student gender and tutor gender. Our modeling choices reflect our primary interest in how female tutors affect girls’ math outcomes.
In addition to the ITT analysis, we conduct a TOT analysis in which we instrument the proportion of tutoring sessions spent working with a female tutor with random assignment to work with a female tutor. We estimate 2SLS models following:
In the first stage, the endogenous variable is the proportion of sessions that girls who were assigned to female tutors actually occurred with female tutors (
The second stage estimate (β1) adjusts for imperfect compliance to treatment or the extent to which girls who were assigned female tutors instead participated in sessions with male tutors.
Finally, to test differences in the effects of same-gender tutors between in-person and virtual tutoring, we estimate the impacts of tutor–student gender matches separately by tutoring modality. The tutoring program delivery differed by school site so that some sites had in-person tutoring while others had virtual tutoring.
Results
Table 4 displays the results from our ITT analysis, where the reference category is female students assigned to a male tutor. We find that girls assigned to female tutors developed an interest in STEM that was .73 standard deviations (SD) higher than girls assigned to male tutors. The results are consistent across the scale and each individual item (see Appendix Table A2). 7 Figure 1 depicts the raw mean of students’ STEM interest on the original scale by condition. As illustrated visually, we do not find evidence of a general gender match effect on interest in STEM—male students do not report increased STEM interest when matched with a male tutor versus a female tutor (see Appendix Table A5). We also see an increase in math achievement as a result of the female gender match, where 51.8 percent of female students working with male tutors earned an average math grade of C– or better compared to 55.6 percent of girls working with female tutors (i.e., a four percentage point increase, .08 SD). 8 We do not find evidence that the girl student–female tutor gender match increased end-of-year FSA scores, school attendance, or tutoring session attendance. The attendance estimates are imprecise, potentially due to low-quality data.
Intent-to-Treat Estimates
Note. The reference category for each estimate is girl student–male tutor. Sample size and average outcomes for girls with a male tutor are the same for Panels A and B. Independent and control variables are observed for all cases. Student characteristics include binary indicators for FRPL, IEP, LEP, and White. Prior achievement includes math Q1 grade, indicator for low 2021 STAR test, and prior math achievement factor.
Standard errors are clustered by school. +<.10, *p < .05, **p < .01, ***p < .001.

STEM interest by condition.
Table 5 presents the results of the TOT analysis, which estimates the impact on student outcomes for students if they received all sessions with tutors of their randomly assigned gender. The first stage F-statistic (i.e., Kleibergen-Paap) is statistically significant and greater than 37 for each outcome, and the Cragg-Donald Wald F statistics all exceed 100 (Stock & Yogo, 2002), indicating, not surprisingly, that randomization sufficiently affected gender matching for the two-stage approach to effectively estimate the effects. The estimated effects for female students working with female tutors is a .98 SD increase in interest in STEM and a 5 percentage point increase (.10 SD) in the likelihood of earning an average of a C– or better in math for the rest of the school year than those working with male tutors.
Instrumental Variables: Estimates Adjusted for Compliance With Treatment
Note. The endogenous variable is the proportion of sessions that girls who were assigned to female tutors had with female tutors, and the instrument is a binary indicator for whether a student was a girl assigned to a female tutor. Independent and control variables are observed for all cases.
Standard errors are clustered by school. +<.10, *p < .05, **p < .01, ***p < .001.
Instructional Modality: In-Person vs. Virtual
Table 6 presents the results separately by whether tutoring sessions were held in-person or virtually. Panel A restricts the sample to in-person programs, and Panel B restricts the sample to virtual programs. The positive impact of female tutors on girls’ STEM interest is largely driven by students in schools with in-person tutoring (.62 SD) compared to those in schools with virtual tutoring (.31 SD, ns). The same pattern holds for the impact of the female gender match on student math course grades: girls who worked with female tutors in-person were 2.7 percentage points (.05 SD) more likely to earn at least a C– in math than girls who worked with male tutors, while girls who worked with female tutors virtually were only 1.4 percentage points (.03 SD) more likely to earn at least a C– in math than girls who worked with male tutors. 9 Although the female gender match effects appear to be driven by students in schools offering in-person tutoring, these results may be a function of unobserved sorting into schools. Students were not randomly assigned to in-person or virtual tutoring; instead, instructional modality was determined at the school level, and any differences in effects may reflect other school-level factors.
The Effect by Instructional Modality: In-Person vs. Virtual
Note. All models include classroom fixed effects and controls. Virtual = virtual tutoring; in-person = in-person tutoring. All models include student characteristics, prior achievement, and classroom fixed effects.
Standard errors are clustered by school. +<.10, *p < .05, **p < .01, ***p < .001.
Discussion
Through a randomized controlled trial, we tested the impact of tutor gender match on female students’ STEM beliefs, achievement, and attendance outcomes. We find that girls benefit from being assigned to a female tutor. Female tutors substantially increased girls’ interest in STEM relative to male tutors (SD = .73). Additionally, we find evidence that the female tutor–tutee gender match confers academic benefits. Girls assigned to work with female tutors were 3.9 percentage points more likely to earn at least a C– in math compared to girls assigned to male tutors. We do not find statistically significant effects on the tutoring provider assessment or attendance outcomes, though attendance was likely poorly measured. Overall, our results show that high school girls can benefit from working with female math tutors.
This research contributes to the literature on the benefits of an additional female STEM role model in girls’ lives. Much research has focused on STEM mentors in college (M. Banerjee et al., 2018) or gender match between students and teachers (Blake-Beard et al., 2011; González-Pérez et al., 2020; Prieto-Rodriguez et al., 2020); however, as far as we know, this is the first study to assess the impact of demographically similar tutors on student outcomes. Moreover, we focus on a critical time period: the outset of high school, when girls may fall off track to pursuing STEM careers (Sadler et al., 2012; M. T. Wang & Degol, 2017). Our survey outcomes also suggest one of the mechanisms as to why a female STEM role model might increase girls’ persistence in STEM—an increased valuing and interest in the subject.
Female tutors may be more readily able to increase girls’ interest in STEM and their subsequent math performance when they interact in person. One potential reason for this difference is that the salience of the tutors’ gender may be stronger in person. Another explanation might be that tutors are better able to connect with students and communicate in person. When it comes to tutoring in virtual settings, some hypothesize that virtual environments can increase stress for participants by increasing “close” eye contact and constant viewing of a video reflection (Bailenson, 2021), perhaps making connection or relationship-building more difficult.
Limitations
Limitations of the data and context presented a few challenges for this project. First, we only observe the STEM interest outcome for students who took the end-of-year survey. Although we do not find consistent differences between students for whom we observe and do not observe the STEM interest outcome on observable pre-intervention covariates, unobservable differences between these groups of students are possible. Thus, our STEM interest findings apply to a group of students engaged enough to complete an end-of-year survey, and we do not know if they would extend to all students who received tutoring.
Second, our capacity to examine effects on achievement beyond grades is also limited because ninth graders in the district do not take a standardized assessment. We find trending positive effects of the student-tutor gender match for girls on tutoring-provided assessments and math grades; however, a standardized assessment could provide more precise and valid results. Third, we are also not able to estimate the long-term effects of the female student-tutor gender match. A next step for research on this topic might focus on showing how the female student-tutor gender match impacts course-taking behavior and labor market decisions.
Fourth, our results suggest that the positive effects of female tutors on female students are driven by students attending the schools where instruction occurred in person. Ultimately, however, we cannot disentangle whether features of specific schools (e.g., the context or quality) or the modality drive the effects. Future research can formally test this by randomly assigning students to receive tutoring in-person versus virtually.
Fifth, our data allows us to focus narrowly on students’ school-based math-related experiences when considering interest in STEM. While mathematics instruction and support from teachers and tutors are undoubtedly important, students’ interest in STEM may also be shaped by their experiences in science classrooms. We find that the impact on STEM interest holds when accounting for students’ science teachers (see Appendix Table A7), which we would expect given the randomization. However, future studies can broaden the scope to include the contributions of science instruction.
Finally, we acknowledge the limitations of quantitative research for operationalizing gender. The limitations of our data necessitate the description of gender as a binary. Although tutors had the opportunity to report a gender other than male or female, student administrative data is limited to male or female. As a result, our conceptualization of how gender perceptions impact STEM beliefs and achievement is limited, and we hope future research can explore how tutoring influences the STEM beliefs of students who are nonbinary or gender nonconforming, as well as those who belong to the LGBTQIA+ communities.
Implications for Policy
Educators and researchers have long recognized the benefits of role models for student development, particularly among student subgroups that experience persistent opportunity gaps (e.g., Black students and females in STEM). Yet, teacher labor markets are difficult to change. Teachers often stay in their jobs for many years—which is a benefit for students—and, as a result, only a small fraction of the workforce changes each year. Moreover, teachers need not only college degrees but substantial additional preparation to obtain the certifications needed for teaching.
The tutor workforce is far more flexible. By finding that gender-matched tutors can have positive effects on female students’ interests and achievement, this study offers a faster, easier way to access the benefits of exposure to demographically similar educators. Education policies and practices can leverage a broader group of educators as role models—and as effective instructors—for students.
At the same time, we do not believe simply matching students and educators—including tutors—based on demographics is a viable policy solution. Although we observe that male student outcomes were similar no matter their tutors’ gender, there are potential negative externalities of students only being exposed to demographically similar educators. Instead, we propose that tutoring providers consider the communities they work in and strive to recruit a diverse set of tutors that reflect the backgrounds of the students they serve.
Our findings are consistent with social cognitive theory (Bandura, 1977), role model theory (Morgenroth et al., 2015), and stereotype threat theory (Spencer et al., 1999), which together suggest that exposure to female educators can strengthen self-efficacy, counter negative stereotypes, and enhance STEM motivation and achievement for girls. At the same time, future research should explore how to ensure that girls feel empowered to persist in STEM regardless of the gender of their tutor, so that supportive contexts and instructional practices—not only demographic match—drive equitable outcomes.
Putting our findings in the context of the gender imbalance in STEM fields, in particular, policies and practices could address this need by ensuring that enough female tutors are engaging in math or other STEM subjects. And if this type of high-impact tutoring is embedded into the school day in high school, girls would be exposed to the mentors they need to develop and sustain interest and achievement in STEM courses.
Our results are consistent with the idea that tutoring has beneficial effects beyond just achievement and that tutors can serve as an additional caring educator in a student's learning trajectory. However, the scope of tutoring programs can vary greatly. This program prioritized students meeting frequently with a consistent tutor, which likely enhanced relationship development. For tutoring programs to be effective, they need to be strong programs.
No single solution will erase the gender gaps in STEM perpetuated by societal structures. Doing so will necessitate many strategies at different levels; we find that exposing female students at the outset of their high school career to female math tutors may be one promising practice for mitigating gender inequality in our education system.
Supplemental Material
sj-pdf-1-aer-10.3102_00028312261417332 – Supplemental material for The Impact of Tutor Gender Match on Girls’ STEM Interest, Engagement, and Performance
Supplemental material, sj-pdf-1-aer-10.3102_00028312261417332 for The Impact of Tutor Gender Match on Girls’ STEM Interest, Engagement, and Performance by Joshua Bleiberg, Carly D. Robinson, Evan Bennett and Susanna Loeb in American Educational Research Journal
Footnotes
Acknowledgements
We are grateful to our partners in this research, including Saga Education. We especially thank Cathryn Cook for implementation and data support. We received insightful feedback and support from the National Student Support Accelerator team—in particular, Nancy Waymack.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: We thank the Overdeck Family Foundation, Smith Richardson Foundation, Walton Family Foundation, and Arnold Ventures/Accelerate for their support of our full research program.
Notes
J
C
E
S
