Abstract
This article presents evidence on linear, nonlinear, and cumulative gender peer effects on test scores. The study utilizes exceptional Swedish data containing the history of the gender composition of students’ classrooms from first to ninth grades. The analysis builds on school fixed effects with the added advantage of observing within-school variation in gender composition across actual classrooms. In contrast to what is often suggested in the literature, results show that gender composition does not uniformly affect boys and girls. More female-dominated classrooms slightly increase girls’ and decrease boys’ test scores, but these effects mainly concern students in classrooms with very skewed gender distributions. Moreover, effect sizes are very small, suggesting that classroom gender composition should not be a primary policy concern except when imbalances are large and cumulatively sustained. Findings also underscore the importance of accounting for nonlinearities and cumulative effects in research on gender peer effects.
Peer interactions represent a crucial element of the school environment, and their influence on students’ academic outcomes has been well documented since the Coleman Report (Coleman et al. 1966). Most of the peer effects literature in education focuses on the effects of peer quality (proxied mostly by ability/performance) on students’ own school performance (Paloyo 2020; Sacerdote 2011). The effect of peers’ gender composition has received considerably less attention despite gender being associated with not only school performance but also school-related attitudes and behavior (OECD 2013, 2019). Through these differences, classrooms’ gender composition may shape the learning climate students experience (Lavy and Schlosser 2011) and the context for social comparison of one's own abilities and goals (Jonsson and Mood 2008; Modena, Rettore, and Tanzi 2022) while also possibly influencing the prevalence of gender stereotypes in the school context (Schneeweis and Zweimüller 2012). Students may thus perform differently in classrooms with different gender compositions, making gender peer effects on educational outcomes compelling to study.
The second motivation for estimating gender peer effects is that gender is a predetermined variable that cannot be affected by peers. This helps us circumvent the simultaneity bias (Manski 1993) present in most peer effects research. For this reason, some prior work has used gender to capture peer quality (Hoxby 2000). Third, the question of gender peer effects might have important policy implications in terms of optimal allocation of female and male students in schools or classrooms. Such research might also provide insight into the consequences of the disruption of gender balance in coeducational classes.
Hoxby’s (2000) and Lavy and Schlosser's (2011) seminal gender peer effect studies led to the narrative that a higher proportion of girls within the compulsory school context affects individual school performance positively for boys and girls (see Sacerdote 2011). However, both studies measure zero effects in many specifications. Subsequent articles have also produced findings inconsistent with this narrative (Briole 2021; Gottfried and Graves 2014; Hu 2015; Proud 2014). Precisely and consistently estimated gender peer effects are lacking in the Western context for two reasons: Previous studies either used survey data with limited sample sizes (making the estimation of nonlinearities and subgroup effects difficult) or relied on cohort-to-cohort variation in gender composition within schools. Between-cohort variation might be a good proxy for between-classroom variation in gender composition, but in practice, classroom composition arguably matters more because peer interactions are stronger at this level.
In this article, I estimate the effect of peer gender composition 1 on test scores in compulsory education (sixth and ninth grades) using full population data from Sweden, where the gender composition of classrooms is registered throughout students’ compulsory school career. I link the gender composition of classrooms to standardized test scores in three core subjects (mathematics, Swedish, English) at two time points (age ≈12/grade six and age ≈15/grade nine). I estimate gender peer effects separately for boys and girls and for each of the three subjects.
The comprehensive nature of the data allows for several contributions to the literature. First, I measure the gender composition of classrooms precisely, without having to use the gender composition of school cohorts. This allows the empirical analysis to use between-classroom, within-school variation in gender composition (using school fixed effects), relying on the assumption that classroom assignment of students based on gender composition is conditionally random.
Second, I contribute by exploring nonlinear effects. This perspective has been overlooked in the literature, although methodology-focused studies point out that linear-in-means models might not be the most suitable to capture peer effects in the educational context (Burke and Sass 2013; Sacerdote 2011). I ask if there are thresholds over which gender peer composition matters more or less beyond the average effect of the percentage of female students in class. Full-population data have enough statistical power to measure even small effects with high precision and are likely to contain detailed information on even very small subgroups (e.g., students exposed to very skewed gender contexts).
Third, I introduce a new approach to the literature by analyzing cumulative effects of peer gender composition. Previous work assumes that peer gender composition at one point in time should affect test scores contemporaneously. Learning, however, is a cumulative process, and to the extent that gender composition affects learning, the history of exposure to same- and opposite-gender peers might matter. The administrative data sets I use contain the classroom gender composition students experienced from preschool class to grade six or grade nine, so I am able to study if the entire history of exposure to a certain classroom composition affects test scores. The possibility of this identification is an important motivation behind looking at the Swedish case.
The study is exploratory in nature. The combination of three subjects, two grades, and two genders, each studied in terms of contemporaneous and cumulative effects, means that I estimate a large set of different coefficients. By juxtaposing these results against each other, we can learn about whether gender peer effects vary along meaningful dimensions or reflect inconsistent and unsystematic differences in the Swedish, and in a broader sense, the Nordic, context. Peer effects are likely to be context dependent (Paloyo 2020), so external validity might be limited to other Western countries with high gender equality and with an education system that features comprehensive primary and lower secondary education without tracking. However, the study still offers conclusions that may be valuable in other contexts.
My results point toward more varied gender peer effects than what is often suggested in the literature: On average, in classrooms with more girls, girls’ test scores tend to slightly increase, and boys’ test scores tend to slightly decrease. However, these effects are driven by a few students in classes with very skewed gender distribution. This article also shows that gender peer composition affects test scores cumulatively, although by a small magnitude. This highlights that research on peer effects in other contexts should incorporate cumulative in addition to contemporaneous effects more rigorously. Finally, the estimates are not just academically interesting: The rather small effect sizes I find suggest that in a gender-equal and comprehensive school context, classroom gender composition ought not to be a primary parameter informing policy except, perhaps, when imbalances are large and sustained over time.
Why should we expect and focus on gender peer effects?
During adolescence, peer influence gains in importance, influencing student attitudes and behavior (Laursen and Veenstra 2021). Gender also becomes a prominent layer of identity during adolescence (Galambos 2004; Lobel et al. 2004). Students at this age are most susceptible to gender stereotypes, and peer influence peeks at around age 14 (Laursen and Veenstra 2021; Steinberg and Monahan 2007). Exploring gender peer effects seems to be most ideal during adolescence because they might have the strongest effect on individual outcomes during this time.
Multiple mechanisms are possible, and most of them relate to differences in cognitive and noncognitive skills between genders and their effects on classroom learning environments. These differences are partly the product of developmental differences between genders in adolescence (Perry and Pauletti 2011), underscoring the importance of studying peer effects for girls and boys separately.
Ability Effects
On average, in most Western countries, girls perform better than boys in most school subjects (Buchmann, DiPrete, and McDaniel 2008; Voyer and Voyer 2014), meaning that one potential mechanism behind gender peer effects is positive ability peer effects, whereby students benefit from being in classes with higher ability peers (Burke and Sass 2013; Hanushek et al. 2003; Hoxby and Weingarth 2005; for Sweden, see Diemer 2022; Sund 2009). High-performing peers might act as role models: Students may learn from them, be stimulated to compete with them (Sacerdote 2011), or be motivated to study toward certain educational goals/routes (Fletcher 2012; Rosenqvist 2018). There is some evidence that positive ability/role model effects are stronger from peers of the same gender because students identify and spend more time with same-gender peers (Modena et al. 2022; Pagani and Pica 2021). Stronger same-gender ability effects are corroborated by the findings of Rosenqvist (2018) in the Swedish context.
Negative ability peer effects are also possible. At any given achievement level, a student surrounded by higher achieving peers will be among the worst performing students in class. Through social contrast mechanisms, students judge their own achievement and academic capacity in contrast to their peers’ achievement, akin to the big-fish-little-pond effect (Jonsson and Mood 2008). Consequently, having more female, on average higher achieving peers can decrease students’ academic self-concept (Fang et al. 2018), their school performance (Attewell 2001; Dobbie and Fryer 2014), and as evidence from the Swedish context suggests, their likelihood to make high-aspiring educational choices (Jonsson and Mood 2008).
However, as Hoxby’s (2000) and Whitmore’s (2005) results suggest, the percentage of girls in a classroom also matters after taking the average achievement of all peers into account. Mouganie and Wang (2020) show that peer gender composition has an effect on educational outcomes beyond peer abilities. Hence, ability peer effects cannot fully account for gender peer effects, which is a good reason for not interpreting gender composition merely as a proxy for ability composition, as is customary in many studies (Sacerdote 2011). Gender peer effects should be studied in their own right.
Learning Climate
More female-dominated classrooms might have a better learning climate. Girls tend to demonstrate higher average levels of noncognitive skills that are important facilitators of educational success: attentiveness, self-discipline, and interest in learning (Buchmann et al. 2008). These may help those who possess them (Buchmann et al. 2008; DiPrete and Jennings 2012) but can also create a better learning climate and less disruption for peers in the classroom (Lavy and Schlosser 2011). Male students, on the other hand, are more disruptive than girls, on average (Zill and West 2001), and have more exposure to friends who demonstrate resistance to schooling even in the Swedish context (Geven, Jonsson, and van Tubergen 2017). Disruptive behavior can negatively affect both own and peers’ performance by slowing the pace of instruction (Figlio 2007; Lazear 2001). Borgen, Borgen, and Birkelund (2023) corroborate these mechanisms in the Nordic context.
Better classroom behavior can also lead to positively biased teacher assessments of girls’ performance (DiPrete and Jennings 2012). In a female-dominated learning environment, teachers may favor practices that benefit girls, including relying on homework, independent learning, and effort-based feedback (Briole 2021). These practices might affect girls’ and boys’ school progress differently, beyond teacher-assigned grades.
Stereotypes
Stereotypes are partly related to the learning climate because students behave or change their orientation toward school according to socially reinforced stereotypes (e.g., boys are more disruptive, competitive, and naturally gifted; girls are less easily distracted, cooperative, and have to work hard to learn; see Legewie and DiPrete 2012).
Other stereotype-related mechanisms are possible, too. Studies report gender-stereotypical subject-specific orientations in which boys have more positive attitudes toward math and sciences and girls have deeper engagement in reading-related tasks internationally and in Sweden (OECD 2013, 2019). These phenomena are likely reinforced in the classroom, and students may have better performance in gender-typical subjects when their class is dominated by their own gender.
Studies from other European contexts have also found stereotype mechanisms that contradict the aforementioned mechanisms. Girls might be less likely to behave according to gender stereotypes if they are surrounded by more female peers. In such contexts, girls are less risk-averse and more competitive (Booth and Nolen 2012a, 2012b; Niederle, Segal, and Vesterlund 2013), and based on findings from Austria, where gender equality is comparable to Sweden (UNDP 2024), they are also less likely to choose female-dominated vocational schools (Schneeweis and Zweimüller 2012). No such outcome has been reported so far for boys. Schneeweis and Zweimüller (2012) suggest this is because traditional female stereotypes are dampened in settings with a high share of female peers, which could potentially mean more competition and better performance in math among girls in more female-dominated classrooms.
Summary of Mechanisms
Specific hypotheses regarding the size and direction of gender peer effects cannot be formulated based on these theoretical mechanisms. We can only expect the different channels to push the gender peer effects in different directions (see Table 1), although these mechanisms cannot be tested in my data. It seems plausible that both girls’ and boys’ test scores are affected positively by a greater share of girls in the classroom through the positive ability channel and negatively by the negative ability channel. Learning climate in female-dominated classes might affect test scores positively for all via less disruption and positively for girls and negatively for boys via teacher bias. Finally, if traditional stereotypes are reinforced, girls’ literacy scores might increase and boys’ math scores might decrease in a more female-dominated classroom (and boys would do better in math in male-dominated classes). But if stereotypical behavior is dampened, it might lead to better math performance among girls. We do not know if the stereotype dampening effect for boys could also prevail in male-dominated settings. The latter would mean a more male-dominated classroom would increase boys’ literacy test scores (thus, a more female-dominated one would decrease them).
Summary of Different Mechanisms behind Gender Peer Effects.
Evidence On Gender Peer Effects
Hoxby’s (2000) pioneering work on the United States and Lavy and Schlosser's (2011) on Israel document mixed results regarding gender peer effects. Hoxby (2000) focuses on Grades 3 to 6 and Lavy and Schlosser (2011) on Grades 5, 8, and 10. Depending on the specification, they find either null or positive effects of a greater female proportion in school cohorts on test scores for both genders. Importantly, however, the majority of estimates they report, albeit positive, are imprecise and nonsignificant. Furthermore, the statistically significant effects are small to moderate: For every 10 percentage-point increase in the proportion of girls in the school cohort, Hoxby (2000) finds 2 percent of a standard deviation increase in test scores, and Lavy and Schlosser (2011) find a corresponding increase of 3 percent to 8 percent of a standard deviation.
The narrative that a higher share of female students generally leads to better test scores remains prevalent, although subsequent articles from different countries find even more mixed results (for a detailed overview of results from studies directly comparable to the present one, see Tables A1 and A2 in the online supplement). Studies show zero or positive effects of more girls in a class on girls’ test scores across various cultural contexts (United States, Western Europe, China) and more consistently so for math than for English (Briole 2021; Fang et al. 2018; Gong, Lu, and Song 2021; Gottfried and Graves 2014; Proud 2014). The effects on boys’ performance are highly variable, with some studies showing no effects for the United States (Gottfried and Graves 2014), negative effects in Western Europe (Briole 2021; Proud 2014), and positive effects in China (Gong et al. 2021; Hu 2015). The high variability in effects across countries may be due to a general sensitivity to specification but might also have to do with the different country settings. Masculinity or femininity norms and cultural valuing of different subjects might vary between the United States, Europe, and China. Additionally, most estimations relying on between-classroom variation in gender composition do not cover Western Europe, only China (Gong et al. 2021; Hu 2015) and the United States (Gottfried and Graves 2014).
Gender peer effect studies from the Nordic countries are scarce. Black, Devereux, and Salvanes (2013) find that more female peers lead to more years of schooling for girls and fewer years for boys in Norway. Borgen et al. (2023) estimate gender composition effects on student outcomes using a pooled sample by gender, also using Norwegian data. They find an overall negative average effect of a more female-dominated school cohort, decreasing student performance and the likelihood of attending an academic upper secondary track (with estimates being robust to using a value-added specification).
Results on nonlinear gender compositional effects are limited. Using a U.S. sample, Hoxby (2000) finds that gender balance or heavy female domination (at least 66 percent female peers) is beneficial in most gender–school subject combinations, but Proud’s (2014) and Briole’s (2021) results for Western Europe suggest that heavy female domination might hurt boys’ test scores. These estimates come from cohort-variation studies; we have no estimates using between-classroom variation in gender composition.
Given the evidence from the Nordics and Western Europe, we should not expect positive effects from more female-dominated classes for both genders because negative effects are also likely, especially for boys’ outcomes.
To date, all prior work in this vein tests whether gender peer composition has a concurrent effect on test scores. However, analyzing the cumulative effects of exposure to a certain peer composition in the classroom seems important. For instance, the effects of cumulative exposure to peer stress (Agoston and Rudolph 2016), bullying by peers (Evans et al. 2019), or older peers (Lam, Marteleto, and Ranchhod 2013) have been shown to influence students’ behavior. Furthermore, Modena et al. (2022) suggest that the exposure to high-performing female peers has long-term effects on educational attainment several years later due to higher human capital accumulation. Briole (2021) also finds long-term gender peer effects on educational attainment, suggesting the persistence of improved learning-related noncognitive skills as a mechanism. Although these studies do not test cumulative peer effects directly, the idea of peer gender composition affecting test scores in a persistent and accumulative manner is in line with the suggested mechanisms and with the cumulative nature of the educational production function (Hanushek 1979).
The Swedish Educational Setting
Sweden is one of the most gender-equal countries (UNDP 2024). It is a welfare state with a standardized and nonstratified educational system (Hällsten and Yaish 2022), emphasizing equality and inclusivity. Compulsory school consists of grades one to nine. 2 Most children start grade one in August the year they turn 7 and end grade nine in June the year they turn 16. All schools are coeducational, and all students follow the same national curriculum. Grades are given in school grades six to nine. Tracking in primary and lower secondary school is not allowed. This is an important system feature to consider because any within-school tracking/grouping based on achievement could affect gender composition. Classes are often regrouped at grade four and grades six or seven, but a class can also remain intact for a student's full school career. In the spring of sixth and ninth grades, every student takes national tests. The tests are a tool to ensure equality in grading across schools, and they are used as part of the individual-level grading.
Students attend most lessons together with the same classmates. Families can choose among municipal schools or “free schools” (independent schools), and if the demand for a school exceeds supply, different rules of priority apply (e.g., geographic closeness in the case of municipal schools; queue time in the case of free schools). Both municipal and independent schools are tuition-free and financed by municipal and compensatory state funding (schools are less dependent on the tax base in the local area; OECD 2017).
Data And Descriptive Statistics
The data are drawn from Swedish administrative registers and cover students who participated in the Swedish national tests (Nationella prov) in sixth or ninth grade from 2013 to 2019, comprising seven cohorts in each grade. Swedish, English, and mathematics test scores come from the test score registers. Test scores originally reflected six performance categories (A–F). Because the interval between the two lowest categories (fail/accept) is larger than the other intervals, I converted the categories to test scores 0 to 20 in a way that reflects the actual differences (see the distributions in Table A3 in the online supplement). In the analysis, I use all test scores z-standardized (M = 0, SD = 1) per grade cohort and school subject. 3
I used the compulsory school registers to gather compositional data for each student's classroom throughout their compulsory school career. Classroom IDs are provided for each year, but these IDs are not consistent across years, which complicates identifying a student's classroom membership across years (i.e., class 8F in a school may not be the same as next year's 9F in the same school). I identified if a student was in the same classroom between years based on the classroom composition, and I regard two classrooms as the same in two consecutive years if at least 70 percent of students overlap in them. Students who repeated grades or are in all-male/female classes are excluded.
Classroom sizes originally ranged from 1 to 154/144 students, with an average class size of 25 and 26 students in sixth and ninth grades, respectively (see Figure A1 in the online supplement). The largest classrooms were in schools where grade cohorts are not divided into fixed classes, which means the number of students in the grade cohort corresponds to that in the class. I restricted my sample to students in classrooms with 10 to 40 people, excluding 9 percent and 6 percent of students in sixth and ninth grades, respectively. The final contemporaneous sample consists of 622,000 and 619,000 observations in sixth and ninth grades, respectively, with roughly 80,000 to 100,000 students in a grade-cohort in both levels. Table 2 shows descriptive statistics for test scores in the contemporaneous sample by grade level and gender, pooled across cohorts.
Descriptive Statistics on Test Scores in the Main Contemporaneous Sample (Students in 10- to 40-Person Classes), by Grade Level and Gender.
Part two of the online supplement contains further details on sample restrictions and descriptive statistics on test scores by cohort and on students’ social background. I also present descriptive statistics on samples with different class sizes to show that restricting the database to 10- to 40-person classes does not change the descriptives for any of the presented variables compared to the sample with all class sizes.
The main explanatory variable of my study is the share of female students in a class. In the first part of the analysis, I examine contemporaneous effects using the percentage of female classmates (excluding ego) in sixth or ninth grade as the main explanatory variable. Figure 1 shows the distribution of this variable in the sample with 10- to 40-person classes (see the averages by grade level and gender in Table 2). Approximately 62 percent to 64 percent of students have 40 percent to 60 percent female classmates (the distribution is similar even when including extremely small and big classes). Almost 99 percent of students in classes with 10 to 40 people have between 20 percent and 80 percent female classmates, which leaves us with a narrow 1 percent of students who experience an almost all-male or all-female classroom.

Distribution of students in classes with 10 to 40 students by percent female classmates.
It is important to note how much of the variation of peer gender composition and test scores is between schools, (within-school) grade cohorts, and classrooms. Intraclass correlations show that class level explains a higher share of the variation in all outcome variables than does grade cohort level, which accounts for around 10 percent in all cases (see Table A8 in the online supplement). Besides, there is larger variation in classroom gender composition than in cohort gender composition at both grade levels (see Table A9).
In the cumulative part of the analysis, the main independent variable expresses the number of years a student has spent in a female- or male-majority classroom up until sixth or ninth grade. I use two cutoffs for majority, 60 percent and 70 percent, and I count all years the student has spent with at least 60 percent or 70 percent girls (or boys) in the classroom up until sixth and ninth grades.
To get a sample where students are reliably comparable to each other based on their cumulative exposure variable, I created the cumulative sample from students who can be observed for at least six years in the sixth-grade analysis and nine years in the ninth-grade analysis. This restriction constrains the number of cohorts we can use because classroom composition data are only available from the 2008–2009 school year onwards (cohorts for sixth grade: 2014 to 2019; for ninth grade: 2017 to 2019). For consistency with the contemporaneous analyses, the cumulative sample includes only students who spent all six or all nine school years in a class with 10 to 40 people (keeping around two thirds of students from the relevant cohorts in sixth and ninth grades). Tables 3 and 4 show the distribution of students in the cumulative sample by the number of years they spent in a classroom with at least 60 percent or 70 percent female or male students.
Distribution of Students by Number of Years Spent in a Female-Dominated Classroom.
Note: Number of years of exposure is top coded in ninth grade due to a very low number of observations with a value above seven years. Cumulative sample: observed for six/nine years and in classes with 10 to 40 students every year until grade six or nine.
Distribution of Students by the Number of Years They Spent in a Male-Dominated Classroom.
Note: Number of years of exposure is top coded in ninth grade due to a very low number of observations with a value above seven years. Cumulative sample: observed for six/nine years and in classes with 10 to 40 students every year until grade six or nine.
Because classes are often regrouped in fourth grade and in sixth or seventh grade, classroom stability varies in the cumulative sample. As shown in Table 5, most students in sixth grade have had a stable class since first or fourth grade (or do not have a stable class at all), and most students in ninth grade have had a stable class since seventh or sixth grade. These data show that regrouping in the latter case started in sixth grade (although only for a narrow sample of students), and most students were regrouped in seventh grade. So, we are not looking at completely new classes in grade six. I use the class stability measure in the cumulative models to control for differences in the timing of regrouping among students.
Distribution of Students by the Number of Years of Classroom Stability.
Note: N = 1 in sixth (ninth) grade means stable class since grade five (eight). Cumulative sample: observed for six/nine years and in classes with 10 to 40 students every year until grade six or nine.
Empirical Strategy
Identification in the Contemporaneous Analysis
According to Manski (1993), peer outcomes can be positively correlated with each other due to the effect of peers’ characteristics on ego's behavior (exogenous/contextual peer effects), the effect of peers’ behavior on ego's behavior (endogenous peer effects), and the unobserved correlated characteristics of peers (correlated effects).
Identifying causal peer effects in observational studies presents some challenges that are well known in the literature. The first is the reflection or simultaneity problem, that is, the difficulty of disentangling the effect peers have on ego from the effect ego has on peers (Manski 1993). This is particularly problematic if peer effects are proxied by peer ability because in that case, not only is individual ability affected by average peer ability, but at the same time, the peer average is also affected by individual ability. This bias is not inherently present if peer effects are measured by a preassigned student characteristic, such as gender, because individuals’ and peers’ gender cannot affect each other simultaneously. But controlling for peer ability would introduce this bias back to the analysis. A peer ability control would not only potentially filter out parts of the gender peer effect running through the performance channel, but it would do so in a biased way because of the reflection problem. For this reason, I decided to capture the full extent of the gender compositional effect in one parameter, acknowledging that beside the contextual effect, it may incorporate an endogenous part running through performance of peers of a particular gender. In effect, this estimates a “reduced form” parameter, collapsing peer effects into one estimate (cf. Sacerdote 2011).
The second challenge in analyzing peer effects is the issue of selection: Peers might endogenously sort into peer groups based on shared characteristics that also determine the outcome. For instance, higher performing girls might be more likely to end up in female-majority classes. This issue might be handled by showing that assignment into peer groups is as good as random. To this end, some gender peer effect studies utilize cohort-to-cohort variation in gender composition (Black et al. 2013; Borgen et al. 2023; Briole 2021; Hoxby 2000; Lavy and Schlosser 2011; Proud 2014), and others use random classroom assignment (Gong et al. 2021; Gottfried and Graves 2014; Hu 2015). I chose the latter by showing that gender composition between classes is as good as conditionally random.
I check the within-school gender composition between classes and cohorts for randomization because it is variation at this level that I rely on throughout the analysis. It seems plausible that this variation is exogeneous because in the Swedish school system, families can rarely influence which class their children are sorted into at school starting age. However, even if parents did influence classroom assignment, any sorting is unlikely to be correlated to a classroom's gender composition because this is not known to parents beforehand. Nonrandom sorting can occur, however, if the school assigns students to classes based on some characteristic correlated with gender or when students change classes between grades. Another potential source of bias could be if more competent teachers prefer (and have the ability) to be allocated to female-majority classes. This problem, however, cannot be tested in the data.
To test whether sorting into classes is random based on gender, I use sorting and balancing tests suggested by de Gendre and Salamanca (2020) on my main contemporaneous sample. In sorting tests, I regress students’ gender on the share of female classmates controlling for the school-level leave-out mean of the share of girls (as suggested by Guryan, Kroft, and Notowidigdo 2009). Statistically significant coefficients indicate gender-based sorting between classes. In balancing tests, I regress the share of female classmates on observable student characteristics. Significant results indicate that gender composition is correlated with characteristics that might also affect test scores. School fixed effects and class size are controlled for throughout (for more details on the tests, see part three of the online supplement).
The sorting tests suggest that sorting into classes is random in ninth grade, but there is some sorting in sixth grade. The balance tests show some slight imbalance (smaller for ninth grade) based on observable characteristics. Given the small point estimates, however, I argue that the imbalance is unlikely to be large enough to confound my estimates. Still, I handle my sample as conditionally random and control for social background in my models to account for any imbalance and to achieve more precision with smaller standard errors. Causality relies on the restrictive assumption that the background characteristics I use control for all unobservable factors determining assignment. If there is sorting based on unobservables I do not completely control for, it would bias my results upward, meaning my coefficients may be slightly inflated estimates of the gender peer effect.
In my analysis, I test two contemporaneous specifications, a linear and a categorical one:
where the indices i, j, and k stand for student i, in school j, in grade k (sixth or ninth). Y denotes standardized test scores in mathematics, Swedish, or English. Variable femperc stands for the percentage of female students among one's classmates calculated for students individually (discarding ego). It is a continuous variable, expressed in units of 10 percentage points for readability. Cat_femperc is the categorical version with three categories for the percentage of female classmates (<30 percent, 30 percent to 70 percent, and >70 percent). The baseline category throughout is 30 percent to 70 percent. 4
Controls include Nclass, meaning the number of students in one's class (including ego). I control for class size because it proxies important compositional factors that relate to the share of girls in the classroom (see part two of the online supplement).
Because the analysis takes an exploratory approach, I have no specific hypotheses about differences across subjects, grades, or genders except for the expectations regarding the directions in which different mechanisms might push the effects (see Table 1). I get many estimates that I do not adjust for multiple comparisons, and I treat the results more in terms of tendencies and patterns rather than identifying only significant estimates as the ones showing effects in the analysis. Nevertheless, standard errors are useful indicators of uncertainty.
Robustness checks
I test dropping social background controls from the analysis, and I also show results for the unrestricted sample, which includes classes with fewer than 10 and more than 40 students. Finally, I test a linear specification using the within-school, between-cohort variation in gender composition as the explanatory variable, which, as Hoxby (2000) suggests, largely circumvents the issue of sorting into classes at the expense of a more precise peer group identification. In this test, the size of the school grade is controlled for, and standard errors are clustered at the grade level.
Identification in the Cumulative Analysis
I rely on the student-level, within-school, between-class variation in cumulative exposure to female- or male-dominated classrooms. I use the following school fixed-effect model:
Here, cumul_femperc is specified as the unbroken number of years a student has spent in classrooms with >60 percent or >70 percent female (or male) classmates until grade six or nine. I use the number of years of exposure in linear form, assuming that longer exposure to a supermajority of girls (boys) has a linearly accumulating effect on school performance. Besides using the same social background controls as in the contemporaneous models, I control for class stability 5 (cumul_cmstab), which is the unbroken number of years a student has spent in the same class.
Causality in this analysis relies on multiple assumptions to make sure the variation in the cumulative history of exposure to different peer compositions is as good as (conditionally) random. First, all observable and unobservable variables that could determine selection into peer groups with a certain exposure history should be controlled for. In my models, I use the social background variables to capture the observable factors, assuming that concurrent social background is a good proxy for students’ past social background and that these variables are the only observable factors that determine selection. I also use school fixed effects to control for the unobservable factors of selection into schools. Regrouping students into new classes is possible within schools, but I assume this is not correlated to the experience of female- or male-dominated classes through mechanisms other than the ones that could sort based on contemporaneous gender composition. I apply the same controls as in the contemporaneous part of the analysis. A potential threat to this identification is the possibility that exposure to a certain share of girls (boys) in earlier grades influences the likelihood of moving between classes or affects later regrouping processes within schools.
Robustness checks
I run two sets of sensitivity tests. I check if dropping the social background controls alters the results for both cutoffs. I also test robustness by using the broken number of exposure years.
Results
Contemporaneous Effects of Classroom Gender Composition
The linear model
Figure 2 shows that a larger share of female classmates tends to have a positive linear effect on girls’ test scores and a negative effect on boys’ test scores. However, effect sizes are not entirely consistent and small, with 10 percentage points more girls in a class corresponding to a 0.4 percent to 1.3 percent of a standard deviation difference in test scores.

Linear effects of the share of female classmates (in units of 10 percentage points) on girls’ and boys’ test scores in sixth and ninth grades.
The robustness test using within-school, between-cohort variation in gender composition shows similar, although less precisely estimated effects (see robustness test results in Figure A3 in the online supplement). These estimates point in the same direction and have similar relative magnitudes as the main results, which confirms that any bias related to sorting into classes based on gender composition is negligible in the main results.
The categorical model
Relaxing functional form assumptions of linearity, the categorical models compare the effects of being in a class with at least 70 percent male or female classmates to a reference category with 30 percent to 70 percent female classmates. The results for girls are the most consistent (see Figure 3). Girls tend to benefit in all subjects from having at least 70 percent same-gender classmates compared to having a more balanced classroom with 30 percent to 70 percent female students, with effects sizes around 3 percent to 6 percent of a standard deviation. Being in a heavily male-dominated classroom, however, does not have any meaningful effect on girls’ test scores. Results for boys have the same magnitude as for girls but are inconsistent. There is a weak tendency toward male supermajority affecting boys’ test scores positively, but there are several exceptions.

Nonlinear (categorical) effect of classroom gender composition on girls’ and boys’ test scores in sixth and ninth grades.
In robustness tests, I use the categorical model without social background controls, and I also test if including classes with <10 or >40 students changes the original estimates. In general, results are more sensitive to dropping social background variables than including all class sizes, especially for girls. Although including all class sizes largely replicates the results, excluding social background increases effect sizes but general tendencies remain. Detailed output tables from the contemporaneous analysis and robustness tests are in part four of the online supplement.
Cumulative Effects of Classroom Gender Composition
Exposure to cumulative female-majority classrooms
Figure 4 shows that sixth-grade girls tend to benefit from spending an additional year in a classroom with a supermajority of girls, whereas boys tend to get lower test scores by doing so, which is in line with the contemporaneous results. However, these estimates have substantial uncertainty due to limited power when using 70 percent female classmates as a cutoff. Effect sizes are similar in magnitude to the linear estimates (0.5 percent to 1.2 percent of a standard deviation).

Cumulative effects of a female-majority classroom on test scores, sixth grade.
Figure 5 shows that in ninth grade, an extra year of cumulative exposure to a female supermajority classroom increases girls’ test scores in Swedish and English. The effects are bigger if the cutoff for female majority is higher (70 percent instead of 60 percent), and the coefficients are also larger than the corresponding ones from sixth grade (0.7 percent to 2.2 percent of a standard deviation). For boys, however, ninth-grade results show no systematic tendencies.

Cumulative effects of a female-majority classroom on test scores, ninth grade.
Based on robustness checks testing sensitivity to social background, the cumulative results seem to be less sensitive to sorting than the contemporaneous ones. Further robustness tests using the broken number of years of exposure to female-majority classrooms instead of the unbroken measure replicate the main results for sixth grade (see Figure A4 in the online supplement). In ninth grade, the robustness tests reproduce positive tendencies for girls, although with less power. Boys’ ninth-grade results are not robust to specification change (see Figure A5).
Exposure to cumulative male-majority classrooms
Figure 6 shows that exposure to cumulative male-majority classrooms tends to increase Swedish test scores for both genders in sixth grade. In ninth grade, this exposure benefits all test scores of male students, with bigger effects for the 70 percent cutoff, although there is no effect for female students (Figure 7). Effect sizes range 0.6 percent to 2.7 percent of a standard deviation.

Cumulative effects of a male-majority classroom on test scores, sixth grade.

Cumulative effects of a male-majority classroom on test scores, ninth grade.
Similar to the results from the analysis of cumulative female majority, these results are also robust to sorting based on social background in both grades. For sixth grade, results are robust to using the broken number of years of exposure (see Figure A6 in the online supplement). For ninth grade, however, results only tend to be robust to this specification change for boys—and only in terms of tendencies because estimates lose their power (see Figure A7). Detailed output tables from the cumulative analysis and its robustness tests can be found in part five of the online supplement.
Summary And Discussion Of Results
My results from the Swedish context show that in contrast to what prior work suggests, peer gender composition does not have uniform effects for boys and girls (see Tables 6 and 7 for easier comparison). Girls tend to benefit from more female peers (especially in mathematics and English), whereas boys’ test scores tend to decline in classrooms with more girls (especially in Swedish). Although precisely estimated, effect sizes are very small, with linear effects corresponding to a 0.4 percent to 1.3 percent of a standard deviation difference in test scores for a 10 percentage-point increase in the share of female classmates. Expressed in terms of an extreme switch from a classroom with 10 percent girls to one with 90 percent girls, the estimated effects correspond to 3 percent to 10 percent of a standard deviation difference in test scores. Given that the standard deviation in most raw test scores is around 5 points, this extreme switch would not result in a larger change than 0.5 points. However, the difference between two passing grades is 2.5 points.
Summary of the Effects of a More Female-Dominated Classroom from Different Specifications.
Note: Nonzero effects are indicated if the point estimate is significant (at the 5 percent level). The plus (minus) sign at the categorical model indicates that either the effect of female supermajority is positive (negative) or the effect of male supermajority is negative (positive). The plus (minus) sign at the cumulative model means the model with either the 60 percent or 70 percent threshold for female majority (or both) shows a positive (negative) effect on test scores. Cumulative effects in parentheses are not robust to different specifications.
Summary of the Effects of Cumulative Exposure to Male-Majority Classrooms.
Note: The plus (minus) sign means the model with either the 60 percent or 70 percent threshold for male majority (or both) shows a positive (negative) effect on test scores. Effects in parentheses are not robust to different specifications.
The nonlinear model revealed that the contemporaneous effects are primarily concentrated in classrooms with very skewed gender distributions (70 percent or above of either boys or girls), thus affecting few students. In practice, very few school classes will have such skewed gender distributions, so these effects will have negligible overall effects on a school cohort in a coeducational and comprehensive school system. However, evidence suggests larger peer ability spillovers when students are grouped by ability (Sacerdote 2011), and some prior work finds stronger class composition effects in tracked educational systems (Dollmann and Rudolphi 2020; Dronkers, Van Der Velden, and Dunne 2012). Consequently, the small effects I find in classes with skewed gender distributions might be more pronounced in a tracked school system (or might even show in classes with nonskewed compositions).
The cumulative analysis shows largely the same tendencies as the contemporaneous one for both genders: Girls tend to benefit from a longer exposure to female supermajority, whereas the effects for boys are rather negative. Longer exposure to male-majority classrooms showed less robust results, with a weak tendency of positive effects on boys’ test scores. Although effect sizes are small in the cumulative analysis too, results suggest that cumulative exposure to a certain class gender composition might matter for school performance. Given that estimates from the Swedish context are likely lower bounds of the effects from less gender-equal and tracked systems, research in other contexts might benefit from focusing not only on the test-taking environment but also on the effects of the longer term learning environment.
In terms of direction, most of the contemporaneous effects corroborate findings from the Western European context on positive effects from same-gender domination (Briole 2021; Proud 2014), but they provide a more comprehensive picture by showing estimates by grade, subject, gender, and linearity. Positive same-gender effects are in line with the theory according to which peer influence in adolescence promotes similarity and conformity (Laursen and Veenstra 2021) and which suggests greater peer influence from same-gender peers. The largely negative effects of a more female-dominated classroom for boys align with Modena et al. (2022) and Pagani and Pica (2021), who find that a larger share of same-gender high-achiever peers increases educational outcomes and a larger share of opposite-gender high-achiever peers decreases educational outcomes. My findings also align with Black et al. (2013) and Hill (2015), who document negative opposite-gender peer effects in education.
In terms of effect sizes, my results are on the lower side among results from Western countries, close to the European estimates (Briole 2021; Hoxby 2000; Lavy and Schlosser 2011; Proud 2014), and much smaller than estimates from the Chinese context (Gong et al. 2021; Hu 2015). Given these comparisons (and the previous discussion about tracked systems), external validity might be extended to other Western countries but arguably with higher confidence to those with comprehensive, nontracked educational systems. A limitation of the study (similar to most studies in the gender peer effects literature), however, is that contextual differences cannot be tested in a one-country setup.
The second limitation is that the theoretical mechanisms of peer ability, learning climate, and stereotypes cannot be tested. Although most channels predicted positive effects for girls from a more female-dominated classroom, the negative effects on boys are potentially explained by only a few mechanisms. For instance, ability contrast mechanisms or biased assessment due to worse behavior could affect boys’ learning negatively. Reinforced stereotypes can also decrease boys’ progress in the verbal domain—a potential mechanism behind the negative effects on Swedish test scores. Given the high level of gender equality in Sweden, it is possible that stereotypes have an even more pronounced effect in other, less gender-equal countries.
Another limitation of the article is that any bias from endogenous teacher allocation or from students’ endogenous moves between classes could not be tested. First, the presence of teacher bias would be unlikely to change the conclusions about the small effect sizes and the absence of overall effects for most students in relatively balanced classrooms given that such a bias would result in overestimated effects (and true effects would be even smaller). Second, endogenous student moves between classes might not affect my results, as suggested by the robustness tests using cohort-to-cohort variation in peer composition. Besides, assuming that the cohort-based approach handles selection well, as suggested by the literature (Briole 2021; Hoxby 2000; Lavy and Schlosser 2011), its similarity in the direction and relative magnitude of estimates to the results relying on classroom composition strengthens the causality claims of this article, suggesting that selection into classes is not a significant issue regarding the main and more precisely estimated results.
Conclusions
This study explored the effects of peer gender composition on students’ test scores in compulsory education in Sweden. Previous research suggests that a higher share of female peers at school affects girls’ and boys’ school performance positively. Many studies underlying this narrative, however, either relied on survey data with limited sample sizes or failed to capture the most relevant level of variation in peer gender composition: variation between classrooms. Using full population registers from Sweden, where students’ classroom composition is observed throughout their entire school career, this study uses an unusually reliable measure of exposure to gender composition in classrooms and gives precise estimates for gender peer effects in linear, nonlinear, contemporaneous, and cumulative forms in a gender-equal context characterized by nontracked schools up until ninth grade.
Relying on these precisely estimated effects, I conclude that classroom gender composition appears to matter for test scores in a nonhomogeneous and nonlinear way for girls and boys. However, the effects are very small, particularly for the range of gender distributions the vast majority of students experience. This suggests that the role of gender composition on student performance has potentially been overstated, and it should not be of primary concern for policy except, perhaps, when imbalances are large and sustained over time. Of course, gender composition may still have an effect in noncomprehensive school systems or less gender-equal contexts, or it may affect other outcomes, such as school engagement or study choices. Knowing the multifaceted potential impact of classroom gender composition, it would be ill-advised to give nuanced policy recommendations here, especially given that I did not test mechanisms. Regardless, researchers should consider that we might have been thinking about gender peer effects in a limited way: by assuming linearity and only contemporaneous effects and not analyzing them in a cumulative manner. Similarly, education policy might want to take a broader perspective by focusing more on the cumulative learning environment along with the contemporaneous test-taking environment.
Supplemental Material
sj-docx-1-soe-10.1177_00380407251391761 – Supplemental material for Gender Peer Effects on Educational Achievement in Swedish Compulsory Schools: A Study of Contemporaneous and Cumulative Effects
Supplemental material, sj-docx-1-soe-10.1177_00380407251391761 for Gender Peer Effects on Educational Achievement in Swedish Compulsory Schools: A Study of Contemporaneous and Cumulative Effects by Tünde Lénárd in Sociology of Education
Footnotes
Acknowledgements
I am grateful to Carina Mood, Andreas Diemer, Stephanie Plenty, Erik Bihagen, Martin Hällsten, Amanda Almstedt Valldor, and Are Skeie Hermansen for their valuable feedback on various versions of this article. Comments made by participants of the LNU seminar at SOFI and the RC28 Spring Conference at Sciences Po are also gratefully acknowledged.
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project received funding from the Swedish Research Council for Health, Working Life and Welfare (Forte, Grant No. 2016-07099) and the Swedish Research Council (Vetenskapsrådet – VR, Grant No.: 2022-02036).
Research Ethics
The research is covered by ethical approval from the Swedish Ethical Review Authority.
Supplemental Material
Supplemental material for this article is available online.
Notes
Author Biography
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
