Abstract
This study analyzes the causal effect of positive feedback on students’ task-specific math self-concept using data from a randomized field experiment conducted among rural Hungarian primary school students. It examines how academic self-concept (ASC) responds to the smallest possible dose of positive feedback—a single instance—and explores treatment heterogeneity by gender. The results show that all students who received randomized positive performance feedback experienced a statistically significant (albeit small) improvement in task-specific math self-concept. The positive treatment effect was primarily driven by girls, who experienced a large and statistically significant effect—over 50% greater than the non-significant treatment effect observed among boys. However, the difference in treatment effects between girls and boys, as well as the corresponding decrease in the gender gap between treated and controlled students was not statistically significant. Thus, the results suggest that, while a single instance of positive feedback can temporarily boost students’ ASC, it is not a panacea for reducing gender inequalities in ASC. Nevertheless, because girls were particularly responsive to positive feedback treatment and boys were not harmed by it, the results suggest that positive feedback interventions may act as a policy lever for improving girls’ self-concept if the intensity of the treatment is enhanced.
Introduction
Girls perform at least as well as boys in mathematics (Guiso et al., 2008; Meinck and Brese, 2019; Neuschmidt et al., 2008; Robinson and Lubienski, 2011). However, in math, girls tend to have a more negative self-concept (Goldman and Penner, 2016; Mejía-Rodríguez et al., 2021; Wilkins, 2004), self-assessment (Mann and DiPrete, 2016), and self-evaluation (Exley and Kessler, 2022) than boys. 1
Girls’ lower self-perception in mathematics compared to boys is a global phenomenon. For instance, based on data from the Trends in International Mathematics and Science Study (TIMSS)—an assessment of fourth-grade students—Mejía-Rodríguez et al. (2021) found that in 2015, girls’ self-concept in mathematics was lower than boys’ in 25 out of the 32 examined countries. This significant gender disparity may play a role in the underrepresentation of girls in STEM 2 fields (OECD, 2019, p. 171), as individuals often choose to invest their efforts in areas where they feel confident and positive (Correll, 2001; Nagy et al., 2006; Oakes, 1990; Ridgeway, 2011; Sax et al., 2015; Seymour, 1995; Vinni-Laakso et al., 2019). Therefore, gender disparity in schoolchildren's academic self-concept (ASC) may have far-reaching consequences (Barone, 2011; Kriesi and Imdorf, 2019), potentially even contributing to the later gender pay gap (Michelmore and Sassler, 2016; Sterling et al., 2020).
Teachers’ feedback practices, in particular, can shape gendered perceptions of ability. When boys receive negative feedback, it tends to focus more on behavior and non-intellectual aspects, with teachers often attributing their failures to a lack of motivation and effort. In contrast, negative feedback for girls primarily targets intellectual inadequacies. As teachers perceive girls as motivated and diligent, they rarely attribute girls’ failures to a lack of effort, which reinforces the tendency for girls to attribute their failures to a lack of ability (Dweck et al., 1978). This tendency can ultimately contribute to a gender gap in self-concept, favoring boys. Furthermore, teachers tend to dedicate more instructional time to girls in reading and boys in math (Leinhardt et al., 1979), which may also reinforce gendered perceptions of abilities. For these reasons, girls’ ASC in math—their perception of their math ability in school (Marsh and Shavelson, 1985; Shavelson et al., 1976)—needs to be developed using easily accessible forms of leverage (DiPrete and Fox-Williams, 2021). Feedback might be one such solution.
Self-determination theory in psychology suggests that people experience intrinsic motivation when they feel a sense of autonomy and competence (Deci, 1998; Deci et al., 1991; Ryan and Deci, 2000). Positive feedback can significantly impact intrinsic motivation by influencing individuals’ perceptions of competence and self-determination (Deci et al., 1975, 2017; Ryan and Deci, 2000). Empirical evidence shows that positive feedback improves achievement. Behncke (2012) describes how students whose teacher read aloud a standard positive affirmation message before their exam scored higher on tests than those who did not receive the positive affirmation. Similarly, highly test-anxious students who read their Facebook friends’ affirmation messages before an exam had similar achievements to their peers with low test anxiety (Deloatch et al., 2017). Furthermore, experimental research in psychology (Katz et al., 2006; Kluger and DeNisi, 1996), economics (Lovász et al., 2022), and sociology (Keller and Szakál, 2021) reveals that positive feedback in the form of encouraging messages boosts students’ motivation and increases their persistence. For these reasons, it is important to ask how positive feedback impacts students’ ASC.
In observational studies, establishing the causal effect of positive feedback is challenging. Positive feedback typically serves as a reward for good performance, making it difficult to distinguish whether individuals have developed a particular self-concept based on their prior achievements or the feedback they have recently received (Hattie and Timperley, 2007). However, randomized experiments can separate the influence of prior performance from the independent causal effect of feedback through the random allocation of feedback.
To investigate how students’ ASC responds to the smallest possible dose of positive feedback—an essential precursor to more extensive interventions—I conducted a randomized field experiment with 1253 Hungarian primary school students from grades 5 to 8, spanning 80 classrooms across 19 schools. The experiment was embedded in a classroom-based, computer-assisted survey. Students first answered an initial question about their math self-concept and then completed a grade-specific math test. Afterward, a randomly selected half of the students received a single instance of positive automated feedback acknowledging their math performance, regardless of how well they did on the test. The other half of the students received no feedback after the math test. Following this treatment, all students were asked to reassess their math self-concept. This design allowed for the measurement of the short-term causal effect of the light-touch positive feedback treatment on students’ self-concept by creating ideal conditions in two key ways. Due to randomization, the feedback was independent of actual performance, allowing for a clear assessment of the treatment's causal effect. Furthermore, repeated self-concept measurement enabled the control of confounding factors related to baseline self-concept.
This study addresses two primary research questions. First, it investigates the causal effect of positive feedback on task-specific math self-concept by analyzing how students’ self-concept changes after receiving randomized positive feedback. Second, it examines how the treatment effect varies between girls and boys and, as a consequence of potential treatment heterogeneity, how the initial gender gap in the control group evolves in the treated group.
Concerning the first research question, the preferred specification indicates that the randomized positive feedback treatment led to a 0.15 standard deviation (SD) unit improvement in task-specific math self-concept for students in the treatment group compared to those in the control group. While modest, this effect is noteworthy considering the light-touch nature of the intervention.
Regarding the second research question, the results show that the overall positive treatment effect primarily stemmed from the improvement in girls’ task-specific math self-concept. Girls showed a substantially large and statistically significant treatment effect of 0.18 SD units, while boys showed an insignificant treatment effect of about 0.12 SD units. However, the gender difference in treatment effects did not reach statistical significance, suggesting that the treatment might have been too light-touch or the sample size insufficient to detect differences of this magnitude between boys and girls. Therefore, the gender gap in self-concept did not differ significantly between the treated and control groups, although it was smaller and statistically insignificant in the treatment group (0.15 SD units) and larger and statistically significant in the control group (0.2 SD units). In sum, regarding the second research question, the findings are not conclusive. A single instance of positive feedback does not work as a panacea and does not reduce gender inequalities in ASC. However, as girls were particularly responsive to positive feedback and boys were not harmed by it, positive feedback could be an effective lever for improving girls’ self-concept in educational practice if the intensity of the one-shot positive feedback is enhanced.
This study is well connected to recent debates and discourses in sociology and economics but is also distinct from prior empirical studies. On the one hand, the research findings indicate that self-related beliefs are malleable to information provision. Thus, the results extend the earlier findings of belief-updating literature (Buser et al., 2018; Coutts, 2019; Eil and Rao, 2011; Ertac, 2011; Möbius et al., 2022) and contribute to the sociological literature on the potential of attitude changes (Broćić and Miles, 2021; Kiley and Vaisey, 2020). Furthermore, the study follows the path of “applied” social research (instead of following the path of “basic” social research) inasmuch as it moves the focus from merely understanding gender inequality to tackling it and promoting social equity (DiPrete and Fox-Williams, 2021).
On the other hand, the study builds on the narrow gender focus of previous experimental research on performance feedback (Lovász et al., 2022; Németh, 1999) but expands the scope by focusing on a younger age group. Unlike prior studies, which examined university students (Buser et al., 2018; Coutts, 2019; Eil and Rao, 2011; Ertac, 2011; Möbius et al., 2022) and young and middle-aged adults (Lovász et al., 2022), this research focuses on primary school students—an age group in which personal traits have not yet crystallized, and self-concept may therefore be more malleable.
The article proceeds as follows: the second section reviews past research, outlines the study's expectations, and establishes its theoretical and empirical context, highlighting the relevance of the study. The third section describes the institutional setting, particularly concerning the gender differences within Hungarian schools. The fourth section details the experimental design and provides an overview of the sample. The fifth section outlines the empirical strategy used for data analysis. The sixth section presents the results, and the last section concludes with a discussion of the study's limitations and implications.
Review of past research
Positive feedback and self-concept
Positive feedback can take many forms (Hattie and Timperley, 2007), including praise (positive evaluation, see Henderlong and Lepper, 2002), performance feedback (information about the correctness of solutions, see Katz et al., 2006), and encouragement (positive expectations about future performance, see Lovász et al., 2022). Meta-analyses suggest that positive feedback improves performance, but its efficacy hinges on several factors, including the type of feedback and the recipient (Kluger and DeNisi, 1996; Smither et al., 2005). Psychological work further suggests that feedback about the task (or processing of the task) has a greater impact than feedback about the quality of the person (Hattie and Timperley, 2007; Kluger and DeNisi, 1996).
Feedback—both positive and negative—plays a significant role in altering situational self-concept, which is more susceptible to change (Demo, 1992). In this way, interventions that empower students with positive feedback might have the potential to improve situational self-concept (Hattie and Timperley, 2007) such as task-specific self-concept. By contrast, people's general self-concept is known to be constant, as individuals selectively pay attention to feedback that contradicts their initial self-image and reinterpret, diminish, or disregard conflicting feedback (Swann et al., 2003). Therefore, general self-concept may be less malleable through feedback.
For these reasons, I hypothesize that receiving positive performance feedback improves task-specific self-concept (H1).
Gender differences in self-concept and the role of positive feedback
There has been extensive sociological research into persistent gender segregation in education, which is highly resistant to change (Barone, 2011; Barone and Assirelli, 2020; Charles and Bradley, 2009; DiPrete and Buchmann, 2013). This body of research highlights the enduring gender gap observed in fields of study such as mathematics and engineering, which have low female representation.
Two main explanations have been proposed to explain gender differences in education: a cultural explanation and a rational choice explanation (Kriesi and Imdorf, 2019). The cultural explanation highlights the influence of internalized gender stereotypes, whereby girls may develop beliefs that they are less capable at math than boys (Correll, 2001; Mann and DiPrete, 2016), while the rational choice explanation posits the influence of the gender-specific cost-benefit considerations behind utility maximization (Jonsson, 1999). Current scholarly discourse tends to lean toward the cultural explanation (Gabay-Egozi et al., 2015; van de Werfhorst, 2017). Recent field experiments have yielded further evidence that aligns with the cultural explanation (Finger et al., 2020). Notably, the cultural explanation for gender differences implies that gender differences may be malleable if girls’ negative beliefs about their math abilities can be changed. One possible means of achieving this goal is the provision of feedback (Behncke, 2012; Deloatch et al., 2017; Keller and Szakál, 2021; Lovász et al., 2022).
Females might react more intensively to feedback than males. A small-case (n = 80) psychological study at Stanford University indicated that female undergraduate students’ self-evaluations were influenced by both positive and negative feedback, while male students were more affected by positive feedback and less impacted by negative feedback (Roberts and Nolen-Hoeksema, 1989). Furthermore, a larger study in the field of economics (n = 397) indicated that Eastern European women between the ages of 18 and 45 exhibited greater persistence in a 2-minute online game than their male counterparts when they received regular encouragement messages, such as “You can do it!” (Lovász et al., 2022).
There are multiple reasons why females may have a greater need for positive feedback than males. Studies have shown that women often perform worse than men in competitive settings (Gërxhani et al., 2023), leading them to avoid such situations (Niederle and Vesterlund, 2007; Van Veldhuizen, 2022). This avoidance of competition may indicate a heightened need for empowering positive feedback among females. Furthermore, females are typically more interpersonally sensitive and concerned with others’ evaluations (Deci et al., 1975; Katz et al., 2006), which could make them more receptive to feedback. Last, females may have weaker stress-coping abilities (Graves et al., 2021; Matud, 2004), potentially increasing their reliance on positive feedback for support. Therefore, offering positive feedback to females could help promote gender equality in persistence and performance (Lovász et al., 2022), competitiveness (Wozniak et al., 2014), and self-efficacy (Roberts and Nolen-Hoeksema, 1989).
For these reasons, I hypothesize that the effect of positive feedback treatment will be larger for girls and lower for boys (H2).
Literature on belief updating
The literature on belief updating in economics examines how individuals adjust their performance beliefs after receiving feedback (Buser et al., 2018; Coutts, 2019; Eil and Rao, 2011; Ertac, 2011; Möbius et al., 2022). This research examines scenarios in which students receive relative performance feedback after completing ability-demanding tasks, such as being informed of their rank within an ability distribution. The central question is how students adjust their initial beliefs in response to this feedback. The feedback they receive can be positive (telling students that their score is higher than a certain percentage of their peers) or negative (telling students that their score is lower than a certain percentage of their peers) and is accurate with a probability of p and inaccurate with a probability of 1 − p.
These studies show that individuals tend to update their performance beliefs conservatively, deviating much less from their initial beliefs than the Bayesian updating rule would predict. However, there is a notable gender difference, with women demonstrating greater conservatism in belief updating compared to men. The literature is less clear about whether individuals react more strongly to positive or negative feedback. Some studies suggest an asymmetry in responsiveness, with individuals showing greater sensitivity to positive than negative feedback (Eil and Rao, 2011; Möbius et al., 2022), while others find the opposite asymmetry, indicating that people respond more to negative feedback than positive feedback (Coutts, 2019; Ertac, 2011) or even identify little evidence for asymmetry (Buser et al., 2018).
Expanding on the conditions of prior literature
This study expands on the results of previous studies on belief updating in three key aspects: the exclusive focus on relative comparison, the gender difference in the treatment effect, and the effect of feedback on gender inequality.
Relative comparison (Suls et al., 2002) is a widely recognized strategy for evaluating and assessing one's performance. Its role in shaping academic self-concept has been well-documented (Marsh, 1987; Marsh and Parker, 1984). As objective criteria for self-evaluation are often not available, people often rely on comparisons with others to estimate their own outcomes (Festinger, 1954).
However, in many everyday situations, individuals assess their abilities in absolute terms, focusing solely on their own performance without a clear comparative benchmark (Exley and Kessler, 2022; Haaland et al., 2023; Moore and Klein, 2008). Notably, most prior studies on belief updating have relied on relative comparison by providing performance feedback framed in relation to others. As a result, there is limited understanding of how individuals revise their performance beliefs when feedback is given without reference to others' performance (Moore and Klein, 2008).
Another aspect of the literature on belief updating that warrants deeper investigation is the gender difference in the treatment effect. Research has indicated that females tend to update their beliefs less than males, leading to a more conservative approach to updating. However, this evidence of females’ rigidity in updating self-related beliefs contrasts with other research suggesting that females are more responsive to positive feedback than males (Lovász et al., 2022; Németh, 1999; Roberts and Nolen-Hoeksema, 1989; Wozniak et al., 2014). Therefore, further research is needed to better understand gender differences in self-concept following positive feedback.
Finally, studies on belief updating have consistently demonstrated that positive feedback enhances self-concept while negative feedback diminishes it. Considering that the initial gender gap favors males’ self-concept (Goldman and Penner, 2016; Mejía-Rodríguez et al., 2021; Wilkins, 2004), and assuming that females may respond more strongly to feedback than males (Deci et al., 1975; Katz et al., 2006; Lovász et al., 2022), gender inequality in self-concept could decrease if females improve their self-concept more than males after receiving positive feedback. Conversely, gender inequality in self-concept may be exacerbated if females experience a greater decline in their self-concept than males following negative feedback. The inequality-exacerbating aspect of negative feedback could have adverse societal implications, as it worsens rather than alleviates existing gender disparities. This potential adverse effect can be avoided by exclusively providing positive feedback.
Setting: gender differences in Hungarian schools
This study explores the impact of positive feedback on students’ task-specific math self-concept in rural Hungarian primary schools within a country where gender disparities in the labor market and education favor males, though not significantly more than in other European countries (Horn and Keller, 2015).
In Hungary, primary education lasts for 8 years, starting at age 6 and covering both primary and lower secondary levels—ISCED 1 and ISCED 2—according to the International Standard Classification of Education. In rural Hungarian primary schools, the predominant teaching style is frontal, characterized by students following teachers’ instructions and primarily receiving explanations from them. Collaborative activities and group work are comparatively less emphasized in the daily routine of these schools. This educational setup places a high value on teacher feedback; with limited opportunities for peer collaboration, teachers become the primary source from which students receive qualitative assessments of their performance.
Gender differences are evident in Hungarian schools. Females are more altruistic than males but also show lower risk tolerance, lower levels of trust, lower trustworthiness, and lower competitiveness (Horn et al., 2022). Gender disparities in fourth-grade Hungarian students’ math performance have escalated over the past two decades, as indicated by TIMSS. Between 2003 and 2015, the gender gap in math performance was statistically insignificant. However, by 2019, it had notably widened, equivalent to 11% SD units in favor of boys (Mullis et al., 2020; Neuschmidt et al., 2008).
Similarly, the gender gap in fourth-grade students’ math self-concept has widened, as TIMSS data show. Figure A1 in the Appendix illustrates this trend by depicting the proportion of girls and boys in Hungary who “strongly agree” with the statement “I usually do well in mathematics.” Over the past 20 years, the gender gap has almost tripled, from a difference of 5% points (p = 0.01) to a difference of 13% points (p < 0.01) in favor of boys. Despite this trend, Hungary ranks in the middle among European countries regarding gender differences in math self-concept (Mejía-Rodríguez et al., 2021). In Germany, the Netherlands, and England, the gender gap in math self-concept is twice as large as in Hungary, at around 20% points. Conversely, in Sweden and Cyprus, the gender gap is half as large as in Hungary, at around 6% points (see Figure A2 in the Appendix).
Furthermore, according to data from the Program for International Student Assessment (PISA), considering 15-year-old students who performed best in mathematics, girls are about 10% points less likely to have a career in science and engineering than boys. Despite this notable gender gap, Hungary remains positioned near the middle among European countries (see Figure A3 in the Appendix).
In summary, Hungary is an example of a country where the gender gap in mathematics, both in terms of achievement and self-concept, is widening but remains moderate compared to the broader European context.
Study design and sample description
Sample
Participating schools in this study had previously been contacted for my prior field experiments. I recruited schools by contacting all primary schools in seven contiguous counties of central Hungary in 2017 and used the data to conduct a field experiment in 2018 (Keller and Elwert, 2023). I obtained initial participation agreements from 55 schools. I then refreshed the initial sample to conduct another field experiment in 2020 (Keller, 2020). Out of the schools in the 2018 experiment, 13 agreed to join the new study, and 16 additional schools were newly recruited in 2020, resulting in 29 schools in the second field experiment.
The sample used in this study comes from the voluntary follow-up survey of the 2020 experiment. Out of the 29 schools, 19 schools participated in the recent survey, representing 80 classrooms. Among the participating classrooms, the median classroom size was 16 students, with the maximum and minimum classroom sizes being 24 and 7 students. 3
Students’ participation in the survey was contingent upon written parental consent obtained from previous experiments and new consent regarding participation in the recent follow-up survey. Thus, students’ non-participation was due to the lack of parental consent or random absenteeism from school on the survey day. Based on verbal communication from teachers, most students in the involved classrooms participated in the survey.
The 19 participating schools in the sample are not representative of Hungarian primary schools. School-level comparison based on administrative data suggests that the participating schools are more likely to be small-sized rural schools with below-average performing students than non-participating schools. As Table A1 in the Appendix shows, there are notable differences between participating and non-participating schools. These differences can be substantial, with disparities reaching up to half a standard deviation in math test scores, reading test scores, and students’ socioeconomic status.
Experiment
The experiment was embedded in a computer-assisted online student survey, which students filled out in the school's computer lab. The survey was conducted between 20 November 2020 and 19 February 2021 and involved 1253 students from grades 5 through 8. Students participated in the survey in school during a regular school day and were supervised by their teachers. They had 45 minutes to complete the survey.
Supplementary materials, data, and all analytical scripts are archived on the project page at the Open Science Framework: https://osf.io/3ry8b/. The study underwent ethical review and received approval from the Institutional Review Board at the HUN-REN Center for Social Sciences, Budapest.
Experimental procedure
Figure 1 shows the experimental procedure. First, students answered a baseline question about their a priori task-specific math self-concept. They then solved the grade-specific math test, followed by the treatment. After the treatment, students evaluated their task-specific math self-concept for the second time. The questionnaire ended with placebo outcome questions to check whether the treatment effect targeted task-specific self-concept (as intended) or had a broader impact on other outcomes.

The experimental procedure.
Treatment
The treatment was integrated into the survey and provided participants with positive feedback. After students had solved the math test, the positive feedback appeared on the computer screen as an automated message. The translated English version of the Hungarian treatment message was: “You did an outstanding job on the math test! As your test score reflects your ability, you should be really proud of yourself as you are a bright and intelligent student.” 4 Control group students did not receive positive feedback but were directed to proceed to the next question instead.
The treatment combined two types of feedback, in line with the distinction made by Hattie and Timperley (2007). On the one hand, the treatment provided task-specific feedback telling students how well they did on a particular task (“You did an excellent job on the math test”). Task-specific feedback is known to be effective in improving strategies and enhancing self-regulation. On the other hand, the feedback included information about the student as a person (“You are a bright and intelligent student”). Self-specific feedback is considered less effective than task-specific feedback. The two types of feedback were mixed in the treatment message to reflect how teachers typically provide feedback in their daily school routines, as teachers often mix task-specific and self-specific feedback (Airasian, 1997; Bennett and Kell, 1989).
Responding to the limitations of the belief updating literature (Buser et al., 2018; Coutts, 2019; Eil and Rao, 2011; Ertac, 2011; Möbius et al., 2022), the treatment offered absolute criteria for self-evaluation. It also eliminated the potential inequality-exacerbating effect of negative feedback since only positive feedback was provided as the treatment.
Randomization and balance
The randomization of the treatment occurred at the individual level and was based on the value of a randomly generated number. Approximately half of the students were randomly assigned to the treated group, while the other half were assigned to the control group.
Individual-level randomization had two significant implications for the study design. First, it led to heterogeneity within the same classroom, meaning that treated and controlled students could potentially be classmates. This setup provides greater statistical power than randomization at the classroom level. Second, students received the treatment regardless of their actual performance on the test. This approach allows for the establishment of the net treatment effect without the potential contamination of prior performance.
The randomization resulted in a good balance between students assigned to the treated and control groups. As Table A2 in the Appendix summarizes, differences between the treated and control groups were substantively small and statistically not significant at the 5% significance level.
Measurement of variables
Baseline variables
Survey questions asked about students’ academic self-concept (Eccles, 1983; Eccles et al., 1989; Musu-Gillette et al., 2015) and their task-specific math self-concept, followed by a grade-specific math test. Because both the self-concept questions and the math test were administered before the treatment, these measures serve as baseline variables.
Students’ academic self-concept in math was measured by the standardly used and validated survey question by Eccles: “In your opinion, how good are you at math?” (Eccles, 1983; Eccles et al., 1989; Musu-Gillette et al., 2015). Answer categories ranged from 1 (“I am very bad at math”) via 4 (“I am average at math”) to 7 (“I am very good at math”).
Students assessed their task-specific math self-concept twice in the questionnaire: once before the grade-specific math test and again after the treatment. The initial survey question was as follows: “You will shortly solve a short math test. Before you start, please let us know how good you are at math tests.” To respond, students used a scale from 0 to 10, where 0 indicated “I am not good at all” and 10 indicated “I am excellent.” Students could choose any number between 0 and 10 to express their opinion more accurately. 5
Student's baseline task-specific math self-concept exhibited a significant correlation with the commonly used Eccles question for academic math self-concept, with a correlation coefficient of 0.8 (p < 0.01). This high correlation coefficient suggests that students’ general math self-concept is highly related to their task-specific (math-test-related) self-concept.
The grade-specific math test used in this study was developed by the Hungarian Educational Authority, drawing on questions from the test banks of the PISA-like National Assessment of Basic Competencies. The test consisted of six problems where students had to employ their math knowledge to solve practical exercises. Test scores refer to the percentage of the correct answers, so the variable ranges between “0” and “1.”
The homeroom teacher reported other baseline variables. Students’ math grade refers to the end-of-term school mark from the second (Spring) semester of the 2019/20 academic year, the school year before the experiment. Math grades are assigned as integers ranging from 1 to 5. The grading scale is defined as follows: excellent (5), good (4), average (3), satisfactory (2), and unsatisfactory (1). Teachers also reported students’ binary gender.
Descriptive statistics about the baseline variables are summarized in Table 1. Participating students were 12.95 years old on average (SD = 1.21); 45% were girls and had a medium-level math performance. Students’ average achievement on the grade-specific math test scores was 51%, and students’ average teacher-awarded math grades were 3.5 on a scale of 1 to 5. The average student estimated themselves as slightly “above average” as their ASC in math (Eccles-question) was 4.45 (SD = 1.63) using the seven-grade scale with the theoretical midpoint of the scale at 4. Similarly, the average students evaluated the task-specific math performance as also above average and scored 5.29 on a scale ranging between 0 and 10, with a theoretical midpoint of the scale at 5. These figures indicate the well-documented above-the-average effect (Kruger and Dunning, 1999).
Descriptive statistics of baseline variables.
ASC: academic self-concept.
The variables are defined as follows:
Girl: A dummy variable (0/1), where 1 refers to girls and 0 refers to boys.
Age: the difference between the date of the actual survey and the student’s birthday divided by 365. This ranges from 9.55 to 15.84 years.
Math grades are integers from 1 to 5, with 1 being unsatisfactory and 5 being excellent. The scale is as follows: 5 (excellent), 4 (good), 3 (average), 2 (satisfactory), and 1 (unsatisfactory).
Math test scores: Range between 0 and 1, based on the percentage of correct answers in the test.
ASC in math (Eccles-question): Scaled from 1 to 7, where 1 means “I am very bad at math,” 4 means “I am average at math,” and 7 means “I am very good at math.”
A priori task-specific math self-concept: Scaled from 0 to 10, where 0 means “I am not good at all,” and 10 means “I am excellent.”
Treated: A dummy variable (0/1), where 1 indicates treated students.
Figure 2 presents descriptive statistics about the initial gender gap in baseline task-specific math self-concept. The left panel of Figure 2 shows that girls outperformed boys in teacher-awarded math grades by 0.14 SD units (p = 0.052). However, their performance on math tests did not differ significantly from boys, with a standardized mean difference of −0.05 SD units (p = 0.25).

Initial gender gap in math performance and self-concept.
The right panel of Figure 2 shows the gender difference after adjusting for math grades. The negative gap means that, on average, boys score higher than girls. The gender difference in math ASC (Eccles-question) is 11% of the SD (p = 0.03). An even larger gender gap can be seen in students’ task-specific math self-concept, with girls providing 0.18 SD units lower estimations than boys (p = 0.01). These significant gender differences in self-concept highlight the need for strategies to enhance girls’ self-concept, especially given their superior performance in mathematics compared to boys, as evidenced by their math grades.
Outcome variables
Following the treatment, all students completed a set of outcome questions using a 0–10 scale. Initially, they responded to the same task-specific self-concept question for the second time: “Please let us know how good you are at math tests?” This question is referred to as the endline task-specific math self-concept to distinguish it from the baseline task-specific math self-concept asked before the math test and treatment.
Additionally, all students answered placebo outcome questions regarding their current mood, including the following questions: “How happy do you feel?,” “How inspired do you feel?,” “How much do you feel that people acknowledge you?,” and “How much do you feel that people respect you?.” These questions were posed only once, as they had not been asked prior to the treatment. The purpose of including these placebo questions was to discern whether the treatment effect targeted task-specific self-concept (as intended) or had a broader, less specific impact.
Descriptive statistics concerning the outcome variables are provided in Table 2.
Descriptive statistics of the outcome variables assessed after the treatment.
Responses to each variable could be given on a scale ranging from 0 to 10. Endline task-specific math self-concept: Scaled from 0 to 10, where 0 means “I am not good at all,” and 10 means “I am excellent.” Concerning the other outcome variables, 0 means “Not at all,” and 10 means “To a very great extent.”
Empirical strategy
To estimate the treatment effect and test H1, I used equations (1) and (2). Both are classroom fixed-effect ordinary least squares (OLS) linear regression models. Fixed-effects regressions are preferred to control for unobserved heterogeneity at the classroom level in the form of teacher effects that could bias the results. For example, classrooms might differ in the positive feedback they received initially from the teacher, which could lead to differences between classrooms in how students respond to the treatment. Since the classroom fixed-effect regression compares treated and controlled students within the same classrooms, all unobserved differences between classrooms are controlled for.
In equation (1), the variable
The coefficient of interest is
In estimating gender heterogeneity in the treatment, the parameters of interest are the coefficients
Results
Descriptive results
Figure 3 shows the raw mean difference in task-specific math self-concept, calculated as the change from baseline to endline. The data is categorized by treatment status (control/treated) and gender (boys/girls).

Change in task-specific math self-concept from baseline to endline, categorized by treatment status (control/treated) and gender (boys/girls).
In the treatment group (represented by black bars), students experienced a positive update in their self-concept. In the control group (represented by white bars), where students did not receive positive feedback, both girls’ and boys’ self-concept decreased after completing the math test. The negative shift in the control group suggests that engaging in a demanding task can reduce self-concept. Such negative updates underscore the importance of a surplus in self-concept when undertaking challenging tasks. The decrease in self-concept after completing ability-demanding tasks also emphasizes the importance of replenishing reduced self-concept through positive feedback.
The figure also suggests the treatment effect, which is represented by the difference between the white and black bars. Girls showed a more pronounced treatment effect compared to boys, as girls in the control group experienced a sizable decline in their task-specific math self-concept after completing the math test (leading to a larger difference between the white and black bars). In contrast, boys in the control group showed a smaller decline in their task-specific self-concept (thus, the difference between the white and black bars is smaller for boys than for girls).
Main treatment effect: the test of H1
Table 3 presents the estimations using both equations (1) and (2). The intervention resulted in a significant improvement in students’ task-specific math self-concept. The treatment effect is equal to a 0.3–0.4 unit improvement on the natural scale of the dependent variable, where the average of students’ task-specific math self-concept is 5.25. Expressing this treatment effect in SD units yields a small effect size between 0.12 and 0.15 SD units. Thus, the results provide causal support for H1, which posits that receiving positive feedback increases students’ task-specific math self-concept.
Treatment effect on endline task-specific math self-concept—unstandardized OLS regression coefficients.
Robust standard errors (clustered at the school level) are in parentheses. ASC: academic self-concept.
** p < 0.01, * p < 0.05, +p < 0.1.
All models include classroom fixed effects. Missing values in the baseline variables are replaced by zero, and a separate dummy variable controls for missing status (these variables are not included in the table).
Treatment heterogeneity by gender: the test of H2
Figure 4 visualizes the heterogeneity in the treatment effect using both equation (3) (point estimates depicted with black circles) and equation (4) (point estimates depicted with gray diamonds), with the full regression models presented in Table A4 in the Appendix. Unlike the main treatment effect, which is causal, the analysis of treatment heterogeneity is exploratory and not causal, as students’ gender could not be randomized in the same way as the treatment.

Treatment effect by gender.
The treatment effect appears statistically insignificant for boys (
A similar pattern is observed in gender-specific treatment effect when controlling for students’ a priori task-specific math self-concept (equation (4)). The treatment effect primarily concentrates on girls, yielding a statistically significant treatment effect among them (
Related to the insignificant gender difference in the treatment effect, the gender gap also does not differ between treated and control groups. Relying on the results estimated by equation (3), the initial gender gap in the control group (
Robustness checks
Several robustness tests were conducted to validate the results. Figure 5 presents a sensitivity analysis of treatment heterogeneity based on students’ initial math performance (as measured by math grades and recent test scores) and academic self-concept (measured by the Eccles question and task-specific math self-concept). The findings indicate that the positive feedback had a substantially larger impact on students with higher initial math grades and stronger ASCs. However, the treatment also had a positive impact on students with average or low math test scores and self-concept, though it did not influence those with average or low math grades.

Treatment heterogeneity according to (level of) math performance and baseline math self-concept.
Treatment heterogeneity based on students’ recent math test scores is particularly relevant. Students with high test scores received (honest) positive feedback as a reward for their excellent performance, while those with average or low scores received biased positive feedback that inaccurately acknowledged their performance as excellent. Notably, no treatment heterogeneity was observed based on recent math test performance. Students scoring above 70% experienced similar positive changes in their task-specific self-concept as those scoring below 70%. This suggests that the honesty of positive feedback does not generate treatment heterogeneity.
Treatment heterogeneity based on prior math performance was further analyzed with a focus on gender differences, comparing boys and girls with either high or average/low math performance. When defining math performance according to prior math grades, the treatment effect did not differ significantly between boys and girls, regardless of whether their prior grades were average/low or high (see the left panel of Figure A5 in the Appendix). However, when math performance was defined by recent test results, girls who performed well on the math test showed a greater treatment effect than similarly achieving boys (see the right panel of Figure A5 in the Appendix). This suggests that girls’ self-concept is more responsive to honest, positive feedback about their recent performance than that of boys.
Further, robustness checks suggest that the treatment exerted its effect as intended and targeted only students’ task-specific math self-concept. The positive feedback intervention did not affect students’ feelings of happiness, inspiration, acknowledgment, or respect, as the treatment effect was statistically insignificant and substantially small across all models (Table A6 in the Appendix). 7 Although there were notable gender differences in each of these variables, with girls indicating lower values than boys, the treatment effect did not substantially differ between boys and girls.
Discussion and conclusion
There has been extensive research into the gender gap in education (Barone, 2011; Barone and Assirelli, 2020; Charles and Bradley, 2009; DiPrete and Buchmann, 2013; Kriesi and Imdorf, 2019), though less attention has been paid to the potential tools for mitigating this gap (DiPrete and Fox-Williams, 2021; Legewie and DiPrete, 2012). This study focused on girls’ biased self-concepts that may hinder their engagement in math-intensive educational choices (Correll, 2001; Nagy et al., 2006; Oakes, 1990; Ridgeway, 2011; Sax et al., 2015; Seymour, 1995; Vinni-Laakso et al., 2019) but which can be changed through feedback (Buser et al., 2018; Coutts, 2019; Eil and Rao, 2011; Ertac, 2011; Möbius et al., 2022). The study analyzed how positive performance feedback (as a scalable tool in education) affects students’ task-specific math self-concept and how it mitigates the initial gender gap in self-concept (which typically favors boys).
A randomized field experiment was conducted with rural Hungarian primary school students to test how their task-specific math self-concept responds to a single instance of randomized positive feedback, representing the smallest possible dose of positive feedback intervention. After answering a baseline question about their task-specific math self-concept and completing a short grade-specific competency-based math test, students received either positive absolute performance feedback or no feedback, determined by a random algorithm and independent of actual performance on the math test. Following the treatment, students were asked about their task-specific math self-concept for a second time.
The design extends prior literature on belief updating in several key ways (Buser et al., 2018; Coutts, 2019; Eil and Rao, 2011; Ertac, 2011; Möbius et al., 2022). First, it targeted primary school-aged students with positive feedback as their academic self-concepts (ASC) may be more malleable than university students’ ASC; prior experiments have primarily focused on university students with better-established, less malleable ASCs. Second, students received absolute performance feedback rather than relative feedback. This means that instead of making relative comparisons to peers, students evaluated their ability in general, which aligns more closely with everyday settings (Moore and Klein, 2008). Third, unlike prior research that provided indications about the likelihood of feedback being true or false (Buser et al., 2018; Coutts, 2019; Eil and Rao, 2011; Ertac, 2011; Möbius et al., 2022), students in this study received feedback without any indication of its accuracy. 8 This approach was chosen to reflect better real-world settings, where individuals often receive feedback without being informed of its truthfulness and must judge its validity themselves. Last, students received only positive feedback to avoid any potential decline in self-concept, which could exacerbate the gender gap. 9
The results showed that all treated students experienced a positive improvement in their self-concept compared to those in the control group, who did not receive positive feedback. Since the treatment was randomized, the treatment effect is causal. The treatment effect was particularly pronounced for girls, with a significant positive effect that was 50% higher than the treatment effect for boys. The positive treatment effect was statistically not significant for boys. However, the non-causal gender difference in the treatment effect and the associated reduction in the gender gap between treated and controlled students were statistically not significant. Therefore, while a single instance of positive feedback improves students’ ASC, it does not reduce the gender gap in self-concept. Nevertheless, because girls’ self-concept was particularly responsive to positive feedback, while boys’ self-concept was not harmed by it, the results suggest that increasing the frequency of this one-shot positive feedback could serve as a policy lever to improve girls’ self-concept.
The results might have implications for status inequality in self-concept. Given that low-status students tend to make lower assessments of their abilities and high-status students tend to make higher ones (Sullivan, 2006), providing targeted positive feedback exclusively to low-status students might help mitigate existing status inequality in self-concept. However, status inequality in self-concept might increase if all students receive positive feedback.
Future research could employ two major strategies to elaborate on the insignificant results concerning the estimation of gender differences in the treatment effect. First, the intensity of the positive feedback could be enhanced by increasing the frequency of the one-shot positive feedback. Delivering feedback orally, especially from a socially significant individual, may enhance its impact compared to the automated written feedback used in this study. Second, the statistical power of the sample could be enhanced by increasing the size of the sample, which consisted of over 1200 students in this study. While this sample size is not small, it may not be sufficiently large to identify the treatment heterogeneity of this magnitude.
In terms of study design, some specific issues warrant discussion. Concerns may arise that self-concept enhancement is not always beneficial. The positive effect of empowerment could lead to inflated self-concepts that are not grounded in reality and could contribute to overconfidence (Kruger and Dunning, 1999; Moore and Healy, 2008).
In contrast, prior research has suggested that positive self-perceptions serve important purposes, such as assisting in the pursuit of goals, influencing others, and tackling complex and challenging tasks (Schwardmann and van der Weele, 2019). This study further expanded on this reasoning by suggesting that individuals might need to have a certain level of overconfidence to engage in tasks that require their abilities. This is because engagement in ability-demanding tasks can potentially lower students’ self-concept, as observed in the control group (Figure 3). Consequently, some students may refrain from challenging themselves in demanding situations due to a lack of positive self-concept (Epstein, 1973). 10 This might be particularly evident in educational choices such as applying to knowledge-intensive educational programs or pursuing STEM fields where the possibility of rejection/failure is high (which would send negative signals about students’ abilities and decrease self-concept). To engage in these risky choices, students may need a degree of overconfidence.
A related critique concerns the relevance of feedback that was not directly related to students’ actual performance and, at least for some students, provided false information. Consequently, the result might also be interpreted as girls being particularly susceptible to disinformation induced by biased feedback.
However, girls were not more susceptible to potential misinformation than boys as the treatment effect does not differ between high/low performing boys and girls if performance is defined based on students’ prior math grades (see the left panel of Figure A5 in the Appendix for a reference). Indeed, girls were more receptive to honest and positive feedback about their recent math performance than boys (see the right panel of Figure A5 in the Appendix). Consequently, girls can translate positive feedback about their recent performance into a larger self-concept improvement than boys—an important message that educational practitioners should consider.
A further potential concern is that providing students with positive signals about their abilities without a direct connection to actual performance is artificial. This argument can be further expanded to raise questions about the study's external validity and whether schools would actually implement this type of feedback.
However, students shape their understanding of their abilities through the feedback they receive, making the development of self-concept an ongoing learning process. Throughout this process, students internalize feedback and cultivate a positive self-image in areas where they have been recognized and acknowledged by their environment (Hattie and Timperley, 2007). Research has also highlighted how social status differences in parenting styles can contribute to the achievement gap observed among students (Bradley and Corwyn, 2002; Kalil and Ryan, 2020). Differences in children's empowerment may be one aspect of differences in parenting styles (Gunderson et al., 2013; Hoff et al., 2002). Furthermore, pedagogical research has underscored the importance of praising students, even for small accomplishments that can be observed among all students regardless of their overall academic performance (Burnett, 2002; Floress and Jenkins, 2015). As a result, the treatment is not far removed from real-life practices since parents (and sometimes also teachers) deliberately create opportunities to praise and support students.
Last, concerns could be raised about the small effect size, and skepticism expressed about the design, which rendered the outcome of the intervention immediate, both temporally (the two self-concept questions were separated by math test and the treatment) and in content (the same question was used in the baseline and endline task-specific self-concept questions). Consequently, it may be posited that it is indeed surprising that contrary to the favorable experimental design, only small treatment effects were found.
However, the modest treatment effect should be interpreted in light of the anchoring effect, which describes people's tendency to maintain consistency in their responses (Furnham and Boo, 2011; Tversky and Kahneman, 1974). Therefore, in this study, students might have recalled their initial answers and provided a similar response to the second task-specific self-concept question. Thus, a significant shock may be required to disrupt the status quo and produce a substantial improvement in self-concept. Furthermore, the favorable experimental design employed in this study is deliberately crafted to maximize the chance of detecting a treatment effect, laying the foundation for more targeted future research that might be able to investigate less proximate outcomes, perhaps even outcomes associated with math achievement.
In conclusion, this study showed that even a single instance of positive feedback can temporarily improve students’ self-concept. However, this approach cannot work as a panacea to reduce gender inequalities. Nevertheless, as girls were particularly responsive to positive feedback while boys were not harmed by it, positive feedback has the potential to serve as an effective policy lever for enhancing girls’ self-concept if the frequency of the one-shot treatment is increased. Therefore, more intensive positive-feedback treatments may hold promise for mitigating or closing the gender gap in educational decisions related to math, such as selecting classes with specialized math courses or pursuing STEM fields at the college level—decisions that, to some extent, all rely on math self-concept (Correll, 2001; Nagy et al., 2006; Oakes, 1990; Ridgeway, 2011; Sax et al., 2015; Seymour, 1995; Vinni-Laakso et al., 2019).
Supplemental Material
sj-docx-1-asj-10.1177_00016993241309552 - Supplemental material for The effect of positive feedback on primary school students’ academic self-concept: Gender heterogeneity in a light-touch randomized intervention
Supplemental material, sj-docx-1-asj-10.1177_00016993241309552 for The effect of positive feedback on primary school students’ academic self-concept: Gender heterogeneity in a light-touch randomized intervention by Tamás Keller in Acta Sociologica
Supplemental Material
sj-docx-2-asj-10.1177_00016993241309552 - Supplemental material for The effect of positive feedback on primary school students’ academic self-concept: Gender heterogeneity in a light-touch randomized intervention
Supplemental material, sj-docx-2-asj-10.1177_00016993241309552 for The effect of positive feedback on primary school students’ academic self-concept: Gender heterogeneity in a light-touch randomized intervention by Tamás Keller in Acta Sociologica
Footnotes
Acknowledgment
The author thanks Carlo Barone and Nevena Kulic and the audiences at the ISA RC28 Spring Meeting 2024 in Shanghai and the Meeting of the Economics of Education Association in 2023 (AEDE) in Santiago de Compostella.
Funding
The research was supported by grants from the Hungarian National Research, Development and Innovation Office (NKFIH), Grant number K-135766; the János Bolyai Research Scholarship of the Hungarian Academy of Sciences (BO/ 00569/21/9) and the New National Excellence Program of the Ministry for Culture and Innovation from the source of the National Research, Development and Innovation Fund (Grant Number: ÚNKP-23-5-CORVINUS-149). Funding from the Horizon Europe project ‘EFFEct’ grant (no. 101129146) is also gratefully acknowledged.
Supplemental Material
Notes
Author biography
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
