Abstract
Generative artificial intelligence (GenAI) has the potential to change student learning. Despite the popularity of integrating this novel technology into teaching and learning practices, few meta-analyses have synthesised its effect in the education context with K-12 and college students. This review examined the effects of GenAI interventions on student academic performance. A total of 19 studies with 24 effect sizes were included. These studies either compared the GenAI group with control groups (n = 17, k = 22) or applied a repeated-measure design (n = 2, k = 2). The results revealed an overall large effect size (g = 0.683), supporting the arguments that GenAI can positively affect student academic achievement. Students with teacher support in the student-GenAI interaction have significantly larger gains (g = 1.426) than those without teacher support (g = 0.077). No other significant moderators were identified. We concluded by discussing the implications for policy and practice and provided suggestions for future research.
Keywords
The development of artificial intelligence (AI) and its integration into teaching and learning practices are transforming education. For instance, AI can facilitate student learning by providing adaptive feedback (Liang et al., 2024) and visual performance reports (Liao et al., 2024). The launch of OpenAI’s ChatGPT in November 2022 attracted public attention to one subset of AI, i.e., generative artificial intelligence (GenAI). GenAI can create new content based on generative models, making it distinctive from traditional non-generative models that focus on prediction, classification, or optimisation (Rashidi et al., 2024; Saish et al., 2025). Such a generative ability of GenAI tools resulted in a surge of global interest, with ChatGPT reaching more than one million users in just five days (Yu, 2023). Now, it has over 400 million weekly active users, according to OpenAI’s spokesperson in February 2025.
In educational contexts, GenAI tools hold the potential to facilitate student learning. By engaging in realistic interactions with learners, GenAI tools not only provide answers from various disciplines, but also generate examples, recognise errors, and remember the context of the dialogue (Imran & Almusharraf, 2023; OpenAI, 2022). However, concerns have been raised about the over-reliance on GenAI, as it may lead to plagiarism and a decline in critical thinking ability (e.g., Lo et al., 2024). Extant empirical studies have also revealed conflicting results regarding the influence of GenAI on student academic achievement, with some studies revealing positive effects (e.g., Wu et al., 2024) and others reporting no effects (e.g., Escalante et al., 2023) or even negative effects (e.g., Niloy et al., 2024).
Given the rapid development of GenAI and mixed findings in existing research, there is a need to systematically synthesise the effects of GenAI interventions on student academic performance and explore potential moderating factors. As one of the first few meta-analytical reviews that narrowed the research scope from AI to GenAI, the current study aimed to (a) figure out the overall effects of GenAI interventions on student academic performance and (b) investigate the potential moderators that influence their effectiveness in educational settings. Following PRISMA guidelines, we examined four databases to determine the overall effect size. We also conducted sensitivity and moderator analyses to assess the robustness of the results and identify influential factors. The results can deepen the understanding of the effects of GenAI interventions on student learning outcomes and guide the effective use of GenAI tools in future educational practices.
What is GenAI?
GenAI is a type of AI technology that uses machine learning models “to learn the patterns and relationships in a dataset of human-created content” and “use the learned patterns to generate new content” (Google, 2023, How does generative AI work? Section, para. 1). More specifically, GenAI mainly uses a subset of machine learning, that is deep learning approach (Kalota, 2024; Strobel et al., 2024), to produce “previously unseen synthetic content, in any form and to support any task, through generative modelling” (Peñalvo & Ingelmo, 2023, p. 14).
Based on different generative models and techniques, GenAI can create new content in different forms (e.g., texts, images, and audio files) (Bengesi et al., 2024). For instance, large language models (LLMs), which emerged in 2017, are primarily designed to process and generate texts (e.g., OpenAI’s ChatGPT and Google’s Bard) (Law, 2024). Unlike previous AI chatbots (e.g., Apple’s Siri) (Kietzmann & Park, 2024), LLMs can analyse and summarise online content, generating new responses in a conversational format across various fields. This process resembles how humans produce novel texts from learned knowledge (Barrett & Pack, 2023). Beyond text generation, other GenAI tools like Midjourney for image creation and Sora for video production can also significantly impact education (Chiu, 2024; Liu et al., 2024).
Effects of GenAI Interventions on Student Academic Performance
GenAI interventions in this review refer to the integration of GenAI tools into teaching or learning practice with an aim to influence students’ learning outcomes. The extant literature has summarised various tasks that GenAI could help with in educational settings. GenAI could serve multiple roles in enhancing teaching effectiveness, working as a “guide on the side” for content generation, a “co-designer” for curriculum development, and an “exploratorium” for assessment analysis and learning recommendations (Sabzalieva & Valentini, 2023). Beyond these functions, it could assist in lesson planning, student record tracking, course material translation, and student engagement (Alshraah et al., 2024). The integration of GenAI in teaching can increase teachers’ working efficiency (Law, 2024), build teaching confidence (Cheah & Kim, 2025), enhance pedagogical competency (Alshraah et al., 2024), and motivate teachers to adopt innovative teaching and assessment methods (Bower et al., 2024). Such improvements in teaching practice could positively impact students’ academic performance.
While the incorporation of GenAI into teaching practices plays an important role in students’ learning, students’ direct use of GenAI tools could exert a more straightforward influence on their academic performance. Grounded in constructivist theory and Vygotsky’s concept of the Zone of Proximal Development (ZPD), GenAI tools can facilitate student learning by providing personalised feedback that adapts to individual needs, guiding learners progressively through their development (Coenen & Pfenninger, 2024; Zhang et al., 2024). GenAI-generated immediate and diverse feedback helps students understand their current performance, identify learning gaps, and formulate future goals (Xia et al., 2024; Yan, 2022). GenAI tools can work as a “study buddy” to facilitate student self-reflection, a “Socratic opponent” to develop argumentation skills, or a “collaboration coach” to facilitate group work (Sabzalieva & Valentini, 2023). Studies also revealed that GenAI interventions can increase students’ motivation (e.g., Li, 2023; Song & Song, 2023) and self-efficacy (Teng, 2024) and decrease students’ anxiety and embarrassment (Hsu et al., 2023). However, there are concerns about potential risks for academic development, such as the negative impact on critical thinking skills and “metacognitive laziness” (Fan et al., 2025; Susnjak & McIntosh, 2024). Students’ use of AI to complete assignments may also diminish their motivation to develop skills, resulting in an educational crisis beyond academic dishonesty (Adeshola & Adepoju, 2023). Considering these impacts, this meta-analysis specifically examines students’ direct use of GenAI tools in their learning process, as understanding these effects is essential for informing educational practices.
A substantial body of empirical research has witnessed varied GenAI interventions among students, for instance, students receiving GenAI-generated feedback (Escalante et al., 2023) and engaging in formative assessment with GenAI-generated questions (Bachiri et al., 2023). However, the effectiveness of GenAI interventions on student academic performance varied across studies. Some studies found significant improvement in student academic achievement in the GenAI group (e.g., English writing, Liu & Xiao, 2024; Song & Song, 2023); some studies showed no significant differences between the GenAI and control groups (e.g., language learning, Escalante et al., 2023; and theoretical medical knowledge, Ba et al., 2024); and some studies revealed that students who used ChatGPT alone underperformed in mathematics than those who received teacher instruction (e.g., mathematics, Dasari et al., 2024). Such an inconclusive result reveals the need for a meta-analysis to systematically synthesise the overall effectiveness of GenAI interventions on student academic performance and the influencing factors that moderate the effects.
Previous Reviews
The extant reviews about GenAI have mainly explored this new technology through systematic review (e.g., Lo et al., 2024), scoping review (e.g., Preiksaitis & Rose, 2023), and bibliometric analysis (e.g., Bahroun et al., 2023). As for meta-analyses, scholars investigated users’ perceptions of GenAI (e.g., Leiter et al., 2024) and the accuracy of GenAI’s outputs in exams (e.g., the diagnostic performance of GenAI compared with physicians, Takita et al., 2024; medical responses, Wei et al., 2024). Despite these reviews providing insights into the trend, benefits, challenges, and performance of GenAI in educational settings, the comprehensive synthesis of the impact of GenAI interventions on academic performance has been less touched.
Only Sun and Zhou’s (2024) meta-analysis has specifically explored the effectiveness of GenAI interventions on student academic performance. They found that GenAI can enhance student academic performance with a medium effect size (g = 0.533). They also found that students significantly improved their academic performance when GenAI generated texts, used in a collaborative learning approach, and included a sample size of 21–40. However, they only focused on a particular group of students (i.e., college students). They did not rigorously assess the quality of the included studies and did not specifically focus on peer-reviewed journal articles, which is vital in this fast-moving field.
Since GenAI is a subset of AI technology, meta-analytical reviews on the effects of broader AI technology could provide some valuable insights. Previous meta-analyses about AI interventions on student learning generally show a positive influence. Zheng et al. (2023) selected articles from 2001 to 2020 and found that AI technologies (e.g., expert systems or agent systems, natural language processing, and mixed technology) had a high effect size on learning achievement (g = 0.812). The moderator analysis revealed a significantly large effect size with a substantial sample size (more than 300), junior and senior high school students, engineering and technological science students, AI being utilised in group settings, serving as policy-making advisors, and incorporating mixed hardware.
Wu and Yu (2024) performed a meta-analysis of 24 randomised studies to determine the effects of various types of AI chatbots on student learning outcomes, including machine learning-based chatbots, natural language processing-based chatbots, and hybrid chatbots. They also revealed that AI chatbots could significantly improve students’ learning performances (d = 1.028). Students who received shorter interventions (i.e., lasting less than ten weeks) at the higher education level experienced a greater effect.
Apart from the studies that comprehensively analysed the effects of AI in education, there are studies mainly focused on one learning domain. For instance, Wang et al. (2024a) found that AI chatbots produced an overall positive effect on language learning performance (g = 0.484) compared to students who did not use chatbots. Four significant moderators were identified: educational level, learners’ language levels, interface design (i.e., mobile-based interface vs. web-based interface), and interaction capability (i.e., chatbot-driven capability vs. user-driven capability). Other meta-analyses also revealed a positive effect of AI technology on student subject-specific learning (e.g., g = 0.351 for elementary students’ mathematical learning, Hwang, 2022; d = 1.18 based on within-group samples; d = 0.39 based on 35 between-group samples for language learning, Lee & Lee, 2024; g = 0.343 for K-12 students’ mathematical learning, Yi et al., 2024).
While Sun and Zhou’s (2024) meta-analysis and previous reviews about AI identified several factors that may moderate the effects of GenAI interventions, other influencing factors, specifically regarding GenAI and research methodology, may also affect the impact. Instructors can directly apply general GenAI tools, designed by big companies, such as OpenAI's ChatGPT, or use course-specific GenAI tools tailored for their students. More specifically, course-specific GenAI tools could be developed through two methods: 1) fine-tuned text-to-text models designed for a specific context, which is based on a selected knowledge base (e.g., Bachiri et al., 2023), and 2) models based on existing LLMs like ChatGPT but modified by research teams to include additional learning functions, for instance, constructing assignment databases and learning profile databases within the ChatGPT-based learning system (e.g., Li, 2023). The direct application of the publicly accessible tool is convenient; however, the generated feedback is not content- or course-specific (Xia et al., 2024). Feedback research shows that students may experience greater learning benefits when they receive feedback that is concrete and specific to the content or course they are studying (Olivera-Aguilar et al., 2022; Shute, 2008). Hence, we examined the different effects between general and course-specific GenAI tools.
Apart from the technical component, the human element, specifically teacher support in classrooms where GenAI tools are used, is less explored in the existing research (Kizilcec et al., 2024). According to Tardy’s (1985) social support framework, teacher support includes giving students informational, instrumental, emotional, and appraisal support (Malecki & Demaray, 2002). When students use GenAI, teachers can provide ongoing support by giving advice, resources, and feedback. For instance, teachers share learning resources through GenAI-based learning systems (e.g., Baba et al., 2024) and give feedback on GenAI-generated content (e.g., Wu et al., 2024). However, in some studies, teachers let students use GenAI without instruction or only offer initial training rather than continuous support throughout the learning process. Teacher support can help facilitate the incorporation of appropriate GenAI outputs into students' work (Su et al., 2023); otherwise, students may lack the capacity to critically assess the quality of GenAI outputs. Therefore, we considered teacher support a potential moderator that can enhance the positive effects of using GenAI tools in education.
Regarding research design, the control groups used in the study for comparison matter. In some studies, students in the control group received teacher instruction or feedback; however, in other studies, students in the control group used other resources, such as Google search, online databases, or textbooks. Feedback information from different agents (e.g., human teachers and technology) may affect student learning differently (Panadero & Lipnevich, 2022). Thus, it is important to test the effects of the comparison groups—those receiving no feedback, teacher feedback, or feedback from other resources—on the effects of GenAI interventions.
The research method can affect the influence of educational interventions (McMillan et al., 2013). Hence, the methodological characteristics of the selected studies were carefully assessed through the study quality (i.e., study design, sampling method, random assignment, confounder report and control, data source, instrument source, and withdrawals and attrition rate) and included in this study as a potential moderator.
In sum, while previous meta-reviews revealed an overall positive effect of AI on academic performance, a comprehensive synthesis exclusively on GenAI remains scarce. Considering variant effect sizes in empirical studies, some factors may influence the effects of GenAI on academic performance. Based on both the data-driven and theory-driven evidence, the potential moderators of GenAI interventions were grouped into three categories: 1) implementation of GenAI, including GenAI tool (general GenAI tools or course-specific GenAI tools), teacher support (with or without support) and intervention duration (equal to or less than one month, or longer than one month); 2) the context of the study, including educational level (K-12 or higher education) and discipline (natural and applied science, or humanities and social science); and 3) research design, including control group (no feedback, teacher feedback, or feedback from other resources), sample size (small or large), and study quality (strong, moderate or weak).
Contributions of the Present Study
The current review aimed to provide a comprehensive review of the effects of GenAI interventions on academic performance. First, rather than including broad AI technology, we specifically explored the effects of GenAI on student academic achievement. After the release of ChatGPT sparked a wave of interest in GenAI among educational researchers and practitioners, GenAI has now been widely discussed and used in the educational field. Therefore, it is essential to scrutinise the effects of this latest generation of AI technology on student academic achievement.
Second, we attempted to provide a comprehensive synthesis by including studies from different educational levels (i.e., K-12 and higher education) with different research designs (i.e., experimental studies, quasi-experimental studies, and studies with a repeated measures design). In this way, we could compare the influence of GenAI on students from different backgrounds and consider the influence of research design on the GenAI effects.
Third, our meta-analytical review carefully considered the quality of the included studies by only selecting peer-reviewed journal articles and critically assessing the quality of each selected study. Empirical studies with rigorous methodology are more likely to present robust evidence for the effectiveness of GenAI in this rapidly evolving field. The research questions (RQ) are listed below:
What is the overall effect of GenAI interventions on student academic performance?
What factors moderate the effects of GenAI interventions on student academic performance?
Method
Search Strategies and Databases
The literature search was conducted across four databases: ERIC, PsycINFO, Web of Science, and Scopus. These four databases were selected because they were comprehensive to include journal articles in the field of AI in education. Three groups of keywords about GenAI, feedback, and education were combined to form the search string: (“GAI” OR “Generative AI” OR “Generative Artificial Intelligence” OR “ChatGPT” OR “GPT” OR “Large Language Models” OR “LLM” OR “AlphaCode” OR “GitHub Copilot” OR “Bard”) AND (“feedback” OR “assessment” OR “instruction” OR “scaffolding” OR “training”) AND (“school” OR “education” OR “colleague*” OR “Tertiary Education” OR “higher education” OR “teacher*” OR “student*”). The synonyms for GenAI are modified from Bahroun et al.’s (2023) study, a bibliometric and content analysis on GenAI in education that used the search string including both GenAI and specific GenAI tools, such as ChatGPT and Bard. The search included papers published after 2017, considering that the transformer (the ‘T’ in GPT, generative pre-trained transformer) was first announced that year (Law, 2024).
Selection of Studies
The literature search was performed on 20 May 2024 using Title or Abstract, revealing 1938 results from four databases. A total of 1310 studies were left for screening after removing duplicates. An eligible paper has to meet the following criteria: (a) it measured the effects of students using GenAI on student academic performance, (b) it adopted an experimental/quasi-experimental design comparing the GenAI group with a comparison group (no GenAI interventions, or an experimental group of non-AI feedback types, such as teacher feedback and peer feedback), or adopted a pre-post comparison design without a control group, and (c) it provided effect size or sufficient information to calculate effect size (e.g., means, standard deviations, and sample sizes), (d) it was written in English and published in a peer-review journal. This review excluded other types of literature, such as book chapters and dissertations.
Studies were excluded if (a) it investigated the effects of GenAI on non-academic outcomes, such as student perception (Kelly et al., 2023), critical thinking skills (Guo & Lee, 2023), and creativity (Habib et al., 2024); (b) it only compared the quality of GenAI-generated feedback and other feedback type but not investigated the comparative effectiveness on student academic performance (Almasre, 2024; Banihashem et al., 2024); (c) it explored the effects of teachers using GenAI during the teaching practice (e.g., preparing course materials) on student academic achievement (Ghafouri et al., 2024); (d) it was a theoretical paper (Bearman et al., 2024), review paper (Baber et al., 2023; Zirar, 2023), editorial opinion (Crawford et al., 2023), or personal reflection (Keath et al., 2024).
The selection process was guided by Page et al.’s (2021) PRISMA 2020 flow diagram (See Figure 1). The identified articles were closely examined through title and abstract in the first round of screening and full texts in the second round of screening based on the selection criteria. Around 10% of the initially identified articles were examined by two coders to ensure the reliability of the screening process. Inter-rater reliability (kappa) between the two coders was 0.48, indicating a moderate agreement (Fleiss, 1971). The disagreements were solved by discussing and consulting with an expert before proceeding to the next step. Finally, 19 studies with 24 effect sizes were selected for the meta-analysis. Prisma Flow Diagram for the Study Selection Process.
Data Extraction
To enhance the consistency and reliability of the data extraction process, we developed a data extraction form tailored to the research questions. The form comprised five sections: (1) basic information about the studies, (2) implantation of GenAI, (3) context of the study, (4) research design, and (5) outcomes. The item was coded as missing data if the information was not reported in the selected studies.
The first section contained the basic information about the studies, such as title, author, journal, and publication years. The second section was the implantation of GenAI tools, including GenAI tool, teacher support, and intervention duration. The whole research period was coded as intervention duration because the specific time of using GenAI was seldom reported in the study. The third section relates to the context of the study, including educational level and discipline. The fourth section was the characteristics of the research design, including control group, sample size, and study quality. More specifically, the quality of the included studies was assessed against the modified Effective Public Health Practice Project (EPHPP) instrument (Thomas et al., 2004), including five dimensions: study design, participant selection bias, confounder report and control, data collection methods, and withdrawals and dropouts. The “blinding” dimension in the original instrument was excluded in this meta-analysis because of the difficulty of blinding participants or researchers in educational research (Noetel et al., 2021). The detailed checklist for assigning a score (i.e., strong, moderate, or weak) to each dimension of the included studies was presented in the Supplemental Material. The final computing scores from the five dimensions were the overall quality of the included studies. The fifth section was outcome variables and estimates of effect size (i.e. sample size, mean, and standard deviation).
Statistical Analyses
Effect size calculation was conducted separately for studies with and without control groups. For the study with a control or comparison group, Cohen’s d (Cohen, 1988) was used to calculate effect size. More specifically, for the studies with the control group and only post-test scores available, post-test scores were used in the formula; and for those with the control group and both pre- and post-test scores, the change scores between the pre- and post-test results were used. As for those repeated-measure design studies without control groups, the effect sizes were calculated using Becker’s (1988) formula. As few correlations have been reported in published articles, drawing from related studies (Borenstein et al., 2021), this study adopted a pre-post correlation of 0.5, a conventional practice commonly used in many meta-analyses (e.g., Yan et al., 2022; Zhan et al., 2023). All Cohen’s d values were converted to Hedges’ g (Hedges, 1981) to correct small sample bias. A positive effect size suggests better learning gains for the GenAI group compared with the control group or the positive effect after GenAI use in the single-group study.
Two- and three-level model comparison was conducted due to multiple effect sizes in 15.79% of the selected studies. Analyses of heterogeneity were performed by the I 2 test to figure out the degree of variance in effect sizes: 0%–40%, not important; 30%–60%, moderate; 50%–90%, substantial; and 75%–100% considerable (Shamseer et al., 2015). If heterogeneity were high, moderator analyses with a mixed-effects model would be conducted through meta-regression to identify the sources of variance. Three categories of potential moderators identified in the literature were tested, including the implementation of GenAI tools, the context of the study, and the research design.
To robust the findings of this review, the outliers were detected by checking whether effect sizes fell outside the range (
Results
Descriptive Statistics
Study Summary.
The Overall Effects of GenAI Interventions
Overall, 19 studies with 24 effect sizes reported the comparative effectiveness of GenAI by conducting either a quasi-/experimental study (n = 17, k = 22) or a repeated-measure study (n = 2, k = 2). The forest plot (Figure 2) shows that 14 studies revealed a positive effect size, ranging from 0.08 to 4.39. Five studies reported a negative effect size, ranging from −1.86 to −0.18. No outliers were detected in these studies (Hedge’s g > 4.47 or g < −3.03). The overall effect of GenAI interventions on student academic performance was 0.683, significantly different from zero (95% CI: 0.17–1.19; t = 2.76, p = .01 < 0.05, k = 24 in 19 studies). The between-study heterogeneity variance was estimated at τ
2
= 1.27 (95%CI: 0.75–2.88) with I
2
= 94.4% (95%CI: 92.8%–95.7%), indicating substantial inconsistency between studies. The following moderator analyses reported “teacher support” as a significant moderator. The prediction interval ranged from g = −1.71 to 3.08, indicating possible negative intervention effects in future studies. The ANOVA test of the model comparison between the two-level and three-level models (Table 2) showed that the two-level model had a better model fit with lower Akaike (AIC) and Bayesian Information Criterion (BIC). The likelihood ratio test (LRT) result is also not statistically significant (X
2
= 1.13, p = .29), indicating that the three-level model is unsuitable for the current review. By choosing the two-level model, we ignored independence because only a very small number of studies included more than one effect size; thus, the result of this meta-analysis may not be substantially influenced by treating these effect sizes as independent (van den Noortgate et al., 2013). Forest Plots of Effect Sizes. Model Comparison.
Factors Moderating the Effect Size of GenAI on Student Academic Performance
Differences in Effect Sizes for Moderators.
Note. *p < .05.
Summary: • Significant variation was revealed between studies with and without teacher support. • Other characteristics did not show significant variations.
No other significant moderators were identified, but there are some observable differences in the effect sizes of different categories. For example, GenAI used in natural and applied science courses (g = 0.958, p < .01) had a larger effect size than that used in humanities and social sciences courses (g = 0.017, p > .05). The mean effect size was larger when using course-specific GenAI tools (g = 0.848, p < .1) than when using existing GenAI tools (g = 0.600, p < .1). When GenAI was used in short time durations, the mean effect size (g = 0.833, p < .01) was larger than when it was used for more than one month (g = 0.609, p > .05). In terms of research design, the mean effect size of studies with teacher feedback groups (g = 0.279, p > .05) was smaller than those with no feedback groups (g = 0.687, p > .05) and other resources groups (g = 0.601, p > .05). The mean effect size was bigger in studies with small samples (g = 0.804, p < .05) than with large samples (g = 0.560, p > .05). In terms of study quality, GenAI has a larger effect size in studies with high quality (g = 1.150, p > .05) than in studies with moderate quality (g = 0.482, p > .05) or weak quality (g = 0.695, p > .05). However, none of these comparisons revealed a statistically significant difference.
Sensitivity Analyses
A sensitivity analysis was performed using the leave-one-out method, sorted by the pooled effect size (See Figure 3). The results show that the original pooled effect size will not be influenced when leaving out each study, as the changed effect sizes still fall within the 95% confidence interval of the original pooled effect size (0.17–1.19). Sensitivity Analysis Leaving Out Each Study.
Sensitivity Analysis Excluding Weak Studies.
Note. *Removed as weak studies: Alneyadi & Wardat (2023), Alneyadi & Wardat (2024), Bachiri et al. (2023), Escalante et al. (2023), Mahapatra (2024), Shi et al. (2024), Sun et al. (2024), Wu et al. (2024), Zhou and Kim (2024).
Publication Bias
The funnel plot (See Figure 4) and the statistical data from Egger’s regression test (β = 4.03, t = 3.65, p = .014) showed that the data were asymmetrical. However, Vevea and Woods’ (2005) selection model revealed minimal adjustment from 0.683 to 0.681. This result showed that the observed asymmetry may not be caused by publication bias. Funnel Plot.
Discussion
The relatively new AI technology, GenAI, has the potential to facilitate student learning by providing timely and personalised feedback (Stojanov, 2023); however, overreliance on GenAI tools could exert harmful effects (Susnjak & McIntosh, 2024). This meta-analysis is among the first few attempts to investigate exclusively the effects of GenAI interventions on student academic performance. After examining 24 effect sizes from 19 empirical studies across various disciplines with either experimental-control design or pre-post comparison design, key meta-analytic results were presented as follows: (1) An overall large effect size (g = 0.683) was found on the effects of GenAI interventions on student academic performance. (2) GenAI interventions had a more pronounced effect on students receiving teacher support (g = 1.426) than on those without teacher support in the student-GenAI interaction (g = 0.077). (3) No statistically significant differences were found in effect sizes across different GenAI tools, intervention duration, educational level, discipline, control group, sample size, and study quality.
Overall Effects of GenAI Interventions
The mean effect size of the GenAI interventions in this synthesis is 0.683, suggesting a large effect size in educational interventions (Hattie, 2008). This result supports the claims that GenAI has the potential to facilitate student learning by working as more knowledgeable others and providing learners with personalised scaffolding, real-time feedback, and interactions (Darvishi et al., 2024; Stojanov, 2023). Students may also have more positive psychological reactions (e.g., higher motivation, Song & Song, 2023; higher self-efficacy, Teng, 2024; and less nervousness, Hsu et al., 2023) when interacting with GenAI tools.
The mean effect size of the current study was larger than that revealed in Sun and Zhou’s (2024) meta-analysis (g = 0.533). The higher effect size may be attributed to the broader educational scope of the current review, including both K-12 and college students, while Sun and Zhou (2024) only focused on higher education. Zheng et al. (2023) found that high school students benefited more than post-secondary students; hence, including K-12 students is likely to increase the mean effect size. Although our findings showed that the educational level did not significantly moderate the effects of GenAI interventions on student academic achievement, such a result needs to be interpreted with caution because of the limited number of primary studies in the K-12 context. This synthesis also showed a larger mean effect size than that in Wang et al.'s (2024a) meta-analysis (g = 0.484). This is probably because Wang et al. (2024a) only considered the language learning domain, while this synthesis included studies across different disciplines. Compared with the effect size in the meta-analyses of Wu and Yu (2024) (d = 1.028) and Zheng et al. (2023) (g = 0.812), which investigated AI technology in general, this study revealed a smaller mean effect size. GenAI is a relatively new technology under the umbrella of AI technology. The related innovations may not have been well implemented and validated in naturalistic educational settings (Yan et al., 2024). Researchers, teachers, and students need time to explore and adjust its optimal use in educational practice.
Despite the overall positive effects of GenAI interventions on student academic performance, five out of 19 studies (26.32%) reported a negative effect size, showing that the positive effects of GenAI interventions are not warranted, and the use of this tool needs careful design and implementation. Across all five studies, teachers were not involved in students’ GenAI use, suggesting that this lack of teacher support may have contributed to the negative effects. While GenAI can help automate some educational tasks (e.g., providing feedback and generating questions), teachers’ emotional support, moral guidance, and expertise in specific domains can all be valuable in scaffolding students in the process of GenAI interventions (Tam, 2024). Moreover, these five studies had relatively short intervention durations: three used GenAI tools for less than one month, one had a six-week intervention, and one did not provide relevant information. This result indicates that short GenAI interventions will likely generate negative learning gains, contradicting Wu and Yu’s (2024) finding that students gained more learning benefits in shorter interventions (less than ten weeks). Such a difference is probably because of the definition of the short duration of the intervention (i.e., one month or ten weeks as the threshold) and students’ less familiarity with GenAI tools than general AI chatbots. Although all five studies were conducted in the tertiary context using general GenAI tools, these findings warrant cautious interpretation because of the limited number of comparative studies - more than ten studies each in higher education and using general GenAI tools, but only four studies in K-12 education and six studies using course-specific GenAI tools.
Moderators of GenAI Interventions
Three groups of moderators were examined in this meta-analysis: implementation of GenAI, context of the study, and research design. A key contribution of our study is to identify “teacher support” that could significantly moderate the effects of GenAI interventions, which was not considered in the study of Sun and Zhou (2024). While Sun and Zhou (2024) explored the moderators from the perspective of designable pedagogy, teachers also play an important role in students’ learning using GenAI tools. We found that students could gain more learning benefits after using GenAI tools with teacher support than those who solely rely on GenAI in the learning process. This finding supported the claim that teachers should provide students with scaffolding and supplementary feedback so that students could better incorporate GenAI feedback into their work; otherwise, students may struggle to critically evaluate the GenAI outputs (Su et al., 2023). Han and Li (2024) also emphasised the role of teachers by proposing an “AI + Teacher” model, which argued for making the best use of both the analytical strengths of AI and the pedagogical expertise of instructors. In this way, students can optimise the use of GenAI outputs in their study while maintaining teacher-student interactions. The importance of teachers’ roles has not been diminished; instead, teachers’ pedagogical decisions are vital regarding using GenAI tools (Jeon & Lee, 2023).
Apart from teacher support, no other significant moderators were detected in this synthesis, although the importance of these hypothesised moderators was theoretically supported. These results are inconsistent with previous reviews that identified other significant moderators (e.g., educational level and intervention duration, Wu & Yu, 2024; sample size, Sun & Zhou, 2024; Zheng et al., 2023). One reason may be that the number of included studies and effect sizes in some categories is too small to detect significant moderators. Fu et al. (2011) suggested that at least four studies in each subgroup be the lower bound for categorical moderator analysis. Therefore, in the “educational level” dimension, due to limited evidence from primary education, this study combined K-12 studies rather than analysing GenAI interventions separately for primary and secondary students. This aggregation may have obscured intervention effects, as primary and secondary students differ in language proficiency, digital literacy, self-regulated learning abilities, and academic pressures (Jeon, 2024; Tang et al., 2020). In addition, the nonsignificant moderator effect of “intervention duration” may stem from using “one month” as the dividing threshold. While Lo et al. (2024) recommended a whole semester of implementation to mitigate the novelty effect of ChatGPT, few studies in this synthesis maintained such long durations. The one-month threshold may be insufficient to detect significant differences, as students may still get familiar with this new technology over a five- or six-week implementation. Another reason could be the collinearity of moderators (Murano et al., 2020). For instance, one study sampled university students who had a two-month intervention (Hsu, 2024), while one study selected K-12 students and implemented only a 90-minute session (Meyer et al., 2024). Such a diversity of the characteristics of selected studies may cause inaccurate estimates of individual moderators.
Implications
Researchers argued that GenAI could promote student learning by providing students with timely feedback from various perspectives (Xia et al., 2024) and facilitating student self-directed learning (Yu, 2024). This meta-analysis supports the argument that GenAI could positively affect student academic achievement across contexts. Learners can use GenAI tools for various purposes, such as a virtual intelligent assistant to get instant feedback, a writing assistant to enhance writing skills, or an aiding tool to gain a personalised learning experience (Albadarin et al., 2024). Hence, rather than restricting the use of GenAI, educational institutions should implement guidelines encouraging its integration into teaching and learning processes. These policies should emphasise the distinct value of human instructors and the limitations of GenAI tools (An et al., 2025). Additionally, institutional policies should support students in developing prompt engineering skills to gain high-quality outputs, avoiding the misuse of GenAI tools (Knoth et al., 2024).
Despite the overall positive effects of GenAI on student academic performance, negative impacts were also observed in some cases, as indicated in this meta-analysis (26.32% of the included studies). Hence, careful design and implementation are needed to avoid the harmful effects. Considering that the “teacher support” factor significantly moderated the GenAI effects, teachers are encouraged to proactively participate in students’ dialogue with GenAI tools. Previous research showed that teachers could support students by specifying learning objects before the integration of GenAI tools (Su & Yang, 2023), enhancing students’ prompting strategies and meta-cognitive skills (Zhan & Yan, 2025), demonstrating the use of GenAI tools and discussing with students the ethical issues of using GenAI tools (Moorhouse et al., 2024). This meta-analysis enriched current understanding by identifying additional effective strategies from the selected studies, including discussing GenAI-suggested content with students (Wu et al., 2024), providing ongoing support during students’ interactions with GenAI instead of merely the training before the intervention (Uddin et al., 2023), identifying and explaining GenAI feedback errors (Zhou & Kim, 2024), and teachers being part of the GenAI-based learning system who can upload the teaching materials (Baba et al., 2024), design learning sheet and assessments, and also monitor the learning process (Li, 2023).
Since teachers are not born with the capacity to use GenAI tools properly, professional training is needed to equip teachers with the skills to integrate GenAI tools into classroom activities according to instructional goals (Liu & Xiao, 2024). The effective teacher support strategies identified in this meta-analysis align with Kong et al.’s (2024) teacher professional development framework, which emphasises two components: developing teachers’ AI literacy and fostering their ability to implement student-centred pedagogy when incorporating GenAI in teaching. Teachers need to develop AI literacy to understand GenAI’s capabilities and limitations (e.g., errors in responses and prompt engineering techniques) and become proficient with GenAI teaching tools (e.g., GenAI-based learning platforms). Moreover, with student-centred pedagogical skills, teachers could know how to guide students in effective AI use through modelling and discussion. While only one significant moderator is detected in this synthesis, teachers are still suggested to consider and reflect on when and how to use GenAI tools based on students’ needs to optimise the positive influence on student academic achievement.
Findings from this study can also inform the development of GenAI-based educational tools. While most of the current GenAI tools are not primarily designed for education, as found in most of the included studies (14 out of 19, 73.68%), their effective implementation requires consistent teacher support and guidance. Hence, curriculum designers should strategically align GenAI technology and effective teaching strategies with curriculum standards, classroom content, and teaching objectives (Wang et al., 2024b). As GenAI development and deployment become more cost-effective (Ferrara, 2024), there is also an opportunity for researchers or technicians to develop course-specific GenAI systems. These systems should incorporate teacher roles to facilitate meaningful teacher involvement, such as the platform in Baba et al.’s (2024) study. Such embedded teacher support allows teachers to effectively guide and support student learning while maximising the benefits of GenAI technology.
Limitations and Future Studies
The current meta-analysis has several limitations. First, only 19 studies, including a very small number of high-quality studies (15.78%), were included in this synthesis. Although the sensitivity analysis excluding weak studies showed a slight difference, such a small sample size with quite a high proportion of weak studies may limit generalizability. Considering the rapid emergence of empirical studies about GenAI interventions, future reviews can include more eligible and high-quality studies to update our understanding in this area. Second, this study only focuses on the effects of GenAI interventions on student academic performance. Apart from cognitive outcomes, GenAI may also influence self-regulated learning (Lee et al., 2024) and social-emotional outcomes (e.g., motivation and attitudes toward GenAI) (Salas-Pilco, 2020). Future studies could investigate non-academic learning outcomes, such as motivation, self-efficacy, self-regulated learning, and higher-order thinking skills. Third, because of the missing information and limited variability in this group of studies, this synthesis did not investigate several personal and contextual factors that might contribute to the high heterogeneity, potentially limiting the generalizability of the findings. For instance, training students before using GenAI tools (Abdelhalim, 2024), along with their familiarity (Wood & Moss, 2024) and self-efficacy (Tantivejakul et al., 2024) in using these technologies, can influence their acceptance and effective use of this novel technology. Pedagogical methods in which GenAI is integrated can also vary across different studies (e.g., ChatGPT-based flipped learning, Li, 2023; using ChatGPT for self- and peer assessment, Mahapatra, 2024). Future research could explore these important factors that may influence the effects of GenAI interventions. Additionally, we imputed a correlation of 0.5 when using Becker’s (1988) formula to calculate the effect size of Cohen’s d for studies with repeated-measures design. Instead of conducting a sensitivity analysis of a wide range of correlations, using a fixed value of correlation to estimate the effect size may bias the true treatment effect because of not capturing comprehensive possibilities in the selected studies (Cuijpers et al., 2017).
Conclusion
This meta-analysis investigated the effects of GenAI interventions on student academic performance. The results showed that overall, GenAI interventions positively affected student academic performance. This finding supported the theoretical arguments that GenAI has the potential to promote student learning. However, negative effects were observed in some studies, encouraging teachers and students to implement this novel AI technology in practice with careful design. This meta-analysis also revealed that students with teacher support in GenAI interventions gained significantly larger learning benefits in the use of GenAI tools than students solely dependent on GenAI tools, suggesting an indispensable role of teachers in students’ interaction with GenAI tools. Alongside the rapid development of GenAI technology, more studies are needed to scrutinise how to use GenAI tools effectively and optimise their impact on learning.
Supplemental Material
Supplemental Material - Effects of GenAI Interventions on Student Academic Performance: A Meta-Analysis
Supplemental Material for Effects of GenAI Interventions on Student Academic Performance: A Meta-Analysis by Jiahe Gu and Zi Yan in Journal of Educational Computing Research
Footnotes
Acknowledgements
The authors acknowledge and thank Peiyao Zhang’s careful screening and coding.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Ethical Statement
Data Availability Statement
Data will be available on request.
Supplemental Material
Supplemental material for this article is available online.
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
