Sage Journals: Discover world-class research

Abstract

Despite the growing potential of Generative Artificial Intelligence (GenAI) to enhance learning—particularly in transforming traditional English as a Foreign Language (EFL) teaching and learning practices—there is still limited research available to guide educators and practitioners in understanding its role in pedagogical contexts. This systematic literature review (SLR) explores GenAI’s roles in EFL instruction by examining application contexts, research methods employed, and key issues identified. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), 51 articles from an initial pool of 284 studies published between 2020 and 2024 (based on early access year) were selected from WoS and Scopus. Findings of application contexts revealed the marked preference of higher education settings, particularly in East Asia and the Middle East, with an overwhelming focus on writing instruction. Methodologically, mixed and qualitative methods, large sample sizes, and subjective data prevailed. Furthermore, research issues demonstrated GenAI’s versatile yet double-edged role in specific courses, including writing, reading, speaking, grammar, and general EFL instruction, particularly its role as an assessor. Notably, despite divergent findings on the effectiveness of GenAI in writing instruction, students consistently preferred teacher feedback over GenAI feedback. Moreover, a teacher-centered perspective remained dominant in studies of general EFL instruction. Therefore, future research is encouraged to broaden application contexts, strengthen quantitative approaches with varied sample sizes and objective data, and deepen exploration of the GenAI’s roles in the full teaching cycle, addressing existing divergences and incorporating more perspectives.

Keywords

GenAI EFL instruction language instruction systematic literature review AI in education

Introduction

Generative Artificial Intelligence (GenAI, used herein as an umbrella term including tools like ChatGPT and DALL-E), defined as algorithms capable of producing audio, visual, and written content (McKinsey & Company, 2023), is reshaping how English is taught and learned (Yeh, 2025). While traditional English as a foreign language (EFL) instruction is confronted with persistent challenges regarding personalization, student engagement, and assessment burdens (Kawinkoonlasate, 2020; Q. Li et al., 2023; Michel-Villarreal et al., 2023; Yavuz et al., 2025), GenAI offers potential solutions by enabling dynamic teaching, interactive learning, and automated assessment to enhance learner engagement and improve outcomes (Hieu & Thao, 2024; Mohamed, 2024; Sayed et al., 2024; Topsakal & Topsakal, 2022).

Despite its potential, GenAI has yet to be fully embraced by educators, and its implementation in EFL classrooms remains limited (Gao et al., 2024; X. Liu & Xiao, 2025). This hesitation is rooted in teachers’ lack of professional training, ethical concerns regarding academic integrity, and the scarcity of practical guidance for effective integration (Arefian et al., 2024; Kohnke et al., 2023b; Zaiarna et al., 2024). Furthermore, the existing literature presents conflicting perspectives. While some researchers criticized GenAI for producing shallow or generic content (X. Liu & Xiao, 2025), others highlighted its ability to generate structured and useful teaching materials (Kusuma et al., 2024). Similarly, Zou et al. (2025) found that teacher feedback led to better engagement than GenAI feedback, whereas Guo and Wang (2024) suggested that GenAI’s feedback might better boost students’ engagement in revision.

In view of the aforementioned challenges and contradictions, a systematic literature review (SLR) is needed to further understand its roles in EFL instruction. What is available in the current body of literature reviews focuses on exploration of GenAI in a broader educational context, specifically in higher education or general education contexts (Batista et al., 2024; Hwang & Chang, 2023; L. Yan et al., 2024), while others only focus on AI or are restricted to ChatGPT (Liang et al., 2023; Lo et al., 2024; Meniado, 2023), with limited emphasis on EFL context. Specifically, this study is aimed at exploring current literature by analyzing: (a) application contexts, (b) research methods employed, and (c) key issues identified.

Literature Review

GenAI is transforming education amid rapid technological advancements (Bahroun et al., 2023). Bahroun et al. (2023) asserted that research on GenAI in education has surged since 2018, partly due to the innovation and popularity of GenAI technologies. Supported by Large Language Models (LLMs), loads of GenAI applications are adept at text processing and generation (R. Lee, 2025). Consequently, recent studies have demonstrated its significant potential in various educational fields, spanning computer science, engineering education, higher education, medical education, nursing education, and academia (Abdulai & Hung, 2023; Cain et al., 2023; Crompton & Burke, 2023; Denny et al., 2023; Dergaa et al., 2023; Relmasira et al., 2023).

In the context of language instruction, GenAI serves as a resourceful, accessible counselor for both students and teachers, drawing increasing attention (Fryer et al., 2020; Kohnke et al., 2023a). Notably, it holds considerable promise for enhancing the language teaching cycle: preparation, implementation, and evaluation (Kovačević, 2023). In preparation, GenAI can help predict students’ performance, find effective teaching methods based on learning data, create discussion questions, adjust materials for students with different skill levels, provide example writing for reference, and make teaching handouts with explanations or practice tasks (Bahroun et al., 2023; Chaudhry & Kazim, 2021; Pack & Maloney, 2023). During implementation, GenAI tools can function as intelligent tutoring systems and support learning management (Bahroun et al., 2023). Particularly, it helps non-native speakers with their writing, making it more effective and easier to manage, while also increasing students’ interest, learning skills, and English abilities (Chan & Lee, 2023; Tang & Deng, 2022; D. Yan, 2023). For evaluation, GenAI can automatically generate assessments, including grades and feedback (Bucol & Sangkawong, 2025; J. Li et al., 2024; Shin & Lee, 2024).

Methodology

This research utilized a systematic literature review process based on the PRISMA 2020 paradigm (Page et al., 2021) with the Modified Technology-based Learning Model proposed by Lo et al. (2024) as its theoretical framework. A literature search was performed in two primary academic databases, Scopus and Web of Science (WoS), concentrating on the incorporation of GenAI in EFL education. The review procedure adhered to the four major steps of PRISMA: identification, screening, eligibility, and inclusion, following the steps highlighted by Page et al. (2021).

Search Strategy

In the initial phase (identification), two major databases (Scopus and WoS) were searched using the following search string: TITLE-ABS-KEY ((“Generative Artificial Intelligence” OR “GenAI” OR “GenAI” OR “GAI” OR “LLMs” OR “Large Language Models” OR “ChatGPT”) AND (“English as a Foreign Language” OR “EFL” OR “Foreign Language Teaching”)).

Inclusion and Exclusion Criteria

During this process, the search was limited to articles written in English and published from 2020 to 2024 (based on early access year). The year 2020 was selected as the starting date as it marked a critical turning point. In 2020, GPT-3 was introduced as the first GenAI model with near-human coherence (with a human detection accuracy of only 52%; Brown et al., 2020). Unlike pre-2020 models requiring fine-tuning, GPT-3 lowered barriers for educators by enabling operation via natural language prompts (Brown et al., 2020). Consequently, since 2020, GenAI has been capable of generating high-quality outputs comparable to human performance (Brown et al., 2020; Trigka & Dritsas, 2025). Furthermore, there was research on GenAI in education in 2020 (Mittal et al., 2024; Zhu et al., 2020).

On top of that, during the screening stage, titles, abstracts, and research questions were examined to identify studies specifically focused on the use of GenAI in EFL instruction. Importantly, studies addressing GenAI-assisted evaluation were also included, as evaluation is considered a core component of instructional design (Tyler, 1975).

To refine the dataset further, only empirical studies were selected. Following the definition used by Batista et al. (2024), empirical studies were those that involved the collection and analysis of data to generate objective, evidence-based findings—excluding theoretical, opinion-based, or speculative works. Table 1 provides a summary of the inclusion and exclusion criteria used in this stage.

Table 1.

Inclusion and Exclusion Criteria.

Criterion	Inclusion	Exclusion
Topic	Studies focusing on the role of GenAI in EFL instruction	Studies not centered to the role of GenAI in EFL instruction
Study type	Empirical studies	Non-empirical studies (e.g., reviews, position papers)
Document type	Articles	Other than articles (e.g., conference papers, book chapters)
Source type	Journals	Source other than journals
Time period	2020–2024	Not within the 2020–2024 time period
Language	English	Non-English

Study Selection

As depicted in Figure 1, a total of 284 articles were initially identified from Scopus (n = 159) and WoS (n = 125). After removing duplicate articles (n = 94) and one article without an English version, 189 records remained for screening. Subsequently, articles unrelated to the use of GenAI (n = 5) and not themed on EFL instruction (n = 127) were excluded. Finally, after removing six non-empirical studies, 51 studies were retained.

Figure 1.

A PRISMA 2020 flow diagram illustrating the study selection process. Adaptation from Page et al. (2021).

Data Extraction and Analysis

In line with Lo et al.’s (2024) Modified Technology-based Learning Model, the data extraction and analysis focused on analyzing: (1) application contexts, (2) research methods employed, and (3) key issues identified. For application contexts, the following information was collected: (1a) study locations, (1b) educational contexts, and (1c) learning domains. For research methods, the analysis categorized the articles based on (2a) research approaches (i.e., quantitative, qualitative, or mixed methods), (2b) research data sources (e.g., surveys and interviews) and research topics; (2c) research sample size (i.e., large, medium or small sample sizes). Additionally, thematic analysis was conducted on the selected articles to further examine the key issues.

Findings and Discussion

Findings

Findings of the systematic review were structured around the application contexts, research methods, and research issues concerning the roles of GenAI in EFL instruction.

(1) Application Contexts.

For application contexts, three dimensions were examined, including (1a) study locations, (1b) educational contexts, and (1c) learning domains. Table 2 summarizes the major findings of application contexts.

Table 2.

Major Findings of Application Contexts.

Application contexts	Predominant focus	Underrepresented areas
Study locations	East Asia, the Middle East	Europe, Africa, South Asia
Educational contexts	Higher education	Secondary education, adult education, elementary education, special education, and primary to upper secondary education
Learning domains	Writing	Speaking, reading, and grammar

Overall, GenAI in EFL instruction was most extensively explored in East Asia and the Middle East, with limited research in Europe, Africa, and South Asia. Similarly, higher education was the most frequently studied educational context, while other levels such as elementary, secondary, and special education remained underrepresented. In terms of learning domains, there was a strong focus on writing, whereas speaking, grammar, reading, and especially listening received comparatively little attention.

(1a) Study locations.

Figure 2 presents the distribution of the study locations across all 51 selected articles.

Figure 2.

Distribution of study locations.

As depicted in Figure 2, a total of 51 studies on the role of GenAI in EFL instruction were conducted across 22 countries and administrative regions, categorized into six geographical areas, including East Asia (n = 19), the Middle East (n = 17), Southeast Asia (n = 8), Europe (n = 5), Africa (n = 1), and South Asia (n = 1). Research was predominantly conducted in East Asia (n = 19, 37.3%) and the Middle East (n = 17, 33.3%), whereas studies in Africa and South Asia remained scarce, with only one conducted by Sayed et al. (2024) in Ethiopia and one by Almashy et al. (2024) in India. Of all the 22 countries and administrative regions, the mainland of China (n = 11, 22%) was the primary contributor. Interestingly, there were two studies under special circumstances. One study was conducted by ElEbyary and Shabara (2024) in the UK, where English is a mother tongue, but this study involved participants who are native Arabic speakers. Another study by Zaiarna et al. (2024) was conducted in Ukraine, but involved participants from Ukraine, the EU, and the USA.

(1b) Educational contexts.

To examine the educational contexts represented in the literature, all 51 articles were analyzed based on their focal teaching settings, as depicted in Figure 3.

Figure 3.

Distribution of educational contexts.

Figure 3 demonstrates the role of GenAI in EFL instruction across six educational contexts: higher education, secondary education, adult education, elementary education, special education, and primary to upper secondary education. Additionally, 6% of the studies did not specify a fixed educational context, including those by Mena Octavio et al. (2024), Stewart and Zheng (2024), and Parviz (2024). Higher education (66%) was the primary focus, followed by secondary education (16%). Studies covering primary to upper secondary education (6%) were more frequent than those focused solely on primary education (2%). Research on adult education, elementary education, and special education was the least prevalent, each making up 2%. Notably, most studies on GenAI in EFL instruction focused on general EFL context, with one exception: Fan et al. (2024) specifically concentrated on EAP context in higher education.

(1c) Learning domains.

As presented in Figure 4, there were five primary language learning domains where GenAI was applied in EFL instruction: writing (n = 28, 55%), speaking (n = 2, 4%), grammar (n = 1, 2%), reading (n = 1, 2%), and unspecified domains (n = 19, 37%). Writing dominated the field, among which one study by Sapan and Uzun (2024) investigated the role of GenAI in writing and vocabulary instruction. A substantial proportion of studies (37%) did not focus on a specific language skill but examined GenAI’s role in lesson preparation (e.g., Milad & Fayez, 2024; Yeh, 2025), instructional implementation (e.g., Mena Octavio et al., 2024), and teacher competence development (e.g., Kartal, 2024; Korucu-Kış, 2024). Only a few studies examined GenAI in speaking, grammar, and reading. No studies explored GenAI’s role in listening skills, despite its importance as a core EFL competency as emphasized by Lo et al. (2024).

(2) Research methods.

Figure 4.

Distribution of learning domains.

Regarding research methods, the review examines (2a) the research approaches employed, (2b) the sample sizes involved, and (2c) the research topics and data sources used. Table 3 presents a summary of the key findings.

Table 3.

Key Findings and Implications of Research Methods.

Research methods	Predominant focus	Underrepresented areas
Research approaches	Mixed methods and qualitative methods	Quantitative studies
Sample sizes	Large-scale samples	Middle and small-scale samples
Topics and data sources	Relying on subjective data	Objective performance

Methodologically, mixed-methods approaches were most common, followed by qualitative approaches, while quantitative studies were comparatively limited. Moreover, most studies involved large sample sizes, with fewer small and medium sample sizes. In terms of topics and data collection, the selected studies tended to rely on subjective data from perceptions and self-report instruments, such as questionnaires and interviews to examine educational outcomes, pedagogical benefits, teacher perceptions, and assessment-related issues. Topics and data sources based on objective performance were underrepresented.

(2a) Research approaches.

As shown in Figure 5, research approaches were categorized into three types: quantitative methods, qualitative methods and mixed methods. The majority of studies adopted mixed methods, comprising 43% of the total, followed by qualitative research, representing 37%, while quantitative research was the least common, at 20%.

Figure 5.

Distribution of research approaches.

Table 4 details the distribution of research approaches by their early access year. The findings revealed a distinct turning point for the field: no relevant studies (n = 0) were found prior to 2023. The field emerged with a small number of publications in 2023 (n = 4) and then surged exponentially in 2024 (n = 47).

Table 4.

Research Approaches by Early Access Year (n = 51).

Research approaches	2023 (n = )	2024 (n = )	Total
Qualitative methods	1	18	19
Quantitative methods	1	9	10
Mixed methods	2	20	22

(2b) Research sample sizes.

In line with the criteria of Liang et al. (2023), a sample size no more than 10 was classified as small, a sample size between 11 and 30 as medium, and a sample size no less than 30 as large. As shown in Figure 6, large sample sizes were most frequently used (n = 26; 51%), followed by medium sample sizes (n = 14; 27%), while small sample sizes were the least common (n = 11; 22%).

(2c) Topics and data sources.

Figure 6.

Distribution of research sample sizes.

Figure 7 presents the methodological preferences in the research on the role of GenAI in EFL instruction from 2020 to 2024. Research on GenAI in EFL instruction was generally categorized into four main topics, including educational effects, pedagogical strengths, teacher perspectives, and evaluation, utilizing 11 data sources, including questionnaires/surveys, interviews, tests/assessments, reflective records, artifacts, technology-based records, observation records, lesson plan templates, feedback/revision records, scales, and rating criteria/rubrics.

Figure 7.

Topics and data sources.

Notably, most studies incorporated multiple data sources. Interviews (n = 25; 25.8%) and questionnaires (n = 20; 20.6%) were the primary data collection methods for studying the roles of GenAI in EFL instruction and were widely applied across all four research themes. This indicated that subjective data played a significant role in investigating the role of GenAI in EFL instruction. Studies on EFL assessment inclined to collecting data from multiple dimensions and multiple perspectives, employing the widest range of data sources except lesson plan templates, with questionnaires/surveys (n = 11; 22.9%) being the most frequently used. Studies on teacher perspectives mainly relied on interviews (n = 7; 53.8%), followed by questionnaires/surveys (n = 4; 30.8%), whereas artifacts and technology-based records were the least utilized. Studies on pedagogical strengths utilized seven data sources, with interviews (n = 6; 33.3%) being the most common, followed by observations (n = 5; 27.8%). However, technology-based records, feedback/revision records, scales, and rating criteria/rubrics were not employed. Studies on educational effects utilized scales, technology-based records, reflective records, tests/assessments, interviews, and questionnaires/rubrics as data sources. Among these, interviews were the most frequently used (n = 5; 27.8%), followed by scales and questionnaires/surveys (n = 22.2%).

(3) Research issues.

The key findings related to these research issues are summarized in Table 5.

Table 5.

Key Findings Related to Research Issues.

Application areas	Research issues	Key findings	Underrepresented areas
Writing instruction (n = 28)	• GenAI’s efficacy as a feedback provider (n = 14) • GenAI’s efficacy as an automatic rater (n = 4) • GenAI’s efficacy as a teaching and learning assistant (n = 10)	• Strengths and weaknesses of GenAI • Divergent findings of GenAI feedback, rating, and impact on learning motivation. • Students’ preference for teacher feedback	• Other roles of GenAI during and before instruction • Resolving divergent findings in feedback, rating, and motivation.
Oral instruction (n = 2)	• GenAI’s efficacy as a study buddy (n = 1) • GenAI’s efficacy as a feedback provider and rater (n = 1)	• Positive psychological and emotional effects but limitations on language level	• Other roles of GenAI during and before instruction • Objective effects
Grammar instruction (n = 1)	• GenAI’s efficacy as a feedback provider (n = 1)	• Superiority to teacher-led instruction. • Though with concerns about reduction of critical thinking skills, over-reliance and irrational use, its merits outweigh its disadvantages.	• Other roles of GenAI during and before instruction • Replication studies to validate this single finding
Reading instruction (n = 1)	• GenAI’s efficacy as a teaching assistant (for material development; n = 1)	• GenAI is efficient for novel tasks but unable to interpret multimodal texts (e.g., charts) and prone to unreliable suggestions.	• Other roles of GenAI during and after instruction • Replication studies to validate this single finding
General EFL instruction (n = 19)	• Teachers’ perspectives on GenAI as a teaching assistant (n = 11) • GenAI’s efficacy as a psychological and cognitive facilitator (n = 4) • GenAI’s efficacy as an instructional designer and facilitator (n = 4)	• Focusing on teachers’ perspectives, which shows positive yet cautious views • Double-edged impacts on psychological and cognitive development • Multifaceted yet structurally inferior to teachers’ designs.	• Students’ perspectives • Objective effects • Optimization strategies

As illustrated in Table 5, four prominent findings are summarized. Most notably, the reviewed studies demonstrate a versatile yet double-edged role of GenAI in the instruction of writing, reading, speaking, grammar, and general EFL courses. Particularly, it mainly functions as a feedback provider, automatic rater, teaching assistant, study buddy, psychological and cognitive facilitator, and instructional designer. It presents potential in assisting lesson planning, implementation, assessment, and facilitating psychological and cognitive development of teachers and learners. However, challenges remain, such as rigid lesson planning, overreliance, and concerns such as feedback accuracy and academic integrity. Secondly, research issues are markedly uneven, focusing primarily on GenAI’s role as an assessor (feedback provider and rater) across writing, speaking, grammar, and general EFL instruction, while other roles were underrepresented. Thirdly, while there are inconsistencies concerning the findings of comparing GenAI feedback and ratings with those of teachers as well as its effects on motivation, researchers have commonly found that students prefer teacher feedback (Zeevy-Solovey, 2024; Zou et al., 2025). Finally, studies on general EFL instruction predominantly focus on teachers’ perspective but pay limited attention to other perspectives.

(3a) Research issues in writing instruction (n = 28).

Writing instruction constituted the primary research focus within the literature (n = 28; 54.9%), centering on three key issues concerning the efficacy of feedback (n = 14; 50%), grading (n = 4; 14.3%), and broader pedagogical application and effects (n = 10; 35.7%), demonstrating the positive and negative roles of GenAI.

GenAI’s Efficacy as a Feedback Provider

As a feedback provider, GenAI played a double-edged role. Positively, it delivered immediate, direct, comprehensive, abundant feedback to students and corrected language errors (Almashy et al., 2024; Gozali et al., 2024; Guo et al., 2024; Guo & Wang, 2024; Polakova & Ivenz, 2024; Teng, 2024). Ultimately, it enhanced feedback quality and promoted students’ motivation, collaboration, and writing skills (Gozali et al., 2024; Guo et al., 2024; Polakova & Ivenz, 2024; Teng, 2024). Compared to teacher feedback, GenAI feedback was generally more detailed, direct, and structured (Guo & Wang, 2024). Moreover, GenAI’s comments on essay content contained more praise and were therefore more favored by students (Guo & Wang, 2024; Zou et al., 2025). Negatively, GenAI feedback was found to include irrelevant comments, excessive length, inaccessibility, incorrect and confusing feedback, and ethical and contextual insensitivity (Bucol & Sangkawong, 2025; Gozali et al., 2024; Guo & Wang, 2024; H. S. Long, 2024). Compared to teacher feedback, GenAI feedback focused on more superficial aspects and demonstrated lower quality (Fan et al., 2024; Zou et al., 2025), resulting in lower student acceptance and preference and less progress (H. S. Long, 2024; D. Yan, 2024; Zeevy-Solovey, 2024; Zou et al., 2025).

Notably, there was a debate. Some studies suggested GenAI outperforms teachers in offering more praise in feedback, which is supposed to encourage students’ engagement in revision (Guo & Wang, 2024), whereas some argued that GenAI underperforms in accuracy, motivating students’ engagement in revision (Fan et al., 2024; Zou et al., 2025). Conversely, some found GenAI feedback as effective as teacher feedback (Alsofyani & Barzanji, 2025).

GenAI’s Efficacy as an Automated Rater

As a rater, GenAI demonstrated human-like grading with greater consistency, validity, objectivity, and systematicity than human raters (Bucol & Sangkawong, 2025; J. Li et al., 2024; Shin & Lee, 2024; Yavuz et al., 2025). However, it tended to be more lenient, assigning higher scores and sometimes overlooking specific criteria or exhibiting comprehension biases when evaluating complex or creative writing (Bucol & Sangkawong, 2025; Shin & Lee, 2024).

Nevertheless, there was also divergence. While Yavuz et al. (2025) observed similar scores between GenAI and human teachers on content and organization, Shin and Lee (2024) reported more lenient and higher scores from GenAI, whereas J. Li et al. (2024) found it overall outperformed human raters.

GenAI’s Efficacy as a Teaching and Learning Assistant

Studies examining the broader pedagogical application and effects revealed the roles of GenAI in assisting teaching and learning. For learners, it enhanced learning experience (Hieu & Thao, 2024; Huang & Mizumoto, 2025), engagement (Hieu & Thao, 2024; Polakova & Ivenz, 2024; Teng, 2024; Woo et al., 2024), motivation and self-efficacy (Huang & Mizumoto, 2025; Z.-M. Liu et al., 2024; Song & Song, 2023; Teng, 2024), feedback literacy (Gozali et al., 2024), and hereby promoted writing performance and ability (Ghafouri et al., 2024; Guo et al., 2024; Z.-M. Liu et al., 2024; Polakova & Ivenz, 2024; Woo et al., 2024). For teachers, it supported professional development, self-efficacy, workload reduction, and broader technological adoption while fostering critical thinking and creativity (Ghafouri et al., 2024; Hieu & Thao, 2024). However, limitations in contextual alignment, integration with existing methods, language quality, and technical resources may lead to over-reliance, ethical risks, and reduced student creativity (Hieu & Thao, 2024; Song & Song, 2023; Stewart & Zheng, 2024), resulting in smaller gains in vocabulary, writing, and learner satisfaction compared to teacher-led instruction (Ahmed, 2023; Sapan & Uzun, 2024; Sawangwan, 2024).

Nonetheless, some studies reported that GenAI enhanced student motivation (Huang & Mizumoto, 2025; Z.-M. Liu et al., 2024; Song & Song, 2023), whereas Woo et al. (2024) observed only a marginal, non-significant increase in motivation.

(3b) Research issues in oral instruction (n = 2).

Only two studies investigated the role of GenAI in EFL oral instruction, focusing on the psychological and emotional impacts of GenAI in communicative practice and speaking assessment within higher education.

Yıldız (2024) investigated the role of GenAI in communicative activities as a study buddy, revealing that GenAI could boost students’ speaking self-efficacy, confidence, and enjoyment while reducing stress by creating a supportive and non-judgmental environment for oral practice. Nevertheless, with a majority of feedback focusing on grammar and vocabulary, there was limited feedback on pronunciation, intonation, and stress patterns.

Sayed et al. (2024) delved into GenAI’s role as a rater and feedback provider in oral tests and reported that GenAI enhanced oral skills, autonomy, and academic resilience by providing instant, personalized feedback in a non-judgmental setting, reducing anxiety and increasing motivation. Additionally, GenAI eased teachers’ workloads and made curricula more dynamic, suggesting that curriculum designers might benefit from incorporating GenAI to promote mental health, autonomy, and GenAI-assisted testing (Sayed et al., 2024).

(3c) Research issues in grammar instruction (n = 1).

A single experimental study by Kucuk (2024) particularly focused on the role of GenAI as a learning assistant and feedback provider in grammar instruction by comparing the GenAI-led instruction and teacher-led instruction.

The findings also revealed a double-edged role for GenAI. Positively, this study found that GenAI-assisted grammar instruction resulted in more significant improvement of university students’ grammar proficiency, compared to the teacher-led grammar instruction. Moreover, GenAI obtained high satisfaction due to its interactive, personalized, and constant support. However, concerns were raised about ambiguous responses and insufficient feedback, which may reduce their critical thinking skills, increase their dependence on technology, and lead to irrational use. Despite these challenges, Kucuk’s (2024) study ultimately concluded that the benefits of GenAI-assisted grammar instruction outweighed its disadvantages.

(3d) Research issues in reading instruction (n = 1).

A single qualitative study conducted by Xin (2024) focused on the role of GenAI as a teaching assistant for teaching material development in reading instruction.

Based on three EFL teachers’ usage of GenAI in EFL reading instruction, Xin’s (2024) study identified both the advantages and limitations of using GenAI in developing instructional materials, particularly for text modification, task design, and acquiring instructional suggestions. Results revealed that GenAI was conducive to increasing efficiency and the ability to generate novel and learner-centered tasks. Nevertheless, teachers found that the tool could not interpret multimodal elements (like posters or charts) within a PDF and sometimes provided unreliable answers or suggestions that were not pedagogically sound.

Therefore, Xin (2024) also suggested that teachers must rely on their pedagogical expertise, understanding of students, and linguistic awareness to make informed judgments when using GenAI for instructional materials development. The study also proposed a D-R-E-A-M model (Determine, Render, Evaluate, Adjust, Make decision) to guide teachers in developing instructional materials with GenAI.

(3e) Research issues in general EFL instruction (n = 19).

Studies related to the role of GenAI in general EFL instruction majorly focused on three issues: teachers’ perspectives on GenAI roles (n = 11), psychological and cognitive impacts (n = 4) and GenAI’s efficacy in lesson planning and implementation (n = 4).

Teachers’ Perspectives on GenAI as a Teaching Assistant (n = 11)

Eleven studies examined perspectives on GenAI in EFL instruction from general teachers (n = 6), novice teachers (n = 4), and special education teachers (n = 1).

Overall, EFL teachers generally held a positive yet cautious view of GenAI (Zaiarna et al., 2024). They recognized its value particularly in teaching preparation and assessment (Derakhshan & Ghiasvand, 2024; Mohamed, 2024; Parviz, 2024; Ulla et al., 2023; Zaiarna et al., 2024). However, significant concerns were raised, primarily regarding over-dependence, trustworthiness, academic integrity and teacher-student interaction (Derakhshan & Ghiasvand, 2024; Gao et al., 2024; Ulla et al., 2023; Zaiarna et al., 2024).

Preservice teachers acknowledged GenAI’s role in professional growth despite its limitations (Kartal, 2024; Kusuma et al., 2024; Mustroph & Steinbock, 2024; Wulandari & Purnamaningwulan, 2024). However, similar concerns were also mentioned, including information quality, accuracy, over-reliance, ethical risks, and misinformation (Kusuma et al., 2024; Wulandari & Purnamaningwulan, 2024). Therefore, they proposed that effective integration required human-GenAI collaboration, critical analysis, and creativity (Kartal, 2024; Mustroph & Steinbock, 2024).

In special education, Alenezi et al. (2023) found that attitudes toward GenAI in special education were moderate, with female teachers showing greater willingness for future use.

GenAI’s Efficacy as a Psychological and Cognitive Facilitator (n = 4)

Existing literature demonstrated that GenAI exerted multiple psychological and cognitive effects in EFL instruction. Y. J. Lee and Davis (2024) highlighted its role in boosting learners’ motivation, interest, and confidence. Ghafouri et al. (2024) found that structured GenAI teaching models helped build emotionally supportive learning environments and students’ psychological grit. Korucu-Kış (2024) noted its potential to enhance teacher creativity, though effectiveness depended on expertise and faced challenges like input accuracy and content repetition. Hınız (2024) argued that while GenAI provided diverse materials and promoted inclusiveness, it also raised concerns about plagiarism and cognitive skill development.

GenAI’s Efficacy as an Instructional Designer and Facilitator (n = 4)

Studies also indicated that GenAI’s multifaceted role in EFL lesson planning and implementation (Mena Octavio et al., 2024; Williyan et al., 2024; Yeh, 2025). Williyan et al. (2024) revealed that GenAI could assist teachers in lesson design, classroom introduction, content presentation, practice activities, immediate feedback, and assessment, fostering adaptability and creativity. Mena Octavio et al. (2024) confirmed its positive impact on planning, implementation, and assessment. Yeh (2025) noted its role in personalizing instruction, increasing interactivity, and making classrooms more student-cantered. However, Milad and Fayez (2024) compared a GenAI lesson plan with that of student teachers and found that GenAI-generated lesson plans followed a rigid, linear structure, relied heavily on vague teacher instructions, and lacked detailed interaction design.

Discussion

Imbalanced Distribution of Application Contexts

The imbalanced distribution of study location, with East Asia and the Middle East as the most common locations, while limited research in Europe, Africa, and South Asia, may be interpreted by the Technology Acceptance Model (TAM), which believes that perceived usefulness results in the willingness to adopt technology (Davis, 1989). In East Asia, the Middle East, and Southeast Asia, AI is seen as more beneficial than risky, but perceived risks dominate in Europe and South Asia, and infrastructural limitations constrain access in Africa (Maslej, 2025; Neudert et al., 2020).

The dominance of higher education is associated with classical cognitive development theories which suggest university students embrace more advanced abstract thinking and critical reasoning abilities (Piaget & Duckworth, 1970). Consequently, their understanding of artificial intelligence becomes more comprehensive and in-depth than at earlier stages (Staikova et al., 2024), thereby enhancing the feasibility of applying GenAI in EFL instruction.

The strong focus on writing, with little or no attention to other skills, aligns closely with the second language acquisition theories (SLA). According to the Output Hypothesis, language internalization occurs through active language production (Swain, 1995). Powered by LLMs, many GenAI applications excel in text processing and generation (R. Lee, 2025), thereby reinforcing the output–feedback–revision cycle that underpins writing development (Gozali et al., 2024; Song & Song, 2023). In contrast, speaking depends on authentic, two-way interaction, as emphasized in M. H. Long’s (1996) Interaction Hypothesis, which remains challenging for GenAI to replicate (Michel-Villarreal et al., 2023). For receptive skills such as reading and listening, GenAI can generate abundant comprehensible input consistent with Krashen’s (1985) Input Hypothesis. However, its role remains largely supportive rather than essential. Similarly, grammatical acquisition, which relies on contextualized language production, is more effectively fostered through writing than through isolated grammar exercises.

Preference for Research Methods

The result of research approaches indicated the prevalence of mixed methods approaches, followed by qualitative and quantitative approaches. This finding contrasts with previous research reviews on similar but broader topics, which predominantly featured quantitative studies, such as Liang et al.’s (2023) study on the role of AI in language education, Hwang and Chang’s (2023) study on chatbots in education, and Batista et al.’s (2024) study on GenAI in education. This indicates the need for further support from quantitative empirical research. Moreover, research began in 2023 and surged in a year suggesting that the related research may still be in the exploratory stage. This phenomenon may be related to the public release of ChatGPT on November 30, 2022 (Lo, 2023).

The high number of large-sample studies seems contradictory to the lack of quantitative research. However, this can be explained by the prevalence of mixed-methods research, which tends to involve larger sample sizes (e.g., Ghafouri, 2024; Sawangwan, 2024; D. Yan, 2024). This suggests more research focusing on statistical validity that overlooks the in-depth analysis from small-sample studies.

In terms of focus and data collection, the selected studies tend to center on educational outcomes, pedagogical benefits, teacher perceptions, and assessment-related issues, predominantly using questionnaires and interviews as data sources. This indicates a focus on subjective perceptions, lacking objective quantification. This finding aligns with S. Lee et al. (2025), who noted that research on GenAI in language classrooms is still at an early stage and largely relies on subjective data.

Discussion of Research Issues Identified

The double-edged role of GenAI in EFL instruction accords with the constructivist view of learning as a process of active knowledge construction (Vygotsky et al., 1978). On the one hand, it can function as a powerful scaffold, accelerating skill acquisition in diverse ways, such as quickly correcting errors and enhancing feedback quality (Gozali et al., 2024; Polakova & Ivenz, 2024). On the other hand, it enables students to evade necessary cognitive challenges, through shortcuts, such as plagiarism (Bucol & Sangkawong, 2025). This leads to the significant risks of over-reliance and a decline in creativity, as noted in several reviewed studies (Hieu & Thao, 2024; Song & Song, 2023).

Findings suggest a predominant concentration on applying GenAI for assessment, including feedback and grading, especially in writing instruction, whereas research on its role in teaching implementation and preparation remains limited. This trend can be interpreted through SLA theories. Assessment aligns closely with the Output Hypothesis (Swain, 1995), which posits that learners internalize language knowledge through a cyclical output–feedback–revision process. Language output generally encompasses both speaking and writing (Zhang et al., 2024). GenAI facilitates this process by providing instant, personalized, and non-judgmental feedback, which enables learners to identify and correct linguistic errors, thereby promoting rapid internalization of language knowledge (Almashy et al., 2024; Sayed et al., 2024). The directness and measurability of this feedback render such studies easier to design and quantify, thereby contributing to their prevalence in literature. Conversely, teaching implementation and preparation pertain to the Input Hypothesis (Krashen, 1985) and the Interaction Hypothesis (M. H. Long, 1996), both of which emphasize comprehensible input and meaning negotiation. Research in these domains is more complex owing to contextual factors, technological constraints, learner variability, and instructional design (Zhang & Dong, 2024), leading to a comparatively smaller body of empirical studies that focus on authentic classroom applications (S. Lee et al., 2025).

The divergence in findings regarding feedback, scoring, and effectiveness in writing instruction may stem from both technical variations and differences in research participants. For instance, J. Li et al. (2024) employed ChatGPT-4, whereas Yavuz et al. (2025) compared ChatGPT and Google’s Bard. In terms of participants, Woo et al. (2024) examined secondary school students, while Huang and Mizumoto (2025) focused on university students.

Learners consistently demonstrate a lower preference for GenAI feedback (Zeevy-Solovey, 2024; Zou et al., 2025). This result can be interpreted through Vygotsky et al.’s (1978) Sociocultural Theory, in which learning is socially and emotionally mediated. Nevertheless, despite its multiple strengths, GenAI lacks the empathetic and emotional dimensions inherent in human instruction, prompting learners to show a clear preference for teacher-led guidance (Michel-Villarreal et al., 2023).

The tendency of teacher perspective in general EFL instruction primarily can also be explained from a constructivist standpoint, in which teachers play a crucial role in designing learning environments and guiding the learning process (Vygotsky et al., 1978). Therefore, they serve as “gatekeepers” in facilitating the effective integration of GenAI into education (Yue et al., 2025). However, since learning involves learners’ active construction of knowledge (Piaget & Duckworth, 1970), the student perspective is equally essential and warrants greater attention.

Significantly, the findings are not merely transient technological effects but emerge from the interaction of relatively stable factors, including technology acceptance, students’ cognitive maturity, the inherent characteristics of GenAI, socio-cultural contexts, and teachers’ mediating roles. Given the stability of these factors, the conclusions of this study carry enduring pedagogical implications.

Conclusion and Limitations

To conclude, this systematic review analyzed 51 studies related to the role of GenAI in EFL instruction, guided by Page et al.’s (2021) PRISMA protocols. To capture the most recent publications, the timespan (2020–2024) was applied to the early access year. The findings reveal three key features of the current research landscape: (1) contextual imbalance, with a heavy focus on higher education, East Asia and the Middle East, and writing instruction; (2) methodological preferences for mixed and qualitative methods, large sample sizes, and subjective data; and (3) research issues exhibiting GenAI’s versatile but double-edged role, an emphasis on assessment, divergent results on the effectiveness of GenAI in writing instruction, students’ preference for teacher-feedback, and teacher-centeredness in the studies of general EFL instruction.

This study provides significant implications for both teaching and policy. For teachers, given the double-edged role of GenAI, it is recommended to critically integrate the technology rather than simply adopt it (Xin, 2024). Teachers should employ GenAI as a scaffolding tool (Guo & Wang, 2024; Zou et al., 2025), while guiding students on its limitations, including bias, overreliance, misinformation, and risks to academic integrity (Bucol & Sangkawong, 2025; Kusuma et al., 2024; Wulandari & Purnamaningwulan, 2024). For policymakers, in view of potential cultural biases and ethical concerns (Song & Song, 2023), institutions should establish clear ethical guidelines and safeguards and provide professional development programs that emphasize the pedagogical use of GenAI, rather than treating it as a mere technological tool.

Future research on GenAI in EFL instruction should focus on three priorities: first, expanding contextual diversity by including underrepresented regions (e.g., Europe, Africa, South Asia), diverse educational levels (e.g., K-12, adult, and special education), and multiple language skills beyond writing (e.g., speaking, reading, and listening); second, optimizing research methods by increasing quantitative studies, incorporating small- and medium-sized samples, and integrating objective performance data rather than relying solely on self-reports; and third, deepening research issues by examining GenAI’s multifaceted roles throughout the full instructional cycle (S. Lee et al., 2025), conducting replication studies to address inconsistencies in feedback and grading, and exploring perspectives beyond teachers, such as those of students and administrators.

This study shows two main limitations. On the one hand, in view of the rapid advancement of GenAI, the specific publication period (2020–2024) may limit the inclusion of the most recent technological developments. To address this limitation, this review incorporated early access articles from early 2025 and integrated the latest literature from 2025 into the analysis of the research background, research issues, and Discussion sections to ensure the currency of the study. Moreover, based on stable sociocultural contexts, intrinsic technological mechanisms, and teachers’ pivotal roles rather than specific software versions, the findings possess lasting explanatory significance. On the other hand, the limitation of databases from literature selection (only Scopus and WoS) may result in imbalance and bias. For instance, the focus on higher education limits generalizability to K-12 and other contexts, and the reliance on subjective data sources like questionnaires and interviews may introduce bias. Future studies should broaden the publication range and datasets to include diverse educational settings and adopt robust quantitative methods to better understand GenAI’s role in EFL instruction.

Footnotes

Acknowledgements

We thank the anonymous reviewers for their constructive feedback and insightful comments, which significantly improved the quality of this manuscript.

ORCID iDs

Luying Deng

Khairul Azhar Jamaludin

Ethical Considerations

This study is a systematic literature review of previously published studies. No human participants were directly recruited or involved in this research. Therefore, ethical approval was not required.

Consent to Participate

Since this study relies exclusively on secondary data from publicly available academic publications, informed consent was not applicable.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data supporting the findings of this study are available from the corresponding author upon reasonable request.

References

Abdulai

A. F.

Hung

(2023). Will ChatGPT undermine ethical values in nursing education, research, and practice? Nursing Inquiry, 30(3), e12556. https://doi.org/10.1111/nin.12556

Ahmed

M. A.

(2023). ChatGPT and the EFL classroom: Supplement or substitute in Saudi Arabia’s Eastern region. Information Sciences Letters, 12(7), 2727–2734. https://doi.org/10.18576/isl/120704

Alenezi

M. A. K.

Mohamed

A. M.

Shaaban

T. S.

(2023). Revolutionizing EFL special education: How ChatGPT is transforming the way teachers approach language learning. Innoeduca International Journal of Technology and Educational Innovation, 9(2), 5–23. https://doi.org/10.24310/innoeduca.2023.v9i2.16774

Almashy

Ahmed

A. S. M. M.

Jamshed

Ansari

M. S.

Banu

Warda

W. U.

(2024). Analyzing the impact of CALL tools on English learners’ writing skills: A comparative study of errors correction. World Journal of English Language, 14(6), 657–667. https://doi.org/10.5430/wjel.v14n6p657

Alsofyani

A. H.

Barzanji

A. M.

(2025). The effects of ChatGPT-generated feedback on Saudi EFL learners’ writing skills and perception at the tertiary level: A mixed-methods study. Journal of Educational Computing Research, 63(2), 431–463. https://doi.org/10.1177/07356331241307297

Arefian

M. H.

Çomoğlu

Dikilitaş

(2024). Understanding EFL teachers’ experiences of ChatGPT-driven collaborative reflective practice through a community of practice lens. Innovation in Language Learning and Teaching, 1–16. https://doi.org/10.1080/17501229.2024.2412769

Bahroun

Anane

Ahmed

Zacca

(2023). Transforming education: A comprehensive review of generative artificial intelligence in educational settings through bibliometric and content analysis. Sustainability, 15(17), 12983. https://doi.org/10.3390/su151712983

Batista

Mesquita

Carnaz

(2024). Generative AI and higher education: Trends, challenges, and future directions from a systematic literature review. Information, 15(11), 676. https://doi.org/10.3390/info15110676

Brown

Mann

Ryder

Subbiah

Kaplan

J. D.

Dhariwal

Neelakantan

Shyam

Sastry

Askell

Agarwal

Herbert-Voss

Krueger

Henighan

Child

Ramesh

Ziegler

Winter

. . . Amodei

(2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://doi.org/10.48550/arXiv.2005.14165

10.

Bucol

J. L.

Sangkawong

(2025). Exploring ChatGPT as a writing assessment tool. Innovations in Education and Teaching International, 62(3), 867–882. https://doi.org/10.1080/14703297.2024.2363901

11.

Cain

Malcom

D. R.

Aungst

T. D.

(2023). The role of artificial intelligence in the future of pharmacy education. American Journal of Pharmaceutical Education, 87, 100135. https://doi.org/10.1016/j.ajpe.2023.100135

12.

Chan

C. K. Y.

Lee

K. K. W.

(2023). The AI generation gap: Are gen Z students more interested in adopting generative AI such as ChatGPT in teaching and learning than their gen X and millennial generation teachers? Smart Learning Environments, 10(1), 60. https://doi.org/10.1186/s40561-023-00269-3

13.

Chaudhry

M. A.

Kazim

(2021). Artificial Intelligence in Education (AIEd): A high-level academic and industry note 2021. AI and Ethics, 2(1), 157–165. https://doi.org/10.1007/s43681-021-00074-z

14.

Crompton

Burke

(2023). Artificial intelligence in higher education: The state of the field. International Journal of Educational Technology in Higher Education, 20(1), 22. https://doi.org/10.1186/s41239-023-00392-8

15.

Davis

F. D.

(1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly, 13(3), 319. https://doi.org/10.2307/249008

16.

Denny

Kumar

Giacaman

(2023). Conversing with Copilot: Exploring prompt engineering for solving CS1 problems using natural language [Conference session]. Proceedings of the 54th ACM Technical Symposium on Computer Science Education V, 1. https://doi.org/10.1145/3545945.3569823

17.

Derakhshan

Ghiasvand

(2024). Is ChatGPT an evil or an angel for second language education and research? A phenomenographic study of research-active EFL teachers’ perceptions. International Journal of Applied Linguistics, 34(4), 1246–1264. https://doi.org/10.1111/ijal.12561

18.

Dergaa

Chamari

Zmijewski

Ben Saad

(2023). From human writing to artificial intelligence generated text: Examining the prospects and potential threats of ChatGPT in academic writing. Biology of Sport, 40(2), 615–622. https://doi.org/10.5114/biolsport.2023.125623

19.

ElEbyary

Shabara

(2024). ChatGPT-generated corrective feedback: Does it do what it says on the tin? Teaching English With Technology, 2024(3), 68–89. https://doi.org/10.56297/vaca6841//bffo7057/myeh4562

20.

Fan

Tan

Lim

G. Y. W.

(2024). EAP teacher feedback in the age of AI: Supporting first-year students in EFL disciplinary writing. Australian Journal of Applied Linguistics, 7(3), 1943. https://doi.org/10.29140/ajal.v7n3.1943

21.

Fryer

Coniam

Carpenter

Lăpuşneanu

(2020). Bots for language learning now: Current and future directions. Language Learning and Technology, 24, 8–22. https://doi.org/10.64152/10125/44719

22.

Gao

Wang

(2024). Exploring EFL university teachers’ beliefs in integrating ChatGPT and other large language models in language education: A study in China. Asia Pacific Journal of Education, 44(1), 29–44. https://doi.org/10.1080/02188791.2024.2305173

23.

Ghafouri

(2024). ChatGPT: The catalyst for teacher-student rapport and grit development in L2 class. System, 120, 103209. https://doi.org/10.1016/j.system.2023.103209

24.

Ghafouri

Hassaskhah

Mahdavi-Zafarghandi

(2024). From virtual assistant to writing mentor: Exploring the impact of a ChatGPT-based writing instruction protocol on EFL teachers’ self-efficacy and learners’ writing skill. Language Teaching Research, 1–23. https://doi.org/10.1177/13621688241239764

25.

Gozali

Wijaya

A. R. T.

Lie

Cahyono

B. Y.

Suryati

(2024). Leveraging the potential of ChatGPT as an automated writing evaluation (AWE) tool: Students’ feedback literacy development and AWE tools integration framework. The JALT CALL Journal, 20(1), 1–22. https://doi.org/10.29140/jaltcall.v20n1.1200

26.

Guo

Pan

Lai

(2024). Effects of an AI-supported approach to peer feedback on university EFL students’ feedback quality and writing ability. Internet and Higher Education, 63, 100962. https://doi.org/10.1016/j.iheduc.2024.100962

27.

Guo

Wang

(2024). To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing. Education and Information Technologies, 29(7), 8435–8463. https://doi.org/10.1007/s10639-023-12146-0

28.

Hieu

H. H.

Thao

L. T.

(2024). Exploring the impact of AI in language education: Vietnamese EFL teachers’ views on using ChatGPT for fairy tale retelling tasks. International Journal of Learning, Teaching and Educational Research, 23(3), 486–503. https://doi.org/10.26803/ijlter.23.3.24

29.

Hınız

(2024). A year of generative AI in English language teaching and learning - A case study. Journal of Research on Technology in Education, 1–21. https://doi.org/10.1080/15391523.2024.2404132

30.

Huang

Mizumoto

(2025). The effects of generative AI usage in EFL classrooms on the L2 motivational self system. Education and Information Technologies, 30, 6435–6454. https://doi.org/10.1007/s10639-024-13071-6

31.

Hwang

G.-J.

Chang

C.-Y.

(2023). A review of opportunities and challenges of chatbots in education. Interactive Learning Environments, 31(7), 4099–4112. https://doi.org/10.1080/10494820.2021.1952615

32.

Kartal

(2024). The influence of ChatGPT on thinking skills and creativity of EFL student teachers: A narrative inquiry. Journal of Education for Teaching, 50(4), 627–642. https://doi.org/10.1080/02607476.2024.2326502

33.

Kawinkoonlasate

(2020). Online language learning for Thai EFL learners: An analysis of effective alternative learning methods in response to the covid-19 outbreak. English Language Teaching, 13(12), 15. https://doi.org/10.5539/elt.v13n12p15

34.

Kohnke

Moorhouse

B. L.

Zou

(2023a). ChatGPT for language teaching and learning. RELC Journal, 54(2), 537–550. https://doi.org/10.1177/00336882231162868

35.

Kohnke

Moorhouse

B. L.

Zou

(2023b). Exploring generative artificial intelligence preparedness among university language instructors: A case study. Computers and Education: Artificial Intelligence, 5, 100156. https://doi.org/10.1016/j.caeai.2023.100156

36.

Korucu-Kış

(2024). Zone of proximal creativity: An empirical study on EFL teachers’ use of ChatGPT for enhanced practice. Thinking Skills and Creativity, 54, 101639. https://doi.org/10.1016/j.tsc.2024.101639

37.

Kovačević

(2023). Use of ChatGPT in ESP teaching process [Conference session]. 2023 22nd International Symposium INFOTEH-JAHORINA (INFOTEH). https://doi.org/10.1109/INFOTEH57020.2023.10094133

38.

Krashen

S. D.

(1985). The Input hypothesis: Issues and implications (1st ed., pp. 1–64). Longman.

39.

Kucuk

(2024). ChatGPT integrated grammar teaching and learning in EFL classes: A study on Tishk International University students in Erbil, Iraq. Arab World English Journal, 1(1), 100–111. https://doi.org/10.24093/awej/chatgpt.6

40.

Kusuma

I. P. I.

Roni

Dewi

K. S.

Mahendrayana

(2024). Revealing the potential of ChatGPT for English language teaching: EFL preservice teachers’ teaching practicum experience. Studies in English Language and Education, 11(2), 650–670. https://doi.org/10.24815/siele.v11i2.34748

41.

Lee

(2025). Large language models (LLMs) and Generative Artificial Intelligence (GenAI). In Lee

(Ed.), Natural language processing (pp. 241–273). Springer Nature Singapore.

42.

Lee

Choe

Zou

Jeon

(2025). Generative AI (GenAI) in the language classroom: A systematic review. Interactive Learning Environments, 0(0), 1–25. https://doi.org/10.1080/10494820.2025.2498537

43.

Lee

Y. J.

Davis

R. O.

(2024). A case study of implementing generative AI in university’s general English courses. Contemporary Educational Technology, 16(4), ep533. https://doi.org/10.30935/cedtech/15218

44.

Liang

J.-C.

Hwang

G.-J.

Chen

M.-R. A.

Darmawansah

(2023). Roles and research foci of artificial intelligence in language education: An integrated bibliographic analysis and systematic review approach. Interactive Learning Environments, 31(7), 4270–4296. https://doi.org/10.1080/10494820.2021.1958348

45.

Huang

Whipple

P. B.

(2024). Evaluating the role of ChatGPT in enhancing EFL writing assessments in classroom settings: A preliminary investigation. Humanities and Social Sciences Communications, 11(1), 1–9. https://doi.org/10.1057/s41599-024-03755-2

46.

Xie

Zeng

(2023). The influence of teaching practicum on foreign language teaching anxiety among pre-service EFL teachers. Sage Open, 13(1), 21582440221149005. https://doi.org/10.1177/21582440221149005

47.

Liu

Xiao

(2025). Chinese university teachers’ engagement with generative AI in different stages of foreign language teaching: A qualitative enquiry through the prism of ADDIE. Education and Information Technologies, 30(1), 485–508. https://doi.org/10.1007/s10639-024-13117-9

48.

Liu

Z.-M.

Hwang

G.-J.

Chen

C.-Q.

Chen

X.-D.

(2024). Integrating large language models into EFL writing instruction: Effects on performance, self-regulated learning strategies, and motivation. Computer Assisted Language Learning, 1–25. https://doi.org/10.1080/09588221.2024.2389923

49.

C. K.

(2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences, 13(4), 410. https://doi.org/10.3390/educsci13040410

50.

C. K.

P. L. H.

D. T. K.

Jong

M. S. Y.

(2024). Exploring the application of ChatGPT in ESL/EFL education and related research issues: A systematic review of empirical studies. Smart Learning Environments, 11(1), 50. https://doi.org/10.1186/s40561-024-00342-5

51.

Long

H. S.

(2024). Exploring the use of ChatGPT as a tool for written corrective feedback in an EFL classroom. Journal of AsiaTEFL, 21(2), 397–412. https://doi.org/10.18823/asiatefl.2024.21.2.8.397

52.

Long

M. H.

(1996). The role of the linguistic environment in second language acquisition. In Ritchie

W. C.

Bhatia

T. K.

(Eds.), Handbook of second language acquisition (pp. 413–468). Academic Press, Inc.

53.

Maslej

(2025). Artificial Intelligence Index Report 2025 (p. 457). Stanford’s Human-Centered AI Institute. https://hai.stanford.edu/ai-index/2025-ai-index-report?utm

54.

McKinsey & Company. (2023, January 19). What is ChatGPT, DALL-E, and generative AI?. What is Generative AI?https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-generative-ai#/

55.

Mena Octavio

González Argüello

M. V.

Pujolà

J.-T

. (2024). ChatGPT as an AI L2 teaching support: A case study of an EFL teacher. Technology in Language Teaching and Learning, 6(1), 1142. https://doi.org/10.29140/tltl.v6n1.1142

56.

Meniado

J. C.

(2023). The impact of ChatGPT on English language teaching, learning, and assessment: A rapid review of literature. Arab World English Journal, 14(4), 3–18. https://doi.org/10.24093/awej/vol14no4.1

57.

Michel-Villarreal

Vilalta-Perdomo

Salinas-Navarro

D. E.

Thierry-Aguilera

Gerardou

F. S.

(2023). Challenges and opportunities of Generative AI for Higher Education as explained by ChatGPT. Education Sciences, 13(9), 856. https://doi.org/10.3390/educsci13090856

58.

Milad

Fayez

(2024). Your way AI way: Let’s meet half-way: Paradigm shift in TEFL. World Journal of English Language, 14(5), 468–481. https://doi.org/10.5430/wjel.v14n5p468

59.

Mittal

Sai

Chamola

Sangwan

(2024). A comprehensive review on generative AI for education. IEEE Access, 12, 142733–142759. https://doi.org/10.1109/access.2024.3468368

60.

Mohamed

A. M.

(2024). Exploring the potential of an AI-based chatbot (ChatGPT) in enhancing English as a foreign language (EFL) teaching: Perceptions of EFL faculty members. Education and Information Technologies, 29(3), 3195–3217. https://doi.org/10.1007/s10639-023-11917-z

61.

Mustroph

Steinbock

(2024). ChatGPT in foreign language education - friend or foe? A quantitative study on pre-service teachers’ beliefs. Technology in Language Teaching and Learning, 6(1), 1133. https://doi.org/10.29140/tltl.v6n1.1133

62.

Neudert

L. M.

Knuutila

Howard

P. N.

(2020). Global attitudes towards AI, machine learning & automated decision making. University of Oxford. https://oxcaigg.oii.ox.ac.uk

63.

Pack

Maloney

(2023). Potential affordances of generative AI in language education: Demonstrations and an evaluative framework. Teaching English With Technology, 23(2), 4. https://doi.org/10.56297/buka4060/vrro1747

64.

Page

M. J.

McKenzie

J. E.

Bossuyt

P. M.

Boutron

Hoffmann

T. C.

Mulrow

C. D.

Shamseer

Tetzlaff

J. M.

Akl

E. A.

Brennan

S. E.

Chou

Glanville

Grimshaw

J. M.

Hróbjartsson

Lalu

M. M.

Loder

E. W.

Mayo-Wilson

McDonald

. . . Moher

(2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 88, 105906. https://doi.org/10.1016/j.ijsu.2021.105906

65.

Parviz

(2024). The double-edged sword: AI integration in English language education from the perspectives of Iranian EFL instructors. Complutense Journal of English Studies, 32, e97261. https://doi.org/10.5209/cjes.97261

66.

Piaget

Duckworth

(1970). Genetic epistemology (Vol. 13). Columbia University Press.

67.

Polakova

Ivenz

(2024). The impact of ChatGPT feedback on the development of EFL students’ writing skills. Cogent Education, 11(1), 2410101. https://doi.org/10.1080/2331186x.2024.2410101

68.

Relmasira

S. C.

Lai

Y. C.

Donaldson

J. P.

(2023). Fostering AI literacy in elementary science, technology, engineering, art, and mathematics (STEAM) education in the age of generative AI. Sustainability, 15(18), 13595. https://doi.org/10.3390/su151813595 Article 18.

69.

Sapan

Uzun

(2024). The effect of ChatGPT-integrated English teaching on high school EFL learners’ writing skills and vocabulary development. International Journal of Education in Mathematics Science and Technology, 12(6), 1679–1699. https://doi.org/10.46328/ijemst.4655

70.

Sawangwan

(2024). ChatGPT vs teacher roles in developing EFL writing. International Journal of Computer-Assisted Language Learning and Teaching, 14(1), 1–21. https://doi.org/10.4018/ijcallt.361235

71.

Sayed

B. T.

Bani Younes

Z. B.

Alkhayyat

Adhamova

Teferi

(2024). To be with artificial intelligence in oral test or not to be: A probe into the traces of success in speaking skill, psychological well-being, autonomy, and academic buoyancy. Language Testing in Asia, 14(1), 49. https://doi.org/10.1186/s40468-024-00321-0

72.

Shin

Lee

J. H.

(2024). Exploratory study on the potential of ChatGPT as a rater of second language writing. Education and Information Technologies, 29(18), 24735–24757. https://doi.org/10.1007/s10639-024-12817-6

73.

Song

(2023). Enhancing academic writing skills and motivation: Assessing the efficacy of ChatGPT in AI-assisted language learning for EFL students. Frontiers in Psychology, 14, 1–14. https://doi.org/10.3389/fpsyg.2023.1260843

74.

Staikova

Ivanova

Chivarov

(2024). Students understanding for AI in different educational levels. IFAC-PapersOnLine, 58(3), 182–186. https://doi.org/10.1016/j.ifacol.2024.07.147

75.

Stewart

Zheng

(. (2024). Generative AI’s recolonization of EFL classrooms: The case of continuation writing. Australian Review of Applied Linguistics, 47(3), 383–409. https://doi.org/10.1075/aral.24091.ste

76.

Swain

(1995). Three functions of output in second language learning. In Cook

Seidlhofer

(Eds.), Principles and practice in applied linguistics: Studies in honor of HG Widdowson (pp. 125–144). Oxford University Press.

77.

Tang

Deng

(2022). The design model of English graded teaching assistant expert system based on improved B/S three-tier structure system. Mobile Information Systems, 2022, 1–9. https://doi.org/10.1155/2022/4167760

78.

Teng

M. F.

(2024). “ChatGPT is the companion, not enemies”: EFL learners’ perceptions and experiences in using ChatGPT for feedback in writing. Computers and Education: Artificial Intelligence, 7, 100270. https://doi.org/10.1016/j.caeai.2024.100270

79.

Topsakal

(2022). Framework for a foreign language teaching software for children utilizing AR, voicebots and ChatGPT (large language models). The Journal of Cognitive Systems, 7(2), 33–38. https://doi.org/10.52876/jcs.1227392

80.

Trigka

Dritsas

(2025). The evolution of generative AI: Trends and applications. IEEE Access, 13, 98504–98529. https://doi.org/10.1109/access.2025.3574660

81.

Tyler

R. W.

(1975). Specific approaches to curriculum development. In Schaffarzick

Hampson

(Eds.), Strategies for curriculum development (pp. 17–33). McCutchan Publishing Corporation.

82.

Ulla

M. B.

Perales

W. F.

Busbus

S. O.

(2023). ‘To generate or stop generating response’ : Exploring EFL teachers’ perspectives on ChatGPt in english language teaching in Thailand. Learning Research and Practice, 9(2), 168–182. https://doi.org/10.1080/23735082.2023.2257252

83.

Vygotsky

L. S.

Cole

John-Steiner

Scribner

Souberman

(1978). Mind in society: The development of higher psychological processes. Harvard University Press.

84.

Williyan

Fitriati

S. W.

Pratama

Sakhiyya

(2024). AI as co-creator: Exploring Indonesian EFL teachers’ collaboration with AI in content development. Teaching English With Technology, 24(2), 5–21. https://doi.org/10.56297/vaca6841/lrdx3699/rzoh5366

85.

Woo

D. J.

Wang

Guo

Susanto

(2024). Teaching EFL students to write with ChatGPT: Students’ motivation to learn, cognitive load, and satisfaction with the learning process. Education and Information Technologies, 29(18), 24963–24990. https://doi.org/10.1007/s10639-024-12819-4

86.

Wulandari

Purnamaningwulan

R. A.

(2024). AI as co-creator: Exploring Indonesian EFL teachers’ collaboration with AI in content development. LLT Journal: A Journal on Language and Language Teaching, 27(2), 878–894. https://doi.org/10.24071/llt.v27i2.8690

87.

Xin

J. J.

(2024). Investigating EFL teachers’ use of generative AI to develop reading materials: A practice and perception study. Language Teaching Research. Advance online publication. https://doi.org/10.1177/13621688241303321

88.

Yan

(2023). Impact of ChatGPT on learners in a L2 writing practicum: An exploratory investigation. Education and Information Technologies, 28(11), 13943–13967. https://doi.org/10.1007/s10639-023-11742-4

89.

Yan

(2024). Comparing individual vs. collaborative processing of ChatGPT-generated feedback: Effects on L2 writing task improvement and learning. Language Learning and Technology, 28(1), 1–19. https://doi.org/10.64152/10125/73597

90.

Yan

Sha

Zhao

Martinez-Maldonado

Chen

Jin

Gašević

(2024). Practical and ethical challenges of large language models in education: A systematic scoping review. British Journal of Educational Technology, 55(1), 90–112. https://doi.org/10.1111/bjet.13370

91.

Yavuz

Çelik

Ö.

YavaşÇelik

(2025). Utilizing large language models for EFL essay grading: An examination of reliability and validity in rubric-based assessments. British Journal of Educational Technology, 56(1), 150–166. https://doi.org/10.1111/bjet.13494

92.

Yeh

H.-C.

(2025). The synergy of generative AI and inquiry-based learning: Transforming the landscape of English teaching and learning. Interactive Learning Environments, 33(1), 88–102. https://doi.org/10.1080/10494820.2024.2335491

93.

Yıldız

(2024). ChatGPT integration in EFL education: A path to enhanced speaking self-efficacy. Novitas-ROYAL, 18(2), 167–182. https://doi.org/10.5281/ZENODO.13861137

94.

Yue

Jong

M. S. Y.

Chen

Jiang

M. Y. C.

(2025). Obstacles or opportunities: Teachers’ concerns about adopting generative AI in learning and teaching [Conference session]. 2025 IEEE International Conference on Advanced Learning Technologies (ICALT). https://doi.org/10.1109/ICALT64023.2025.00084

95.

Zaiarna

Zhyhadlo

Dunaievska

(2024). ChatGPT in foreign language teaching and assessment: Exploring EFL instructors’ experience. Information Technologies and Learning Tools, 102(4), 176–191. https://doi.org/10.33407/itlt.v102i4.5716

96.

Zeevy-Solovey

(2024). Comparing peer, ChatGPT, and teacher corrective feedback in EFL writing: Students’ perceptions and preferences. Technology in Language Teaching and Learning, 6(3), 1482. https://doi.org/10.29140/tltl.v6n3.1482

97.

Zhang

Dong

(2024). Unveiling the dynamic mechanisms of Generative AI in English language learning: A hybrid study based on fsQCA and system dynamics. Behavioral Sciences, 14(11), 1015. https://doi.org/10.3390/bs14111015

98.

Zhang

Sun

(2024). Lingua franca proficiency and cross-border mergers and acquisitions: Language matters. Finance Research Letters, 66, 105667. https://doi.org/10.1016/j.frl.2024.105667

99.

Zhu

Liu

O. L.

Lee

H.-S.

(2020). The effect of automated feedback on revision behavior and learning gains in formative assessment of scientific argument writing. Computers & Education, 143, 103668. https://doi.org/10.1016/j.compedu.2019.103668

100.

Zou

Guo

Wang

Liu

(2025). Investigating students’ uptake of teacher- and ChatGPT-generated feedback in EFL writing: A comparison study. Computer Assisted Language Learning, 1–30. https://doi.org/10.1080/09588221.2024.2447279

Roles of Generative Artificial Intelligence (GenAI) in English as a Foreign Language (EFL) Instruction: A Systematic Literature Review

Abstract

Keywords

Introduction

Literature Review

Methodology

Search Strategy

Inclusion and Exclusion Criteria

Study Selection

Data Extraction and Analysis

Findings and Discussion

Findings

GenAI’s Efficacy as a Feedback Provider

GenAI’s Efficacy as an Automated Rater

GenAI’s Efficacy as a Teaching and Learning Assistant

Teachers’ Perspectives on GenAI as a Teaching Assistant (n = 11)

GenAI’s Efficacy as a Psychological and Cognitive Facilitator (n = 4)

GenAI’s Efficacy as an Instructional Designer and Facilitator (n = 4)

Discussion

Imbalanced Distribution of Application Contexts

Preference for Research Methods

Discussion of Research Issues Identified

Conclusion and Limitations

Footnotes

Acknowledgements

ORCID iDs

Ethical Considerations

Consent to Participate

Funding

Declaration of Conflicting Interests

Data Availability Statement

References