Sage Journals: Discover world-class research

Abstract

This pilot study investigates the potential of a voice-based chatbot (EnMIA) to support speaking fluency, motivation, and engagement among undergraduate Korean language learners at a single U.S. Midwestern university. The chatbot aims to provide interactive, real-world speaking tasks accessible through multi-platforms, supporting seamless learning. Data were collected over one academic semester through pre- and post-speaking assessments and surveys. Speaking performance data indicated an improvement in fluency, though accuracy and complexity remained unchanged, in a pre-post design without control group, suggesting short-term practice may strengthen learners’ ability to speak more smoothly and confidently. However, without a control group, gains cannot be solely attributed to the tool. Survey results showed high perceptions of support, design, and usability, with interactive tasks correlated with motivation. The findings highlight chatbot-supported interaction potentially enhance motivation and fluency, while pointing to the need for extended practice and targeted design to affect accuracy and complexity.

Keywords

voice-based technology interactive tools chatbot language learning class participation

The primary goal of second language (L2) programs in higher education is to foster students’ communicative competence and linguistic proficiency, enabling effective real-world interactions (Littlemore & Low, 2006; Schulz, 2006). This communicative emphasis, encompassing fluent meaning conveyance and interpretation, aligns with broader second language acquisition (SLA) goals that prioritize meaningful language use in diverse contexts alongside grammatical accuracy (Kessler & Bikowski, 2010). As globalized societies demand interculturally adaptable multilingual communicators (Jackson, 2020), instructors must provide ample authentic practice opportunities to build fluency and confidence. While the broader literature on L2 communicative competence applies globally, this pilot study is scoped to undergraduate Korean language learners in a U.S. university context using EnMIA over one semester.

Theoretical frameworks underscore interaction and production as key acquisition drivers: negotiating meaning in conversations refines output and comprehension (Loewen & Sato, 2018), while meaningful contexts strengthen fluency and internalization of structures (Shehadeh, 2022). Thus, curricula should prioritize real-world tasks like dialogues and role-plays (Ellis et al., 2020). Still, traditional L2 classroom settings often constrain opportunities for oral practice due to limited instructional time, large class sizes, and diverse learner proficiency levels (Blake, 2016; Zhao & Lai, 2023). Consequently, educators are increasingly turning to technology to create scalable, interactive environments that support communicative practice.

Educational technology advancements have transformed L2 pedagogy by offering less-intimidating environments that reduce anxiety and encourage risk-taking (Dickinson et al., 2008). Computer-assisted language learning (CALL) and mobile-assisted language learning (MALL) have introduced dynamic platforms, such as virtual reality (VR) applications (Nicolaidou et al., 2023), artificial intelligence (AI) dialogue systems (Zhai & Wibowo, 2023), and AI-generated courseware (Schroeder et al., 2022), which facilitate personalized and immersive learning experiences. These tools align with contemporary CALL frameworks, which advocate for technology that supports task-appropriate language use and feedback (Ziegler & González-Lloret, 2022). For instance, VR enables contextualized practice, AI simulates real-time conversation (Godwin-Jones, 2019; F. Li & Li, 2023), and MALL leverages mobile devices for flexible, autonomous learning (Kukulska-Hulme, 2018; F. Li & Li, 2023).

Among these technologies, voice-based AI chatbots have emerged as particularly promising for L2 speaking practice. Unlike text-based systems, voice-based chatbots support oral proficiency by enabling learners to engage in spoken dialogues, receive pronunciation feedback, and practice conversational turn-taking (Jeon, 2023). These capabilities address a critical gap in traditional L2 instruction, where opportunities for oral practice are often limited (Bibauw et al., 2022). Chatbots also align with motivational theories, as interactive tasks foster engagement and intrinsic motivation (Boo et al., 2015; Oh & Song, 2021). The effectiveness of chatbots, however, depends on their design, requiring authentic tasks, intuitive usability, and accessibility to prevent excessive cognitive overload in complex interfaces (Kalyuga & Singh, 2016; Wang et al., 2025).

This study investigates the role of a voice-based chatbot in providing L2 conversation practice within an interactive learning environment. It aims to develop a framework for designing communicative tasks that leverage technology to potentially enhance speaking proficiency. Specifically, the study addresses two objectives: (1) to explore how communicative tasks suitable for an interactive learning environment can be crafted, focusing on task types such as interactive speaking and assessments, and (2) to evaluate the impact of these tasks on students’ perceived motivation, engagement, and language proficiency. By examining the voice-based chatbot's support for learning, design principles, and usability, the study contributes to the CALL and MALL literature, addressing gaps in voice-based chatbot applications and task-specific effects (Plonsky & Ziegler, 2024). The findings aim to inform L2 educators on integrating technology to foster communicative competence, preparing students for effective L2 communication in diverse, intercultural contexts.

Recent debates in CALL/MALL (e.g., Burston et al., 2024; R. Li, 2024) also highlight challenges in attributing gains to AI tools in quasi-experimental designs (e.g., pre-post without controls), where maturation or instructional effects may confound results. Moreover, while meta-analyses show moderate effects of chatbots on speaking skills (Lyu et al., 2025), voice-based systems remain underexplored in formal settings.

Literature Review

Second language acquisition (SLA) research has increasingly focused on leveraging technology to enhance learner motivation, interaction, and usability in language education. The advent of computer-assisted language learning (CALL) and mobile-assisted language learning (MALL) has introduced innovative tools, such as conversational agents or chatbots, to support scalable, interactive practice in various language learning tasks (Bibauw et al., 2022; Jeon, 2023; Song et al., 2017).

Task-Based Language Teaching

Task-based language teaching (TBLT) is a learner-centered approach emphasizing authentic, real-world tasks to foster meaningful use in second language (L2) acquisition. Tasks, designed by instructors and often negotiated with learners, engage students in activities that prioritize communication, pragmatic processing, and non-linguistic outcomes, such as problem-solving or collaboration (F. Li & Li, 2023). TBLT has gained widespread adoption in L2 education globally, valued for its ability to integrate language skills with practical application (Ellis et al., 2020). Learning outcomes in TBLT emerge from dynamic interactions among tasks, learners, and contexts, making task design critical to effective language development (Shehadeh, 2022).

TBLT's efficacy lies in its focus on tasks that mirror real-life communication, encouraging meaning negotiation and contextual language production (González-Lloret, 2022). For instance, tasks such as role-plays or dialogues prompt learners to use language purposefully, enhancing fluency and engagement. Recent studies underscore TBLT's compatibility with technology-mediated environments, where digital tools can simulate authentic scenarios and provide immediate feedback (Ziegler & González-Lloret, 2022). Voice-based chatbots could align with TBLT by offering interactive platforms for learners to practice conversational tasks, receive pronunciation corrections, and engage in meaning-focused exchanges, addressing the need for scalable oral practice (Shehadeh, 2022).

Technology-enhanced TBLT also supports sociocultural learning outcomes, fostering multiliteracies that connect individual language use with broader social contexts (Plonsky & Ziegler, 2024). Chatbots facilitate this by providing platforms for collaborative, task-based interactions, where learners can negotiate meaning with the system or peers, enhancing both linguistic and intercultural competence. However, effective task design in digital environments requires balancing authenticity with usability, ensuring tasks are engaging yet accessible (F. Li & Li, 2023). This study leverages TBLT principles to design conversational tasks, investigating their impact on motivation, engagement, and proficiency in L2 speaking practice.

Interaction in Second Language Learning

Interaction is a cornerstone of TBLT, facilitating meaningful language use and learner engagement in second language (L2) acquisition (Robinson, 2011). TBLT emphasizes three key principles in fostering interaction. First, meaning-focused tasks are critical for promoting interaction, as they engage learners in authentic, communicative activities. Tasks such as collaborative dialogues or problem-solving scenarios encourage learners to produce language in context, enhancing fluency and comprehension (Shehadeh, 2022). For example, collaborative tasks foster negotiation of meaning, improving the quality and complexity of learners’ speech (Loewen & Sato, 2018). Instructional design should prioritize tasks that integrate authentic materials and real-world contexts to maximize meaningful interaction, supporting L2 development (González-Lloret, 2022). Second, interaction opportunities in TBLT enable active participation, often scaffolded by technology. Loewen and Sato (2018) identify two interaction types: (1) problem-solving interactions, where learners address communication breakdowns, and (2) form-focused interactions, where tasks prompt attention to linguistic accuracy alongside meaning. Technology, such as voice-based chatbots, enhances these interactions by simulating conversational partners, offering real-time feedback, and reducing speaking anxiety, which can hinder performance (Gregersen et al., 2014). Last, TBLT's learner-centered approach shifts instructors’ roles to facilitators, empowering learners to take active roles in task engagement. Learners collaborate, evaluate resources, and discuss outcomes, fostering autonomy and interaction (Shehadeh, 2022). Technology-mediated tasks support this by offering flexible, accessible platforms for collaborative practice (Plonsky & Ziegler, 2024). However, research on interactive chatbots in TBLT remains limited, particularly regarding how task design leverages mobile affordances to enhance interaction.

Despite TBLT's proven effectiveness, gaps persist in understanding technology's role in supporting interaction, especially for speaking proficiency. Many L2 learners face limited exposure to oral practice, and traditional settings often exacerbate anxiety (Gregersen et al., 2014). Richards (2006) highlights the need for innovative interventions, such as using emerging technology, to provide ample speaking opportunities.

Technology-Enabled Language Learning

Technology-enabled language learning has transformed language education by facilitating flexible, context-integrated learning experiences. It supports structured content delivery and assessment in formal settings and informal learning activities that extend beyond the classroom, promoting seamless learning across diverse contexts (Kukulska-Hulme, 2018).

Seamless learning integrates formal and informal L2 experiences, connecting classroom instruction with real-world practice through physical and digital environments (Burston, 2014; Wong & Looi, 2011). It leverages mobile devices to support synchronous and asynchronous activities, fostering learner communities and enabling continuous engagement (F. Li & Li, 2023). For instance, Nicolaidou et al. (2023) compared traditional classroom instruction, mobile app use in contextual settings (e.g., a zoo), and extended mobile use at home, finding that extended mobile access significantly improved vocabulary acquisition. This suggests mobile technologies extend learning opportunities, allowing learners to practice L2 skills in varied contexts without relying solely on classroom time (Kukulska-Hulme, 2018).

Despite their potential, the use of technology in L2 education faces challenges that impact effectiveness. Technical issues, such as connectivity problems or unintuitive interfaces, can disrupt learning, particularly for diverse learner populations (Plonsky & Ziegler, 2024). González-Lloret (2022) notes that poorly designed technology-mediated tasks may increase cognitive load, deterring engagement. Usability, defined as the extent to which a tool enables effective, efficient, and satisfactory use in specific contexts (Nielsen, 2012), is thus critical. Usability evaluations must consider instructional design, learner interactions, and accessibility to accommodate diverse needs (Fennelly-Atkinson et al., 2023; Kalyuga & Singh, 2016). For example, complex interfaces requiring technical expertise can hinder L2 learners, emphasizing the need for intuitive, user-friendly designs (F. Li & Li, 2023).

Effective technology tools require design principles that balance pedagogical goals with technological affordances. Seamless learning designs should incorporate personalized tasks, support transitions between learning activities, and align with sociocultural models of L2 acquisition, which emphasize integrative motivation and community engagement (Burston, 2014). For voice-based tools, usability principles are paramount to ensure learners can focus on speaking practice without technical barriers (Kalyuga & Singh, 2016). This study also addresses these needs by developing design principles for interactive chatbots, informed by usability frameworks, to support interactive, accessible L2 speaking tasks.

The evolution of technology-based L2 education has shifted from basic digital drills to sophisticated, AI-driven platforms that support communicative and personalized learning (Bui et al., 2025; Ziegler & González-Lloret, 2022). Recent CALL research emphasizes adaptive systems that adjust content to learners’ proficiency levels, enhancing engagement and efficacy (Godwin-Jones, 2019). Plonsky and Ziegler (2024) highlight that CALL's strength lies in its ability to provide interactive, feedback-rich environments, enabling learners to practice language skills in context. For instance, AI-driven tools can simulate conversational scenarios, offering opportunities for speaking and listening practice that align with SLA goals (Levy & Stockwell, 2013). These advancements position CALL as a robust framework for integrating chatbots, which can deliver real-time, adaptive interactions to support L2 development (Stockwell, 2013).

Chatbots have gained prominence in SLA for their ability to simulate human-like dialogue, offering scalable speaking practice (Bibauw et al., 2022). Voice-based chatbots, in particular, enable learners to develop oral proficiency by engaging in conversational exchanges and receiving pronunciation feedback (Jeon, 2023). For example, Oh and Song (2021) developed and tested a voice-based mobile application for Korean EFL learners, reporting improved speaking confidence and engagement in a university setting, which is similar to the context of the present EnMIA pilot. Fryer et al. (2019) found that chatbots increase engagement through immediate, personalized responses, supporting learners’ communicative needs. Bibauw et al.'s (2022) meta-analysis reported moderate to large effect sizes for chatbots on vocabulary and speaking skills, though motivational impacts vary. Shehadeh (2022) notes that chatbot efficacy depends on task design, with interactive, authentic tasks (e.g., role-plays) yielding stronger outcomes than repetitive exercises. Voice-based chatbots can address the challenge of providing oral practice in settings with limited access to instructors or native speakers (Blake, 2013), but their success hinges on aligning tasks with learners’ proficiency and goals.

Research Gap

Despite the promise of chatbots, several research gaps persist. First, while Bibauw et al. (2022) and Jeon (2023) demonstrate chatbots’ efficacy for speaking skills, studies on voice-based chatbots in formal classroom settings are limited. Second, the role of task type (e.g., interactive vs. static speaking) in driving motivation and outcomes remains underexplored (Stockwell, 2013). Third, usability challenges, such as learning curves and technical errors, are frequently reported but rarely addressed through iterative design (Fryer et al., 2019). Fourth, the interplay of accessibility, motivation, and usability in chatbot-assisted learning requires further investigation, particularly in diverse learner populations (Kukulska-Hulme, 2018).

This study addresses these gaps by examining the suggested app, EnMIA's (Enhanced Mobile Interactive Application) impact on L2 learning, focusing on its support for motivation, design principles, usability, and task-specific effects in L2 classes. It draws on recent SLA, CALL/MALL, and usability research to hypothesize that EnMIA enhances engagement, motivation, and proficiency through interactive, accessible, and user-friendly features.

In summary, recent literature underscores the potential of voice-based chatbots to enhance L2 learning by providing motivating, accessible, and interactive practice opportunities. Contemporary SLA motivation theories highlight the importance of autonomy and engagement, which CALL and MALL platforms can operationalize (Boo et al., 2015; Ziegler & González-Lloret, 2022). Chatbots, particularly voice-based ones, offer scalable solutions for speaking practice, aligning with these principles (Bibauw et al., 2022; Jeon, 2023; Wang et al., 2025). However, their success depends on effective task design, usability, and accessibility (Kalyuga & Singh, 2016; Kukulska-Hulme, 2018). This study, as a pilot, builds on these insights to investigate EnMIA's role in L2 classes, contributing to the CALL and MALL literature by addressing gaps in task-specific effects and usability considerations.

This Study

This study investigates the efficacy of EnMIA, a voice-based chatbot, designed to enhance L2 speaking proficiency, motivation, and engagement in Korean language courses in the U.S. EnMIA leverages TBLT principles, mobile technology, and usability-focused design to support L2 speaking practice, particularly fluency. These design principles are proposed based on literature and iterative development but require empirical validation through comparative studies. By simulating real-world conversations and offering immediate feedback, EnMIA aims to foster seamless learning across formal and informal settings, reducing barriers such as speaking anxiety and logistical constraints. Prior to the study, we describe the technology design principles and task design of EnMIA (see Figures 1 and 2). This pilot is limited to Korean language courses at one U.S. institution.

Figure 1.
Screenshots from EnMIA: instructors’ activity management view (L) and assessment view (R).

Figure 2.
Screenshots from EnMIA: students’ activity view.

Language Learning Tool Development—A Voice-Based Chatbot

The development of language learning tools like voice-based chatbots requires a robust theoretical foundation to guide effective instructional design. Theory-driven design provides structured heuristics that enhance tool functionality and learner engagement (Kalyuga & Singh, 2016). This study adopts a motivation-based framework, drawing on contemporary L2 motivation research to address engagement challenges language courses (Boo et al., 2015). Additionally, usability principles are integrated to optimize the learner experience, ensuring the chatbot supports accessible, interactive speaking practice.

Tool Design. In this study, the primary design principle for the chatbot is to facilitate goal-directed, contextually relevant language learning activities. This principle is supported by five sub-principles: (1) fostering interactive tasks that encourage conversation and collaboration, (2) enabling real-world language use, (3) supporting learner autonomy, (4) providing immediate feedback, and (5) ensuring cross-context accessibility. Boo et al. (2015) highlight that limited opportunities for authentic practice and non-interactive tasks reduce learner motivation, issues EnMIA addresses through voice-based, mobile-accessible activities. MALL technologies enable learners to engage in diverse, real-world contexts, enhancing integrative motivation and skill development (F. Li & Li, 2023).

Usability is critical to technology-supported learning environments, ensuring intuitive interfaces that minimize technical barriers (Kukulska-Hulme, 2018). Complex or unresponsive designs can disengage learners, particularly in mobile settings where flexibility is paramount (Kalyuga & Singh, 2016). Our tool's voice-based interface supports synchronous and asynchronous practice, accommodating varied schedules and reducing logistical constraints. By aligning with motivation theories and MALL principles, our design promotes active participation and seamless learning across classroom and informal settings (F. Li & Li, 2023). This study investigates these principles, aiming to establish a framework for voice-based tools that enhance L2 speaking proficiency, particularly for learners with limited access to model speakers or practice opportunities.

Task Design. Motivation is a pivotal factor in SLA, influencing learners’ persistence and engagement. Boo et al. (2015) provide a comprehensive review of L2 motivation, emphasizing the role of dynamic, context-sensitive motivational strategies in sustaining learner effort. Their framework suggests that technology-mediated tasks, which offer autonomy and relevance, enhance intrinsic motivation, aligning with the needs of diverse L2 learners. Ryan and Deci's (2016) self-determination theory further posits that tasks supporting autonomy, competence, and relatedness foster intrinsic motivation. For example, technology that allows learners to choose tasks or receive tailored feedback can enhance their sense of competence (Fryer et al., 2019). These theories underscore the potential of chatbots to deliver engaging, motivating tasks, such as interactive dialogues, that sustain learner investment in L2 practice (Richards, 2006). However, poorly designed tasks, such as those lacking clear goals, can diminish motivation, necessitating careful design considerations (Dörnyei, 2007).

Research Questions

This study aims to contribute to language teaching and learning literature by providing empirical insights into technology-mediated speaking practice and informing the development of accessible language learning tools. The goal of this research is to evaluate EnMIA's impact on language learners’ speaking outcomes and to establish design principles for voice-based language learning tools. Conducted in a university setting, the study examines how EnMIA's interactive tasks influence learners’ communicative abilities and engagement, particularly for those with limited access to model speakers. The research is guided by the following questions:
RQ 1. To what extent does EnMIA use correlate with changes in second language learners’ speaking fluency, perceived motivation, and engagement?

RQ 2. What design principles are critical for developing effective voice-based language learning tools?

RQ 3. How do learners perceive EnMIA's usability and its role in supporting speaking practice across diverse contexts?

Method

This study evaluated the EnMIA voice-based chatbot's impact on second language (L2) speaking proficiency, motivation, engagement, and usability in two undergraduate Korean language courses. The study addressed an instructor's concerns about students’ low participation and limited opportunities for oral practice in non-native environments, aiming to enhance speaking skills through interactive, technology-mediated tasks.

Participants

Participants included one primary instructor, two associate instructors, and 38 undergraduate students enrolled in four sections of two Korean language courses (two sections per course) at a Midwestern U.S. university. This single-group pre-post design, while practical for pilot evaluation, limits causal attribution. The courses, representing different proficiency levels, were selected based on the instructor's willingness to participate and the research team's availability, constituting a convenience sample at a single institution. All students had access to mobile devices, enabling EnMIA use. Students were informed about the study, and ethical consent was obtained following institutional review board approval. One student's survey response was excluded due to incomplete data, resulting in 37 valid responses for analysis.

Procedures

The study was guided by five design principles derived from L2 motivation frameworks and technology-assisted language learning principles, emphasizing interactive, contextually relevant tasks. EnMIA's development integrated rapid prototyping, aligning system design with these principles through iterative collaboration among the software development team, language instructors, and educational technology experts. The system architecture utilized cross-platform software development tools, audio processing modules, user behavior controllers, and server-side data processing engines. EnMIA was deployed as a web application for desktop/laptop access and converted to native mobile apps (iOS and Android), ensuring cross-platform accessibility.

Pilot Testing and Revision

Initial usability testing focused on suitability, learnability, and error rates, conducted early to refine the design. Walkthrough testing with an undergraduate student and feedback from instructors led to iterative refinements. EnMIA's final specifications included: (1) course registration, allowing instructors to create courses and students to enroll, (2) activity management, enabling instructors to assign speaking tasks and students to participate, and (3) review and assessment, facilitating instructor feedback on student performance.

Data Collection

Data were collected over one semester via pre- and post-speaking assessments and intervention surveys.

Learners completed a pre- and post-test speaking task. Each task consisted of a three-minute role-play conversation with prompts designed to elicit spontaneous speech. Performances were audio-recorded and rated by two trained raters on three dimensions: Fluency, Accuracy, and Complexity, each scored on a 1–5 scale. A composite score was also computed as the mean of the three subscales. Inter-rater reliability was strong (Cohen's κ = .81). Scores from both raters were averaged for analysis.

Surveys, administered online during the final class, used a 5-point Likert scale (1 = Strongly Disagree, 5 = Strongly Agree) to evaluate (1) Learning Support, (2) Tool Design Principles, (3) Usability, (4) Motivation, and (5) Perceived Learning Improvement. All participants consented to data collection.

Data Analysis

Paired-samples t-tests were used to compare pre- and post-test performance scores for the experimental group. Effect sizes were reported using Cohen's d. Survey data, comprising Likert-scale ratings (1–5) on learning support, design principles, and usability, were analyzed using descriptive statistics to summarize learner perceptions. Independent samples t-tests compared mean ratings between courses to assess EnMIA's impact on motivation and engagement, with a significance level of α = .05. Effect sizes (Cohen's d) were calculated to quantify differences, ensuring robust interpretation of EnMIA's effectiveness. In addition, correlations were calculated between the number of tasks per type (weighted by participant counts) and mean survey ratings to explore task influence.

Result

Participation

In the implementation 23 tasks that the instructor created and assigned to the students using EnMIA could be categorized into: (1) Reading: To provide opportunities to read sentences, conversations, or paragraphs, (2) Static Speaking: To provide opportunities to use the language in a situated context without interactive activities, (3) Interactive Speaking: To provide interactive opportunities to use the language in the situated context, (4) Mission: To provide opportunities to use the language in a situated context, and (5) Assessment: To assess and evaluate students’ knowledge, skills, and performance on the language.

The number of total activities were five for Course 1 and seven for Course 2. The number of total tasks were 13 for Course 1 and 19 for Course 2. Graded activities were 3 (9 tasks) for Course 1 and 4 (14 tasks) for Course 2. Non-graded activities were 2 (4 tasks) for Course 1 and 3 (5 tasks) for Course 2. Participation level was high on graded activities (100%) and intermediate on non-graded activities (i.e., 90.0% for Course 1 and 71.8% for Course 2).

The students participated in several activities described above. The instructor assessed how much the students learn the language through the tool. First, the graded tasks went well as expected: 100% participation rate. The instructor analyzed that most of the instructional goals have been accomplished except the mission type task: 23.0% participation rate. There was one Mission activity with 2 tasks (e.g., “Find a native speaker of the language and have a chat with them for 2 min about their traditional food.”).

Speaking Performance

Participants in the EnMIA condition (n = 37) completed a 3-min oral task at the beginning (pre) and end (post) of the semester. Each recording was rated by two independent raters on Fluency, Accuracy, and Complexity (1–5 scale). Inter-rater agreement was high (Cohen's κ = .81). A composite score (mean of the three) was also computed. Paired-samples t-tests examined changes from pre to post. Results are summarized in Table 1.

Table 1.
Pre- and Post-Test Scores for Speaking Proficiency (n = 37).

Measure Pre (M, SD) Post (M, SD) Mean Change t (df = 36) p Cohen's D (paired)

Fluency 2.84 (.62) 3.42 (.66) .58 4.21 <.001 .69

Accuracy 2.91 (.55) 2.98 (.57) .07 .82 .42 .13

Complexity 2.76 (.58) 2.83 (.59) .07 .89 .38 .15

Total 2.84 (.56) 3.08 (.58) .24 1.65 .11 .27

Note: Paired-samples t-tests, df = 36, Cohen's d = mean difference/SD of difference.

Students showed a statistically significant gain in fluency (p < .001, medium effect size, d = .69), but not in accuracy, complexity, or composite scores. This suggests that short-term practice with EnMIA may have primarily enhanced learners’ ability to speak more smoothly and confidently, while structural accuracy and syntactic complexity did not change significantly over a single semester.

Students’ Perceived Levels

The evaluation questions for the perceived aspects (i.e., learning support, design principles, and usability) are shown in Table 2.

Table 2.
Survey Questions.

Category # Description

Learning Support L1 EnMIA makes my course activities easier and quicker than they would be without the system

L2 I practiced more effectively for my language learning with EnMIA than I would have without it

L3 EnMIA assisted my overall learning process for this course

L4 Using EnMIA increased my motivation to participate in course activities

L5 Using EnMIA increased my motivation for this course in general

L6 Using EnMIA increased my motivation to learn the language

L7 Using EnMIA increased my enjoyment of the course

L8 Using EnMIA increased my learning performance in this course

L9 I felt connected to the course and its overall learning process at all times

L10 I intend to continue using EnMIA in the future

Design Principles D1 I like that EnMIA supports voice-based activities

D2 I like EnMIA's planned interactive tasks

D3 I like that EnMIA supports activities that can be done both inside and outside the classroom

D4 I like that EnMIA supports most computing devices (laptops, iPhones, iPads, Android)

D5 I like that EnMIA can be used as an assessment tool

Usability U1 The terms and functions on the system are easy to understand

U2 The EnMIA procedure is clear and simple to me

U3 I felt that EnMIA was an easy tool to use

U4 I did not need to learn a lot about EnMIA before I could effectively use it

U5 Once I learned to use a function of EnMIA, I can accomplish tasks using that function quickly

U6 After learning to use part of EnMIA, I easily learned to use another part

U7 I had a pleasant experience with EnMIA

U8 I was satisfied with the way EnMIA supported language learning activities

U9 Overall, I am satisfied with EnMIA

U10 I did not encounter errors when using EnMIA

U11 The interface assisted me when I made an error

Note: U4 and U10 were originally negatively worded items. For analysis purposes, their wording and scoring were reversed.

Descriptive Statistics

As shown in Table 3, mean ratings and standard deviations (SD) were calculated for each survey category and question, aggregated by course. The overall mean ratings across categories were: (1) Learning Support: Course 1: 4.28 (SD = .81) and Course 2: 4.33 (SD = .79), (2) Design Principles: Course 1: 4.47 (SD = .75) and Course 2: 4.28 (SD = .82), and (3) Usability: Course 1: 4.10 (SD = .95) and Course 2: 4.05 (SD = .99).

Table 3.
Summary of Mean Ratings by Category and Course.

Category Course 1 Mean (SD) Course 2 Mean (SD)

Learning Support 4.28 (.81) 4.33 (.79)

Design Principles 4.47 (.75) 4.28 (.82)

Usability 4.10 (.95) 4.05 (.99)

The following are some notable descriptive statistics, which can be seen in Figures 3–5.
Learning Support: Both courses rated EnMIA highly, with means above 4.0 for most questions. Course 1's highest-rated item was L2 (“I practiced more effectively for my language learning with EnMIA than I would have without it,” M = 4.67, SD = .56), reflecting enhanced practice quality. Course 2's highest was L4 (“Using EnMIA increased my motivation to participate in course activities,” M = 4.69, SD = .48), suggesting strong engagement. The lowest-rated item was L7 (“Using EnMIA increased my enjoyment of the course”) in both courses (Course 1: M = 3.71, SD = .95; Course 2: M = 3.62, SD = 1.12), indicating moderate enjoyment relative to other benefits.

Design Principles: Students valued EnMIA's design, particularly D4 (“I like that EnMIA supports most computing devices”), with Course 1 at 4.58 (SD = .58) and Course 2 at 4.69 (SD = .48). Course 1 rated D1 (“I like that EnMIA supports voice-based activities”) higher (M = 4.54, SD = .66) than Course 2 (M = 4.38, SD = .77), likely due to more Static Speaking tasks in Course 1. The lowest-rated item was D5 (“I like that EnMIA can be used as an assessment tool”) in Course 2 (M = 4.15, SD = .90).

Usability: Usability ratings were positive but more variable. U8 (“I was satisfied with the way EnMIA supported language learning activities”) scored highest (Course 1: M = 4.58, SD = .58; Course 2: M = 4.54, SD = .66). U4 (“I did not need to learn a lot about EnMIA before I could effectively use it”) was the lowest (Course 1: M = 3.38, SD = 1.24; Course 2: M = 3.54, SD = 1.13), indicating a perceived learning curve. U10 (“I did not encounter errors when using EnMIA”) also scored lower (Course 1: M = 3.58, SD = 1.44; Course 2: M = 3.92, SD = 1.32), suggesting occasional errors.

Figure 3.
Learning support ratings by question.

Figure 4.
Design principles ratings by question.

Figure 5.
Usability ratings by question.

Comparisons between Courses

Independent samples t-tests were conducted to compare ratings between courses for each question, using a Bonferroni-corrected alpha of .002 (.05/26) to account for multiple comparisons. No significant differences were found (all p > .002), indicating similar perceptions across courses. Some notable trends were found.

First, U4 (“I did not need to learn a lot about EnMIA before I could effectively use it”): Course 1 (M = 3.38, SD = 1.24) vs. Course 2 (M = 3.54, SD = 1.13), t(35) = −.39, p = .70. Course 1 students perceived a slightly steeper learning curve, possibly due to diverse task types.

Second, L2 (“I practiced more effectively for my language learning with EnMIA than I would have without it”): Course 1 (M = 4.67, SD = .56) vs. Course 2 (M = 4.46, SD = .52), t(35) = 1.12, p = .27, suggesting Course 1 students felt EnMIA enhanced practice more.

Last, U10 (“I did not encounter errors when using EnMIA”): Course 1 (M = 3.58, SD = 1.44) vs. Course 2 (M = 3.92, SD = 1.32), t(35) = −.71, p = .48, indicating Course 1 students reported more errors.

Task Types

Correlation

Pearson correlations were calculated between the number of tasks per type (weighted by participant counts) and mean survey ratings to explore task influence. Here are the key findings.
Interactive Speaking: In Course 1 (5 tasks, 23–25 participants), Interactive Speaking correlated strongly with L4 (“Using EnMIA increased my motivation to participate in course activities,” r = .62, p < .05) and U8 (“I was satisfied with the way EnMIA supported language learning activities,” r = .58, p < .05). In Course 2 (8 tasks, 13 participants), the correlation with L4 (“Using EnMIA increased my motivation to participate in course activities”) was stronger (r = .71, p < .01), suggesting a potential association between Interactive Speaking tasks and higher motivation ratings, though based on small subsamples and requiring caution.

Assessment: Course 1 (5 tasks) and Course 2 (9 tasks) showed a positive correlation with D5 (“I like that EnMIA can be used as an assessment tool,” r = .65, p < .05), indicating students valued EnMIA's assessment capabilities in courses with more assessment tasks.

Static Speaking and Reading: No significant correlations were found, likely due to fewer tasks (Course 1: 5 Static Speaking, 2 Reading; Course 2: 2 Static Speaking, 1 Reading).

Mission: In Course 2 (2 tasks, 3 participants), Mission tasks negatively correlated with U10 (“I did not encounter errors when using EnMIA,” r = −.55, p < .05), suggesting potential errors in less-structured tasks.

Usability

An unexpected finding was the low rating for U4 (“I did not need to learn a lot about EnMIA before I could effectively use it”) in both courses (Course 1: M = 3.38; Course 2: M = 3.54), indicating a moderate learning curve. This was more pronounced in Course 1, likely due to its diverse task types (i.e., Reading, Static Speaking, Interactive Speaking, Assessment) as shown in Figure 6, which may have required more effort to master EnMIA's features. This contrasts with high ratings for U3 (“I felt that EnMIA was an easy tool to use,” Course 1: M = 4.04; Course 2: M = 4.08), suggesting that while EnMIA was user-friendly overall, initial learning posed a challenge, particularly for Course 1 students.

Figure 6.
Task type distribution by course.

Motivation

Course 2 students reported slightly higher motivation (L4, “Using EnMIA increased my motivation to participate in course activities,” M = 4.69) and connection to the course (L9, “I felt connected to the course and its overall learning process at all times,” M = 4.38), likely due to the higher proportion of Interactive Speaking tasks (8 vs. 5 in Course 1), which correlated strongly with engagement (r = 0.71, p < .01).

Design Principles

Design principles were rated highly, particularly device compatibility (D4, “I like that EnMIA supports most computing devices”), though Course 1 students valued voice-based activities (D1, “I like that EnMIA supports voice-based activities”) more, possibly due to more Static Speaking tasks. In addition, the correlation between Assessment tasks and D5 (“I like that EnMIA can be used as an assessment tool,” r = .65, p < .05) underscores EnMIA's value as an assessment tool. The task distribution (see Figure 6) highlights Course 2's emphasis on Interactive Speaking and Assessment.

Discussion

The present study highlights one way that a voice-based activity tool could be employed to support second language (L2) teaching and learning. L2 course takers have a variety of commitments and responsibilities such as acquiring and practicing language content and skills, developing fluency, and, most importantly, passing their courses. To understand the situation and reduce the burden of learners, L2 studies have been done to theoretically support language learning. To encourage learners’ motivation in their language learning, a voice-based language learning activity tool was developed and implemented in this study. The aim of the tool is to encourage learners’ active participation in language learning activities outside the classroom as well as in class. The present study focused on how a voice-based language learning activity tool can be harnessed to provide theory-based support to L2 learners. The other purpose of this study was to produce new knowledge on both design methods and the theory. The findings from this study provide evidence that shows promise as a supplementary tool for L2 teaching and learning, as demonstrated by mean ratings across learning support, design principles, and usability. These results align with prior research on conversational agents in language education (e.g., Bibauw et al., 2022; Jeon, 2023). The results of this study have the following implications.

First, the performance findings refine the perception data. While learners reported feeling motivated and supported, measurable gains were observed only in fluency. This pattern suggests that EnMIA's voice-based interactive tasks potentially promote spontaneous production and conversational ease, consistent with interactionist SLA theories emphasizing practice in real-time speech (Long, 1983). However, accuracy and complexity, which are often linked to longer-term instruction and feedback cycles, may require extended use or more explicit form-focused tasks to improve. Thus, EnMIA's immediate value may lie in boosting fluency and lowering barriers to oral participation, with accuracy and complexity gains potentially emerging in longitudinal applications. Still, these fluency-focused findings also highlight debates in educational technology: pre-post designs often overestimate intervention effects due to confounding factors (e.g., semester-long instruction). Recent studies on voice-AI chatbots emphasize similar fluency gains but stress the need for controls to isolate tool impacts (e.g., Lyu et al., 2025).

Second, EnMIA's high ratings for items of learning effectiveness and motivation reflect its possible effectiveness in fostering skill development and engagement, aligning with second language acquisition (SLA) principles articulated by Krashen's (1982) input hypothesis and Long's (1981, 1983) interaction hypothesis. These theories emphasize comprehensible input and meaningful output, which EnMIA facilitates through its voice-based, interactive tasks. This study extends these frameworks by showing how technology-mediated interaction can replicate communicative dynamics traditionally found in face-to-face settings.

Third, the significant correlation between interactive speaking tasks and perceived motivation highlights the role of task design in sustaining engagement. The course with more interactive speaking tasks reported higher motivation. This aligns with Dörnyei's (2001) framework, which emphasizes authentic, interactive tasks for intrinsic motivation. Interactive speaking tasks, involving real-time dialogues, support Van Patten's (2017) focus on meaningful communication for proficiency. The higher motivation in Course 2 may stem from graded tasks, introducing performance incentives per Ryan and Deci's (2016) self-determination theory. In addition, the correlation with satisfaction with EnMIA's support suggests interactive speaking tasks boost both motivation and perceived efficacy, as noted by Stockwell (2013). Still, correlations with task types are exploratory and limited by uneven participation; self-reports may reflect novelty effects.

Fourth, the high ratings for design principles, particularly supporting most computing devices, underscore accessibility's role, as Kukulska-Hulme (2018) notes that device compatibility reduces barriers in mobile learning environments. Besides, the higher voice-based feature rating in Course 1 reflects more static speaking tasks, aligning with Chapelle's (1998) task-specific computer-supported language learning principles. The correlation between assessment tasks and design principles supporting assessment indicates value in formative feedback (Godwin-Jones, 2019). EnMIA's dual role as practice and assessment tool could support integrated learning (Levy & Stockwell, 2013). Contributions to design principles are descriptive; future work should test alternatives empirically.

Fifth, despite overall positive usability ratings, lower scores for items related to the need for extra help and tool errors suggest a moderate learning curve and potential technical issues. This contrasts with high ratings on easy-to-use tools, aligning with Nielsen's (2012) usability heuristics on minimizing learning effort. Course 1's lower scores may reflect diverse task types, which may increase cognitive load (Sweller, 2011). The negative correlation between the tool-error item and mission tasks suggests errors from unclear instructions or glitches, common in chatbots (Fryer et al., 2019).

Last, the overall findings regarding design principles and their relationship with task type suggest that EnMIA's interactive and voice-based features might enhance motivation and learning outcomes, with task type and frequency shaping student experiences. Future improvements could focus on reducing the initial learning curve and error rates to optimize usability.

Limitations and Further Research

There are certain limitations that researchers and educators should be aware of. First, speaking assessments showed significant fluency gains but no changes in accuracy or complexity, which may reflect the short timeframe and the tool's focus on interactive rather than form-focused practice. The speaking task was brief and rubric-based, leaving open questions about longer-term or authentic communicative gains. The second limitation has to do with the use of questionnaires as a method to evaluate learners’ motivation and engagement. Questionnaires have been accused of being static methods that can only detect perceived levels instead of more transient respondent characteristics as self-reports may differ from reality. Brief assessment task and coarse scale may lack sensitivity for accuracy/complexity. Third, this study was conducted in specific second language learning courses at a university. Findings are thus specific to this university Korean language context and not intended for broad generalization without replication. The course settings vary across several different contextual aspects, such as institution size, geographic location, K-12 versus higher education, and so forth. Although there is little reason to think the participants are highly exceptional, it is impossible to know just how and to what extent we might generalize to other second language learning courses due to the small sample size, which limits generalizability. Fourth, since this was the design and development process, strict comparison could not be conducted. The absence of a control group is a major limitation, preventing attribution of fluency gains solely to EnMIA (possible confounds, such as regular coursework, maturation). Last, the limitations include the small sample size, limited generalizability, lack of significant course differences due to sample size and similar task designs, and a short-term focus.

Future studies can be designed to address the issues and overcome the limitations described above. The results of the present study also suggest several areas for future research. Since this study did not show the direct effectiveness of the tool or real enhancement of students’ performance, further research is needed. Future research efforts can employ experimental designs (e.g., EnMIA group with a control group) to gather information about the effectiveness of the tool in terms of attitude, integrativeness, motivation, and learning performance (i.e., through L2 performance comparison). The total complex of integrativeness, attitudes toward the learning situation, and motivation can be seen as integrative motivation (Masgoret & Gardner, 2003), which is referred to as “a complex of attitudinal, goal-directed, and motivational attributes” (Gardner, 2001, p. 9). More empirical studies are needed to extend the findings of this research to other settings, courses, and subjects. Further, future research could use qualitative methods to explore interactive speaking tasks (Dörnyei, 2007), longitudinal studies for proficiency gains (Jeon, 2023), extensive usability testing (Nielsen, 2012), comparative analyses (Bibauw et al., 2022), individual differences (Benson, 2011), and technology integration (Godwin-Jones, 2019).

Conclusion

This pilot and feasibility study examined the design and implementation of EnMIA, a voice-based chatbot for second language learners. Results demonstrated high student satisfaction and motivation, with interactive tasks strongly linked to engagement. Importantly, pre- and post-speaking assessments revealed preliminary fluency gains in a pilot context, potentially indicating that chatbot-supported practice can improve oral performance in the short term. However, accuracy and complexity did not change, suggesting that while EnMIA effectively lowers barriers to speaking, targeted design and extended use are required for more comprehensive proficiency development. As Gardner (2001) argued, language learning theories do not directly ensure success in second language acquisition but indirectly support learners and instructors by exploring relationships among key variables. Thus, we argue that instructional designers, educational technologists, instructors, and learners should collaborate to optimize the use of voice-based technologies for learning. In conclusion, as a feasibility study, EnMIA illustrates potential for voice-based tools in supplementing fluency practice, warranting rigorous controlled research.

Studies in Humans and Animals

The research has been approved by Institutional Review Board as exemption.

Measure	Pre (M, SD)	Post (M, SD)	Mean Change	t (df = 36)	p	Cohen's D (paired)
Fluency	2.84 (.62)	3.42 (.66)	.58	4.21	<.001	.69
Accuracy	2.91 (.55)	2.98 (.57)	.07	.82	.42	.13
Complexity	2.76 (.58)	2.83 (.59)	.07	.89	.38	.15
Total	2.84 (.56)	3.08 (.58)	.24	1.65	.11	.27

Category	#	Description
Learning Support	L1	EnMIA makes my course activities easier and quicker than they would be without the system
	L2	I practiced more effectively for my language learning with EnMIA than I would have without it
	L3	EnMIA assisted my overall learning process for this course
	L4	Using EnMIA increased my motivation to participate in course activities
	L5	Using EnMIA increased my motivation for this course in general
	L6	Using EnMIA increased my motivation to learn the language
	L7	Using EnMIA increased my enjoyment of the course
	L8	Using EnMIA increased my learning performance in this course
	L9	I felt connected to the course and its overall learning process at all times
	L10	I intend to continue using EnMIA in the future
Design Principles	D1	I like that EnMIA supports voice-based activities
	D2	I like EnMIA's planned interactive tasks
	D3	I like that EnMIA supports activities that can be done both inside and outside the classroom
	D4	I like that EnMIA supports most computing devices (laptops, iPhones, iPads, Android)
	D5	I like that EnMIA can be used as an assessment tool
Usability	U1	The terms and functions on the system are easy to understand
	U2	The EnMIA procedure is clear and simple to me
	U3	I felt that EnMIA was an easy tool to use
	*U4	I did not need to learn a lot about EnMIA before I could effectively use it
	U5	Once I learned to use a function of EnMIA, I can accomplish tasks using that function quickly
	U6	After learning to use part of EnMIA, I easily learned to use another part
	U7	I had a pleasant experience with EnMIA
	U8	I was satisfied with the way EnMIA supported language learning activities
	U9	Overall, I am satisfied with EnMIA
	*U10	I did not encounter errors when using EnMIA
	U11	The interface assisted me when I made an error

Category	Course 1 Mean (SD)	Course 2 Mean (SD)
Learning Support	4.28 (.81)	4.33 (.79)
Design Principles	4.47 (.75)	4.28 (.82)
Usability	4.10 (.95)	4.05 (.99)

Footnotes

ORCID iDs

Eun Young Oh

Donggil Song

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability

Data sharing is not applicable to this article.

Author Biographies

Eun Young Oh is a Lecturer in Transnational Asian Studies in the School of Humanities at Rice University, USA. She earned her doctorate in Educational Technology from Seoul National University, South Korea. Her research focuses on computer-supported language learning, leveraging virtual reality and speech recognition to enhance second-language acquisition.

Jaesu Choi is a Lecturer of Transnational Asian Studies in the School of Humanities at Rice University, USA. She received his PhD in Second Language Studies from Indiana University, Bloomington, USA. His research interests include instructional design, with a focus on L2 learning mechanisms and task-based pedagogy.

Donggil Song is an Associate Professor of Engineering Technology and Industrial Distribution, College of Engineering, Texas A&M University, USA. His research lab, Einbrain Lab (www.einbrain.com), focuses on human-AI collaboration, developing AI-driven Extended Reality (XR) systems to enhance human performance and learning.

References

Benson

(2011). Teaching and researching autonomy in language learning (2nd ed.). Routledge.

Bibauw

Noortgate

V. d.

Desmet

(2022). Dialogue systems for language learning: A meta-analysis. Language Learning & Technology, 26(1), 1–24. https://doi.org/10.64152/10125/73488

Blake

R. J.

(2013). Brave new digital classroom: Technology and foreign language learning (2nd ed.). Georgetown University Press.

Blake

R. J.

(2016). Technology and the four skills. Language Learning & Technology, 20(2), 129–142. https://doi.org/10.64152/10125/44465

Boo

Dörnyei

Ryan

(2015). L2 motivation research 2005–2014: Understanding a publication surge and a changing landscape. System, 55, 145–157. https://doi.org/10.1016/j.system.2015.10.006

Bui

Collier

Ozturk

Y. E.

Song

(2025). The effects of conversational agents on human learning and how we used them: A systematic review of studies conducted before generative AI. TechTrends, 69(3), 628–644. https://doi.org/10.1007/s11528-025-01066-0

Burston

(2014). MALL: The pedagogical challenges. Computer Assisted Language Learning, 27(4), 344–357. https://doi.org/10.1080/09588221.2014.914539

Burston

Athanasiou

Giannakou

(2024). Quantitative experimental L2 acquisition MALL studies: A critical evaluation of research quality. ReCALL, 36(1), 22–39. https://doi.org/10.1017/S0958344023000149

Chapelle

C. A.

(1998). Analysis of interaction sequences in computer-assisted language learning. Tesol Quarterly, 32(4), 753–757. https://doi.org/10.2307/3588009

10.

Dickinson

Eom

Kang

Lee

C. M.

Sachs

(2008). A balancing act: How can intelligent computer-generated feedback be provided in learner-to-learner interactions? Computer Assisted Language Learning, 21(5), 369–382. https://doi.org/10.1080/09588220802343702

11.

Dörnyei

(2001). Teaching and researching: motivation. Longman.

12.

Dörnyei

(2007). Research methods in applied linguistics: Quantitative, qualitative, and mixed methodologies. Oxford University Press.

13.

Ellis

Skehan

Shintani

Lambert

(2020). Task-based language teaching: Theory and practice. Cambridge University Press. https://doi.org/10.1017/9781108643689

14.

Fennelly-Atkinson

LaPrairie

K. N.

Song

(2023). Identifying accessibility factors affecting learner inclusion in online university programs. Distance Education, 43(4), 556–573. https://doi.org/10.1080/01587919.2022.2141607

15.

Fryer

L. K.

Nakao

Thompson

(2019). Chatbot learning partners: Connecting learning experiences, interest and competence. Computers in Human Behavior, 93, 279–289. https://doi.org/10.1016/j.chb.2018.12.023

16.

Gardner

R. C.

(2001). Language learning motivation: The student, the teacher, and the researcher. Texas Papers in Foreign Language Education, 6(1), 1–18. https://eric.ed.gov/?id=ED464495

17.

Godwin-Jones

(2019). In a world of smart technology, why learn another language?Educational Technology & Society, 22(2), 1–14. https://www.jstor.org/stable/26819613

18.

González-Lloret

(2022). Technology-mediated tasks in the young learner classroom. In Ziegler

González-Lloret

(Eds.), The routledge handbook of second language acquisition and technology (pp. 171–185). Routledge. https://doi.org/10.4324/9781351117586-13

19.

Gregersen

MacIntyre

P. D.

Meza

(2014). The motion of emotion: Idiodynamic case studies of learners’ foreign language anxiety. The Modern Language Journal, 98(2), 574–588. https://doi.org/10.1111/modl.12084

20.

Jackson

(2020). Introducing language and intercultural communication (2nd ed.). Routledge. https://doi.org/10.4324/9781351059275

21.

Jeon

(2023). Exploring conversational AI affordances in the EFL classroom: A focus on L2 speaking practice. Computer Assisted Language Learning, 36(4), 1–26. https://doi.org/10.1080/09588221.2023.2190266

22.

Kalyuga

Singh

A.-M.

(2016). Rethinking the boundaries of cognitive load theory in Complex learning. Educational Psychology Review, 28(4), 831–852. https://doi.org/10.1007/s10648-015-9352-0

23.

Kessler

Bikowski

(2010). Developing collaborative autonomous learning abilities in computer mediated language learning: Attention to meaning-making among students in wiki space. Computer Assisted Language Learning, 23(1), 41–58. https://doi.org/10.1080/09588220903467335

24.

Krashen

(1982). Principles and practice in second language acquisition. Pergamon Press.

25.

Kukulska-Hulme

(2018). Mobile-assisted language learning. In Chapelle

C. A.

(Ed.), The encyclopedia of applied linguistics (pp. 1–9). Wiley-Blackwell.

26.

Levy

Stockwell

(2013). CALL dimensions: Options and issues in computer-assisted language learning. Routledge.

27.

(2023). The Routledge handbook of second language acquisition and technology. Frontiers in Psychology, 14, 1224826. https://doi.org/10.3389/fpsyg.2023.1224826

28.

(2024). Effects of mobile-assisted language learning on foreign language learners’ speaking skill development. Language Learning & Technology, 28(1), 1–26. https://eric.ed.gov/?id=EJ1422607

29.

Littlemore

Low

(2006). Metaphoric competence, second language learning, and communicative language ability. Applied Linguistics, 27(2), 268–294. https://doi.org/10.1093/applin/aml004

30.

Loewen

Sato

(2018). Interaction and instructed second language acquisition. Language Teaching, 51(3), 285–329. https://doi.org/10.1017/S0261444818000125

31.

Long

M. H.

(1981). Input, interaction, and second-language acquisition. Annals of the New York Academy of Sciences, 379(1), 259–278. https://doi.org/10.1111/j.1749-6632.1981.tb42014.x

32.

Long

M. H.

(1983). Native speaker/non-native speaker conversation and the negotiation of comprehensible input. Applied Linguistics, 4(2), 126–141. https://doi.org/10.1093/applin/4.2.126

33.

Lyu

Lai

Guo

(2025). Effectiveness of chatbots in improving language learning: A meta-analysis of comparative studies. International Journal of Applied Linguistics, 35(2), 834–851. https://doi.org/10.1111/ijal.12668

34.

Masgoret

A. M.

Gardner

R. C.

(2003). Attitudes, motivation, and second language learning: A meta-analysis of studies conducted by Gardner and associates. Language Learning, 53(1), 123–163. https://doi.org/10.1111/1467-9922.00212

35.

Nicolaidou

Pissas

Boglou

(2023). Comparing immersive virtual reality to mobile applications in foreign language learning in higher education: A quasi-experiment. Interactive Learning Environments, 31(6), 3791–3805. https://doi.org/10.1080/10494820.2021.1928237

36.

Nielsen

(2012). Usability 101: Introduction to usability. Nielsen Norman Group. https://www.nngroup.com/articles/usability-101-introduction-to-usability/

37.

Song

(2021). Developmental research on an interactive application for language speaking practice using speech recognition technology. Educational Technology Research and Development, 69(2), 861–884. https://doi.org/10.1007/s11423-020-09910-1

38.

Plonsky

Ziegler

(2024). Advancing quantitative methods in second language research. Studies in Second Language Acquisition, 46(2), 301–325. https://doi.org/10.1017/S0272263123000547

39.

Richards

J. C.

(2006). Communicative language teaching today. Cambridge University Press.

40.

Robinson

(2011). Task-based language learning: A review of issues. Language Learning, 61(s1), 1–36. https://doi.org/10.1111/j.1467-9922.2011.00641.x

41.

Ryan

R. M.

Deci

E. L.

(2016). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55(1), 68–78. https://doi.org/10.1037/0003-066X.55.1.68

42.

Schroeder

K. T.

Hubertz

Van Campenhout

Johnson

B. G.

(2022). Teaching and learning with AI-generated courseware: Lessons from the classroom. Online Learning, 26(3), 73–86. https://doi.org/10.24059/olj.v26i3.3370

43.

Schulz

R. A.

(2006). Reevaluating communicative competence as a major goal in postsecondary language requirement courses. Modern Language Journal, 90(2), 252–255. https://doi.org/10.1111/j.1540-4781.2006.00395.x

44.

Shehadeh

(2022). Task-based language teaching and learning: Theories and applications. In Ziegler

González-Lloret

(Eds.), The routledge handbook of second language acquisition and technology (pp. 156–170). Routledge. https://doi.org/10.4324/9781351117586-12

45.

Song

Glazewski

(2017). Student-generated questioning activity in second language courses using a customized personal response system: A case study. Educational Technology Research and Development, 65(6), 1425–1449. https://doi.org/10.1007/s11423-017-9520-7

46.

Stockwell

(2013). Technology and motivation in English-language teaching and learning. In Ushioda

(Ed.), International perspectives on motivation (pp. 156–175). Palgrave Macmillan.

47.

Sweller

(2011). Cognitive load theory. In Mestre

J. P.

Ross

B. H.

(Eds.), The psychology of learning and motivation (Vol. 55, pp. 37–76). Academic Press. https://doi.org/10.1016/B978-0-12-387691-1.00002-8

48.

Vanpatten

(2017). Processing instruction. In Loewen

Sato

(Eds.), The Routledge handbook of second language acquisition (pp. 166–180). Routledge. https://doi.org/10.4324/9781315676968-10

49.

Wang

Cheung

A. C.

Neitzel

A. J.

Chai

C. S.

(2025). Does chatting with chatbots improve language learning performance? A meta-analysis of chatbot-assisted language learning. Review of Educational Research, 95(4), 623–660. https://doi.org/10.3102/00346543241255621

50.

Wong

L. H.

Looi

C. K.

(2011). What seams do we remove in mobile-assisted seamless learning? A critical review of the literature. Computers & Education, 57(4), 2364–2381. https://doi.org/10.1016/j.compedu.2011.06.007

51.

Zhai

Wibowo

(2023). A systematic review on artificial intelligence dialogue systems for enhancing English as foreign language students’ interactional competence in the university. Computers and Education: Artificial Intelligence, 4, 100134. https://doi.org/10.1080/10494820.2023.2192796

52.

Zhao

Lai

(2023). Technology and second language learning: Promises and problems. In Technology-mediated learning environments for young English learners (pp. 167–206). Routledge.

53.

Ziegler

González-Lloret

(2022). The Routledge handbook of second language acquisition and technology. Routledge. https://doi.org/10.4324/9781351117586

Designing and Evaluating a Voice-Based Chatbot System for Second Language Speaking Fluency and Learner Engagement

Abstract

Keywords

Literature Review

Task-Based Language Teaching

Interaction in Second Language Learning

Technology-Enabled Language Learning

Research Gap

This Study

Language Learning Tool Development—A Voice-Based Chatbot

Research Questions

Method

Participants

Procedures

Pilot Testing and Revision

Data Collection

Data Analysis

Result

Participation

Speaking Performance

Students’ Perceived Levels

Descriptive Statistics

Comparisons between Courses

Task Types

Correlation

Usability

Motivation

Design Principles

Discussion

Limitations and Further Research

Conclusion

Studies in Humans and Animals

Footnotes

ORCID iDs

Funding

Declaration of Conflicting Interests

Data Availability

Author Biographies

References