Abstract
Purpose
This study investigates the impact of large language models (LLMs), for example, ChatGPT, in inclusive education.
Design/Approach/Methods
We develop a ChatGPT-assisted writing system, namely CHATTING, to support students with dyslexia in learning to write. Then, we investigate the impact of CHATTING on students’ writing performance (vocabulary size) and their engagement in learning. The study involved 101 students who were divided into an experimental group (learn with CHATTING) and a control group (learn in traditional classes). The effectiveness of CHATTING in assisting students’ learning is evaluated by measuring students’ learning engagement with open-ended interviews.
Findings
The results indicate that, with the assistance of CHATTING, students with dyslexia demonstrate positive improvement in behavioral, emotional engagement, and intrinsic motivation, but a minor improvement in cognitive engagement. The study also discusses several opportunities, challenges, and insights for utilizing ChatGPT-assisted learning to enhance students’ learning motivation and provide personalized support for students with dyslexia.
Originality/Value
This work contributes to inclusive education by developing and evaluating the effectiveness of CHATTING, a ChatGPT-assisted writing system specifically designed for students with dyslexia. The limitations of using generative models in supporting students’ learning in writing are also highlighted.
Keywords
Introduction
Many students, especially those with dyslexia, struggle to write compositions (Carter & Sellman, 2013). As a result, they experience despondency and inadequacy due to deficient writing skills and difficulty constructing written works (Moojen et al., 2020). Teachers have endeavored to help students by employing diverse pedagogical techniques and technology, including pre-writing, lexical expansion, mind-mapping, and computer-aided tools (Xie, 2021). However, a viable solution to foster the writing skills and learning engagement of students with dyslexia supported by the self-determination theory (SDT) is not yet available (Wehmeyer & Shogren, 2020).
Traditional modes of education
Traditional modes of education could not meet the diverse needs of students (Benmarrakchi et al., 2017; Fung, Lee, Hui et al., 2023), especially those with special learning disabilities (SpLD), such as dyslexia (Fung, Sin et al., 2022; Fung, Tang et al., 2024) and autism (Fung, Fung, Lui, Pang et al., 2024). For example, in most schools in Hong Kong, a single instructor teaches a class of thirty students within a 45-min timeframe, making it impractical to provide individualized guidance (Fung, Perrault et al., 2022). Without immediate and adequate support, students are less motivated to learn and even give up studying (Fung, Fung, Lui, Mow et al., 2025; Schleicher, 2018).
Learning motivation
Learning motivation is crucial as it drives students to learn actively (Fung, Fung, Lui, Sin et al., 2025; Yan & Yang, 2021). The advantage of intrinsic motivation lies in its accessibility and portability. When one's drive to succeed is rooted in one's values, ethics, aspirations, and objectives, the motivators are instantly available and not contingent upon the availability or assistance of external entities (Bandhu et al., 2024). However, many students with low intrinsic motivation are discouraged due to learning inequalities, lack of instant responses, inability to express their needs, or unwillingness to disclose their needs in front of others (Gwernan-Jones & Burden, 2010). To this end, technology can assist students’ learning and solve the negative impact of resource limitations on students’ learning.
Revolution of writing education with artificial intelligence (AI)
Technology has revolutionized education, and AI has emerged as a powerful tool to assist students in learning to write. For example, Chen et al. (2013) developed GRASP, a device that helped English language learners improve their use of verbs and prepositions. Due to the importance of timely and effective feedback on writing, Liu et al. (2016) proposed an approach to generate input automatically. Students’ feedback was positive and their learning engagement was promoted. Moreover, Wu et al. (2019) developed Additional Writing Help, a writing assistance tool tailored for users with dyslexia, to proofread their text and ensure error-free posts on Facebook. However, the current one-size-fits-all solutions cannot meet students’ diverse learning needs and there is a lack of research on effectively enhancing the learning motivation of students with dyslexia.
We developed CHATTING, an inclusive writing system that utilizes ChatGPT to address the research gap. The main objective is to arouse students’ learning engagement, especially those with dyslexia, by providing instant and personalized responses to their writing. With the developed tool, we performed a study to evaluate the effectiveness of AI models in enhancing students’ learning engagement and writing performance. As students with dyslexia face difficulties in vocabulary size, spelling, and dictation (Galuschka et al., 2020), this work focuses on vocabulary in the writing performance. This work aims to answer two research questions (RQs): (1) How do generative AI models like CHATTING influence students’ learning engagement? (2) How effectively can CHATTING improve students’ writing performance?
Related work
Educational technology, particularly conversational chatbots like ChatGPT, has shown promise in improving learning outcomes in various educational domains, such as writing. Recent studies have highlighted the potential of AI technology in enhancing vocabulary size, learning competency, and conversation skills.
Using ChatGPT in education
In recent years, educational technology has been identified as a promising strategy to improve the quality of learning (Fung & Fung, 2020; Yan et al., 2021). Conversational chatbots have been increasingly used in various educational domains, including special education, peer dialogues, digital tutors, language learning, and programming (Benotti et al., 2017; Su & Yang, 2023; Yang et al., 2024). Numerous studies have confirmed the effectiveness of chatbot-based systems, such as their positive impact on learners’ affective status and ability to improve specific aspects of writing performance (vocabulary size) (Adiguzel et al., 2023; Han et al., 2023).
ChatGPT is a new generative chatbot built based on large-scale language models released by OpenAI in November 2022 (N/A, 2022). Its ability to respond rapidly, in multiple languages, smartly (functionally), and in a human-like style (Ray, 2023) has attracted the attention of researchers and the public. Recent studies have explored the potential and strength of using ChatGPT in education, such as enhancing creativity and critical thinking (Shidiq, 2023; Yang et al., 2024), empowering learners in education (Kasneci et al., 2023), fostering student engagement through authentic writing (Adiguzel et al., 2023) and gradually building competency until the student achieves an acceptable level of mastery (Kim et al., 2022). This study developed a ChatGPT-enabled writing system to assist secondary school students in learning to write Chinese and English compositions.
Language learning tools
Table 1 shows the popular language learning tools. The Grammar and Syntax-Based Pattern Finder, GRASP (Chen et al., 2013), is a tool that analyzes learners’ test scores and compares their progress in four lexical categories. It improved learners’ proficiency and encouraged integrating varied lexical forms, such as verbs and prepositions. Duolingo (Duolingo, 2023; Naismith et al., 2023) helps learners practice and improve their real-world conversation skills with characters (i.e., virtual tutors). Quizlet's AI-powered Learning Assistant (Quizlet, 2020) is designed to make the study more streamlined and assist learners in promoting better memory recall. However, these tools only focus on one learning aspect, that is, vocabulary, conversation skills, or memory.
The comparisons of different learning tools for writing.
Note. “v” denotes the elements included in the tools, while “x” denotes those that were excluded. The acronyms “Gram,” “Enga,” “EFL,” “EN,” “CN,” “Grad,” “Any,” “SpLD,” “AW,” “SS,” “MS,” “Psy,” “MMD,” “ID,” and “SF” denote “Grammar,” “Engagement,” “English as a Foreign Language students,” “English,” “Chinese,” “Graduate,” “Anyone,” “Students with SpLD,” “Academic writing,” “Secondary students,” “Multi subjects,” “Psychology,” “Multimedia design,” “Inclusive design,” and “Student feedback.”
Other applications address this issue by facilitating students’ writing tasks: GoatChat (GoatChat, n.d.) can compose essays, write poetry, and respond to questions. Poe (POE, n.d.) consists of various algorithms, such as Claude2-100k GPT4 and ChatGPT. Unfortunately, none of the existing systems supported individualized learning specifically for students with dyslexia. Recognizing this gap, we developed CHATTING to assist students with dyslexia in their learning-to-write process.
System design
CHATTING was developed using Android Studio and OpenAI algorithm, which supports Traditional Chinese, Cantonese, and English writing. The design included the front-end and backend, which are discussed below.
Front-end design
As shown in Figure 1, students could ask any questions regarding their writing topics. CHATTING can provide students with instant responses based on the depth of the questions. For example, a student asked the system to teach him how to construct a writing framework with the topic of “Travel Blog” and “Please give me the framework” (Figure 1, #8). Then, the system replied, “Intro (hook) Begin with a captivating introduction” (Figure 1, #9). Furthermore, CHATTING provides different functions to support an inclusive learning environment. The UI incorporates several components designed to enhance the user experience.
A speaker button delivers the question to the user audibly. A display box is designated for presenting the textual content of the question. A timer visually indicates the time remaining for essay completion. An exit button permits students to depart from the application conveniently. A settings button provides students with the ability to adjust various parameters, including (a) language (Traditional Chinese, Cantonese, or English), (b) speech rate (spanning from −2 to +2), (c) tutorial (an instructional video), (d) voice type (male or female), and (e) volume intensity (ranging from 0 to 1). A hint button is designed to provide guidance and cues for the students. A recorder enables the Speech-to-Text (STT) functionality, allowing students to pose questions through spoken language. The text is readable using Text-to-Speech (TTS) functionality, allowing students to listen to the AI-generated content. A screen allows students to view their questions and responses from CHATTING.

The user interface (UI) of the ChatGPT-assisted writing system, CHATTING.
The system also implemented a variety of inclusive design elements. First, students could adjust the delivery speed of the STT and TTS functions to suit their needs. Second, when speech functions were deemed unnecessary, students could mute them thrice by pressing the hint button (Figure 1, #6). Third, students could choose either male or female voice channels to read the content. Fourth, several input methods were implemented to cater to students’ needs, such as the speed input method, pinyin input method, handwriting recognition, and voice input method.
Backend design
ChatGPT can provide instant responses to an extensive range of questions. However, ChatGPT needs much information to understand the questions thoroughly. In other words, when talking with ChatGPT, asking specific questions is critical. Unfortunately, students with dyslexia face significant challenges in asking appropriate questions. They constantly struggle to process large amounts of information. As a result, if ChatGPT provides too much information at once, students cannot comprehend it, leading to ineffective learning outcomes.
To reduce the randomness of ChatGPT's responses and provide more appropriate answers to students, CHATTING provides guidance on ChatGPT's answers, which helps students with dyslexia better understand the writing process and ultimately improve their writing skills. In the following, we explain how this is achieved.
Figure 2 shows the architecture of CHATTING. First, according to students’ requirements, a role for ChatGPT is set up to generate proper responses based on input. The default role of ChatGPT is an assistant with a default prompt: “You are a helpful assistant that answers questions as simply as possible.” CHATTING can also set the role of ChatGPT as an English teacher with the specific prompt: “You are an English teacher, focusing on grammar, writing logic, system, intonation, and language use. You will focus on the quality of your writing, such as structure, ideas, and correctness. You need to guide students’ writing content, layout, organization, vocabulary, and writing style.”

The architecture of CHATTING.
Second, we pre-set typical prompts in the database. When students ask questions, the system would map students’ questions with proper prompts via STT conversation. CHATTING would then send the selected prompts to ChatGPT. With tailored prompts, CHATGPT provides detailed writing guidance to the learners. For example, it breaks down the essay into different elements, that is, from audience identification to conclusion. This approach enables students to receive suitable guidance to develop their writing skills. If CHATTING cannot map students’ questions with prompts in the database, it would send the questions to ChatGPT with the default prompts, for example, “The students are writing an article about a day trip to Hong Kong. Please guide them.”
Using ChatGPT with customized prompts to teach writing can be considered a form of scaffolded instruction, which partially follows the traditional scaffold method. The traditional method offers structured support and gradually reduces the support as students become knowledgeable. Similarly, CHATTING can provide support and guidance, especially in the beginning stages, using prompts to stimulate ideas, provide starting points, suggest structures, and offer instant replies. However, it is essential to note that the traditional scaffold method provides more explicit and structured support. At the same time, CHATTING offers more open-ended and instant AI-generated responses.
Evaluation study
This section covers participants, types of questionnaires and interviews undertaken with students, and the interviews conducted with teachers.
Questionnaires based on the self-determination theory
The SDT proposed by Deci and Ryan (2012) posits that learning motivation is comprised of four distinct elements: (a) behavioral, (b) emotional, (c) cognitive engagement, and (d) intrinsic motivation.
The first session of the questionnaire was designed to evaluate students’ engagement. Pre- and post-questionnaires were conducted with both groups of students before and after the learning session. The questionnaire was developed based on SDT and contained seven variables in two categories: need satisfaction and engagement. Each variable was evaluated using the 5-point Likert scale (with positive statements on the right), based on items validated in a previous study conducted in the same region (Fung, Lee, Sin et al., 2024). An experienced teacher reviewed the questionnaire items to ensure they were written in understandable language. The questionnaire is shown in Table 2 with four dimensions of questions as follows:
Behavioral engagement: observable on-task actions like participation and effort. Emotional engagement: an individual's positive or negative feelings toward a task. Cognitive engagement: the mental effort invested in understanding a task. Intrinsic motivation: the internal drive to engage in an activity for personal satisfaction.
Questions of SDT for pre-/post-questionnaire.
Questionnaire regarding CHATTING UIs and functions
The second session of the questionnaire evaluated the effectiveness, UIs, and functions of CHATTING, which was adopted from Yan (2023). Each variable was evaluated using the 5-point Likert scale, with the positive statements on the left and the negative statements on the right. The questions are shown in Supplementary Information: Questionnaire regarding CHATTING UIs and Functions, Figure 14.
Chinese- and English-writing topics
The Chinese and English teachers determined the writing topics and questions for Chinese and English compositions. The level of difficulties depended on the student's grade level, that is, Secondary 1–3. Each grade level includes five different essay topics, covering narrative, descriptive, lyrical, argumentative, and open-ended essay prompts. The types of essay prompts for pre-/post-test were the same, but the specific topics differed. Students could choose freely.
Rubrics of writing Chinese and English compositions
The school teacher provided us with the school-based Chinese and English writing scoring guides (Supplementary Information: Figures 15 and 16). The teachers graded the writings following the rubrics in the guide. The items in the rubrics consisted of content (40 marks), language (30 marks), organization (20 marks), and handwriting (10 marks) in Traditional Chinese writing. As CHATTING’s instruction does not affect the neatness of handwriting, our grading did not take handwriting into account. The items include content (eight points), language (eight points), organization (four points), and features (four points) in English writing.
Plagiarism detection
Copyleaks (Copyleaks, n.d.) was used to detect plagiarized content, which includes the features of identical (1-to-1 exact word matches), minor change (words hold nearly the same meaning but have a change to their form), paraphrased (different words that contain the same meaning replace the original content), omitted words (the portion of text not being scanned for plagiarism based on the scan settings), and match (overall similarity). A sample report is shown in Figure 3.

A sample report of Copyleaks.
Students’ interview
Students in the experimental group were interviewed about their experience after using CHATTING. The interview questions were adopted from Yan (2023), whose study applied ChatGPT's text generation functions in writing practice. An experienced teacher reviewed the interview questions to ensure they were written in understandable language. The questions are shown in Supplementary Information: Figure 17.
Participants
The participants in this study were students from a local secondary school in Hong Kong, where Traditional Chinese and Cantonese were used as the medium of instruction. They were from three levels: Secondary 1–3, comprising students with/without dyslexia.
Initially, 74 students were in the experimental group, 35 in the English writing group, and 39 in the Chinese writing group. However, two students in the Chinese writing group wrote their assignments independently without using CHATTING. As a result, the final number of participants in the study was 72, with 35 in the English writing group and 37 in the Chinese writing group, as indicated in Supplementary Information: Figure 18.
Informed consent for participation was obtained from all students before the commencement of the study. The Institutional Review Board approved the research, ensuring that ethical considerations were thoroughly addressed throughout the study's design, execution, and reporting.
Control group
Twenty-nine students (students with dyslexia: N = 12, M̄ = 14.87-year-old, SD = 1.29; students without dyslexia: N = 17, M̄ = 13.87-year-old, SD = 0.83) were instructed by Chinese and English teachers. The students conducted pre-/post-questionnaires of SDT (Section: “Questionnaires based on Self-Determination Theory”), administered before and after the learning session with teachers.
Experimental group
As shown in Figure 4 and Figure 5, 72 students were instructed by Chinese and English teachers in the pre-test and learned with CHATTING in the post-test. The students (students with dyslexia: N = 13, M̄ = 14.31-year-old, SD = 0.81; students without dyslexia: N = 13, M̄ = 14.26-year-old, SD = 1.49) conducted pre-/post-questionnaires of SDT (Sections: “Questionnaires based on Self-Determination Theory” and “Questionnaire regarding CHATTING UIs and functions”), which were administered before and after the learning session with CHATTING. Thirty-seven students (students with dyslexia: N = 18, M̄ = 14.13-year-old, SD = 0.86; students without dyslexia: N = 19, M̄ = 14.05-year-old, SD = 1.23) learnt to write Chinese composition with CHATTING, while 35 students (students with dyslexia: N = 18, M̄ = 14.13-year-old, SD = 0.86; students without dyslexia: N = 17, M̄ = 14.07-year-old, SD = 1.30) learnt to write English composition with CHATTING.

The overview of pilot test implementation in the experimental group. Students had 90 min for Chinese (Days 1 and 3) and 60 min for English (Days 2 and 4) writing.

A student was learning writing with CHATTING in a classroom setting.
Then, the students conducted an interview (students with dyslexia: N = 18, M̄ = 14.13-year-old, SD = 0.86; students without dyslexia: N = 19, M̄ = 14.05-year-old, SD = 1.23) and functional feedback (students with dyslexia: N = 16, M̄ = 14.11-year-old, SD = 0.85; students without dyslexia: N = 17, M̄ = 14.25-year-old, SD = 1.34) individually after a 2-day learning session with CHATTING.
The whole pilot test was held in four days. Pre-writing tests were conducted on Day 1 (Chinese) and Day 2 (English), while post-writing tests were administered on Day 3 (Chinese) and Day 4 (English). Pre-/post-writing tests allowed for a comparative analysis regarding the effects of CHATTING on students with/without dyslexia. The pre-writing/post-writing tests employed the same teachers, standardized measures, and scoring rubrics to enhance the validity and reliability of the tests and allow for fair comparisons between the two groups. All participants were allotted equal time to write the compositions with the questions at the same learning level, which included 90 min for Chinese (Days 1 and 3) and 60 min for English (Days 2 and 4) writing.
Results
RQ1: How do generative AI models like CHATTING influence students’ learning engagement?
We analyzed RQ1 from two perspectives: (a) changes in engagements and motivation and (b) student interviews.
Changes in engagements and motivation
In this section, we discussed the changes in three engagements and motivation. The analysis of covariance (ANCOVA) was utilized to conduct data analysis. The independent variable (IV), dependent variable (DV), and covariate are the groups, post-questionnaire, and pre-questionnaire, respectively.
The ANCOVA result revealed that learning with CHATTING could positively arouse students’ interest in the behavior, emotional, cognitive, and intrinsic motivation compared with traditional learning. The performance of students with dyslexia is shown in Figure 6(a). For behavioral engagement, the experimental group (M̄ = 85.20, SD = 0.63) outperformed the control group (M̄ = 76.60, SD = 0.72) by 11.23%, p = .08. For emotional engagement, the experimental group (M̄ = 81.60, SD = 0.64) was better than the control group (M̄ = 70.00, SD = 0.67) by 16.57%, p < .05, which demonstrated a significant change. For cognitive engagement, the experimental group (M̄ = 80.00, SD = 0.71) surpassed the control group (M̄ = 73.40, SD = 0.65) by 8.99%, p = .21. For intrinsic motivation, the experimental group (M̄ = 72.40, SD = 0.87) exceeded the control group (M̄ = 66.60, SD = 0.65) by 8.71%, p = .05.

Overview of analysis of covariance (ANCOVA) for three engagements. (a) The performance of students with dyslexia. (b) The performance of students without dyslexia.
The performance of students without dyslexia is shown in Figure 6(b). For behavioral engagement, the experimental group (M̄ = 80.00, SD = 0.82) outperformed the control group (M̄ = 78.80, SD = 1.14) by 1.52%, p = .39. For emotional engagement, the experimental group (M̄ = 75.40, SD = 0.93) was better than the control group (M̄ = 69.40, SD = 1.07) by 8.65%, p = .33. For cognitive engagement, the experimental group (M̄ = 77.00, SD = 0.69) surpassed the control group (M̄ = 69.40, SD = 1.13) by 10.95%, p = .31. For intrinsic motivation, the experimental group (M̄ = 75.40, SD = 0.73) exceeded the control group (M̄ = 69.40, SD = 1.18) by 8.65%, p = .31.
In the experimental group, students with/without dyslexia exhibited improvement in behavioral, emotional, cognitive, and intrinsic motivation (Figure 7[a]). Specifically, students with dyslexia had a more positive improvement in behavioral (+9.95%), emotional engagement (+8.22%), intrinsic motivation (+9.34%), and minor improvement in cognitive engagement (+2.04%). While students without dyslexia exhibited superior advancement in behavioral engagement (+6.10%) and intrinsic motivation (+25.67%), cognitive and emotional engagement performance remained the same.

The findings from the questionnaire on the SDT. (a) The findings in the experimental group. (b) The findings in the control group.
However, students in the control group deteriorated behavioral, emotional, and cognitive engagement and intrinsic motivation (Supplementary Information: Figure 7[b]). Students with dyslexia had a regression in emotional (−8.62%), cognitive engagement (−6.38%), and intrinsic motivation (−9.26%). There was no change in behavioral engagement. Interestingly, students without dyslexia exhibited a dramatic decline in emotional (−11.93%) and cognitive engagement (−14.53%) and slightly deteriorated in behavioral engagement (−5.74%) and intrinsic motivation (−3.34%).
Students with dyslexia were more positive than students without dyslexia on the CHATTING's UIs and functions, especially on the helpfulness, relevance, readability, consistency, and conciseness, as shown in Figure 8(a) and 8(b). For the open-ended interview, two raters, one with a psychology major and the other in inclusive education, independently analyzed the replies received from the students. To determine the consistency of the feedback, Cohen Kappa was used and the inter-rater reliability was found to be substantial (k = .780, p < .001).

Questionnaires of CHATTING UIs and functions. (a) The feedback from students with dyslexia. (b) The feedback from students without dyslexia.
Students’ interviews
In this section, we discussed the feedback from students’ interviews. As mentioned in Section “Students’ interview,” the interview questions were categorized into three sections, including Section 1: CHATTING's impact on writing, Section 2: CHATTING's UIs, functions, and impact beyond writing, and Section 3: The design and function of CHATTING.
Section 1: CHATTING's impact on writing
CHATTING could (a) provide detailed explanations and examples, facilitating understanding when questions are asked, and (b) assist in generating writing ideas. However, explanations of CHATTING may lack clarity and require additional follow-up. As a result, the impact of CHATTING on completing writing tasks is insignificant. In general, one-third of students (35.29% of students without dyslexia and 31.25% of students) agreed that CHATTING impacts the completion of writing tasks.
Advantages
The benefits of using CHATTING included improved writing ability and enhanced interaction. Students reported feeling more confident in their understanding of the topic regarding writing ability when they received accurate and helpful responses from CHATTING. For example, one student who expressed having poor English proficiency found CHATTING helpful in learning vocabulary and writing skills and believed it could help cultivate his interest in writing.
Regarding interaction, CHATTING could simulate human-like interactions, creating a sense of connection and relatedness in digital support. For example, a student mentioned that CHATTING reduced the time spent on brainstorming by providing multiple examples, which helped clarify his writing direction.
Disadvantages
The drawbacks of using CHATTING included a lack of guidance, insufficient explanation, and the risk of over-reliance. While CHATTING provided support, it could not replace the personalized guidance that a human teacher can offer. For example, a student expressed that although CHATTING could express human emotions, they still desired a teacher's presence to provide instruction and guidance in their writing process.
Regarding explanation, CHATTING did not always provide detailed explanations without specific inputs. A student mentioned that CHATTING's answers may not always align with their desired response, requiring them to provide detailed instructions to obtain the desired answers.
Furthermore, students expressed concerns about over-reliance on CHATTING. One student highlighted their fear of becoming too dependent on CHATTING and allowing it to influence their thinking. They noted that sometimes the information provided by CHATTING may not align with their initial intentions, leading to potential deviations in their thoughts and potential errors.
Section 2: CHATTING's UIs, functions, and impact beyond writing
A higher number of students without dyslexia expressed appreciation for CHATTING's UIs and functions compared to those with dyslexia. Overall, they agreed that CHATTING effectively provided them with various information and examples, and they found the responses prompt. Specifically, they agreed that CHATTING was relevant (58.82%), readable (70.59%), coherent (70.59%), concise (76.47%), simple (70.59%), and original (58.82%).
Students with dyslexia acknowledged the personalized support of CHATTING. They agreed that CHATTING was relevant (56.25%), readable (62.50%), coherent (62.50%), concise (50.00%), simple (56.25%), and original (37.50%). Many said that CHATTING engaged in interactive question-and-answer sessions. For example, if they did not articulate their question clearly, CHATTING would ask them for further clarification, prompting them to rephrase or provide additional details.
The feedback on using CHATTING in writing classes was quite different in the impact beyond writing. More students agreed to use CHATTING in the class. Their feedback included efficiency, workload, and dependency. Students mentioned that CHATTING could provide detailed explanations and summaries to save time for both students and teachers. Also, using CHATTING in writing classes was supported because it helped alleviate the workload of teachers who have to attend to many students individually. Furthermore, some students emphasized the importance of not relying too heavily on CHATTING to avoid dependency and maintain the development of their writing skills.
Section 3: The design and function of CHATTING
The design
Students generally agreed that the UIs of CHATTING was good. More than 70% of students without dyslexia and 50% of those with dyslexia found the UIs of the CHATTING system easy to use. Also, nearly half of the students without dyslexia and 37.5% of those with dyslexia found that the voice functionality of the CHATTING system was helpful.
Satisfaction
The majority of users do not perceive any areas for improvement in the UIs of CHATTING, indicating a high level of satisfaction.
Improvement
Some students noted that the handwriting input method was not responsive, suggesting a potential area for improvement. Also, ND8 suggested removing the auditory function as it was noisy.
The function
Students generally agreed that the functionality of CHATTING is sufficient. Nearly 60% of students without dyslexia and more than half of those with dyslexia agreed that the function of the CHATTING system was sufficient.
Satisfaction
The majority of users do not perceive any areas where the functionality of CHATTING can be improved, indicating a high level of satisfaction.
Improvement
The enhancements focused on character display, voice, and additional functions. Regarding character display, since the students read and wrote in Traditional Chinese, ND2 suggested that CHATTING could better understand local dialects and spoken languages. Additionally, a student pointed out the need for improvements in the voice input feature, as CHATTING occasionally produced strange or incorrect words. In terms of additional functions, students proposed incorporating a feature within CHATTING to facilitate essay assessment, encompassing aspects such as language structure, spelling accuracy, and lexical selection.
RQ2: How effectively can CHATTING improve students’ writing performance?
We analyzed RQ2 from three perspectives, including (1) writing performance, (2) plagiarism detection, and (3) language barrier. From the data analysis in Sections “Writing performance” and “Plagiarism detection,” CHATTING did not effectively improve students’ writing performance. Students’ writing scores were reduced after using CHATTING. Also, the plagiarism was severe.
Writing performance
In this section, we discussed the performance of Traditional Chinese and English writing. Both groups of students showed a slight decline in writing performance in both languages (Supplementary Information: Figure 19). In Traditional Chinese writing, the performance of students with dyslexia declined by 3.32% overall [F(1, 17) = .132, p = .719], with each item decreasing by 14.57% [F(1, 12) = 1.592, p = 1.592], 1.54% [F(1, 12) = .018, p = .895), 13.14% [F(1, 12) = 1.418, p = .245], and 23.54% [F(1, 17) = 7.231, p = .011] in content, language, organization, and word count, respectively.
The performance of students without dyslexia also declined by 6.77% [F(1, 18) = .676, p = .416], with each item decreasing by 11.68% (F(1, 17) = 1.349, p = .254), 3.44% [F(1, 17) = .169, p = .684], and 2.62% [F(1, 17) = .081, p = .777] in content, language, and organization, respectively, but increasing by 6.42% [F(1, 18) = .324, p = .573] in word count.
In English writing, the performance of students with dyslexia declined 29.5% overall [F(1, 17) = 4.991, p < .05], with each item decreasing by 33.33% [F(1, 17) = 5.453, p < .05], 15.28% [F(1, 17) = 1.055, p = .312], 36.4% [F(1, 17) = 7.351, p < .05], and 38.64% [F(1, 17) = 8.820, p < .01] in content, language, organization, and feature/appropriacy, respectively. However, the word count increased by 2.48% [F(1, 17) = .019, p = .891].
Students without dyslexia also dropped by 9.71% in total, with 8.86% [F(1, 16) = .530, p = .472], 21.94% [F(1, 16) = 3.834, p = .059], and 12.82% [F(1, 16) = 1.093, p = .304] regressed in content, organization, and feature/appropriacy, respectively. However, there was no change in language and the word count increased by 23.89% [F(1, 16) = 2.357, p = .135].
Plagiarism detection
In this section, we discussed the plagiarism issues in Traditional Chinese and English writing. Both groups of students had plagiarism issues (Figure 9). More students with dyslexia copied in English writing, while more students without dyslexia plagiarized in Chinese.

Plagiarism detection of students’ writing after using CHATTING. (a) Plagiarism detection of students with dyslexia in Chinese writing. (b) Plagiarism detection of students with dyslexia in English writing. (c) Plagiarism detection of students without dyslexia in Chinese writing. ND7 and ND8 did not use CHATTING in Chinese writing. (d) Plagiarism detection of students without dyslexia in English writing.
For students with dyslexia, the average percentage of plagiarism in the factors of identical, minor change, paraphrased, omitted words, and match were 18% (SD = 0.20), 12% (SD = 0.19), 0% (SD = 0), 0% (SD = 0) and 30% (SD = 0.35) respectively in Traditional Chinese writing, and 35% (SD = 0.26), 33% (SD = 0.32), 6% (SD = 0.12), 0% (SD = 0), and 74% (SD = 0.33) in English. Students with dyslexia had a higher probability and a higher portion of copying the generated content in English (Figure 9[a] and Figure 9[b]). Also, they tended to copy the content directly or make minor changes. One possible reason was that they could not comprehend the generated content. Therefore, they copied all content directly. For example, the teacher told D20 it should be a personal letter instead of a conversation with AI.
For students without dyslexia, the average percentage of plagiarism in the factors of identical, minor change, paraphrased, omitted words, and match were 31% (SD = 0.24), 17% (SD = 0.17), 0% (SD = 0), 0% (SD = 0), and 48% (SD = 0.37), respectively, in Traditional Chinese writing, and 23% (SD = 0.25), 22% (SD = 0.22), 6% (SD = 0.13), 0% (SD = 0), and 52% (SD = 0.44) respectively in English. The plagiarism detection of students without dyslexia in Chinese and English writing was similar (Figure 9[c] and Figure 9[d]).
In Traditional Chinese writing, teachers most commonly commented that students hastily ended their essays, merely pieced together data, and lacked organization. However, the teachers could not identify students who plagiarized the generated content. Further research may be needed to explore why students are more likely to plagiarize Chinese content. The study noted that students had difficulty comprehending the generated content in English writing, resulting in instances of plagiarism. For example, the teacher commented to ND13 that they had copied an AI statement and to ND15 that their writing was unrelated to the question.
Language barrier
In this section, we analyzed language barriers from different perspectives, including (a) writing independently without using CHATTING, (b) asking questions, and (c) learning writing with first/second language and copying.
Write independently without using CHATTING. As mentioned above, two students without dyslexia requested to write independently without using CHATTING in the post-test. The reason was that they knew how to write and did not need extra help from CHATTING. When comparing the performance between the pre- and post-test, one showed no change, while the other increased by 5.88%. The exceptional cases gave us insight into how self-learning ability can affect writing ability.
Ask questions. As shown in Figure 10, D12 asked five questions, while D3 asked four. When comparing the questions, D12 focused on open-ended questions, such as what, how, and what kind. Also, D12 kept the questions informative and stuck to key points, such as “teaching principles in schools,” “experience can be unforgettable,” and “can help people learn a truth.” However, D3's questions were less informative. The skill of asking questions is crucial to learning with ChatGPT. Therefore, training students’ critical thinking skills is essential.

Chinese writing samples from ND12 and D3.
Learn writing with a first/second language and through copying. As shown in Figure 11, ND12 asked many questions using CHATTING. ND12 could understand the generated information and filtered suitable sentences to incorporate into ND12's writing.

Chinese writing sample from ND12.
As shown in Figure 12, D3 directly copied almost entire paragraphs and only added two phrases in the writing. D3's Chinese teacher indicated that D3 is a student with dyslexia and it is difficult for D3 to learn writing. From a teacher's perspective, students who demonstrate proficiency in copying the content of ChatGPT have shown positive improvement. However, a Chinese teacher told us that some students struggle to write more than eight Chinese characters within a 45-min exam. The process of learning to write compositions often begins with copying. The more they practice copying, the better they can remember and eventually apply what they have learned to their writing. It is already commendable if students can copy effectively. Furthermore, by gradually incorporating their ideas and content, they can continue to learn and effectively apply this method.

Chinese writing sample from D3.
Furthermore, as depicted in Figure 13, ND12 merely copied some content to meet the basic writing requirements. The teacher remarked, “This is not a complete blog entry. It is more of a guideline for writing a blog entry.” When we asked about ND12's performance in English composition, ND12 mentioned, “I excel in Chinese writing and always achieve the highest scores in the class. However, my English is weak as I struggle to comprehend its meaning.” By collecting the conversations between ND12 and CHATTING, along with the teacher's comments and ND12's situation, it is evident that technology can support students’ learning, provided they possess basic language acquisition skills.

English writing sample from ND12.
Discussion
This section delves into the opportunities, challenges, and insights associated with utilizing CHATTING, along with incorporating ChatGPT functions.
Opportunities
There is no doubt that ChatGPT-assisted learning can enhance students’ learning interests (Ali et al., 2023). The findings demonstrated the positive impact of CHATTING on the learning motivation of students with dyslexia, particularly in terms of emotional engagement (+16.57%, p < .05) and intrinsic motivation (+8.71%, p = .05). Similarly, those without dyslexia showed a better performance regarding emotional (+8.65%, p = .33) and cognitive engagement (+10.95%, p = .31).
Engagement
Students with dyslexia always encounter the disadvantages of weak vocabulary and memory retrieval. With the help of CHATTING, they can brainstorm many new ideas and enrich their vocabulary size (Chukwuere, 2024). Even though the students cannot think of the topics or outlines, they can get help from CHATTING by generating content to guide their thinking. Therefore, they are more eager to learn.
Digital support
ChatGPT-assisted learning can provide personalized teaching, boost self-perception of competence, and simulate human-like interactions, creating a sense of social presence and engagement in digital support (Abdullah et al., 2022; Kasneci et al., 2023). Students have reported that CHATTING reduces brainstorming time by providing multiple examples, helping them clarify their writing direction. As a result, students perceive CHATTING as an interactive companion in their learning journey, creating a sense of social presence and engagement.
Inclusive design
When developing a tool to support students’ language learning, it is crucial to consider their literacy levels that may hinder their access and use of ChatGPT-assisted tools (Murgia et al., 2023). For instance, in the case of English as a second language, it is vital to provide functions that assist students in comprehending the content, such as translation capabilities. Additionally, personalized features are necessary, such as automatically adjusting the difficulty level of the output to cater to individual needs.
Challenges
ChatGPT-assisted learning can offer students more support than traditional learning (Lo et al., 2024). However, students encounter different difficulties when utilizing CHATTING. This section discusses the challenges from different perspectives, including the ability to ask questions, students’ dilemmas, and less controllable output.
Ability to ask questions
Based on our analysis, the writing performance of both groups in Traditional Chinese and English declined. First, students struggled to identify the article genre or lacked depth in their writing, resulting in reduced scores. For example, a teacher commented to ND3 that the format was incorrect and the language was not concise.
Second, most students are not good at asking questions, resulting in not getting what they want from CHATTING. Most students asked “Yes/No” questions, questions less informative or too broad, such as “Can I write the e-mail?”, “I have no idea,” and “story.” Only small portions of students were able to ask open-ended, informative, and specific questions, such as “What kind of experience can be unforgettable?” and “Help me translate 互相聊天 (Literally: Chat with each other) into English.” Developing the ability to ask questions plays a vital role in leveraging ChatGPT for learning purposes. Consequently, it is of utmost significance to prioritize cultivating students’ critical thinking skills (Qawqzeh, 2024).
Students’ dilemma
Students faced challenges in the learning process, including resistance to certain functions, difficulties composing English compositions, and the need for further support and guidance. For example, students with dyslexia felt embarrassed when using STT to ask questions. They were willing to compose Traditional Chinese writing. However, a few resisted asking questions using the STT functions (Herreid & Schiller, 2013). In some cases, they could not input any text through typing or lacked knowledge of how to write certain words. Consequently, they would remain idle, awaiting assistance from the instructor.
Teachers act as facilitators and guides in promoting behavioral engagement in the classroom, which cannot be underestimated (Mystkowska-Wiertelak, 2022). Although ChatGPT-assisted learning cannot substitute the personalized guidance that a human instructor can provide, students with dyslexia are always reluctant to seek help from teachers in class. Therefore, a more personalized ChatGPT-assisted tool can complement the learning experience and help overcome challenges in the classroom.
Less controllable output
ChatGPT often generates lengthy paragraphs of information in response to students’ keyword input (Lingard, 2023). Although this does not pose a significant challenge for Chinese (their mother tongue), it can be difficult for students learning English as a second language. Particularly, expressions like samurai armor, ancient pottery, panoramic views, and meticulously manicured gardens may prove challenging for students with lower proficiency levels. Therefore, it is crucial to iterate and provide students with a broader range of learning options that cater to their needs and abilities.
Insights
Through our analysis and exploration, several key insights have emerged about using CHATTING, including plagiarism concerns, inclusive design elements, and alternative open-source models.
Plagiarism concerns
From the plagiarism detection analysis in Section “Plagiarism detection,” students committed plagiarism, which aligned with the findings of Bašić et al. (2023). However, how do educators define plagiarism? When we learn to write, teachers often ask us to quote idioms, famous sentences, and maxims or refer to model articles. Learning from copying is a process for students to grow, like planting a tree with air, sunlight, and water as the essential elements. If we give the tree nutrients, it will grow better. Similarly, students need the foundation of languages, article writing formats, and rhetorical skills. Giving them digital tools, dictionaries, books, or other teachers’ support can help them learn faster. As mentioned by a teacher, some students can merely write less than ten Chinese characters in the Chinese exam. The initial step in learning composition writing involves copying. The more the students practice copying, the better they can remember and eventually apply what they have learned to their writing. If students can copy effectively, it is already praiseworthy. Whether imitation or referencing exemplary works should be considered plagiarism at the early learning stage is worth reconsidering.
Open-source models
By exploring alternative models, we can better understand their potential benefits and make informed decisions regarding the best approach. Some open-source models, such as Llama2 and GPT-Neo, can provide similar functions as ChatGPT. Llama2 is a language model designed for generating coherent and fluent text. It has been trained on a large corpus of text and can generate text in various languages. It can be applied in a range of tasks, such as chatbots and text summarization.
Llama2 can be particularly useful for students with dyslexia who struggle with organizing their thoughts and expressing them coherently in writing. However, a challenge to using Liama2 in the concerned study is that it may not fully understand the local context or idioms used in a particular language. This could result in unintentional plagiarism or inappropriate suggestions. Therefore, it is essential to carefully evaluate the suggestions generated by the model and ensure they are appropriate for the target audience.
GPT-Neo is a set of powerful language models trained on the Pile, which can perform a wide range of natural language processing (NLP) tasks, including chatbots. Its ability to comprehend complex linguistic patterns enables it to generate coherent and contextually relevant outputs, which can be especially beneficial for students with dyslexia to improve their writing skills. However, GPT-Neo is trained on a vast corpus of languages that may contain uncensored content or biases, potentially affecting the quality of the generated text. Therefore, conducting rigorous evaluations through comprehensive testing and user feedback is crucial to ensure that the model's outputs align with the educational goals and are appropriate for our target audience.
Word count
In this work, students demonstrated an improvement in the word count (Section “Writing performance”), which meant that they could write more after using CHATTING. However, word count does not necessarily reflect the depth and breadth of vocabulary use (Schmitt, 2014).
Rubrics, such as content, language, organization, and features, can be used to evaluate the writing quality (Shabani & Panahi, 2020).
Limitations
CHATTING system
It is crucial to consider the limitations of current natural language processing models. These models may struggle to understand certain idioms and local contexts (D11). They are trained on large datasets and rely on statistical patterns rather than profoundly understanding students’ questions. As a result, there can be instances where the explanations provided by CHATTING could be more explicit and require further clarification (ND2). The answers provided may sometimes align with students’ expectations (ND9). The responses can be inaccurate or incomplete, hindering students’ cognitive engagement, particularly when faced with complex or nuanced questions.
Sample size and diversity
This study was constrained by sample size and diversity. A larger population would provide additional insights into the learning outcomes and engagement levels, thereby offering more comprehensive implications. A larger number of participants may enhance the sample's representativeness, reflect a broader population, or account for individual differences.
A longitudinal test
The study consisted of a two-day learning session, but its effectiveness was limited. Nonetheless, it provided valuable insights into how ChatGPT can be integrated into inclusive education. To enhance our approach, we plan to conduct a longitudinal test over the course of a semester, incorporating inputs from teachers’ instructional modules. This will help us optimize learning efficacy and yield more comprehensive results.
Conclusion and future works
In this work, we have two major contributions. First, we developed a ChatGPT-assisted system, CHATTING, to motivate students to learn writing. CHATTING incorporates inclusive design functions, such as adjustable speech rate and STT functions. Second, we conducted an evaluation study with CHATTING to investigate how generative models influence students’ learning engagement and facilitate students’ writing process. The results showed that students with dyslexia expressed more positivity toward CHATTING's user interfaces (UIs) and functions than students without dyslexia, particularly regarding helpfulness, relevance, readability, consistency, and conciseness. However, CHATTING did not improve students’ writing performance and led to unintentional plagiarism. Further research is imperative to comprehend the utilization of generative models for inclusive language education.
Footnotes
Contributorship
Ka Yan Fung was responsible for the study conception and design, data collection, analysis and interpretation of results, and draft manuscript preparation. Kwong Chiu Fung contributed to the study conception and design, analysis and interpretation of results, and draft manuscript preparation. Rick Tze Leung Lui assisted with data collection and contributed to the draft manuscript preparation. Lik Hang Lee participated in the draft manuscript preparation, while Shenghui Song also contributed to the draft manuscript preparation. Kuen Fung Sin and Huamin Qu contributed to the study conception and design. All authors reviewed the results and approved the final version of the manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical statement
Informed consent for participation was obtained from all students before the commencement of the study. The Institutional Review Board of Hong Kong University of Science and Technology (HREP-2023-0217) approved the research, ensuring that ethical considerations were thoroughly addressed throughout the study's design, execution, and reporting.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
