Sage Journals: Discover world-class research

Abstract

Although AI-generated texts are increasingly attractive to the EFL/ESL researchers and teachers, few studies have examined those texts from logical and rhetorical perspectives. This study explored whether AI model can argue more sufficiently and tactfully than human writers. The data source consisted of 60 argumentative essays from EFL learners’ written corpus and a parallel ChatGPT-written corpus. Content analysis and quantitative analysis were employed to do the comparisons. The results displayed that the ChatGPT could argue more sufficiently than the EFL students by providing more grounds and warrants to justify the claims, and that it could only use monotonous type of warrant to gain argument sufficiency despite its same preference of argument width as the human writers, that is, using parallel arguments to support the claims. Moreover, the machine showed less tactfulness by expressing claims with more certainty and by using fewer concessions before refuting. Implications in writing instruction are emphasized such as the culturally and emotionally embedded nature of human communication.

Keywords

AI EFL argumentation argument sufficiency tactfulness ChatGPT

Introduction

Toulmin model of argumentation (Toulmin, 1958, 1999) is increasingly used to examine the argumentation written in English as a Foreign/Second Language (EFL/ESL). The model consists of six elements: claim, ground, warrant, backing, rebuttal, and qualifier. The four elements—claim, ground, warrant and backing—expound and justify one’s view. The rebuttal entails opposition and refutation. The qualifier expresses the degree of force or certainty on the claim. Some related studies mainly concern the relationship between the Toulmin elements and argumentative writing quality (D. Liu et al., 2024; F. Liu & Stapleton, 2020; R. Yang, 2022). Others focus on the efficient ways of doing comparisons among different kinds of student essays (Crammond, 1998; D. Liu & Wan, 2020; Qin & Karabacak, 2010).

However, in all the related studies, argument sufficiency is unavoidably the major concern whether they focus on the assessment of argumentation quality or the teaching pedagogy. Johnson and Blair (1994) classify argument sufficiency into local and global. Local sufficiency refers to only working at justifying a claim and global sufficiency includes both justification of a claim and refutation of opposite views. In this sense, Toulmin model reveals global sufficiency. Using Toulmin model as theoretical basis, this paper focuses on global sufficiency. Moreover, tactfulness in argumentation originally included in Toulmin’s argumentation theory was mainly embodied in the use of qualifier to express one’s claim in an appropriate way (e.g., to moderate the tone, to be more objective and unbiased). Tactful language in argumentation dispels aggressiveness in refutation, adds prudence to the expressing of claims, and hence improves persuasiveness. Although tactfulness in argumentation is gaining attention by some teachers and textbooks for writing, unfortunately only a few studies explore the tactfulness from Toulminian perspectives (Berger, 2014; D. Liu et al., 2024; Majidi et al., 2021; R. Yang, 2022). In this study tactful language is not only revealed by the use of qualifier but also concession.

The integration of artificial intelligence (AI) in EFL writing instruction, particularly in the context of argumentative writing, has garnered significant attention. AI tools, such as ChatGPT, offer innovative approaches to improve writing proficiency among learners (Al Fraidan, 2025), to improve students’ writing through personalized feedback mechanisms (Guo et al., 2024; Li et al., 2023; Nazari et al., 2021; S. Yang et al., 2024), and to foster greater student motivation and engagement (He, 2024; Song & Song, 2023; Wei, 2023). However, its stunning ability to produce coherent and informative texts has caused concerns. Some researchers complain of the difficulty to differentiate a text generated by AI from the one written by a human, and worry about the possible increase in plagiarism and the laziness of the students (Chatterjee & Dethlefs, 2023; S. Yang et al., 2024). Although OpenAI has launched a classifier to distinguish between AI-written and human-written texts, this tool still has a few limitations. For example, it can be rather unreliable on short texts (below 1,000 characters).

In order to make better use of AI in language teaching, it is necessary to perform more comparisons and analyses of AI-generated and human-written texts and it might be more helpful to make investigation at rhetorical and logical level such as argument sufficiency and tactfulness in argumentation. This paper aims to conduct such exploration and make contributions to the identification of AI-generated shorter texts. The whole study was based on Toulmin model. By synthesizing several Toulminian analytical frameworks (D. Liu et al., 2024; D. Liu & Xiong, 2024) to measure argument sufficiency and tactfulness, this study compared the essays produced by ChatGPT and human writers, it will emphasize the irreplaceable nature of human work, especially the culturally and emotionally embedded nature of human communication and reasoning in writing instruction.

Argument Sufficiency and Tactfulness in Argumentation

Argument Sufficiency

Argument sufficiency as an important factor to assess argumentation, was initiated by Johnson and Blair (1994). Argumentation consists of an inference core and a dialectical tier. To be specific, arguers not only have to justify but also defend their arguments, that is, to consider and respond to those potential opposite views and critical questions raised by the rational opponents. Justifying the arguments entails inference, and defending the arguments involves both inference and dialectical tier. Thus, argument sufficiency is further divided into local and global sufficiency, according to Johnson and Blair (1994). Local sufficiency which refers to how arguers justify their arguments (i.e., to explicate and support the claims) cannot better reveal the essence of argumentation than global sufficiency. In order to achieve global sufficiency, the arguers must respond to the competing claims and grounds which their opponents hold, point out the mistakes of the claims and grounds, or even prove their inferiority or weakness.

The exploration of sufficiency in argumentative writing from Toulminian perspectives has become an attractive approach in the recent decades. A variety of measures based on Toulmin model have been proposed to assess argument sufficiency in school writing. For example, Crammond (1998) and Qin and Karabacak (2010) used their modified Toulmin models as analytical framework to examine L1 and L2 students’ argumentative essays. Most of the subsequent studies apply their frameworks to analyze the essays written by the students of different backgrounds and at different proficiency levels (Abdollahzadeh et al., 2017; Cheng & Chen, 2009; F. Liu & Stapleton, 2014; Stapleton & Wu, 2015). Other studies make further modifications on those revised versions of Toulmin model to improve the accuracy of measurement, or to meet the needs of the students and teachers. D. Liu and Wan (2020) created a framework with seven elements: claim, subclaim, ground, warrant-backing, counterargument-claim, counterargument-ground, and rebuttal. They compared the English argumentative essays written by the first- and the fourth-year EFL students, focusing on both global and local sufficiency revealed by the frequencies of Toulmin elements and argument complexity. Majidi et al. (2021) further modified Toulmin model and introduce several structural elaborations. These adaptations include broadening the backing element to support warrants, ground, qualifiers, alternative solutions, counterarguments, and rebuttals. Unlike Toulmin’s original model, rebuttals are defined as responses to opposing views, and “alternative solutions” added as potential responses to the issue. R. Yang (2022) divided the element ground into reason and evidence to investigate sufficiency in argumentation, and used the revised Toulmin model as a scaffold in teaching argumentation.

Admittedly, the modifications of Toulmin model applied to teaching or analyzing different genres have certain facilitating effects. However, the deficiencies in those modified versions and discrepancies of views on certain Toulmin elements render the modifications stray away from the original Toulmin model. The deviation is caused by excluding or confusing such key elements as qualifier and warrant in Toulmin model. The warrant is an indispensable component making up the defining feature of an argument structure, together with the claim and data. The exclusion of warrant alters the Toulmin model because the reasoning from the data to claim is missing. The confusion of data with warrant undermines the basic triangle of Toulmin model as well. In general, those modifications endow the components with different meanings to adapt it to second language writing, and consequently simplify and even destroy the model.

Keeping the original idea of Toulmin model and adopting the concept of interaction in Pragma-dialectics (Van Eemeren & Henkemans, 2017), D. Liu and Xiong (2024) propose a modified version of the Toulmin model. The argument structure is termed justification-qualifier to remind the researchers and teachers of qualifier. Warrant is treated as critical in this model and can embody argument sufficiency together with ground. This model includes eight elements: Claim, Justification-ground (J-ground), Justification-warrant (J-warrant), Justification-rebuttal (J-rebuttal), Opposition-ground (O-ground), Opposition-warrant (O-warrant), Opposition-rebuttal (O-rebuttal), and Qualifier. The components in justification and opposition reflect the dialogical relationship between the writer and the reader, and the inclusion of warrant and qualifier preserves the integrity of the original Toulmin model. This Toulminian model is quite feasible for evaluating global sufficiency in argumentation.

Tactfulness

Tactfulness in argumentation refers to the moderate, polite, or acceptable way to express claims and refutation. Qualifier in Toulmin model of argumentation is treated as a tactful way of persuasion. Qualifier can “put them (claims) into debates in an uncommitted way, merely for purposes of discussion” (Toulmin et al., 1978, p. 90). Toulmin et al. (1978) put more weight on its function than its linguistic form. However, only a few studies focus on the tactfulness of expressing claims in argumentation. Berger (2014) categorizes qualifier into three types according to linguistic form: qualifier in word form, qualifier in participle form, and qualifier in sentence form. Based on Berger’s (2014) classification, R. Yang (2022) investigated the relationship between the holistic writing quality of the argumentative essays and the three types of qualifiers, and finally finds no relation between writing quality and the use of qualifier. Majidi et al. (2021) splits the qualifier into reservation in claim and constraints of claim. However, it is contended that the various divisions of qualifier in the previous studies go against Toulmin’s definition (D. Liu et al., 2024). Adhering to Toulmin’s definition, D. Liu et al. (2024) classify claims with/without qualifier into five ranks and apply this classification to argumentation teaching. This classification is conducive to writing instruction and assessment, as well as to this study.

Tactfulness in argumentation can also be revealed in other ways besides the use of qualifier. D. Liu et al. (2024) propose “concession” as an indicator of tactfulness in EFL argumentation. For one reason, aggressive refutation is highly probable to incur a quarrel and contrarily concession can moderate the rebutting language. The way to express disagreement or rejection determines the persuasive effect of the argument. For another, concession is usually included in the teaching materials or textbooks, even in the teaching syllabus of argumentation in college. D. Liu et al.’s (2024) experiment demonstrates that concession strategies can be taught and tactfulness can be improved. It is reasonable and feasible to use this skill as an indicator of tactfulness from rhetorical perspective. Based on this result, our study takes concession into consideration in the measurement of tactfulness.

Comparative Studies on AI-Generated and Human-Written Texts

Abundant studies on the comparison of AI-generated and human-written texts are mostly done from the angle of Natural Language Processing (S. Yang et al., 2024). In recent years the studies on AI-generated texts are increasingly diverse in many areas, with the studies on AI-generated research papers constituting the majority. Only a few studies focus on argumentative essays and produce controversial results. For example, Dilai and Dilai (2025) conducted a corpus-based study examining the essays produced by EFL students, native English speakers and ChatGPT. The results demonstrated that essays generated by ChatGPT exhibited greater analytical style, cognitive complexity, and lexical diversity. ChatGPT also tended to use certain low-frequency words while the human writers preferred light verbs and high-frequency words such as very, much, thus. The human-written essays are also characterized by subjectivity realized by I-words, tentative words and certitude words.

While AI-generated argumentative essays outperform human-written texts in language complexity, accuracy and lexical richness, human writers maintain a distinct advantage in conveying personal perspective, engaging readers, and organizing content in a more coherent and interactive manner (Herbold et al., 2023; Jiang & Hyland, 2025; Mizumoto et al., 2024). A recent study by Jiang and Hyland (2025) discussed the causes of differences of engagement markers in the essays generated by ChatGPT and those written by British student writers. The ChatGPT-generated essays exhibited the limitations in building interactional arguments. The differences were attributed to the statistical algorithms underlying ChatGPT and the language data used to train the model. ChatGPT’s reliance on probabilistic patterns, rather than an understanding of reader expectations, resulted in texts lacking interactive qualities that could help make human writing more persuasive and engaging. The findings might serve as a counterargument to any tendency to personify AI model. Unfortunately, the focus of the study on interactional elements offers inadequate evidence since the discourse markers cannot reveal the major characteristics at logic level but only a part of the interactional feature of argumentation.

At discourse level, S. Yang et al. (2024) examined the thematic choices and progression patterns in IELTS teacher-written and AI-generated argumentative texts within the framework of systemic functional grammar. The machine was found to use more concession to repeat information, but fewer modal adjuncts to interact with the readers. The frequent use of the constant thematic progression pattern in AI-generated texts “prevents the text from development and makes the text redundant and simplistic like a list of ideas” (P1). The results suggest that AI-generated texts are inferior to human-written texts in text organization and development.

Most of the studies on AI mentioned above focus on lexical level or syntactical level, with little attention paid to discourse level or logical reasoning. Although AI-generated discourse may surpass human output in linguistic accuracy and lexical diversity, it still lacks consensus on whether AI-generated discourse is deficient in deep thinking, comprehensibility, and interactivity. Moreover, although Hyland’s engagement model is a good yardstick for assessing interactivity of ChatGPT, argumentative features at logical and rhetorical levels can be better parameters and should be considered as measures to assess argumentation. Just as Jiang and Hyland (2025, p. 28) propose, “it may be necessary to modify the categories when seeking to characterize AI-generated content.” To meet those needs, this study explores from logical and rhetorical perspectives whether AI model can argue more sufficiently and tactfully than human writers.

Methods

Research Questions

Given the scarcity of the studies comparing human-written argumentation with AI-generated essays from a Toulminian perspective, this study aims to identify the argumentation features exhibited by ChatGPT 3.5. We raise the following research questions:

(1) Can ChatGPT argue more sufficiently than the EFL students? Does it use the same way to gain argument sufficiency as the EFL students?

(2) Does ChatGPT show more tactfulness than the EFL students?

Data Source

The dataset consists of 60 argumentative essays on the same topic: 30 essays authored by Chinese EFL students and 30 essays generated by ChatGPT. The essays written by EFL students were sourced from the Ten-thousand English Compositions of Chinese Learners (TECCL Corpus) (Xue, 2015), while the AI-generated essays were selected from the aiTen-thousand English Compositions of Chinese Learners (aiTECCL Corpus) (Xu & Sun, 2023).

The TECCL corpus contains approximately 10,000 essays written by the Chinese EFL learners. All essay contributors agreed to share their essays for academic purposes at the time of submission to the online system. Additional anonymization measures were implemented to minimize the risk of disclosing writers’ identities. The corpus features a wide range of topics and prompts, with an estimated 1,000 distinct essay topics. The writers represented in the corpus range from elementary school to postgraduate students, with the majority being undergraduates.

The aiTECCL corpus created by Xu and Sun (2023), serves as a parallel corpus of TECCL. The corpus features10,000 essays generated by the GPT-3.5 model prompted with the identical topics and directions in the TECCL corpus.

The topic “online shopping” was chosen because it is familiar to university students, allowing them to express their views more easily. Topic familiarity is an important factor influencing idea generating, content organizing and language fluency. AI excels in collecting information, whether familiar or unfamiliar, whether fresh or outdated. In order to guarantee the fairness in data collection, we took the human students at the first consideration. The writing prompt is as follows:

In today’s world, online shopping has become an essential part of many people’s lives. Some believe it offers unparalleled benefits to our life, while others argue that it can be harmful. Write an argumentative essay on online shopping.

Both corpora contain 128 essays on the topic of online shopping. In selecting the 30 student essays, factors such as proficiency levels, testing conditions, and demographic characteristics (e.g., age and gender) were not taken into consideration, just because these variables were not directly relevant to the present study. The primary aim was to compare the argumentative features of human-written and ChatGPT-generated essays. However, to ensure that the two sets of essays were comparable in terms of overall writing quality, all the essays were evaluated using a five-scale scoring rubric, developed by Qin and Karabacak (2010). The selection of this rubric is based on two primary considerations: (1) it incorporates McCann’s (1989) clear descriptions of organizational structure and language use, alongside Nussbaum and Kardash’s (2005) comprehensive delineation of argument effectiveness; (2) it does not prioritize the Toulmin model, focusing instead on three general criteria for assessing effective argumentative writing. Specifically, this holistic rubric evaluates three key dimensions: overall effectiveness of the argument (including acknowledgment of potential counterarguments), overall organization, and general language use (see Appendix 1). Finally, 30 essays of each group were selected, with the score above 4. Table 1 shows the average scores of the two groups. An independent samples t-test reveals no significant difference between the two groups (t = 0.78, p = .44), indicating that the essays were produced at the same language proficiency level and could be used for comparative research.

Table 1.

Basic Information of the Two Groups of Essays.

Overall quality	EFL learners		ChatGPT		t	p
Overall quality	Mean	SD	Mean	SD	t	p
	4.03	0.41	4.13	0.57	0.78	.44

The essay length of all the selected essays was 250 to 350, with a total number of words in the Chinese EFL learners’ essays 8,564 and that of ChatGPT group 9,799. The essays written by ChatGPT are slightly longer than those written by the human writers, since ChatGPT has “a tendency to generate essays that are longer than the desired length” (Herbold et al., 2023, p. 3). However, it is not a problem when length-normalized metrics are used in data analyses.

Data Collection and Analyses

This study adopts D. Liu and Xiong’s (2024) model in argumentation feature investigation. To answer the first research question, argument sufficiency was examined by comparing the use of the following elements: J-ground, J-warrant, J-rebuttal, O-ground, O-warrant, O-rebuttal. Ground and warrant can indicate argument sufficiency. The definitions are displayed in Table 2, with the examples selected from our data source.

Table 2.

Definitions of Elements to Measure Sufficiency (D. Liu & Xiong, 2024, p. 5, revised).

Element	Definition	Example
J-ground	Statements or evidence offered in support of a claim. It can take various forms, such as facts, statistics, experience and so on.	[claim] Online shopping has become increasingly popular, revolutionizing the way we purchase goods and services. [J-ground] One of the main benefits of online shopping is convenience.
J-warrant	Explanation or reasons offered to guide the readers to get the writer’s viewpoint correctly and accurately from the given evidence. It can be a sentence, sentence cluster or paragraph.	[J-ground] One of the main benefits of online shopping is convenience. [J-warrant] With just a few clicks, we can effortlessly compare prices and products from the comfort of our own homes. No longer do we have to battle crowds or wait in long queues.
J-rebuttal	Statements in which the writer responds to opposition by pointing out the possible weakness.	[O-ground] Issues like delivery times or online frauds cause warranted hesitation for a fraction of users. [J-rebuttal] However, most reputable companies invest heavily in ensuring buyer protection, implementing secure payment mechanisms, and reliable shipping services.
O-ground	Evidence offered in support of a counter-claim. It can take various forms, such as facts, statistics, experience and so on.	[O-ground] Issues like delivery times or online frauds cause warranted hesitation for a fraction of users. [J-rebuttal] However, …
O-warrant	Explanation or reasons offered to guide the readers to get the viewpoint opposite to the writer’s correctly and accurately from the given evidence. It can be a sentence, sentence cluster or paragraph.	[O-ground] there may be risks associated with the security of personal and financial information, [O-warrant] as online shopping requires divulging sensitive details over the internet.
O-rebuttal	Statements in which the writer expects the opponent’s response to Justification.	[concession] While security measures have improved significantly over the years, [O-rebuttal] there still exists a small degree of risk for fraudulent activities.

Warrant type is an important element to reveal the way to achieve argument sufficiency. This study not only examined its frequency but also types. Five warrant types were identified: a priori warrant, empirical warrant, institutional warrant, evaluative warrant, and descriptive warrant. The definitions are shown in Table 3.

Table 3.

Definitions of Warrants.

Warrant type	Definition	Example
A priori warrant	involves common sense or world knowledge	Since it is common sense that no one would spend more money if he/she can buy the same goods in a lower price.
Empirical warrant	depends on previous experience	Most of us have heard or even experienced the situation when you are shopping online, which can lead to troubles in personal property. Definitely, the Internet fraud put much burden not only on our shoulders but also on our mentality. The more terrible thing is that we are likely to lose our money without knowing it when we are shopping.
institutional warrant	relies on rules, principles, policies, or regulations in an institution or discipline	Due to loose government and legal supervision, there may be fake and inferior goods
evaluative warrant	entails morality, values or social convention to make comments or explanation	As we all know nowadays, many ways of fraud have developed on the web. Maybe your money will go in a liar’s pocket after you have agreed to the deal on a similar but fake website. People’s hatred of fraud further worsens the situation.
descriptive warrant	depends on describing, narrating, analyzing and even commenting	Because of phrases like “eco-friendly” that are shown on the product packaging, many consumers want to satisfy a sense of environmental responsibility. They tend to ignore the product’s quality, even if its real quality is not as good as is displayed online. But when they later realize that the “eco-friendly” label is nothing but a lie, they feel cheated and betrayed and perhaps will never shop online.

Another way to gain argument sufficiency is realized by argument width and depth. J-width refers to the number of parallel J-ground while J-depth means the number of J-ground at different vertical levels. For example, if there are 3 parallel J-grounds in an essay, its J-width is 3; if there are 2 J-grounds at two vertical levels, the J-depth is 2. It is the same case with O-width and O-depth. Argument width or depth reflects a writer’s preference of structure in proceeding argumentation. Moreover, argument width or depth does not indicate argumentation proficiency or writing proficiency, but the writer’s preference of the way to develop argumentation.

To address the second research question, we focused on qualifier and concession to measure tactfulness. D. Liu and Xiong (2024) expand the denotation of qualifier to cover both the expressions to hedge the claim and the concession to acknowledge the opponents. Considering the EFL writing in this study, we adopted D. Liu et al.’s (2024) concept of concession which could reveal finer features of L2 writing, and separated qualifier in D. Liu and Xiong (2024) into two parts: qualifier and concession. The definitions of argument elements were taken from the two studies, as is shown in Table 4.

Table 4.

Definitions of Elements to Measure Tactfulness.

Element	Definition	Example
Qualifier	Words, phrases, or clauses which can hedge the claim	Online shopping may be an ideal and perfect way of shopping for people in the future.
Concession	Granting the merits of opposing grounds before refuting them	[concession] While security measures have improved significantly over the years, [O-rebuttal] there still exists a small degree of risk for fraudulent activities.

In addition, to measure the quality of qualifier, “qualifier rank” proposed by D. Liu et al. (2024) was also utilized in this study, since qualifier ranks can reveal the degree of tactfulness in expressing claims. Five ranks from 1 to 5 were assigned to the qualifiers. The higher the score, the stronger and tougher the claim sounds, and the less tactful the writer shows.

Rank 1: acknowledging the opposite view + claiming one’s own view+using hedges to moderate the tone (e.g., although this view is reasonable, I think shopping online might have some possible weakness.)

Rank 2: acknowledging the opposite view + claiming one’s own view; or using hedges to moderate the tone (e.g., shopping online might have some possible weakness.)

Rank 3: expressing one’s view in a flat way without any emphatic words or moderate words (e.g., shopping online is dangerous)

Rank 4: making salient the writer’s discourse responsibility (e.g., in my opinion, I hold the view)

Rank 5: adding emphatic words, absolutes and superlatives to the claim (e.g., shopping online is totally a disaster.)

The identification of each Toulmin element was primarily guided by its definition and semantic relations. Linguistic devices indicative of specific components served as supplement. For example, concessions were recognized through linguistic cues that acknowledge opposing perspectives, such as “Admittedly, there are a few concerns regarding online shopping…” The procedure of doing identification and annotation is as follows: claim > J-ground > J-warrant > J-rebuttal > O-ground > O-warrant > O-rebuttal > concession > qualifier. Once an element is identified, annotation with the name and number in brackets is written before it, such as [J-ground 1.1] , [J-warrant 2.1] . The second author and a Ph.D. candidate in linguistics independently scored all essays, and any discrepancies were resolved through discussion to ensure consistency and accuracy.

To ensure the validity of the identification of all the Toulmin elements, the two coders first discussed the definitions and some typical examples to get familiar with these elements, and then coded a sample respectively. The two coders discussed the discrepancies to clarify the interpretation of the coding rubric. Once a consensus was reached, the remaining essays were coded independently. The scoring of the elements to measure sufficiency and to identify the warrant types depended on the presence or absence. If an element was found appear twice in an essay, the frequency or the score was 2. The frequency of each Toulmin element was recorded in Microsoft Excel, with column headings representing the six elements and the far-left column indicating the essay group. The inter-coder reliability was measured by Cohen’s Kappa, and the result for each element is as follows: claim (0.96), J-ground (0.94), J-warrant (0.96), J-rebuttal (0.96), O-ground (0.94), O-warrant (0.95), O-rebuttal (1.00), qualifier (0.88), and concession (0.96). The process of identifying warrant types was conducted in a similar way to the Toulmin elements. The two coders independently reviewed all essays to identify the instances of warrants. Each identified warrant was annotated with its corresponding type and a numerical designation in brackets (e.g., [Evaluative Warrant 1] , [Descriptive Warrant 2] ). The inter-rater reliability before negotiation was 0.89. Any disagreements were systematically reviewed and resolved through discussion until full agreement was achieved. Finally, the total number of each element in each essay was collected as the frequency of this element, so was the frequency of each type of warrant. Because AI-generated essays were systematically longer, all counts of Toulmin elements were normalized for essay length. For each essay, the frequency of each element was divided by the total word count of each essay and multiplied by 100 to yield a rate per 100 words.

After the data collection, Mann-Whitney U tests and Wilcoxon signed-rank tests were employed to do comparisons because our sample size was not large and the data were continuous. Another reason is that the two types of tests do not require the normal distribution of data (it is almost impossible for the studies focused on logical and rhetoric levels to produce the data of normal distribution. The Mann-Whitney U tests were conducted for inter-group comparisons in the use of Toulmin elements and warrant types, whereas the Wilcoxon signed-rank tests were applied to intra-group comparisons between argument width and depth. Although the Mann-Whitney U and Wilcoxon signed-rank tests operate on ranks, descriptive statistics such as the mean (M) and standard deviation (SD) of the raw frequency scores were also reported to provide a clearer summary of central tendency and dispersion within each group. Spearman correlation analyses were used to find out the relations between the Toulmin elements and argument width and depth, and hence to explore the contributing factors to argument width and depth. All analyses were performed via SPSS version 27.

Research Results

By comparing the argumentative essays written by the EFL learners with those generated by ChatGPT, the study investigates whether ChatGPT can argue more sufficiently and tactfully than the EFL learners, and whether they have the same way to gain argument sufficiency.

Argument Elements to Gain Sufficiency

As can be seen in Table 5, no elements other than J-warrants and O-ground display significant differences between the two groups of essays. Whereas the writing features of both groups exhibit similarities, ChatGPT demonstrates greater proficiency in constructing O-grounds and in linking J-grounds to claims. In contrast, EFL students tend to leave a significantly higher proportion of J-grounds unlinked.

Table 5.

Comparison of Sufficiency Elements Between the Two Groups.

Components	EFL essays		ChatGPT essays		Z-value	r
Components	Mean	SD	Mean	SD	Z-value	r
J-ground	1.03	0.39	1.16	0.29	1.34	.43
J-warrant	0.55	0.33	1.06	0.42	4.30***	.56
J-rebuttal	0.07	0.20	0.09	0.18	0.78	.10
O-ground	0.69	0.26	0.49	0.26	3.40***	.44
O-warrant	0.25	0.32	0.28	0.27	0.53	.07
O-rebuttal	0.04	0.12	0.04	0.11	0.01	.00

***

p < .001.

When it comes to warrant types, Table 6 reveals that among all the five types, both EFL learners and ChatGPT predominantly used empirical warrant. Aside from this preference, they both used only a limited number of warrants of other types, particularly a priori warrant. However, the differences in the use of evaluative and descriptive warrants between the two groups were statically significant, showing that EFL learners were better at employing evaluative warrant and descriptive warrant to connect grounds to claims.

Table 6.

Comparison of warrant types between EFL Essays and ChatGPT Essays.

		A priori	Empirical	Institutional	Evaluative	Descriptive
EFL essays	Frequency	1	42	2	11	8
EFL essays	Mean	0.03	1.40	0.07	0.37	0.27
ChatGPT essays	Frequency	0	114	2	1	1
ChatGPT essays	Mean	0.00	3.80	0.07	2.98	0.03
Z-value		1.00	5.51***	0.55	2.98**	2.51*
r		.13	.71	.07	.38	.32

p < .05. **p < .01. ***p < .001.

Table 7 demonstrates that both EFL students and ChatGPT had statistical difference between J-width and J-depth as well as between O-width and O-depth. Both groups show a preference for J-width and O-width, relying more on J-grounds and O-grounds to support claims.

Table 7.

Comparisons Between Argument Width and Depth.

Components	EFL essays		ChatGPT essays
Components	Mean	SD	Mean	SD
J-width	1.09	0.40	1.08	0.39
J-depth	0.41	0.08	0.49	0.21
Z-value	4.46***		4.07***
r	0.81		0.74
O-width	0.67	0.27	0.49	0.28
O-depth	0.38	0.13	0.29	0.15
Z-value	3.97***		3.52***
r	0.72		0.64

***

p < .001.

To investigate the relationship between Toulmin elements and argument width and depth in ChatGPT-generated and EFL students-produced essays, Spearman correlation analyses were conducted. As presented in Table 8, the EFL students’ J-ground exhibits the most significant positive correlation with J-width (r_s = .88), and O-ground also positively correlates with O-width (r_s = .83) and O-depth (r_s = .78). In the ChatGPT essays, J-ground and J-rebuttal exhibit significant positive correlations with J-width (r_s = 0.48, r_s = 0.60). J-warrant is highly correlated with J-depth (r_s = 0.52). Moreover, O-ground and O-warrant show highly positive correlations with O-width (r_s = 0.91, r_s = 0.65) and O-depth (r_s = 0.78, r_s = 0.52). Qualifier demonstrates highly positive correlations with O-width (r_s = 0.52) but weak correlations with J-depth and O-depth.

Table 8.

Correlation Coefficients Between Toulmin Elements and Argument Width and Depth.

Toulmin components	EFL essays				Chat GPT essays
Toulmin components	J-width	J-depth	O-width	O-depth	J-width	J-depth	O-width	O-depth
J-ground	0.88**	0.36*	0.30	−0.26	0.48**	0.19	−0.31	−0.41*
J-warrant	−0.33	0.06	0.12	0.18	−0.20	0.52**	−0.28	−0.40*
J-rebuttal	0.20	−0.20	−0.08	−0.04	0.60**	−0.01	0.15	0.01
O-ground	0.05	−0.30	0.83**	0.78**	−0.60	0.04	0.91**	0.78**
O-warrant	−0.38*	−0.24	−0.06	0.13	−0.05	0.10	0.65**	0.52**
O-rebuttal	0.10	−0.11	0.00	0.00	−0.31	0.09	−0.07	0.10
Qualifier	0.17	0.31	−0.38*	−0.30	0.09	−0.06	0.52**	0.44*
Concession	0.09	−0.13	−0.01	0.09	−0.34	0.02	−0.09	0.22

p < .05. **p < .01.

Tactfulness

Concession and qualifier are treated as two indicators of tactfulness in this study. Table 9 shows that more concessions were found in EFL essays than in ChatGPT-generated ones and the inter-group diferences were statistically significant (p < 0.05 and p < 0.01), but the frequency of qualifier was vice versa. Despite the controversial results of tactfulness, further comparison of the qualifier ranks in Table 10 provides more information to clarify the tactfulness. The mean rank of qualifier in EFL essays is significantly lower than that in the ChatGPT-generated ones (3.07 < 3.53). The two tables display a complete picture of the use of qualifier: ChatGPT preferred to use more qualifiers to restrict the claims but the qualifier quality is not high. It tended to use more expressions to highlight the writer’s discourse responsibility. The proportion of the first two ranks in the ChatGPT essays is lower than the EFL ones. In addition, most of the EFL students tended to use the first two ranks of qualifier since Table 10 displays that the standard deviation in the group is much lower.

Table 9.

Comparison of Tactfulness Elements Between the Two Groups.

Components	EFL essays		ChatGPT essays
Components	Mean	SD	Mean	SD	Z-value	r
Qualifier	0.06	0.15	0.16	0.17	2.06*	.27
Concession	0.23	0.20	0.11	0.16	2.71**	.35

p < .05. **p < .01.

Table 10.

Comparison of Qualifier Ranks Between the Two Groups.

	EFL essays		ChatGPT essays
Indicator	Mean	SD	Mean	SD	Z value	r
Qualifier ranks	3.07	0.37	3.53	0.82	2.92**	.38

p < .01.

Discussion

Argument Sufficiency and the Way to Gain Sufficiency

As is shown in the research results, ChatGPT can justify the claim more sufficiently than the EFL learners by providing significantly more O-grounds and J-warrants. In Herbold et al.’s (2023) study, ChatGPT performs better in writing argumentative essays than the EFL/ESL high school students especially in scientific language realized by nominalizations and sentence complexity. But it is unknown how ChatGPT outperforms human writers other than the lexical and syntactical features. Our study lends support to them by analyzing the argument elements. Higher frequencies of J-warrants and O-grounds indicate more argument sufficiency in justification and opposition, and hence more persuasiveness, as is suggested or proved in some related studies like Crammond (1998), D. Liu and Xiong (2024), and Liu and Wan (2020). In those studies, the expert or proficient writers used more J-warrants than the younger or less proficient students. In addition, the definition of J-ground in our study is loyal to Toulmin model, thus different from the previous studies which only count in examples and evidence. To put it another way, the concept “ground” in Toulmin’s theory is broader so that more J-grounds have been found in our study.

Both groups of essays show greater J-width than J-depth, and greater O-width than O-depth. As S. Yang et al. (2024, p. 1) comment, AI-generated essays are simplistic in organization only to present “a list of ideas.” Our findings partly concord to them in AI’s preference of argument type. Differing from their negative attitude, we treat this as a kind of preference at logical level. However, this preference is shared by the EFL students in our study. This divergence may be ascribed to the different proficiencies of the human writers and the different data sources. The human writers in the study of S. Yang et al. (2024) were three professional IELTS teachers experienced in teaching IELTS writing featured by argumentation; in contrast, the human writers in our study were EFL students far less proficient in English argumentation. Moreover, the 50 essays written by only three teachers might bear more obvious idiosyncratic features than the essays written by more different students. Thus, it might be hard to generalize the features of human-written essays, as the expert-written essays surpass not only those produced by AI and EFL students, but even some other teachers.

The quantity of ground (including J-ground and O-ground) in both groups contributes far more to argument width than to depth. As is shown in the result section of this study, J-ground in both groups correlates with J-width, and O-ground is highly correlated with O-width. However, the two groups differ in the elements affecting argument width. For the ChatGPT essays, J-rebuttal also correlates with J-width, and O-warrant and qualifier are closely related with O-width as well. The argument width is influenced by many elements, which suggests the orderliness and regularity in text organization. The ChatGPT might have been input considerable amount of the material of the English rhetoric and argumentation, through which it has acquired the convention and accordingly can produce English-style essays mechanically. In contrast, the argument width of the EFL students is affected mainly by the element of ground. The Chinese EFL students have learned to provide abundant examples, allusions, anecdotes, and evidence to support and illustrate their claims, whether in Chinese or English writing classes and textbooks. Sufficient grounds are deemed as an important criterion of good argumentation by the college students (D. Liu & Huang, 2021) and textbook writers (D. Liu & Xiong, 2024). It is reasonable in this context that the element of ground acts as the major contributing factor to argument width in the EFL essays.

With regard to the other way to gain argument sufficiency, EFL learners employed much more descriptive warrants than ChatGPT. The descriptive warrant widely used by the Chinese writers and treated as a particular feature in Chinese argumentation, reveals homiletic reasoning (D. Liu & Huang, 2021; D. Liu & Xiong, 2024). The writers subjectively, personally, and even emotionally explain their statements or evidence (J-ground), relying on imagination and association. Argumentation mixed with narration and description (jia xu jia yi) is considered as more persuasive, lively and interesting. For example,

Because of phrases like “eco-friendly” that are shown on the product packaging, many consumers want to satisfy a sense of environmental responsibility. They tend to ignore the product’s quality, even if its real quality is not as good as is displayed online. But when they later realize that the “eco-friendly” label is nothing but a lie, they feel cheated and betrayed and perhaps will never shop online.

The writer described the process of consumers’ psycho change vividly by depicting the details, using parallelism and emphatic expressions (e.g., nothing but, even if, will never). We cannot assert that the EFL students have transferred the Chinese writing strategies. However, they have stored both Chinese and English writing conventions in their mind, and they are at “rhetorical borderlands” (Mao, 2006, p. 79) of Chinese and English. It is highly possible for them to resort to Chinese rhetorical strategies.

The ChatGPT failed to follow the way of the EFL students to bridge the grounds to the claims. Much more empirical and evaluative warrants were presented in ChatGPT essays than in the EFL essays. Our research results also lend support to the findings of Jiang and Hyland (2025) from logical reasoning perspective. They have found that the student writers’ essays are injected with more interactive and personal touch, while the ChatGPT essays show limited use of personal asides and questions that are identified by Hyland as important for personal involvement. S. Yang et al. (2024) have the same finding that AI used fewer modal adjuncts to interact with the readers. The fact that AI can generate coherent text but fails to understand and adapt to the context in the way human writers do is attributed to the way of AI training: AI produces text based on statistical patterns drawn from the big training data (Byrd, 2023; Jiang & Hyland, 2025). To summarize, if writer-reader interaction, personal involvement and subjectivity are viewed as efficient strategies of pathos appeal, AI-produced essays are deficient at both logical and lexical levels.

Tactfulness in Argumentation

Both qualifier and concession have weakening effect. Qualifier enables a claim less subjective, and concession renders refutation less straightforward. More concessions are found in the EFL essays than in the ChatGPT-generated ones although the ChatGPT uses more qualifiers to restrict the claims. Moreover, concession has no correlation with argument width or depth, which indicates the independence of this element. Different from our finding, S. Yang et al. (2024, p. 1) contend that “the machine tends to use more concession signals to repeat information.” The incongruous results might be attributed to the divergence in the definition of concession. S. Yang et al. (2024) focused on lexical level and identify concession by the word while. We define concession in this study as acknowledging opposing grounds. Concession can be various expressions besides the clause beginning with the word while. Take for example, that sounds reasonable; there is something in it; I agree with them on this point. These expressions convey new information with positive attitude, instead of repeating the given information. When the writers show identification with the readers who hold different views, rather than refute them immediately, it is more likely to persuade the readers successfully to accept their claims.

Human writers in this study are more apt to consider the readers’ emotional needs and take tactful strategies. D. Liu and Xiong (2024) find that the Chinese writers who are more cognitively mature and socially experienced are more skillful at regulating straightforwardness by propriety, since straightforwardness in communication is much possibly viewed as rudeness and hence cause interpersonal conflict. The EFL students have acquired the social convention during the process of socialization and could apply it to writing argumentation. In contrast, ChatGPT might have not been trained adequately in social aspect so that it is not adept at employing the tactful concession strategy.

The other factor which may increase tactfulness is the use of qualifier to moderate the tone of the claim. The ChatGPT essays display more qualifiers to make the claims objective and rigorous. AI-generated language is found in previous studies to be more accurate in grammar (Markey et al., 2024; Mizumoto et al., 2024), rich in vocabulary (Herbold et al., 2023), and even objective and formal in style ( Dilai & Dilai, 2025; Jiang & Hyland, 2025; S. Yang et al., 2024). Objectivity is deemed negatively in those studies as impersonal and less readability. However, our finding differs from theirs in that we only consider Toulmin qualifier in the claim, not the tentative word or modal word in the whole essay. Objectivity revealed by qualifier in argumentation is regarded as a style encouraged in argumentative writing or discussion (Toulmin et al., 1978). The use of qualifier indicates that the writers have already examined or at least considered different views in their mind, rather than only concentrating on my-side view, and accordingly draw a conclusion with probability instead of certainty. A claim without qualifier sounds arbitrary and subjective, and might undermine the justification.

Tactfulness is complicated in this study. Neither group is perfectly tactful, at least less tactful than the participants in D. Liu et al.’s (2024) study. It could be attributed to their experiment which was conducted to teach the tactful strategies, activate and reinforce the schema. In our study AI is even less tactful than the EFL students despite its higher frequency of qualifier. It is speculated that AI is more obedient to the instructions in the training so that ChatGPT essays are more complete in form (e.g., more uses of qualifier). In addition, human children usually take years to acquire a certain culture including its rhetoric, convention and pragmatic use. During this process they get socialized gradually. Once they have acquired the culture, it can be deeply rooted and even affect the foreign language writing, for example rhetoric transference, translingual writing, and hybrid rhetoric. However, AI needs a great amount of input and training before showing some culture-specific features. At present the input and training imposed on AI is within the western rhetorical framework. It is different from human learners who first begin with the mother language and then learn the foreign language. When the human learners write in the foreign language they can employ the rhetoric strategies acquired in their mother language and even adjust the strategies to the contexts. It might constitute the weakness of AI at present.

Conclusion

Focusing on logical reasoning and rhetorical strategy, this study compared the argumentative essays written by the EFL students and ChatGPT. The following conclusions can be drawn tentatively. Firstly, the ChatGPT outperformed the EFL students in argument sufficiency. It did not use totally the same way to achieve argument sufficiency as the human writers: the ChatGPT relied almost solely on empirical warrants. Secondly, the ChatGPT in general presented less tactfulness than the EFL students, despite its higher frequency of qualifier.

Some previous studies have found that although AI-generated texts are useful for producing structured and coherent content, they fall short in areas requiring human creativity, originality, ethical accountability, and nuanced voice. Our study provides more supporting evidence: AI can imitate human writers’ preference in argument style and excels in logical reasoning but weaker in adapting rhetorical strategy to the context and readers. Human writers have emotions and can hardly refrain from revealing emotions in communication. Especially in certain cultures, writers are encouraged to appeal to emotion to enhance persuasiveness. Modern English writing conventions lay more weight on ration and logical reasoning rather than emotional appeal, for example in the textbooks for writing compiled by Langan (2007), Kanar (2011), and Johnson-Sheehan and Paine (2013). When input enough materials, AI can perform mechanically according to the instructions, regardless of the rhetorical aspect.

This study has significant pedagogical implications, particularly for teachers of writing. As AI tools become more integrated into educational environments, instructors face the challenge of determining how to effectively incorporate AI into their teaching practices. This study offers practical insights into how AI can be used to complement traditional teaching methods, helping students develop and enhance their writing skills. To be specific, as AI excels in gaining argument sufficiency by generating more grounds and warrants, teachers may consider using AI to open a topic, broaden the students’ view and enrich their ideas in argumentation training and practice. However, students might be dissuaded and even restricted to rely on AI so heavily as to produce some stiff texts lacking persuasion since AI shows less tactfulness in argumentation. Another implication for teaching consists in the identification of plagiarism. It would be easier for teachers to differentiate AI-generated texts from human-written ones in terms of tactfulness in argumentation. In summary, teachers will be motivated to reflect on how to best integrate AI in their classrooms, ensuring that their students not only benefit from technological advancements but also continue to cultivate their communicative abilities.

The major limitation in this study is the small sample size. Only 60 essays were selected from the corpora although this sample size aligns with those used in previous corpus-based and discourse-analytic studies of argumentative writing. To guarantee the validity of group comparisons, all essays were strictly evaluated under a standardized scoring rubric, and the detailed quantitative analyses were conducted in this study in the hope for providing a robust basis for the findings reported. However, a larger sample will be more statistically powerful. It is recommended that more corpora should be built and utilized to do such comparisons between AI and human writers.

Footnotes

Appendix 1 Acknowledgements

Thanks go to Miss Yuzhu Wei who is a PhD student in Southeast University and helped in our data collection.

ORCID iD

Donghong Liu

Ethical Considerations

This study obtained ethical approval from the Research Ethics Committee of the School of Foreign Languages in Southeast University. All methods performed in the study were carried out in accordance with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. This study used two parallel open corpora and involved no human or animal studies.

Author Contribution

Donghong Liu: Conceptualization, Methodology, Writing—Original Draft, Writing—Review & Editing, Supervision, Funding acquisition. Pengfei Wu: Methodology, Validation, Formal analysis, Investigation, Writing—Original Draft. All authors have read and agreed to the revised version of the manuscript.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study is supported by the National Social Science Fund of China (Grant No. 21FYYB016).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data source used in this study can be accessed at http://114.251.154.212/cqp/teccl/ and . Raw data supporting this study’s findings are available from the corresponding author, upon reasonable request.

References

Abdollahzadeh

Amini Farsani

Beikmohammadi

(2017). Argumentative writing behavior of graduate EFL learners. Argumentation, 31(4), 641–661. https://doi.org/10.1007/s10503-016-9415-5

Al Fraidan

(2025). AI and uncertain motivation: Hidden allies that impact EFL argumentative essays using the Toulmin Model. Acta Psychologica, 252, 104684. https://doi.org/10.1016/j.actpsy.2024.104684

Berger

R. E.

(2014). A scientific approach to writing for engineers and scientists. Wiley.

Byrd

(2023). Truth-telling: Critical inquiries on LLMs and the corpus texts that train them. Composition Studies, 51(1), 135–142.

Chatterjee

Dethlefs

(2023). This new conversational AI model can be your friend, philosopher, and guide … and even your worst enemy. Patterns, 4(1), 100676. https://doi.org/10.1016/j.patter.2022.100676

Cheng

Chen

(2009). Taiwanese argumentation skills: Contrastive rhetoric perspective. Taiwan International ESP Journal, 1(1), 23–50.

Crammond

J. G.

(1998). The uses and complexity of argument structures in expert and student persuasive writing. Written Communication, 15(2), 230–268. https://doi.org/10.1177/0741088398015002004

Dilai

(2025). AI vs humans: A corpus-based study of English academic writing. In Banas

Wlazlak

(Eds.), Language, Technology, humanities in Society 5.0 (Vol. 11, pp. 145–160). V & R Unipress.

Guo

Pan

Lai

(2024). Effects of an AI-supported approach to peer feedback on university EFL students’ feedback quality and writing ability. Internet and Higher Education, 63, 100962. https://doi.org/10.1016/j.iheduc.2024.100962

10.

Herbold

Hautli-Janisz

Heuer

Kikteva

Trautsch

(2023). A large-scale comparison of human-written versus ChatGPT-generated essays. Scientific Reports, 13(1), 18617. https://doi.org/10.1038/s41598-023-45644-9

11.

(2024). The metaphor of AI in writing in English: A reflection on EFL learners’ motivation to write, enjoyment of writing, academic buoyancy, and academic success in writing. International Review of Research in Open and Distributed Learning, 25(3), 271–286. https://doi.org/10.19173/irrodl.v25i3.7769

12.

Jiang

Hyland

(2025). Does ChatGPT write like a student? Engagement markers in argumentative essays. Written Communication, 42(3), 463–492. https://doi.org/10.1177/07410883251328311

13.

Johnson

R. H.

Blair

J. A.

(1994). Logical Self-defense. McGraw Hill.

14.

Johnson-Sheehan

Paine

(2013). Writing today. Pearson Education Inc.

15.

Kanar

C. C.

(2011). The college writer. Beijing University Press.

16.

Langan

(2007). College writing skills with readings. Foreign Language Teaching and Research Press.

17.

Liu

Gan

(2024). Activating argumentation schema to write argumentatively and tactfully. Acta Psychologica, 246, 1–10. https://doi.org/10.1016/j.actpsy.2024.104256

18.

Liu

Huang

(2021). Rhetoric construction of Chinese expository essays: Implications for EFL composition instruction. Sage Open, 11(1), 1–10. https://doi.org/10.1177/2158244020988518

19.

Liu

Wan

(2020). What makes proficient writers’ essays more persuasive? A Toulmin perspective. International Journal of TESOL Studies, 2(1), 1–13. https://doi.org/10.46451/ijts.2020.06.01

20.

Liu

Xiong

(2024). Keeping balance between loyalty and modification: A Toulminian model as analytical framework. Humanities and Social Sciences Communications, 11(1), 1–11. https://doi.org/10.1057/s41599-024-03151-w

21.

Liu

Stapleton

(2014). Counterargumentation and the cultivation of critical thinking in argumentative writing: Investigating washback from a high-stakes test. System, 45, 117–128. https://doi.org/10.1016/j.system.2014.05.005

22.

Liu

Stapleton

(2020). Counterargumentation at the primary level: An intervention study investigating the argumentative writing of second language learners. System, 89, 1–15. https://doi.org/10.1016/j.system.2019.102198

23.

Cho

S. J.

(2023). Empowering Chinese language learners from low-income families to improve their Chinese writing with ChatGPT’s assistance afterschool. Languages, 8(4), 238. https://doi.org/10.3390/languages8040238

24.

McCann

(1989). Student argumentative writing knowledge and ability at three grade levels. Research in the Teaching of English, 23, 62–76.

25.

Majidi

A. E.

Janssen

de Graaff

(2021). The effects of in-class debates on argumentation skills in second language education. System, 101, 1–15. https://doi.org/10.1016/j.system.2021.102576

26.

Mao

(2006). Reading Chinese Fortune Cookie: The making of Chinese American rhetoric. Utah State University Press.

27.

Markey

Brown

D. W.

Laudenbach

Kohler

(2024). Dense and disconnected: Analyzing the sedimented style of ChatGPT-generated text at scale. Written Communication, 41, 571–600. https://doi.org/10.1177/07410883241263528

28.

Mizumoto

Yasuda

Tamura

(2024). Identifying ChatGPT-generated texts in EFL students’ writing: Through comparative analysis of linguistic fingerprints. Applied Corpus Linguistics, 4(3), 100106. https://doi.org/10.1016/j.acorp.2024.100106

29.

Nazari

Shabbir

M. S.

Setiawan

(2021). Application of Artificial Intelligence powered digital writing assistant in higher education: Randomized controlled trial. Heliyon, 7(5), e07014. https://doi.org/10.1016/j.heliyon.2021.e07014

30.

Nussbaum

E. M.

Kardash

C. A. M.

(2005). The effects of goal instructions and text on the generation of counterarguments during writing. Journal of Educational Psychology, 97, 157–169.

31.

Qin

Karabacak

(2010). The analysis of Toulmin elements in Chinese EFL university argumentative writing. System, 38(3), 444–456. https://doi.org/10.1016/j.system.2010.06.012

32.

Song

(2023). Enhancing academic writing skills and motivation: Assessing the efficacy of ChatGPT in AI-assisted language learning for EFL students. Frontiers in Psychology, 14, 1–14. https://doi.org/10.3389/fpsyg.2023.1260843

33.

Stapleton

(2015). Assessing the quality of arguments in students’ persuasive writing: A case study analyzing the relationship between surface structure and substance. Journal of English for Academic Purposes, 17, 12–23. https://doi.org/10.1016/j.jeap.2014.11.006

34.

Toulmin

(1958). The uses of argument. Cambridge University Press.

35.

Toulmin

(Ed.). (1999). The uses of argument (updated ed.). Cambridge University Press.

36.

Toulmin

Rieke

Janik

(1978). An Introduction to reasoning. Macmillan Publishing Co, Inc.

37.

Van Eemeren

F. H.

Henkemans

A. F. S.

(2017). Argumentation: Analysis and evaluation. Routledge Taylor and Francis.

38.

Wei

(2023). Artificial intelligence in language instruction: Impact on English learning achievement, L2 motivation, and self-regulated learning. Frontiers in Psychology, 14, 1–14. https://doi.org/10.3389/fpsyg.2023.1261955

39.

Xue

(2015). Ten-thousand English compositions of Chinese learners (The TECCL corpus), version 1.1. The National Research Centre for Foreign Language Education, Beijing Foreign Studies University.

40.

Sun

(2023). aiTECCL: An AIGC English Essay Corpus. National Research Centre for Foreign Language Education, Beijing Foreign Studies University.

41.

Yang

(2022). An empirical study of claims and qualifiers in ESL students’ argumentative writing based on Toulmin model. Asian-Pacific Journal of Second and Foreign Language Education, 7(1), 1–17. https://doi.org/10.1186/s40862-022-00133-w

42.

Yang

Chen

Zhu

Lin

Wang

(2024). A comparative study of thematic choices and thematic progression patterns in human-written and AI-generated texts. System, 126, 103494. https://doi.org/10.1016/j.system.2024.103494

Can AI Argue More Sufficiently and Tactfully?

Abstract

Keywords

Introduction

Argument Sufficiency and Tactfulness in Argumentation

Argument Sufficiency

Tactfulness

Comparative Studies on AI-Generated and Human-Written Texts

Methods

Research Questions

Data Source

Data Collection and Analyses

Research Results

Argument Elements to Gain Sufficiency

Tactfulness

Discussion

Argument Sufficiency and the Way to Gain Sufficiency

Tactfulness in Argumentation

Conclusion

Footnotes

Appendix 1

Acknowledgements

ORCID iD

Ethical Considerations

Author Contribution

Funding

Declaration of Conflicting Interests

Data Availability Statement

References