Sage Journals: Discover world-class research

Abstract

AI technologies play a transformative role, making the clear communication of related advancements increasingly critical. Yet, the use of metadiscourse in technology news remains underexplored, particularly in how linguistic strategies are tailored to diverse audiences. This study investigates metadiscourse features in AI-related news articles, analyzing their interactive and interactional dimensions across headlines, subheadings, and body text. Using a corpus of 1,084 MIT News articles from 2015 to 2024, and applying Hyland’s metadiscourse framework, the study reveals distinct patterns across sections. Interactional features, such as hedges, boosters, and engagement markers, are most prevalent in the body, where they balance authority and audience connection, while the sub-header exhibits the highest relative density of such features. Interactive markers are most concentrated in sub-headers, with transitions dominating across all sections to aid coherence and reader orientation. Correlation analysis shows the strongest co-occurrence of both dimensions in the body, supporting complex argumentation, while headers and sub-headers emphasize clarity over the author’s stance. Temporal analysis indicates a marked decline in both metadiscourse dimensions during the 10-year period, reflecting a shift toward more neutral, concise, and standardized reporting shaped by digital reading habits and AI-assisted writing tools. These findings highlight the adaptive rhetorical strategies in AI journalism and demonstrate the importance of metadiscourse in framing information for effective public understanding.

Keywords

metadiscourse analysis discourse analysis AI news AI technology communication linguistic strategies

Introduction

In an era of rapid technological advancement, journalism covering AI (artificial intelligence) and emerging technologies plays a vibrant role in shaping public understanding of complex scientific developments. This news not only reports innovations but also interprets technical content for non-specialist readers, making rhetorical clarity and audience engagement essential. Within this context, metadiscourse, the linguistic devices used to organize discourse and engage readers, serves as a critical tool for managing complexity and projecting stance (Hyland, 2005; Hyland & Tse, 2004).

One of the most applied models of metadiscourse is Hyland’s (2005) interpersonal framework, which distinguishes between two complementary dimensions: interactive and interactional resources. Interactive metadiscourse helps writers organize information and guide readers through the text using transitions, frame markers, and evidentials. Interactional metadiscourse, in contrast, allows writers to express stance and engage readers through features such as hedges, boosters, self-mentions, and engagement markers. This model provides a comprehensive means of analyzing how writers manage both textual organization and reader interaction.

Building on Hyland’s (2005) model, recent surveys have mapped the expansion of metadiscourse research across genres, disciplines, and languages. Hyland et al. (2022), using a large-scale bibliometric analysis of 431 studies published between 1983 and 2020, show that the concept has become a central tool in understanding written communication, particularly in academic and business contexts. They also highlight a marked increase in studies focusing on interactional features, such as stance and engagement, alongside a growing attention to cross-disciplinary and corpus-based approaches. Complementing this, S. Wang (2025) provides a cross-cultural overview, in a recent survey, of metadiscourse in academic, media, and business discourse. Her review emphasizes that while academic texts remain the most studied genre, research on media discourse, including editorials and commentaries, has begun to reveal important cultural and rhetorical variation, particularly in how metadiscourse is used to construct identity and manage audience engagement. Despite these developments, the analysis of metadiscourse in journalism, particularly in news reporting, remains comparatively underrepresented when contrasted with other genres such as academic writing. In addition, metadiscourse has been examined in a variety of media genres, including commentaries and editorials, for example Al-Subhi (2023) and Chen and Li (2023), journalistic discourse in high-technology domains such as AI remains underexplored.

News reporting on artificial intelligence presents distinctive communicative challenges, as it seeks to translate complex and often abstract scientific concepts into accessible narratives while addressing ethical, social, and epistemic uncertainties. Ouchchy et al. (2020) note that media representations of AI often foreground ethical issues such as accountability, transparency, and societal impact. Similarly, Nguyen and Hekman (2024) emphasize that journalistic portrayals of AI are shaped by framing practices that influence how automation and intelligent systems are understood by the public.

Moreover, previous research has rarely investigated metadiscourse variation across the internal structure of news articles. Headlines, sub-headlines, and article bodies perform distinct rhetorical functions—capturing attention, bridging or summarizing key ideas, and elaborating content respectively—but most studies have treated news texts as homogeneous wholes. For instance, Aszeli et al. (2021) applied Hyland’s model to news articles on the impact of COVID-19 on education without differentiating among textual sections, while Finkbeiner (2024) highlights the distinct pragmatic and rhetorical constraints that characterize headlines. These observations emphasize that the linguistic mechanisms through which metadiscourse operates across sections, headline, sub-headline, and body, remain insufficiently explored in current research.

Recent studies have also highlighted the importance of diachronic perspectives in understanding shifts in communicative norms. For instance, Aghdam et al. (2025) documented temporal variation in metadiscourse markers across research articles, while H. Wang and Hu (2023) examined changes in self-mention and stance in academic prose. However, such longitudinal approaches have yet to be applied to technology journalism. Given AI’s evolution from a niche scientific topic to a mainstream subject of public discourse, exploring metadiscourse diachronically can reveal how journalistic strategies have adapted to changing norms, audience expectations, and societal debates. Collectively, these topical, structural, and temporal gaps highlight the need for a systematic investigation of metadiscourse in AI news reporting.

This study therefore investigates how metadiscourse devices are used in news articles about AI technologies, focusing exclusively on this genre without comparisons to other types of reporting. AI news articles must balance technical complexity with accessibility, making metadiscourse a key resource for effective communication with diverse audiences. The central objective of this study is to analyze how interactional and interactive metadiscourse features are employed across different sections of news articles, namely, the headline, sub-headline, and body. Each section performs a distinct communicative function: headlines capture attention, sub-headlines bridge or summarize key ideas, and article bodies deliver the full content. Examining metadiscourse across these sections reveals how writers structure information, express stance, and engage readers in distinctive ways. Additionally, by examining changes in metadiscourse use over a 10-year period (2015–2024), this study traces how journalistic strategies have evolved in response to shifting norms in science communication and broader trends in public discourse.

To fulfill these aims, the study addresses the following four research questions:

- Interactional Dimension (RQ1): How do AI-related articles use interactional metadiscourse strategies, across different sections of news articles, to convey authorial stance and promote audience engagement?

- Interactive Dimension (RQ2): How are interactive metadiscourse features used in AI-related articles, across different sections of news articles, to organize information and aid reader comprehension?

- Correlated Relation (RQ3): How do interactional and interactive metadiscourse dimensions correlate across different sections of news articles about AI technologies?

- Temporal Changes (RQ4): How does the use of metadiscourse devices in both dimensions, interactive and interactional, evolve in news articles on AI technologies from 2015 until 2024?

In light of the research aim and questions, this study makes four main contributions. It examines the use of metadiscourse in AI-related news, an underexplored genre within media discourse research. It represents the first section-based application of Hyland’s (2005) framework, analyzing metadiscourse patterns across headlines, sub-headlines, and article bodies. It also provides a diachronic perspective (2015–2024) on how metadiscourse practices in AI reporting have evolved over time. Finally, it introduces a validated, tool-assisted methodological approach that enhances the transparency and replicability of large-scale discourse analysis.

The findings of this study are expected to benefit both academic and professional communities. For scholars in linguistics, media studies, and communication, the results offer deeper insights into rhetorical practices in technology-oriented journalism. For journalists and science communicators, the study provides practical guidance for refining reporting strategies and enhancing audience engagement when presenting complex AI topics.

Literature Review

Metadiscourse Framework

Metadiscourse is a concept that has attracted considerable attention across a range of disciplines. Since its emergence in the early 1980s, metadiscourse has gradually evolved into a prominent approach for examining discourse, with a particular emphasis on the analysis of written texts (Hyland et al., 2022). It refers to the different linguistic devices writers employ to organize their texts, engage with readers, and express their attitudes and opinions (Hyland, 2005). As Ädel (2006) points out, metadiscourse is “text about the evolving text, or the writer’s explicit commentary on her own ongoing discourse” (p. 20). The study of metadiscourse has become increasingly important in understanding how writers guide readers through their texts, establish credibility, and create a sense of interaction and engagement (Hyland & Tse, 2004).

Within linguistics, metadiscourse has been examined from several perspectives, including discourse analysis, pragmatics, and applied linguistics (Flowerdew, 2008). One of the earliest systematic frameworks was proposed by Kopple (1985), who identified seven categories of metadiscourse: text connectives, code glosses, illocution markers, validity markers, narrators, attitude markers, and commentary. This model was later refined by Crismore et al. (1993), who simplified the system into two overarching categories: textual and interpersonal metadiscourse.

Over the past four decades, the study of metadiscourse has evolved from structural classifications toward socially oriented models that foreground the relationship between writer and reader. Introduced by Harris (1959) and operationalized by scholars such as Crismore (1983) and Kopple (1985), early frameworks distinguished between propositional content and discourse-level commentary. Later, Mauranen (1993) and Ädel (2008) developed narrower, text-reflexive definitions focusing primarily on organizational cues. In contrast, Hyland’s (2005) interpersonal model reconceptualized metadiscourse as a means of managing both information and interaction, emphasizing its social, audience-aware nature. This model remains one of the most influential and widely applied frameworks in metadiscourse research.

According to Hyland’s (2005) interpersonal model, metadiscourse comprises two complementary dimensions: interactive and interactional. Interactive metadiscourse assists readers in processing the text by guiding them through its structure and argumentation. It includes transitions, frame markers, endophoric markers, evidentials, and code glosses. Interactional metadiscourse, on the other hand, expresses the writer’s stance and engagement with the audience through devices such as hedges, boosters, attitude markers, self-mentions, and engagement markers. Together, these categories capture how writers organize their discourse while simultaneously negotiating their relationship with readers.

The metadiscourse framework has been applied across a wide range of genres, including academic writing (Hyland & Tse, 2004), newspaper discourse (Dafouz-Milne, 2008), social media (Feng et al., 2024), and language teaching (Pérez-Llantada, 2003). Research has also explored disciplinary variation (Hyland, 2005) and cross-cultural differences in metadiscourse use (Dahl, 2004; Mauranen, 1993). Collectively, these studies demonstrate that metadiscourse plays a key role in organizing information, constructing stance, and facilitating interaction between writers and readers. A clear understanding of these features helps explain how writers create coherent, persuasive, and reader-friendly texts across diverse communicative contexts.

A Glance of Metadiscourse Studies on Journalism

Recent years have witnessed a growing linguistic and discourse-analytic interest in journalism and news reporting, reflecting how language both constructs and negotiates social realities. One strand of research focuses on the rhetorical and stylistic strategies employed in news discourse to build authority and engage audiences. For instance, Ben Moussa (2025) examined journalistic metadiscourse in the United Arab Emirates, showing how rhetorical practices are used to construct professional identity and credibility. Similarly, Chen and Li (2023) conducted a corpus-based study of China Daily and The New York Times, analyzing interactional metadiscourse in news commentaries. Their findings reveal that journalists use stance and engagement markers not only to evaluate events but also to guide readers’ interpretations and align them with institutional viewpoints. Following Hyland’s (2005) model, Al-Subhi (2023) investigated metadiscourse and phraseology in newspaper editorials covering the Russia–Ukraine War, demonstrating how interactional resources are strategically employed to express evaluation and to construct ideological positioning. Collectively, these studies highlight the expanding application of metadiscourse analysis to journalism, while also pointing to the need for further investigation of how such features operate across different news genres and textual sections.

Beyond specific case studies, recent reviews have synthesized broader developments in metadiscourse research. S. Wang (2025) provided a comprehensive cross-cultural overview of metadiscourse across academic, media, and business genres, emphasizing that while academic texts remain the most extensively studied, media discourse is gaining increasing attention as a site for examining audience engagement and identity construction. This shift reflects a widening understanding of metadiscourse as a flexible, socially grounded resource that operates across genres and communicative contexts.

In addition, scholars have begun to explore metadiscourse in relation to large-scale digital communication. Aghdam et al. (2025) examined diachronic variations in metadiscourse markers in academic discourse, offering methodological insights for longitudinal analyses of linguistic change, an approach relevant to examining the evolution of journalistic discourse. Similarly, Yang (2025) investigated interactive and interactional metadiscourse in computational linguistics abstracts, illustrating how discourse practices vary across disciplinary and technological domains.

In addition, Chang (2025) examined interactional metadiscourse use across media agency types in online content farms, revealing variation in how stance and engagement are adapted to institutional and audience expectations. Similarly, Hastomo and Aminatun (2023) analyzed metadiscourse markers in online news media, showing that digital journalism favors concise yet rhetorically dense language to maintain coherence and reader engagement.

Taken together, these studies reflect an expanding scholarly interest in the linguistic dimensions of journalism and underscore the central role of metadiscourse in shaping news credibility, stance, and reader engagement. News discourse draws on specific rhetorical resources, such as stance-taking, evidentiality, and engagement markers, to construct authority and guide interpretation. Nevertheless, metadiscourse in specialized domains such as AI journalism remains almost entirely unexplored. To date, no studies have examined how journalists linguistically negotiate the balance between technical complexity and accessibility, or how metadiscourse operates across different textual sections of AI-related news articles. A systematic investigation of these patterns can yield valuable insights into how language mediates between scientific discourse and public understanding in contemporary technology reporting.

Research Design

The study applied a systematic mixed-method research design to identify and analyze metadiscourse features in AI-focused news articles. The process combined corpus compilation, computational annotation, expert validation, and statistical analysis to ensure both linguistic accuracy and methodological rigor. The overall procedures of the study are illustrated in Figure 1, which outlines the three main phases of the research design. The following subsections describe each phase in detail, highlighting how the dataset was constructed, annotated, and quantitatively examined.

Figure 1.

Overview of the research design illustrating the three main phases of the study.

Step 1: Dataset Sampling

The first step involved identifying the criteria for selecting a corpus suitable for metadiscourse analysis in technology-oriented news reporting. As the study examines how artificial intelligence is represented linguistically, the focus was placed on technology-based and institutionally curated news platforms that provide consistent and credible coverage of emerging technologies. Selecting such sources ensures textual uniformity, topical relevance, and methodological reliability in analyzing discourse practices (Bednarek & Caple, 2017; Hunston, 2022). Articles were included if they explicitly discussed AI-related topics, were written in English, and contained clearly defined textual sections—headlines, sub-headlines, and body text—that could be annotated for metadiscourse features. These inclusion criteria establish a transparent and replicable procedure for constructing datasets in similar studies of technology-related journalism.

Step 2: Identification of Metadiscourse Features

Hyland’s (2005) metadiscourse framework was used to categorize metadiscourse features into interactive and interactional dimensions. This framework was selected because it provides a systematic and widely recognized model for examining writer–reader interaction and rhetorical structure in discourse, making it well suited for analyzing journalistic texts (Hyland et al., 2022). Interactive features are used to analyze how authors structure and organize content, and interactional features are used to explore how writers engage readers and express their stance. See Table 2 for the interactive and interactional types of metadiscourse adapted in this study.

A text processing script was developed in Python as a replication of an existing metadiscourse annotation tool originally developed by Berberich and Kleiber (2025), called MetaPak. The script utilized a predefined keyword list provided by the original tool to automatically identify and annotate interactive and interactional metadiscourse features within the MIT-News corpus. These features and their associated markers are summarized in Table 2. The script functions as a keyword-based recognizer, scanning the text for linguistic items associated with each metadiscourse category. Following automatic annotation, a manual verification step was conducted to ensure the accuracy and consistency of the annotations across the dataset.

The results obtained from the annotation tool were further reviewed and evaluated by two linguistic experts who are familiar with Hyland’s framework. In the manual verification step, the annotators were provided with Hyland’s (2005) metadiscourse framework, including its main categories and feature descriptions (as summarized in Table 1). A random sample comprising nearly 5% of the total articles was selected for expert validation. The sample size was chosen to provide a manageable yet representative subset for verification, an manual inspection of an entire 10-year corpus of articles would be infeasible; therefore, a random subset was employed to ensure both feasibility and representativeness, following methodological recommendations in corpus linguistics (Hunston, 2022; McEnery & Hardie, 2011).

Table 1.

Overview of Hyland’s (2005) Metadiscourse Framework, Categorizing Interactional and Interactive Features, With Descriptions and Examples of Their Use in Text.

Category/feature	Description	Examples
Interactional
Attitude markers	Express the writer’s attitudes, feelings, or evaluations toward a proposition.	Unfortunately, surprisingly, importantly, I agree, it is clear that …
Boosters	Indicate certainty or emphasize the writer’s confidence in a statement.	Clearly, obviously, indeed, undoubtedly, of course …
Engagement markers	Directly address or involve readers, promoting a relationship between writer and reader.	Consider, note that, you can see, we should …
Hedges	Mitigate certainty, soften claims, or express caution to allow for alternative viewpoints.	Might, perhaps, possibly, suggests, it seems …
Self-mention	Explicit reference to the author(s) to highlight their presence in the text.	I, we, my, our …
Interactive
Code glosses	Provide additional information or clarification to ensure understanding.	Namely, in other words, such as, that is …
Endophoric markers	Refer to other parts of the text to guide readers to relevant information.	As noted above, see Fig. 2, in the next section …
Evidentials	Indicate the source of information or evidence for a claim.	According to X, as stated by Y, Z argues that …
Frame markers: Announce goals	Indicate the purpose or objective of a section or text.	The aim of this study is, in this paper we argue that …
Frame markers: Label stages	Clarify the structure and stages of the discourse.	First, next, finally, in conclusion …
Frame markers: Sequencing	Indicate the order of points or arguments within the text.	Firstly, secondly, thirdly …
Frame markers: Shift topic	Signal a change in topic or focus within the text.	Turning to, moving on, with respect to …
Transition markers	Help connect ideas by indicating relationships between clauses or sentences.	However, therefore, in addition, consequently, on the other hand …

Each annotation record contained: the article ID, the associated text (headline, sub-headline, or body), the feature category (interactional or interactive), the type of identified marker, and the contextual excerpt where the feature occurred. The experts independently verified each annotation by indicating “Yes” (correct) or “No” (incorrect) for each instance and provide comments if any. Their responses were then compared to calculate inter-rater agreement using Cohen’s Kappa (κ). Any cases of disagreement were subsequently reviewed and discussed to refine category interpretation and ensure the validity of the final dataset.

Step 3: Statistical Analysis Plan

To determine if the differences in the frequency of metadiscourse features across sections (headers, sub-headers, and body) were statistically significant, Chi-Square tests of independence were conducted. This is an essential statistical analysis test to confirm the feature frequencies results obtained for answering RQ1 and RQ2. The test assessed the relationship between two categorical variables: the sections of the articles (headers, sub-headers, and body) and the types of metadiscourse features. Specifically, it was used to evaluate associations between categorical variables (i.e., the different sections of the articles and the types of metadiscourse features, such as hedges, boosters, engagement markers, attitude markers, transitions, etc.). The analysis was performed for all metadiscourse features combined, as well as separately for features from each dimension (interactional and interactive). This made it possible to determine whether the observed variations in the use of these features across sections were due to chance or represented meaningful differences. The Chi-Square test was chosen because it is suitable for evaluating associations between categorical variables (McEnery & Hardie, 2011).

To address RQ3, which examines the relationship between interactional and interactive metadiscourse dimensions, a Pearson correlation analysis was conducted. This method was used to determine whether a statistically significant linear association exists between the frequencies of features from the two dimensions. The analysis was calculated separately for each structural section (i.e., headers, sub-headers, and bodies), allowing for a more fine-grained understanding of how the relationship between these dimensions may vary across different rhetorical zones of the article. Pearson’s correlation coefficient was chosen due to its suitability for evaluating the strength and direction of association between two continuous variables, such as feature frequencies (Gries, 2010; McEnery & Hardie, 2011).

To analyze temporal trends and address RQ4, a linear regression (LR) analysis was conducted to examine the relationship between publication year and the frequency of metadiscourse features. This statistical method was selected because it enables the detection of directional trends over time (e.g., increases or decreases), while also quantifying the strength and statistical significance of those trends. Linear regression is particularly effective for modeling continuous predictors such as time and is widely used in linguistic studies for exploring change across corpora (Speelman, 2014). In this study, the slope coefficients of the regression lines indicate whether the use of metadiscourse features increased, decreased, or remained stable over the 10-year period (2015–2024). By applying this method to both interactional and interactive dimensions of metadiscourse, the analysis provides a comprehensive overview of how rhetorical strategies in AI and technology journalism have evolved over time.

Dataset Presentation

To examine trends in AI technologies reporting, news articles were collected from the MIT News website (cf. https://news.mit.edu/), focusing specifically on those categorized under “Artificial Intelligence” category. The dataset spans from the website’s inception in 1994 through 2024. However, the primary emphasis was placed on articles published between 2015 and 2024.

This 10-year window was chosen because it represents the most dynamic and transformative phase in the evolution of AI and related technologies. In contrast, articles from 1994 to 2014 were excluded due to their limited number, fewer than 30 in total. Including such a small subset would not only provide inadequate representation of the earlier years but could also distort the diachronic analysis by introducing imbalance and noise, thereby reducing the reliability of temporal comparisons.

Web scraping tools implemented in Python, specifically using the Beautiful Soup package (Richardson, 2007), were employed to collect news articles from the MIT News website. Articles were filtered by publication date (2015–2024), and key metadata, such as article titles (with main headers separated from sub-headers), publication dates, authors, and article content (as body), was systematically extracted. The resulting dataset is referred to as the MIT-News corpus, and it serves as the primary resource in the current study for analyzing how communication of AI technologies has evolved over the past decade.

Figure 2 presents the yearly distribution of articles in the MIT-News corpus from 2015 to 2024, offering an overview of publishing frequency and helping to contextualize temporal patterns in metadiscourse usage.

Figure 2.

Number of published articles per year in MIT-News corpus (2015−2024).

Following data collection, lexical profiling was performed automatically using the NLTK package in Python (Bird, 2006) to analyze core linguistic properties of the corpus. This profiling included counts of tokens, sentences, and average sentence length. In this context, a token refers to any discrete unit identified during tokenization, including words, punctuation marks, numbers, or symbols.

Table 2 presents descriptive statistics for the MIT-News corpus covering the period 2015 to 2024, focusing on three structural components of each article: headers, sub-headers, and bodies. The table includes the number of articles published each year, along with total counts of tokens and sentences, and the average sentence length for each section. These statistics help characterize the textual composition of each element and offer insights into their functional roles in news reporting on AI technologies as represented in the sampled dataset (MIT-New corpus).

Table 2.

Linguistic Statistics of the MIT-News Corpus (2015–2024), Showing the Count of Headers, Sub-Headers, and Articles, Along With Tokens, Sentences, and Average Sentence Lengths for Each Section Across the Years.

Year	Header				Sub-header				Body
Year	Count	Tokens	Sent.	Avg.	Count	Tokens	Sent.	Avg.	Count	Tokens	Sent.	Avg.
2015	29	204	29	7.0	29	459	30	15.8	29	23,231	689	801.1
2016	35	235	35	6.7	34	541	35	15.9	35	25,789	819	736.8
2017	60	477	61	8.0	60	1,050	60	17.5	60	52,519	1,633	875.3
2018	98	845	98	8.6	93	1,720	98	18.5	98	93,904	3,025	958.2
2019	130	1,172	133	9.0	128	2,378	132	18.6	130	144,879	4,458	1,114.5
2020	115	1,014	118	8.8	115	2,225	118	19.3	115	129,844	3,982	1,129.1
2021	130	1,157	130	8.9	130	2,711	131	20.9	130	139,213	4,575	1,070.9
2022	145	1,262	147	8.7	145	3,177	149	21.9	145	167,072	5,322	1,152.2
2023	164	1,543	166	9.4	164	3,657	167	22.3	164	171,625	5,880	1,046.5
2024	178	1,744	181	9.8	178	4,217	191	23.7	178	199,420	6,224	1,120.3
Total	1,084	9,653	1,098	8.9	1,076	22,135	1,111	20.6	1,084	1,147,496	36,607	1,058.6

Across the 10-year dataset, the corpus comprises 1,084 articles, including a total of 9,653 tokens in headers, 22,135 tokens in sub-headers, and 1,147, 496 tokens in article bodies. Sentence-level counts include 1,098 sentences in headers, 1,111 in sub-headers, and 36,607 in bodies, corresponding to average sentence lengths (i.e., average number of tokens) of approximately 8.1 tokens (headers), 17.8 tokens (sub-headers), and 1,058.6 tokens per body texts.

The dataset also reveals a steady year-on-year increase in the number and length of articles, particularly after 2018. The number of published articles rose from just 29 in 2015 to 178 in 2024. Correspondingly, the body word count grew from around 23,231 tokens in 2015 to over 199,420 tokens in 2024, indicating both increased reporting volume and richer textual content. These trends reflect the growing prominence of communication about AI technologies in institutional media and suggest an evolution toward greater rhetorical and informational complexity in public science discourse.

Before analyzing the interactional and interactive features, the dataset was first annotated using the replicated tool described in the Research Design section (cf. Step 2) and then manually validated by two linguists familiar with metadiscourse analysis and Hyland’s (2005) framework. The first expert devoted approximately 9 days (around 1.5 hr per day, alongside her teaching duties), while the second spent 4 days (about 2 hr per day) reviewing all randomly sampled articles. In total, 66 articles—representing roughly 5 % of the dataset (approximately 6−7 articles per year)—were validated. These 66 articles contained 2,598 annotated tokens, distributed as 2,480 in the body sections, 88 in sub-headers, and 30 in headers.

Inter-rater reliability, assessed using Cohen’s Kappa (κ), showed a substantial level of agreement between the two experts, with an overall κ = .74. When examined by section, the agreement rates were κ = .70 for body texts, κ = .79 for headers, and κ = .78 for sub-headers. These results indicate a consistently high degree of coding reliability across all sections. Instances of disagreement were examined collaboratively in a joint meeting. Through discussion and cross-checking against Hyland’s framework, the two experts reached consensus and affirmed that the tool’s automated classifications were consistent with the theoretical definitions.

The validation process also provided an opportunity to reflect on the advantages and limitations of applying Hyland’s (2005) metadiscourse framework to journalistic texts. The framework proved highly effective for revealing how writers construct interaction with readers through stance and engagement markers, highlighting authorial voice and uncovering subtle persuasive cues in ostensibly objective reporting. Its clear categorical structure enhanced analytical transparency and replicability, while the dual interactive–interactional dimensions helped to clarify textual organization and rhetorical purpose. Moreover, its flexibility allowed adaptation to different journalistic genres and supported both quantitative coding and qualitative interpretation. However, several limitations were noted. The framework is primarily designed for English discourse and remains descriptive rather than critical, focusing on textual functions rather than ideological or audience effects. It also excludes multimodal elements such as visuals and layout, which can influence meaning in news discourse. Despite these constraints, the framework offered a systematic and insightful means of examining how AI-related news reporting constructs interpersonal relationships and reader engagement.

Results and Discussion

RQ1: Analysis of Features in the Interactional Dimension

As presented in Table 3, the distribution of interactional metadiscourse features across the headers, sub-headers, and body sections of the MIT-News Corpus exhibits distinct patterns in raw occurrences, normalized frequency per 1,000 words, and percentage rates for all features. The body section consistently demonstrates the highest contribution to interactional metadiscourse, while the header and sub-header sections display varying degrees of emphasis depending on the feature.

Table 3.

Distribution of Interactional Metadiscourse Features Across Sections of MIT-News Corpus (Headers, Sub-Headers, and Body), Including Raw Occurrences, Normalized Frequency per 1,000 Tokens, and Percentage Rates.

Category/feature	Header (9,653 tokens)			Sub-header (22,135 tokens)			Body (1,147,496 tokens)
Category/feature	Raw occur.	Per 1,000	Rate (%)	Raw occur.	Per 1,000	Rate (%)	Raw occur.	Per 1,000	Rate (%)
Attitude Markers	7	0.79	1.6	26	1.36	1.8	1,547	1.65	3.8
Boosters	42	4.76	9.7	102	5.32	7.2	5,058	5.39	12.5
Engagement Markers	121	13.71	27.9	240	12.53	17.0	9,106	9.70	22.5
Hedges	55	6.23	12.7	234	12.22	16.6	6,577	7.00	16.3
Self-Mention	31	3.51	7.2	27	1.41	1.9	2,756	2.94	6.8
Interactional (total)	256	29.01	59.1	629	32.84	44.5	25,044	26.67	62.0

For attitude markers, the body contains the highest raw occurrences (1,547), with a normalized frequency of 1.65 per 1,000 words and a percentage rate of 3.8%. In comparison, the sub-header section shows 26 occurrences, a frequency of 1.36 per 1,000 words, and a rate of 1.8%, while the header section records just 7 occurrences, equivalent to a frequency of 0.79 per 1,000 words and a rate of 1.6%. This indicates that attitude markers are more prevalent in the body, where they likely serve to convey evaluations or stances more extensively.

In the case of boosters, the body again dominates with 5,058 occurrences, a frequency of 5.39 per 1,000 words, and a percentage rate of 12.5%. The sub-header section follows with 102 occurrences, a frequency of 5.32 per 1,000 words, and a percentage rate of 7.2%, while the header section has 42 occurrences, a frequency of 4.76 per 1,000 words, and a rate of 9.7%. This trend suggests that boosters are used extensively in the body to emphasize certainty or reinforce arguments.

Engagement markers also show the highest values in the body, with 9,106 occurrences, a frequency of 9.70 per 1,000 words, and a percentage rate of 22.5%. The sub-header section follows with 240 occurrences, a frequency of 12.53 per 1,000 words, and a rate of 17.0%, while the header section records 121 occurrences, equivalent to 13.71 per 1,000 words and a rate of 27.9%. Interestingly, the header section has the highest relative proportion, indicating a focus on directly engaging the reader in introductory contexts.

For hedges, the body once again leads with 6,577 occurrences, a normalized frequency of 7.00 per 1,000 words, and a percentage rate of 16.3%. In contrast, the sub-header section has 234 occurrences, a frequency of 12.22 per 1,000 words, and a rate of 16.6%, while the header section shows 55 occurrences, equivalent to a frequency of 6.23 per 1,000 words and a rate of 12.7%. These results reflect a more cautious tone in the body and sub-header sections, where hedges are likely used to qualify claims or introduce uncertainty.

When examining self-mention, the body continues to dominate, with 2,756 occurrences, a frequency of 2.94 per 1,000 words, and a percentage rate of 6.8%. The sub-header section shows a much lower frequency of 1.41 per 1,000 words, with 27 occurrences and a rate of 1.9%, while the header section records 31 occurrences, a frequency of 3.51 per 1,000 words, and a rate of 7.2%. This suggests that self-mention is most prominent in the body, likely reflecting the author’s role in presenting arguments or findings, while the header section uses self-mention sparingly to introduce the authorial voice.

The overall interactional metadiscourse totals reveal that the body section accounts for the majority of interactional features, with 25,044 occurrences, a normalized frequency of 26.67 per 1,000 words, and a percentage rate of 62.0%. The sub-header section follows with 629 occurrences, a frequency of 32.84 per 1,000 words, and a rate of 44.5%, while the header section shows the lowest totals, with 256 occurrences, a frequency of 29.01 per 1,000 words, and a percentage rate of 59.1%. These results emphasize the body section’s critical role in hosting interactional features, reflecting its function as the main site for engaging the audience, presenting arguments, and reinforcing claims. Meanwhile, the header and sub-header sections, while less dense in interactional features, still contribute significantly to setting the tone and engaging readers at the outset.

These findings suggest that while the body of the articles contains the majority of interactional features in absolute terms, the sub-header and header make more concentrated use of these features relative to their word counts. The distribution reflects the functional roles of these sections, with the body focusing on detailed argumentation balanced by confidence (boosters) and caution (hedges), while the header and sub-header prioritize engaging readers and presenting key ideas concisely.

These findings align with previous research emphasizing the rhetorical differentiation of news sections and the communicative functions they serve. As Finkbeiner (2024) notes, headlines operate under distinct pragmatic constraints, designed to capture attention and orient readers toward the central theme of a story. Similarly, Bednarek and Caple (2017) demonstrate that headlines and sub-headlines are central to shaping news values and engagement, while body sections elaborate and contextualize information through extended argumentation. The prevalence of interactional features in the body section of the present corpus supports this view, reflecting the journalist’s role in presenting evaluations, managing stance, and engaging readers throughout the narrative. At the same time, the higher proportional density of these features in the headline and sub-headline sections corresponds with Nguyen and Hekman’s (2024) observation that AI-related reporting relies heavily on framing and evaluative cues to construct interpretive alignment with audiences. In combination, these patterns suggest that metadiscourse resources are strategically distributed across sections to balance informational content with interpersonal engagement, a tendency that resonates with broader findings in media discourse research (Ben Moussa, 2025; Chen & Li, 2023).

RQ2: Analysis of Features in the Interactive Dimension

As presented in Table 4, the distribution of interactive metadiscourse features across the headers, sub-headers, and body sections of the MIT-News Corpus demonstrates significant variation in raw occurrences, normalized frequency per 1,000 words, and percentage rates. The body section consistently exhibits the highest raw occurrences for most features, reflecting its role as the primary site for structuring and organizing discourse.

Table 4.

Distribution of Interactive Metadiscourse Features Across Sections of MIT-News Corpus (Headers, Sub-Headers, and Body), Including Raw Occurrences, Normalized Frequency per 1,000 Words, and Percentage Rates.

Category/feature	Header (9,653 words)			Sub-header (22,135 words)			Body (1,147,496 words)
Category/feature	Raw occur.	Per 1,000	Rate (%)	Raw occur.	Per 1,000	Rate (%)	Raw occur.	Per 1,000	Rate (%)
Code Glosses	3	0.34	0.7	48	2.51	3.4	1,694	1.80	4.2
Endophoric Markers	4	0.45	0.9	19	0.99	1.3	1,722	1.83	4.3
Evidentials	0	0.00	0.0	0	0.00	0.0	34	0.04	0.1
Frame Markers announce goals	0	0.00	0.0	23	1.20	1.6	650	0.69	1.6
Frame Markers label stages	1	0.11	0.2	7	0.37	0.5	565	0.60	1.4
Frame Markers Sequencing	12	1.36	2.8	38	1.98	2.7	2,534	2.70	6.3
Frame Markers shift topic	7	0.79	1.6	20	1.04	1.4	1,635	1.74	4.0
Transition Markers	150	17.00	34.6	629	32.84	44.5	6,535	6.96	16.2
Interactive (Total)	177	20.06	40.9	784	40.93	55.5	15,369	16.37	38.0

For instance, code glosses appear 1,694 times in the body with a normalized frequency of 1.80 per 1,000 words and a percentage rate of 4.2%. In comparison, the sub-header section records 48 occurrences with a frequency of 2.51 per 1,000 words and a percentage rate of 3.4%, while the header section contains only 3 occurrences with a frequency of 0.34 per 1,000 words and a rate of 0.7%. These results highlight the body section’s central role in elaborating and clarifying information.

For endophoric markers, the body again leads, with 1,722 occurrences, a frequency of 1.83 per 1,000 words, and a percentage rate of 4.3%. The sub-header section follows with 19 occurrences and a frequency of 0.99 per 1,000 words (1.3%), while the header section shows only 4 occurrences, corresponding to a frequency of 0.45 per 1,000 words and a rate of 0.9%. Similarly, evidentials are almost exclusively found in the body, with 34 raw occurrences, a normalized frequency of 0.04 per 1,000 words, and a percentage rate of 0.1%, while both the header and sub-header sections show no occurrences of these features. These results highlight the body section’s essential role in referencing and supporting information, where elaboration and attribution are critical to building coherent and trustworthy narratives.

In terms of frame markers that announce goals, the body contains 650 occurrences, with a normalized frequency of 0.69 per 1,000 words and a percentage rate of 1.6%. The sub-header section records 23 occurrences, a frequency of 1.20 per 1,000 words, and the same percentage rate of 1.6%, while the header section shows no occurrences. For frame markers that label stages, the body again leads with 565 occurrences, a frequency of 0.60 per 1,000 words, and a rate of 1.4%. The sub-header section records lower values, with 7 occurrences and a frequency of 0.37 per 1,000 words (0.5%), while the header section contains only 1 occurrence, corresponding to a frequency of 0.11 per 1,000 words (0.2%). These results highlight the body’s key function in mapping the structure of arguments or narratives, where readers are guided through goals and stages of discussion in a detailed and linear fashion.

The body section also exhibits the highest frequency of frame markers used for sequencing, with 2,534 occurrences, a normalized frequency of 2.70 per 1,000 words, and a percentage rate of 6.3%. The sub-header section follows with 38 occurrences, a frequency of 1.98 per 1,000 words, and a rate of 2.7%, while the header section shows 12 occurrences, corresponding to a frequency of 1.36 per 1,000 words and a rate of 2.8%. Similarly, for frame markers used to shift topics, the body records 1,635 occurrences, a frequency of 1.74 per 1,000 words, and a percentage rate of 4.0%. The sub-header section follows with 20 occurrences and a frequency of 1.04 per 1,000 words (1.4%), while the header section contains 7 occurrences, a frequency of 0.79 per 1,000 words, and a rate of 1.6%. These results highlight the body’s role in maintaining coherence across extended discourse, using sequencing and topic shifts to navigate complex information flows.

Transition markers are the most frequently occurring feature across all sections. The body section demonstrates the highest count, with 6,535 occurrences, a normalized frequency of 6.96 per 1,000 words, and a percentage rate of 16.2%. The sub-header section follows with 629 occurrences, a frequency of 32.84 per 1,000 words, and a percentage rate of 44.5%. The header section contains 150 occurrences with a frequency of 17.00 per 1,000 words and a rate of 34.6%. These results suggest that while transition markers are prevalent across all sections, their relative importance is higher in the shorter header and sub-header sections, where they play a significant role in guiding readers.

Overall, the total number of interactive metadiscourse features is highest in the body section, with 15,369 occurrences. However, when normalized per 1,000 words, the sub-header section demonstrates the highest concentration at 40.93, followed by the header at 20.06, and the body at 16.37. In terms of percentage distribution within each section, the sub-header again leads with 55.5%, followed by the header (40.9%) and the body (38.0%). These findings suggest that although the body encompasses more interactive features overall due to its length, interactive elements are more densely and strategically employed in the shorter sub-header and header sections to enhance structural clarity.

This aligns with the observation that interactive features are most concentrated in the sub-header, where they serve to guide readers through the structure and intent of the text. The body, while containing the majority of features in absolute terms, reflects a broader use of transitions, sequencing markers, and topic shifts to maintain coherence and clarity in detailed arguments. In contrast, the header uses interactive features less extensively, focusing instead on transition markers and sequencing to briefly orient the reader to the content. The distribution of these features across sections emphasizes their functional roles in organizing and clarifying the text for the audience.

These patterns correspond with earlier studies on the rhetorical organization of news discourse. Bednarek and Caple (2017) emphasize that textual components such as headlines and bodies perform distinct structural roles that guide readers through news narratives. Similarly, Finkbeiner (2024) highlights the pragmatic function of headlines and sub-headlines as orientation cues that frame interpretation and facilitate coherence. The present distribution of interactive markers reflects these functional distinctions: sub-headlines employ a higher density of frame and transition markers to signal structure and thematic focus, while the body relies on sequencing and topic-shift markers to maintain textual flow. These findings also resonate with Aszeli et al. (2021), who found that interactive resources enhance clarity and reader comprehension in news discourse. Taken together, the results reinforce the view that interactive metadiscourse functions as an organizing mechanism that enables journalists to present complex AI-related information in a coherent and accessible way.

Statistical Test Results for RQ1 and RQ2

The chi-square test results presented in Table 5 highlight statistically significant differences in the distribution of metadiscourse features across the three sections (header, sub-header, and body) of the MIT-News Corpus. The results are based on both raw occurrences and normalized frequencies, providing insights into the overall distribution as well as the specific contributions of interactional and interactive features.

Table 5.

Chi-Square Test Results, Based on Raw Occurrences and Normalized Frequencies, for Overall Feature Distribution and Subcategories (Interactional and Interactive Features) Across Three Sections (Header, Sub-Header, and Body).

Category	Chi-square statistic (χ²; raw occurrences)	Chi-square statistic (χ²; normalized)	Degree of freedom (df)	p-Value
All	1,114,998.202	1,698.519512	24	0.00E+00
Interactional	686,563.8356	871.0755306	8	6.78E-177
Interactive	428,434.3665	827.4439818	14	2.50E-173

For the overall feature distribution, the chi-square statistic based on raw occurrences is extremely high (χ² = 1,114,998.202) with a degree of freedom (df) of 24, and the corresponding p-value is effectively zero (p < .0001). This indicates highly significant differences in the overall distribution of metadiscourse features across the sections. Similarly, the normalized frequencies yield a chi-square statistic of χ² = 1,698.519, which also reflects a significant variation (p < .0001). These findings confirm that the distribution of features is not uniform across the sections and that the observed differences are unlikely to have arisen by chance.

For the interactional features, the chi-square statistic for raw occurrences is χ² = 686,563.836 with df = 8, and the p-value is remarkably small (p = 6.78 × 10⁻¹⁷⁷). The normalized frequencies also reveal significant variation, with a chi-square statistic of χ² = 871.076 and a similarly small p-value. These results suggest that the interactional features are distributed unevenly among the header, sub-header, and body sections of the corpus. This uneven distribution reflects the distinct rhetorical functions of these sections, with the body section typically hosting the majority of interactional elements to engage readers and present arguments.

For the interactive features, the chi-square test yields a statistic of χ² = 428,434.367 for raw occurrences with df = 14, and the p-value is again effectively zero (p = 2.50 × 10⁻¹⁷³). The normalized frequencies yield a similarly significant result, with χ² = 827.444 and a negligible p-value. These findings indicate significant differences in the distribution of interactive features, such as code glosses, frame markers, and transition markers, across the sections. The body section likely contributes disproportionately to these features, given its role in elaborating and structuring information, while the header and sub-header sections rely on such features to frame and orient the content in a more concise manner.

The chi-square results strongly support the conclusion that both interactional and interactive metadiscourse features are distributed unevenly across the sections of the MIT-News Corpus. The observed differences are statistically significant, with p-values far below conventional thresholds of significance. These findings highlight the distinct communicative and rhetorical functions of the header, sub-header, and body sections, which are reflected in their varying use of metadiscourse features.

RQ3: Correlation Analysis Across Metadiscourse Dimensions

The relationship between interactive and interactional dimensions can vary significantly across different sections of a news article in MIT-News Corpus, reflecting the distinct communicative purposes of each section. To better understand these patterns, the correlation between these two dimensions was analyzed in the header, sub-header, and body sections. This analysis provides insights into how writers balance clarity, structure, and reader engagement in news articles about AI technologies. The following subsections present the results and interpretations for each section, highlighting the unique trends observed. The Pearson correlation results for all three sections are presented in Table 6.

Table 6.

Pearson Correlation Coefficients (r) and Associated p-Values for the Relationship Between Interactive and Interactional Metadiscourse Dimensions Across Article Sections (Header, Sub-Header, and Body) in the MIT-News Corpus (2015–2024).

Section	Correlation results (r)	p-Value
Header	−.6737511585984094	1.5767513756272053e-52
Sub-header	−.30238840610479645	1.36676200048994e-20
Body	.7747167624964868	1.1128541641289372e-256

As depicted in Figure 3 and reported in Table 6, the Pearson correlation analysis for the header section revealed a statistically significant negative correlation between interactive and interactional features (r = −.67, p < .001). This indicates that as the use of interactive features increases, the use of interactional features tends to decrease, and vice versa.

Figure 3.

Pearson correlation matrix for interactive and interactional features in header sections: Strong negative correlation (r = −.67).

This pattern suggests that headers in news articles about AI technologies are strategically designed to prioritize clarity and conciseness. Interactive features are more prominent, as they help structure information succinctly and introduce the article’s topic with coherence. In contrast, interactional features appear less frequently, likely because they introduce subjectivity or interpersonal tone that may detract from the professional and objective style typical of news headlines. The inverse relationship highlights a rhetorical trade-off: headers favor structural guidance over reader engagement, focusing on efficient delivery of key information rather than facilitating interaction.

As shown in Figure 4 and presented in Table 6, the correlation analysis for the sub-header section revealed a weak but statistically significant negative correlation between interactive and interactional features (r = −.30, p < .001). While the strength of this correlation is limited, its significance suggests a slight tendency for one metadiscourse dimension to increase as the other decreases within sub-headers.

Figure 4.

Pearson correlation matrix for interactive and interactional features in sub-header sections: Weak negative correlation (r = −.3).

Sub-headers serve a transitional role, linking the concise structure of the header with the elaboration found in the article body. The weak inverse correlation suggests that sub-headers maintain a mild preference for interactive features, which provide structure and cohesion, while allowing limited space for interactional features. Elements such as attitude markers (e.g., “important,”“innovative”) appear slightly more frequently here than in headers, offering subtle cues about the significance or evaluative stance of the content.

This result reflects the hybrid rhetorical function of sub-headers: they must remain clear and organizationally effective, yet they also serve to signal emphasis or guide reader attention. The weak correlation suggests that while both dimensions may co-occur to some extent, sub-headers still lean toward informational clarity over overt engagement.

As presented in Table 6 and illustrated in Figure 5, the body section exhibited a strong positive correlation between interactive and interactional features (r = .77, p < .001). This highly significant result indicates that the two metadiscourse dimensions tend to co-occur, increasing in tandem to support both structural clarity and audience engagement.

Figure 5.

Pearson correlation matrix for interactive and interactional features in article body sections: Strong positive correlation (r = .77).

In the article body, where core content is developed and elaborated, this correlation reflects a rhetorical strategy that seeks to balance organization with reader interaction. Writers rely on interactive features to guide readers through complex content, ensuring coherence and flow. At the same time, interactional features are employed to personalize the narrative, emphasize relevance, and convey authorial perspective. For instance, while interactive features help the reader navigate technical explanations, interactional features invite reflection on broader societal, ethical, or practical implications (Hyland, 2005). This synergistic use of both metadiscourse dimensions in the body section is essential for making AI technology reporting both accessible and compelling. It allows writers to maintain informational precision while encouraging a sense of dialogue or shared interest, a key consideration when communicating specialized content to a broad public audience.

The observed variation in correlations between interactive and interactional metadiscourse features across the structural sections of news articles about AI technologies reveals a differentiated rhetorical strategy shaped by genre, format, and evolving media practices.

These associations suggest that metadiscourse is not uniformly distributed but strategically adjusted according to the communicative demands of each section. This aligns with research showing that digital media environments increasingly reward clarity, modularity, and efficiency, particularly in high-visibility textual zones such as headlines and sub-headlines (Boczkowski, 2009; Liu, 2005). In such sections, the reduced co-occurrence of interactional and interactive features may reflect constraints on space and reader attention, reinforcing the tendency to prioritize organization over engagement (De Maeyer, 2015).

Conversely, the body section’s strong co-occurrence of both metadiscourse dimensions suggests a deliberate rhetorical strategy aimed at balancing reader orientation with authorial stance. This aligns with Hyland (2005) interpersonal model of metadiscourse, which highlights the dual role of writers in organizing information and engaging readers, particularly in specialized genres. It is also consistent with Biber and Conrad (2019) observation that different sections within a genre serve distinct communicative purposes and often require greater linguistic flexibility to accommodate the complexity and density of content in technical or informative discourse. In science and technology reporting, where both credibility and reader accessibility are critical, this dual deployment serves to enhance coherence while framing the content in socially and ethically relevant terms (Crawford, 2021).

Furthermore, these section-based distinctions may reflect the broader influence of AI-assisted writing tools, which often impose structural regularity and minimize subjective language (Thäsler-Kordonouri et al., 2024; S. Wang et al., 2025). Such tools may reinforce the standardization of headers and sub-headers while allowing greater stylistic range in the body, where human editing and narrative depth are more pronounced.

Recent research further supports these observations. Chen and Li (2023) demonstrate that interactional markers in news commentaries are strategically distributed to balance stance and reader orientation, a tendency that aligns with the section-based distinctions observed in the present analysis. Similarly, Al-Subhi (2023) shows that the use of metadiscourse in war-related editorials is contextually adapted to rhetorical intent, revealing how interactional and interactive dimensions interweave to manage persuasion and coherence. Complementary findings by Chang (2025) and Hastomo and Aminatun (2023) highlight that online and hybrid news environments increasingly demand concise yet rhetorically dense language, where organizational markers and engagement cues are deployed selectively to maintain audience attention. These recent studies collectively affirm that the interaction between metadiscourse dimensions is both context-sensitive and functionally motivated, reinforcing the broader trend of strategic linguistic adaptation in contemporary media discourse.

Taken together, the correlation patterns across sections emphasize the need to approach metadiscourse analysis not only at the document level but also at the intra-textual level, where rhetorical function is contextually embedded. This supports a growing view in discourse studies that effective communication, particularly in specialized domains such as AI technology reporting, depends on section-sensitive adaptation of metadiscourse strategies (Hyland & Tse, 2004).

RQ4: Temporal Changes Across Metadiscourse Dimensions

Examining temporal trends reveals shifts in writing practices and communicative strategies as AI technologies have evolved. The analysis provides insights into how writers adapt their use of metadiscourse devices to align with changes in audience expectations, technological advancements, and the broader discourse surrounding AI technologies.

Figure 6 illustrates the trends in normalized frequencies of interactive and interactional metadiscourse features in the MIT-News Corpus from 2015 to 2024. Both features show a general decline over time, with interactive features (green line) consistently occurring less frequently compared to interactional features (orange line). The decline in interactive features is steeper, particularly in the earlier years, stabilizing after 2018. In contrast, interactional features demonstrate a more gradual decrease, with fluctuations but an overall downward trajectory. These trends suggest a reduced emphasis on both types of metadiscourse features in recent years, potentially reflecting shifts in writing practices or audience expectations in this domain.

Figure 6.

Trends of normalized interactive (green) and interactional (orange) metadiscourse features in MIT-News Corpus (2015–2024), showing a consistent decline in both over time.

To statistically verify the declining trends depicted in Figure 6, LR was used as a method of analysis. This approach enabled examination of patterns and assessment of trend significance over the analyzed period. The results of the linear regression (LR) modeling are presented in Table 7 for both interactional and interactive metadiscourse dimensions. The table reveals significant patterns in the trends of normalized frequencies for each dimension over the examined period.

Table 7.

Linear Regression Results for Normalized Interactional and Interactive.

Metric	Interactional dimension	Interactive dimension
Intercept (const)	730.57 (p = .028)	896.39 (p = .004)
Slope (x1) “Year”	−0.3482 (p = .033)	−0.4351 (p = .004)
R ²	.452	.661
Adjusted R²	.384	.618
Standard error of slope	0.136	0.11
95% CI for slope	[−0.661, −0.036]	[−0.689, −0.181]

For the interactional dimension, the intercept is 730.57 (p = .028), indicating the estimated starting value at the beginning of the timeline. The slope, −0.3482 (p = .033), demonstrates a statistically significant decline in the frequency of interactional features over time, with a 95% confidence interval for the slope ranging from [−0.661, −0.036]. The R² value of .452 indicates that 45.2% of the variability in the interactional dimension is explained by the year variable. However, the adjusted R² value of .384 provides a more conservative estimate, suggesting that approximately 38.4% of the variability in the interactional dimension is explained by the year variable. The adjusted R² is emphasized here because it accounts for model complexity and provides a better estimate of how the model would perform on new data. This means that the year (time) is an important factor driving the decline in interactional features, although other factors, such as changes in writing style, audience preferences, or editorial policies, account for the remaining variability.

For the interactive dimension, the intercept is 896.39 (p = .004), and the slope is −0.4351 (p = .004), which indicates a statistically significant decline over time. The 95% confidence interval for the slope ranges from [−0.689, −0.181]. The R² value of .661 and the adjusted R² value of .618 indicate that the year variable explains approximately 61.8% of the variability in the interactive dimension. This stronger relationship compared to the interactional dimension highlights that time is a more dominant factor in explaining the decline in interactive features, although some variability remains unexplained.

Figure 7 illustrates these trends using regression lines. The blue line represents the decline in interactive features, while the orange line shows the decline in interactional features. The figure confirms what the table shows: both types of features have been steadily decreasing over time in the MIT-News Corpus. The results show a clear decline in both interactional and interactive features, with a steeper, more consistent drop in the latter.

Figure 7.

Linear regression trends of normalized interactive (blue) and interactional (orange) features in MIT-New Corpus (2015–2024), showing a steady decline in both over time.

Shifts in the use of metadiscourse in AI news articles from 2015 to 2024 reflect broader changes in how these technologies are communicated to the public. The observed decline in both interactional and interactive features points to significant adaptations in writing styles and audience engagement strategies, influenced by societal, technological, and cultural transformations.

The reduction in interactional features suggests a trend toward more neutral, objective, and impersonal reporting. This shift aligns with the increasing demand for credibility and factual accuracy, particularly in science and technology reporting, where audiences expect reliable and evidence-based content. The reduction in interactional features suggests a shift toward impersonal and objective reporting, which aligns with findings in science journalism emphasizing neutrality and credibility (Bucchi & Trench, 2021; Hyland, 2005). The decline in language that directly engages the reader could also reflect a preference for professional and authoritative tones, especially as these articles often target a global and highly diverse audience (Boczkowski, 2009; Weaver et al., 2009). By minimizing subjective expressions, writers may aim to avoid controversy and maintain neutrality in a field that is often under intense public and ethical scrutiny (Crawford, 2021).

Similarly, the decline in interactive features points to a trend toward simplification and streamlining of content. This change may reflect the influence of digital reading behaviors, where audiences often skim rather than deeply engage with lengthy or elaborately structured texts (Liu, 2005). In fast-paced digital environments, reducing dense organizational cues can help make articles more digestible and accessible, ensuring that key points are delivered efficiently to readers with limited attention spans (Boczkowski, 2009). The observed reduction in endophoric markers, which signal intra-textual cohesion, further suggests that articles are increasingly designed as self-contained pieces, rather than components of serialized or thematically linked news packages. This pattern aligns with current web publishing practices, where modular, standalone texts are favored for their adaptability and shareability in fragmented online contexts (De Maeyer, 2015).

In addition, Technological advancements, especially the increasing use of AI-powered writing and editing tools, appear to influence stylistic conventions in journalism. Studies show that automated content generation, while efficient, may compromise word choice and narrative quality, reducing the presence of interactional features such as engagement markers and hedges (Thäsler-Kordonouri et al., 2024). In a newsroom case study, (S. Wang et al., 2025) found that AI-generated drafts were more formulaic and required substantial human editing for clarity, reflecting a shift toward a standardized and minimalistic writing style. Furthermore, research on audience perceptions of AI-generated news indicates that reduced rhetorical richness and perceived authorial impersonality may impact how such content is received, especially when the AI’s identity is made explicit (Lee et al., 2025).

These shifts also reflect broader societal changes in how technology and AI are discussed. Over the past decade, technology reporting has moved from being niche and exploratory to being mainstream and central to public discourse. This shift may have driven a focus on clear, concise, and objective communication to address a wider audience, including both experts and non-specialists. Additionally, as public awareness of ethical concerns and biases in AI technologies has grown, writers may increasingly strive for neutrality to avoid polarizing their audience.

While these changes enhance clarity and professionalism, they may also reduce the sense of engagement and connection with readers. The decline in rhetorical strategies that personalize or engage the audience suggests a move toward a more transactional style of writing, where the primary goal is to convey information rather than promote dialogue or emotional resonance. This shift reflects the challenges of communicating effectively in a fast-paced, information-heavy digital world, where attention spans are limited, and the demand for accuracy and efficiency is high.

These diachronic tendencies are consistent with broader linguistic trends observed in recent studies. H. Wang and Hu (2023) report a steady decline in explicit self-mention and stance markers across academic writing, suggesting a shift toward greater impersonality and objectivity, patterns mirrored in the present corpus of AI news reporting. Similarly, Aghdam et al. (2025) found diachronic reductions in metadiscourse markers within research articles, reflecting a general movement toward concision and formal neutrality in knowledge-oriented genres. In journalism, comparable observations have been made by Nguyen and Hekman (2024), who note that digital media practices and algorithmic constraints increasingly favor streamlined, modular text structures. Together, these studies support the view that the decline in both interactive and interactional metadiscourse in AI-related news reflects not a rhetorical loss but an adaptation to evolving communicative norms shaped by digital environments and automated production processes.

Conclusion and Future Work

This study investigated the distribution and use of metadiscourse features in AI news articles, focusing on interactional and interactive dimensions across three key sections: header, sub-header, and body. The findings show that these features are strategically distributed to fulfill each section’s communicative role. The body is the main site for detailed argumentation, using interactional features to balance confidence with caution and interactive features for coherence and clarity. The sub-header serves as a transition, using concentrated metadiscourse to guide readers and clarify structure and intent. The header focuses on engaging readers and orienting them to the content, relying on engagement and transition markers.

These results emphasize the functional differentiation of metadiscourse strategies in AI news articles, providing valuable insights into how authors engage readers, present arguments, and structure technical content. Beyond theoretical insights, the findings also carry practical implications for journalists, science communicators, linguists and educators working in media and communication studies. They can use these results to enhance clarity, coherence, and reader engagement when reporting complex technological topics such as AI. Furthermore, the study offers a reference model for academic and professional writers, particularly non-native English speakers and novice researchers, seeking to produce linguistically effective and reader-oriented texts in technical and scientific domains.

While this study provides a comprehensive analysis of metadiscourse features in news articles about AI technologies, several areas remain open for exploration. The main limitation concerns the scope of the dataset, which focuses on a single institutional source (MIT News). Expanding future research to include multiple outlets and cross-cultural datasets could provide a broader perspective on how metadiscourse practices vary across media contexts and cultural settings. Future research could expand this investigation by examining metadiscourse in a broader range of disciplines to identify domain-specific patterns and practices, and additional multilingual resources. Additionally, longitudinal studies could explore how metadiscourse usage evolves over time in response to changes in academic writing conventions and technological advancements in publishing. Finally, employing qualitative methods, such as interviews with authors and readers, could provide deeper insights into the perceived effectiveness of metadiscourse strategies and their impact on reader comprehension and engagement. By addressing these areas, future research can build on the present findings to deepen our understanding of how metadiscourse operates across journalistic and technical genres, and to develop pedagogical and practical applications for improving writing in both academic and media contexts.

Footnotes

ORCID iD

Rashed Saad Alsharif

Ethical Considerations

This study involved no human or animal testing and required no ethical approval.

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data are available from the author upon reasonable request.

References

Ädel

(2008). Metadiscourse in L1 and L2 English. John Benjamins Publishing Company. https://doi.org/10.1075/scl.24

Aghdam

A. A.

Mahdavirad

Rezai

M. J.

(2025). Diachronic variations of metadiscourse markers in research articles. Social Sciences & Humanities Open, 11, Article 101363.

Al-Subhi

A. S.

(2023). Interactional meta-discourse and phraseology in newspaper editorials during the Russia-Ukraine War. Online Journal of Communication and Media Technologies, 13(3), Article e202331.

Aszeli

N. A.

Jamil

D. A.

Rahmat

N. H.

(2021). A study of interactional metadiscourse on news article on the impact of COVID-19 on education. European Journal of Literature, Language and Linguistics Studies, 4(4), 88–100.

Bednarek

Caple

(2017). The discourse of news values: How news organizations create newsworthiness. Oxford University Press.

Ben Moussa

. (2025). Journalism metadiscourse on professional identity and verification in UAE. Journalism Practice, 19(2), 263–281. https://doi.org/10.1080/17512786.2023.2187859

Berberich

Kleiber

(2025). MetaPak: A tool to assist metadiscourse analysis based on Hyland’s framework. https://corpus-analysis.com/tag/metadiscourse

Biber

Conrad

(2019). Register, genre, and style (2nd ed.). Cambridge University Press. https://doi.org/10.1017/9781108686136

Bird

(2006). NLTK: The natural language toolkit. Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions (pp. 69–72), Sydney, Australia. https://doi.org/10.3115/1225403.1225421

10.

Boczkowski

P. J.

(2009). Technology, monitoring, and imitation in contemporary news work. Communication, Culture & Critique, 2(1), 39–59. https://doi.org/10.1111/j.1753-9137.2008.01028.x

11.

Bucchi

Trench

(2021). Routledge handbook of public communication of science and technology (3rd ed.). Routledge. https://doi.org/10.4324/9781003039242

12.

Chang

Y.-Y.

(2025). Exploring interactional metadiscourse in content farms and formal news agencies. International Journal of Applied Linguistics, 35(2), 852–862.

13.

Chen

(2023). Interactional metadiscourse in news commentaries: A corpus-based study of China Daily and The New York Times. Journal of Pragmatics, 212, 29–40.

14.

Crawford

(2021). The atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press. https://doi.org/10.2307/j.ctv1ghv45t

15.

Crismore

(1983). Metadiscourse: What it is and how it is used in school and non-school social science texts (p. 103). National Institute of Education. https://eric.ed.gov/?id=ED229720

16.

Crismore

Markkanen

Steffensen

M. S.

(1993). Metadiscourse in persuasive writing: A study of texts written by American and Finnish University students. Written Communication, 10(1), 39–71. https://doi.org/10.1177/0741088393010001002

17.

Dafouz-Milne

(2008). The pragmatic role of textual and interpersonal metadiscourse markers in the construction and attainment of persuasion: A cross-linguistic study of newspaper discourse. Journal of Pragmatics, 40(1), 95–113. https://doi.org/10.1016/j.pragma.2007.10.003

18.

Dahl

(2004). Textual metadiscourse in research articles: A marker of national culture or of academic discipline? Journal of Pragmatics, 36(10), 1807–1825. https://doi.org/10.1016/j.pragma.2004.05.004

19.

De Maeyer

. (2015). The journalistic hyperlink: Prescriptive discourses about linking in online news. In B. Franklin (Ed.), The future of journalism: Developments and debates (1st ed., pp. 238–247). Routledge. https://doi.org/10.4324/9781315678962-26

20.

Feng

Tang

Xiao

Zhu

(2024). Persuasion strategies of the major powers on social media: An analysis of the metadiscourse from the Chinese and American spokespersons’ tweets. Emerging Media, 2(4), 698–723. https://doi.org/10.1177/27523543241283645

21.

Finkbeiner

(2024). The pragmatics of headlines. Central issues and future research avenues. Elsevier.

22.

Flowerdew

(2008). Scholarly writers who use English as an additional language: What can Goffman’s “Stigma” tell us? Journal of English for Academic Purposes, 7(2), 77–86.

23.

Gries

S. T.

(2010). Useful statistics for corpus linguistics. In Sánchez

Almela

(Eds.), A mosaic of corpus linguistics: Selected approaches (1st ed., pp. 269–280). Peter Lang. https://www.peterlang.com/document/1136097

24.

Harris

Z. S.

(1959). The transformational model of language structure. Anthropological Linguistics, 1(1), 27–29.

25.

Hastomo

Aminatun

(2023). An analysis of metadiscourse markers in online news media: Qualitative research. LEXEME: Journal of Linguistics and Applied Linguistics, 5(1), 95–103.

26.

Hunston

(2022). Corpora in applied linguistics. Cambridge University Press.

27.

Hyland

(2005). Metadiscourse: Exploring interaction in writing (1st ed.). Continuum.

28.

Hyland

Tse

(2004). Metadiscourse in academic writing: A reappraisal. Applied Linguistics, 25(2), 156–177. https://doi.org/10.1093/applin/25.2.156

29.

Hyland

Wang

Jiang

F. K.

(2022). Metadiscourse across languages and genres: An overview. Lingua, 265, Article 103205. https://doi.org/10.1016/j.lingua.2021.103205

30.

Kopple

W. V.

(1985). Some exploratory discourse on metadiscourse. College Composition & Communication, 36(1), 82–93. https://doi.org/10.58680/ccc198511781

31.

Lee

D. C.

Jhang

Baek

T. H.

(2025). AI-generated news content: The impact of AI writer identity and perceived AI human-likeness. International Journal of Human–Computer Interaction, 41(21), 1–13. https://doi.org/10.1080/10447318.2025.2477739

32.

Liu

(2005). Reading behavior in the digital environment: Changes in reading behavior over the past ten years. Journal of Documentation, 61(6), 700–712. https://doi.org/10.1108/00220410510632040

33.

Mauranen

(1993). Cultural differences in academic rhetoric: A textlinguistic study (Nordeuropäische Beiträge aus den Human-und Gesellschaftswissenschaften 4) [Nordic Contributions to the Humanities and Social Sciences 4]. Peter Lang.

34.

McEnery

Hardie

(2011). Corpus linguistics: Method, theory and practice. Cambridge University Press. https://doi.org/10.1017/CBO9780511981395

35.

Nguyen

Hekman

(2024). The news framing of artificial intelligence: A critical exploration of how media discourses make sense of automation. AI & Society, 39(2), 437–451. https://doi.org/10.1007/s00146-022-01511-1

36.

Ouchchy

Coin

Dubljević

(2020). AI in the headlines: The portrayal of the ethical issues of artificial intelligence in the media. AI & Society, 35(4), 927–936. https://doi.org/10.1007/s00146-020-00965-5

37.

Pérez-Llantada

(2003). Communication skills in academic monologic discourse. Empirical and applied perspectives. Círculo de Lingüística Aplicada a La Comunicación, 3(15), 1–14.

38.

Richardson

(2007). Beautiful soup documentation. https://ucilnica.fri.uni-lj.si/pluginfile.php/217774/mod_resource/content/1/beautiful-soup-4-readthedocs-io-en-latest.pdf

39.

Speelman

(2014). Logistic regression: A confirmatory technique for comparisons in corpus linguistics. In Glynn

Robinson

J. A.

(Eds.), Human cognitive processing (Vol. 43, pp. 487–533). John Benjamins Publishing Company. https://doi.org/10.1075/hcp.43.18spe

40.

Thäsler-Kordonouri

Thurman

Schwertberger

Stalph

(2024). Too many numbers and worse word choice: Why readers find data-driven news articles produced with automation harder to understand. Journalism, 26(9), 1878–1898. https://doi.org/10.1177/14648849241262204

41.

Wang

(2023). Changes in the ways authors refer to themselves: A diachronic study of self-mention in English research articles. Humanities and Social Sciences Communications, 10(1), 1–10.

42.

Wang

(2025). Metadiscourse across discourses: A cross-cultural review of current trends and future directions. International Journal of Linguistics, Literature and Translation, 8(3), 93–96. https://doi.org/10.32996/ijllt.2025.8.3.12

43.

Wang

McKinnon-Crowley

Long

Lua

K. L.

Henderson

Crowston

Nickerson

J. V.

Hansen

Chilton

L. B.

(2025, February 7). The role of human creativity in the presence of AI creativity tools at work: A case study on AI-driven content transformation in journalism. arXiv. https://doi.org/10.48550/arXiv.2502.05347

44.

Weaver

D. A.

Lively

Bimber

(2009). Searching for a frame: News media tell the story of technological progress, risk, and regulation. Science Communication, 31(2), 139–166. https://doi.org/10.1177/1075547009340345

45.

Yang

(2025). Metadiscourse in the research abstracts of an interdisciplinary field: A case study of computational linguistics. Humanities and Social Sciences Communications, 12(1), 1–10.

Interactional and Interactive Metadiscourse in AI News Articles

Abstract

Keywords

Introduction

Literature Review

Metadiscourse Framework

A Glance of Metadiscourse Studies on Journalism

Research Design

Step 1: Dataset Sampling

Step 2: Identification of Metadiscourse Features

Step 3: Statistical Analysis Plan

Dataset Presentation

Results and Discussion

RQ1: Analysis of Features in the Interactional Dimension

RQ2: Analysis of Features in the Interactive Dimension

Statistical Test Results for RQ1 and RQ2

RQ3: Correlation Analysis Across Metadiscourse Dimensions

RQ4: Temporal Changes Across Metadiscourse Dimensions

Conclusion and Future Work

Footnotes

ORCID iD

Ethical Considerations

Funding

Declaration of Conflicting Interests

Data Availability Statement

References