Abstract
An increasing number of studies in pragmatics, second language acquisition, and related fields have opted to use sitcom conversations as a substitute for natural conversations in their analyses. However, few studies have critically examined the validity of this substitution. Taking this into consideration, the present study aims to examine the lexical similarities and differences between sitcom and natural conversations by utilizing the Friends Corpus and the Santa Barbara Corpus of Spoken American English as samples under a synthesized analytic framework of six lexical categories that have been frequently examined in previous research. The findings indicate that there are significant differences between sitcom and natural conversations at the lexical level, particularly in terms of word lengths, keywords, and the use of discourse markers, personal pronouns, vocatives, and religious words. While sitcom conversations tend to be more concise, interactive, evaluative, and involving, natural conversations provide more explanations for their speech acts and refer to their parties who are not present in the conversations. Additionally, sitcom conversations use more intensifiers and vocatives while using fewer tentative modals, expletives, and religious words. Based on these results, it can be concluded that sitcom conversations do not fully depict the conversations in sitcoms, and thus substituting natural conversations with those in sitcoms should be approached with caution in language teaching and research. This study provides insight into the differences in lexical patterns between two types of conversations, and highlights the importance of using natural conversations as a basis for language teaching and research.
Research Background
Corpus linguistics has emerged as a significant research domain in linguistics, and is now considered a mainstream methodology in linguistic analyses (Clancy, 2011; O’Keeffe & Walsh, 2012; Taljard, 2014; Upton & Cohen, 2009). Among these, studies in pragmatics have increasingly turned to corpus data as a means of exploring various linguistic phenomena such as discourse markers, stance markers, conversational patterns and structures, (im)politeness, etc. These studies are often based on data from a range of corpora, ready-made (e.g., BNC, COCA, NOW, MICASE) (e.g., C. Y. Lin, 2017; Skalicky et al., 2015), self-developed (e.g., Fuentes-Rodríguez et al., 2016; Polat, 2011), or both (e.g., Yeung, 2009). While most ready-made corpora are highly representative of the genres they aim to represent, some may suffer from a lack of representatives, particularly in areas outside of corpus linguistics such as pragmatics. As a result, many researchers in this field have turned to using conversations from popular media such as sitcoms (e.g., Forcadell, 2016; Kim, 2014; Su, 2017), movies (e.g., Ionescu, 2020), and television (e.g., Sinkeviciute & Rodriguez, 2021), etc. as data sources, due to the convenience of data collection and processing, as well as their flexibility in meeting various research objectives. However, it is important to examine whether conversations in popular media can adequately substitute natural conversations in linguistic analyses. Specifically, it is necessary to consider whether conversations in sitcoms and other media differ substantially from those that occur in natural settings. This is an important question that warrants further investigation, as the use of popular media as data sources in linguistic analyses may introduce biases or limitations in the findings.
In addition to its usefulness in linguistic analysis, data from popular media have also been widely utilized as a resource for language learning and teaching. Researchers have examined the efficacy of using films in language teaching (e.g., Allan, 1985; Rose, 2001; Stempleski & Tomalin, 1990), and explored how movies, televisions, and sitcoms can be used more effectively to teach a second language (e.g., Allan, 1985; Candlin et al., 1982; Eisenstein et al., 1987; Grant & Starks, 2001; P. M. S. Lin, 2014; Pattemore & Muñoz, 2020; Ruck, 2022). Studies have shown that popular media can help develop learners’ vocabulary competence (e.g., Bisson et al., 2013; Csomay & Petrović, 2012; MacFadden et al., 2009; Neuman & Koskinen, 1992; Rodgers & Webb, 2011; Webb, 2010, 2011), pragmatic competence (e.g., Alerwi & Alzahrani, 2020; Barón & Celaya, 2022; Derakhshan & Eslami, 2020; Omar & Razı, 2022), discoursal and stylistic competence (e.g., Marshall & Werndly, 2002; Meinhoe, 1998), as well as facilitating the acculturation of foreign cultures (e.g., Meinhoe, 1998; Tolson, 2001). Several studies have found that learners with extensive exposure to popular media perform better in learning vocabulary (e.g., Bisson et al., 2013; Verspoor et al., 2011), and developing writing competence (e.g., Verspoor et al., 2011).
The use of conversations in popular media (e.g., sitcoms, movies, television) as a substitute for natural conversations, either for research or pedagogical purposes, rests on the assumption that they are not significantly different from conversations in natural settings. Empirical studies have shown that the language used in films is “most representative of naturally-occurring data” (Rose, 2001, p. 309), and similar findings have been reported for internet televisions (e.g., P. M. S. Lin, 2014) and other settings (e.g., Abrams, 2014; Dynel, 2011; Quaglio, 2009; Richardson, 2010). However, other studies have found that the language in popular media differs to varying degrees from that in natural settings (Dose, 2013; Kozloff, 2000; Marshall & Werndly, 2002; Rose, 2001; Rossi, 2011). For instance, P. M. S. Lin (2014) has shown that different television genres resemble natural conversations to various extents, with entertainment and music programs having the least resemblance, and religion, news, and comedy programs having the most. Consequently, some scholars argue that the language in television, sitcoms, movies, etc. can never be a viable substitute for natural conversations (Emmison, 1993; Schegloff, 1988), and it is preferable to adopt natural conversations in compiling teaching materials and conducting research (Abrams, 2014; Gilmore, 2004; Huth & Taleghani-Nikazm, 2006; Wong, 2002). The controversies over the authenticity of conversations in popular media have led to the present study, which aims to compare the lexical features of conversations in sitcoms and natural conversations, and to reveal to what extent they resemble each other. It should be noted that this study only focuses on the lexical differences between the two genres, while acknowledging that there are other perspectives from which their differences could be examined.
Analytic Framework
This study aims to analyze and contrast the lexical features of the conversations in sitcoms and natural settings. Specifically, eight categories will be compared, including discourse markers, modal verbs, intensifiers, downtoners, personal pronouns, vocatives, expletives, and religious words. These categories were selected due to their frequent discussions in the field of pragmatics. Furthermore, the study will also compare the average word lengths (i.e., the average number of letters in a word) and keywords (excluding proper nouns) used in sitcoms and natural conversations, which are two of the key analyses in corpus linguistics. In total, therefore, the study will compare 10 categorical features, as depicted in Figure 1.

Lexical features to be compared in the study.
Word Length
Word length refers to the average number of letters comprising the words in a corpus. The inclusion of word length as a measure of lexical features is due to the findings from previous research that have demonstrated significant differences in word length between natural conversations and prepared speech, such as writing (Fan, 2011; Shi & Lei, 2021). Sitcoms employ a language that is “prepared” for speaking, as it is first scripted and then spoken by the actors or actresses. This differs from natural conversations, which are mostly impromptu and unprepared. Therefore, the observation of word length in sitcoms and natural conversations can serve as an indicator to determine the authenticity of sitcom conversations in portraying those in natural settings.
Keywords
Keywords are words that appear “in a text or corpus statistically significantly more frequent than would be expected by chance when compared to a corpus” (Baker et al., 2006, p. 97). The comparison of keywords between sitcoms and natural settings can provide valuable insights into whether these two genres utilize similar lexical choices. In order to accurately assess the lexical features of each genre, proper names such as character, place, and institution names should be excluded from analysis, as these are often topic-specific and may not accurately reflect the generic features of the genres being compared.
Other Categories
The study will also evaluate other lexical features, such as discourse markers, modal verbs, intensifiers, downtoners, and so on, as indicated in Figure 1. The specific words under scrutiny are presented in Table 1, primarily based on previous research by Biber et al. (1999, pp. 554–556, 564-569, 1086–1088, 1110–1111), Traugot (2020), Halliday and Matthiessen (2004, p. 116), Richard and Schmidt (2010, p. 184), and Martínez (2018). It needs to be pointed out that you know, you know what, you see, I mean, etc. will be treated as lexical rather than phrasal features since they convey a coherent, indispensable meaning as a whole and are categorized as typical discourse markers.
Words Selected to be Searched in Different Lexical Categories.
Methodology
Data Source
The data utilized in the study was derived from two primary sources. The first source was obtained from the classic sitcom Friends, which serves as a representative sample for sitcom conversations. The second source of data was drawn from the Santa Barbara Corpus of Spoken American English (SBC for short hereafter), which was used as a representative sample for natural conversations.
Friends Corpus
The Friends Corpus (FC for short hereafter) was obtained from the website Crazy for Friends (http://www.livesinabox.com/friends/scripts.shtml), a fan club that provides free transcripts of the sitcom for educational and entertainment purposes. According to Quaglio (2009, p. 30), the transcripts were found to be “fairly accurate” and “extremely detailed” (Quaglio, 2009, p. 30), which ensures the reliability of using these data as samples for sitcom conversations. The choice of Friends as a sample was mainly based on its widespread popularity in America (and many other English and non-English speaking countries), and its significant impact on American culture, including language use among speakers of diverse groups (Quaglio, 2009, p. 12).
The investigation of the linguistic features requires manual tagging or refinement due to the potential for words to be used in both pragmatic and non-pragmatic ways. For instance, the word Well is a common discourse maker in native conversations, but its use cannot be reliably identified through automatic corpus tools alone, as it can also function as a noun (e.g., He dug a
To ensure feasibility and comparability, the study analyzed the first three episodes of each of the 10 seasons, resulting in a total of 30 episodes. The FC consists of 77,199 word tokens, which fall into 4,649 word types, as indicated in Table 2.
Basic Information About the Two Corpora in the Study.
SBC Corpus
SBC is a representative sample of contemporary spoken American English, consisting of recordings of naturally occurring interactions among speakers of different ages, genders, occupations, social backgrounds, etc. It mainly includes face-to-face conversations, as well as other types of language use such as telephone communications, pub talks, classroom lectures, meetings, etc. (Du Bois et al., 2000–2005) This makes it comparable to the language used in FC, in which face-to-face conversations are also the predominant form. SBC has 60 files, and this study chose the first 21 as a sample for natural conversations. The corpus consists of 91,233 word tokens, which fall into 6,570 word types, as shown in Table 2. The selection of a larger reference corpus (i.e., SBC) to the observed corpus (i.e., FC) is in line with the established practices in corpus linguistics (Liang et al., 2010, p. 86).
Instruments
The present study utilized two software programs as instruments, namely WordSmith Tools 6.0 and Loglikelihood and Chi-square Calculator 1.0. WordSmith Tools is a computer program designed to analyze how words behave in texts. It consists of three distinct packages, including WordList, Concord, and KeyWords. WordList provides users with the frequency of words or word clusters within a corpus, while Concord enables users to observe the co-occurrence of a word or word cluster in context. KeyWords facilitates the identification of key words within a given corpus in comparison with others. By implementing WordSmith Tools, the present study compared the lexical similarities and differences between the two corpora.
Furthermore, the Loglikelihood and Chi-square Calculator 1.0, developed by Maocheng Liang at Beihang University, China, was used to calculate the degrees of difference between the 10 categories mentioned in Section 2. The chi-square test utilized in the program is based on the classic chi-square test of Yates correction for 2 × 2 tables (Liang, 2010).
Procedures of Data Collection and Analysis
The study employed several methodological procedures to analyze the lexical features of sitcom and natural conversations. First, Wordsmith Tools 6.0 was utilized to obtain the general information of the two corpora, including type, token, TTR, STTR, and word length. Then, the software was utilized to extract the raw frequencies of the words listed in Table 1, whereby irrelevant instances such as well being used as a noun instead of a discourse marker were eliminated. Finally, the raw frequencies were input into the Loglikelihood and Chi-square Calculator (Liang, 2010) to measure the degrees of differences in the lexical features between sitcom and natural conversations.
Research Findings
Word Lengths
In the field of corpus linguistics, word length refers to the average number of letters in a word. It has been observed that different genres of language use tend to have varying word lengths. For example, Shi and Lei (2021) report that the average word length in spoken language is approximately 3.7 letters, while that in written language is around 4.4 letters (Fan, 2011). Therefore, it is necessary to investigate and compare the word lengths in FC and SBC to determine whether there are significant differences in language use between the two corpora.
As presented in Table 2, this study found that the average word length in FC is approximately 3.70 letters, which is quite similar to that of SBC (3.63 letters). This suggests that there are no significant differences in word lengths between the conversations in sitcoms and natural settings. This finding is consistent with the observation by Shi and Lei (2021) that spoken language tends to employ shorter, simpler words, typically consisting of about four letters.
Moreover, both FC and SBC exhibit a similar pattern in the frequency rankings of words of different lengths. Specifically, 4-letter words occupy the top rank, followed by 3-letter and 2-letter words, as shown in Table 3.
Frequency of Words of Different Lengths in FC and SBC.
Note. RF = raw frequency; SF = standardized frequency (per 100,000).
Regarding general frequencies, the analysis shows that FC exhibits significant overuse of 1- and 3-letter words, whereas it significantly underuses 2-, 5-, 6-, 7-, 9-, and 10-letter words and words of more than 10 letters. This suggests that the language used in sitcoms tends to rely on shorter words than found in SBC, and thus may oversimplify the language used in conversations. Such findings may indicate that scriptwriters and/or playwrights of sitcoms hold inaccurate assumptions regarding the language used in everyday conversations, as words used in natural conversations are likely not as short or as simple as those in sitcoms.
Keywords
Keywords in Friends
According to Chen (2012, p. 213), a keyword is defined as a word that occurs significantly more frequently in a given corpus than in a reference corpus. The Keyword method is a tool that can be used to measure whether there are statistically significant differences in word frequencies between two corpora. However, keywords are highly sensitive to topics, participants, and other contextual factors of the conversations. Therefore, proper nouns such as character names, place names, and institution names were excluded from the analysis to ensure that the identified keywords are relevant to the general language use in the corpora.
The analysis of keywords revealed that FC significantly overuses various lexical items related to dating and weddings (e.g., married, wedding, dating), contractions (e.g., ‘t, ‘s, don), deixis (e.g., I, you, my), discourse markers (e.g., oh, y’know, well), phatic expressions (e.g., hey, guys, dude), honorifics (e.g., honey, sweetie, please), evaluation markers (e.g., great, good, weird), negation terms (e.g., no, not), prototypical speech act markers (e.g., sorry, thank, thanks), wh-words (e.g., what, how, why), attention raisers (e.g., look, listen, wait), psychological words (e.g., believe, want, feel), modal verbs (e.g., can, should, maybe), time markers (e.g., now, minute, Monday), sex-related terms (e.g., sex, breast), food terms (e.g., coffee, chip), religious words (Jesus), etc., as presented in Table 4.
Words That Are Significantly Overused in FC.
The excessive use of dating, wedding, and sex-related words in FC can be attributed to the show’s central focus on the lives of young adults. Nonetheless, the remaining differences between sitcom and natural conversations suggest that sitcom conversations tend to be more concise, as indicated by their frequent use of contractions. Contractions are strongly associated with conversations (Biber et al., 1999, p. 1129), which means that sitcom conversations are a reasonable reflection of natural conversations in this regard. However, the overuse of contractions implies that sitcom conversations may over-represent certain features of natural conversations.
Moreover, the overuse of discourse markers and deixis suggests that sitcom conversations are more interactive than those in natural settings, since one of the two major roles of discourse markers is to indicate interactive relationships between speakers, hearers, and messages (Biber et al., 1999, p. 1086), and deixis in conversations are mainly used to build and maintain social relations (Kretzenbacher et al., 2020).
Furthermore, sitcom conversations tend to be more focused on participants’ involvement with their frequent use of phatic expressions, honorifics, and attention raisers. These linguistic devices overtly invite the addressees’ participation. Additionally, sitcom conversations tend to be more negative, as evidenced by their excessive of negation markers, and more evaluative, as indicated by their heavy use of evaluation markers.
Keywords in SBC
The study found that the number of words that were significantly overused in SBC was relatively small. Specifically, the results indicated that SBC primarily overused 2 contractions (i.e., cause, ‘re), 7 third-party deixes (e.g., these, their, he), 1 sequence marker (i.e., then), and 1 religious word (i.e., Jesus), as is shown in Table 5.
Words That Are Significantly Overused in SBC.
It is interesting to note the difference in the use of religious words in natural conversations and in sitcoms. While both Farr and Murphy (2009) and the present study indicate that religious words are common in natural conversations, FC and SBC show different preferences for the use of specific religious words. The overuse of Jesus in SBC suggests that it may be more commonly used in everyday conversations compared to God (with a frequency of 205 over 1), while FC seems to prefer the use of God over Jesus.
The difference in the use of cause between FC and SBC also highlights a potential difference in the way interpersonal relations are handled in natural conversations versus in sitcoms. The frequent use of cause in SBC suggests a greater emphasis on providing reasons or explanations for speech acts, which can help to avoid misunderstandings and maintain positive social relations.
In general, compared with the keywords for FC in Table 4, it can be inferred that SBC pays more attention to explanations (e.g., cause), people who are not present in the current turn of talk or place not adjacent to the speakers (e.g., their, he, there). Another interesting finding is the overuse of Jesus in SBS because under similar settings FC prefers God in conversations.
Discourse Markers
Discourse markers are words or expressions that facilitate ongoing interactions and are loosely attached to clauses (Biber et al., 1999, p. 140). They have been a central issue in pragmatics for decades and are typically associated with spoken language. This study analyzed the occurrence of discourse markers listed in Table 1 and found that there are significantly more discourse markers in FC than in SBC (χ2 = 59.75, p = .00), as illustrated in Table 6. This suggests that sitcom conversations use more discourse markers than natural conversations.
Comparisons of Discourse Markers Between Sitcom and Natural Conversations.
Table 6 reveals that FC employs significantly more discourse markers such as ah, all right, by the way, (right) now, oh, ok/okay, well, and you know what, while significantly underusing the markers of right, you know, and yeah. These results suggest that sitcoms tend to overuse discourse markers in general, and the number of overused markers in sitcoms is higher than that in natural conversations. Moreover, natural conversations tend to be more affirmative, overusing markers such as right and yeah, which mainly indicate positive evaluation of previous utterances. This finding is consistent with Table 4, where negative evaluations are more prevalent in sitcom conversations. Therefore, the present study suggests that natural conversations use fewer negations or negative evaluations than those in sitcoms.
Modal Verbs
The results from the corpus (Table 7) suggest that FC significantly overuses modal verbs in general (χ2 = 9.30, p = .00) and the markers of can, could, and should in particular. Additionally, FC significantly underuses the marker would in comparison with SBC. The findings demonstrate that sitcom conversations exhibit only a small proportion of modal verb difference (4 out of 12) when compared to natural conversations, though the overall frequency differs significantly. It is noteworthy that the modals overused by FC are primarily of low (i.e., can, could) and median (i.e., should) values (Halliday & Matthiessen, 2004, p. 116), and there is no discernible difference in the use of high-valued modals (e.g., must, have to) between FC and SBC. Both FC and SBC employ high-valued modals at a lower frequency, which may result in less compelling or offensive conversations. However, significant variations exist in the overuse of can, could, and should, and in the underuse of would in FC than in SBC. According to Biber et al.’s (1999, pp. 491–496) classifications, can and could are markers of permission, possibility, or ability, while should is a marker of obligation and necessity; Conversely, would is a marker of volition or prediction. Thus, it could be posited that natural conversations, as exemplified by SBC, are more tentative or less compelling than those in sitcoms.
Comparisons of Modal Verbs Between Sitcom and Natural Conversations.
Intensifiers
Intensifiers are a subset of words, typically adverbs, that modify gradable adjectives, adverbs, and verbs to augment the degree or intensity (Biber et al., 1999, pp. 554–555; Richard & Schmidt, 2010: 184). Notably, the use of intensifiers varies across different genres of English; for instance, intensifiers are rarely employed in academic prose (Biber et al., 1999, p. 564), and different varieties of English exhibit a preference for different types of intensifiers (Biber et al., 1999, p. 564).
The results of our study show that although the frequency of intensifiers in FC is significantly higher in total than in SBC (χ2 = 17.75, p = .00), this difference is primarily attributed to the significant overuse of so (χ2 = 42.52, p = .00) and significant underuse of very (χ2 = −5.05, p = .02) (Table 8). Notably, only 2 out of 18 intensifiers exhibited statistically significant differences between the corpora of FC and SBC. It is therefore inferred that while the choice of specific intensifiers may not differ drastically between the two corpora, sitcom conversations tend to incorporate more intensifiers, which could be attributed to their aim to create dramatic effects in the plot.
Comparisons of Intensifiers Between Sitcom and Natural Conversations.
Regarding the significant differences in intensifiers between FC and SBC, it is noteworthy that the frequency distributions of the intensifiers so and very in SBC bear a closer resemblance to the findings in Biber et al. (1999, p. 565), which report that the frequencies of these two intensifiers are not substantially different. This observation underscores the fact that the lexical patterns of conversations in sitcoms such as FC diverge, to varying degrees, from those in natural settings.
Downtoners
Downtoners are a lexical category of words that function to indicate a reduction in the degree or intensity of a particular aspect of meaning (Richard & Schmidt, 2010, p. 184). These words, such as fairly, almost, somewhat, and partially, serve to mitigate the impact of the modified item (Biber et al., 1999, pp. 555–-556). A quantitative analysis of the data presented in Table 9 reveals that the sitcom corpus (FC) exhibits a statistically significant overuse of the downtoner just (χ2 = 31.34, p = .00), while underutilizing other downtoners, such as less, probably, kind of, and sort of. These findings suggest that there are notable differences between the use of downtoners in sitcom conversations and natural conversations, particularly with regard to a fair proportion (5 out of 18) of the specific downtoners analyzed above. However, it should be noted that the overall frequency differences between the two corpora did not reach statistical significance.
Comparisons of Downtoners Between Sitcom and Natural Conversations.
When considered in conjunction with the findings regarding the usage of intensifiers as discussed in the preceding section, it can be inferred that, on the whole, the conversations depicted in sitcoms (FC) have a tendency to amplify the degree of emphasis being conveyed (e.g., you’ve done
Personal Pronouns
Personal pronouns are a key aspect of deixis, and central to pragmatics research. As shown in Table 10, the present study reveals that the sitcom corpus (FC) employs a significantly greater number of personal pronouns in total than the natural conversation corpus (SBC) (χ2 = −384.84, p = .00), with a notable overuse of such pronouns as I, we, me, you, her, my, mine, your and her, and significant underuse of such pronouns as he, it, they, them, his, their, and its in comparison to SBC. These results suggest that sitcom conversations tend to favor the use of personal pronouns that are directly relevant to the participants engaged in the immediate conversation (e.g., I, we, you), while making infrequent references to third parties (e.g., he, they, them) who are not physically present during concurrent conversational exchanges.
Comparisons of Personal Pronouns Between Sitcom and Natural Conversations.
Regarding the frequency distribution of pronouns, Table 10 demonstrates that both FC and SBC exhibit a preference for the use of first and second person pronouns, as the frequencies of pronouns belonging to these categories are generally higher than those of third person pronouns. This finding suggests that both sitcom and natural conversations tend to prioritize the reference of individuals who are “in immediate contact” (Biber et al., 1999, p. 333) in the ongoing exchange, as opposed to those who are not present during the conversation.
Nonetheless, it is worth noting that there are observable differences between FC and SBC with regard to pronoun usage. Specifically, FC displays a tendency to overuse first and second person pronouns in comparison to SBC, while also underutilizing third person pronouns. This finding highlights the importance of making reference to third parties during natural conversations, which appears to be a less prominent feature in sitcom discourse.
Vocatives
Vocatives are a class of words used to draw the addressee’s attention and/or maintain and reinforce social relationships in conversation (Biber et al., 1999, p. 1112). Examples include boys, buddy and sweetie. Analysis of the data (Table 11) reveals that FC significantly overuses vocatives in total compared to SBC (χ2 = −106.98, p = .00). Further examination shows that FC contains significantly more instances of vocatives such as baby, dude, (you) guys, honey, man/men, and sweetie, while exhibiting fewer occurrences of dear than SBC. These findings suggest that sitcom conversations tend to employ a higher frequency of vocatives to attract the attention of the individuals being addressed during conversation.
Comparisons of Vocatives Between Sitcom and Natural Conversations.
It is worth considering that this difference in vocative usage may also be influenced by the specific registers employed in sitcom discourse. For example, Friends, the sitcom analyzed in this study, predominantly focuses on the lives of young individuals living in close proximity to one another, and as such, the characters in the show often address one another using terms such as dude, guys, and sweetie in order to capture their attention or elicit a response. Moreover, vocatives can serve to emphasize the interpersonal relationships between conversationalists, as demonstrated in the following example: “Well, you got here just in time. I really have to go,
Expletives
Expletives, which are considered taboo expressions or swearwords not typically employed or encouraged in everyday conversations (Biber et al., 1999, pp. 1094–1095), are also examined in this study. The results presented in Table 12 suggest that there is no significant difference in the frequency of expletive usage between FC and SBC (χ2 = −1.68, p = .20) with the exception of the terms shit and hell. Specifically, FC demonstrates a propensity for employing the term hell with a relatively high frequency (χ2 = 9.28, p = .00), whereas it makes use of the term shit significantly less frequently than SBC (χ2 = −22.13, p = .00). However, despite these distinctions, no significant difference between the overall usage of expletives in FC and SBC was observed.
Comparisons of Expletives Between Sitcom and Natural Conversations.
These results suggest that the use of expletives is generally avoided in both sitcom and natural conversations, as reflected by their comparatively lower overall frequency in relation to other linguistic features such as discourse markers, intensifiers, downgraders, and vocatives in similar conversational settings. However, the data also reveals that expletives are more diverse and more frequently used in natural conversations than in sitcoms. This suggests that natural conversations, as illustrated by SBC, tend to exhibit a higher degree of “vulgarity” than sitcoms, as these words have the potential to offend to varying degrees (Biber et al., 1999, pp. 1094–1095), although, in rare or highly contextualized instances, they may also function as a marker of solidarity or politeness (Daly et al., 2004).
Religious Words
Religious words are frequently employed in conversations to convey intense emotions, particularly in response to highly negative experiences (Biber et al., 1999, p. 1094). Table 13 demonstrates that FC employs a significantly greater number of religious words in total (χ2 = 3.72, p = .00), specifically with the phrase (my) God (χ2 = 77.34, p = .00), when compared to SBC. However, FC underutilizes certain religious words such as Christ, heaven(s), Lord, and Jesus in comparison to SBC. This finding is partially consistent with Farr and Murphy (2009), who discovered that native English speakers commonly use God, Jesus, and Christ in natural conversations.
Comparisons of Religious Words Between Sitcom and Natural Conversations.
The disparity in the usage of religious words between sitcom conversations and natural conversations is evident, with the former exhibiting a heavier reliance on the word God and a lesser usage of other religious words such as Christ, heaven(s), Lord, and Jesus than the latter. This finding suggests that sitcoms tend to limit the use of religious words in conversations, possibly due to linguistic and cultural constraints. In contrast, the more diverse usage of religious words in natural conversations implies that they serve as a means to convey intense negative emotions. Hence, it can be inferred that the difference in the employment of religious words between sitcoms and natural conversations could be attributed to the restrictions and norms surrounding their language use.
Conclusion
The present study aims to investigate the lexical differences between conversations in sitcoms and natural settings. The findings suggest that sitcoms tend to employ shorter words, particularly those with three letters, while the average word length between the two settings remains relatively similar, hovering around 3.7 letters per word. This oversimplification observed in sitcom conversations contrasts with the comparatively longer words that are utilized in natural conversations.
An analysis of keywords further reveals that sitcom conversations, as exemplified by FC, exhibit more economy, interaction, involvement, and evaluation, whereas natural conversations tend to provide more explanations and refer to third parties. Additionally, the study identifies specific types of words that are overused in each setting, with discourse markers, modal verbs, intensifiers, personal pronouns, vocatives, and expletives being more prevalent in sitcom conversations, and downtoners and religious words being more frequently employed in natural conversations.
The findings of this study suggest that significant lexical differences exist between conversations in sitcoms and natural settings. Thus, it is recommended that language researchers exercise great caution when utilizing sitcom conversations as data sources, given their limited ability to fully depict the lexical features of natural conversations. While sitcom conversations offer a more convenient and efficient means of collecting and analyzing data, language teachers and teaching material developers should be aware of these differences and adjust their usage of sitcom conversations accordingly to reflect the nature of natural conversations more closely. However, it may not be feasible to directly adopt natural conversations for teaching purposes owing to such imperfections like false starts, overlaps, repairs, etc. in natural conversations, hence modifications are necessary even if they are utilized for educational purposes. It should be noted that this study only examines lexical differences between sitcom and natural conversations, and further research is needed to investigate other linguistic features (e.g., sentence types, syntactic features, speech acts) or to present a more thorough picture of a given phenomenon (e.g., expletives, the discourse marker well, tag questions).
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Social Science Foundation (grant no. 19BYY225), Humanities and Social Science Fund of MOE (grant no. 22JJD740011), and Beijing Foreign Studies University (grant no. 2020SYLZDXM011; ZGWYJYJJ11Z002).
Ethics Statement
Not applicable.
Data Availability Statement
Data supporting the findings of this study are available from the corresponding author upon reasonable request.
