Abstract
This article investigates the interplay of lexical competition and socio-historical events through a close examination of the use of gambling and gaming based on large-scale synchronic and diachronic corpora. We first set the background for comparison through a synchronic study of the collocational patterns and grammatical relations of the two words using Sketch Engine. We show that gambling tends to be associated with negatively perceived activities and strong disapproval, whereas gaming tends to collocate with recreational activities, business, and technology. Using Google Books Ngram Viewer, we focus on the drastic diachronic changes in use of the two words, from competition to co-development. Based on corpora trends, we correlate the rise and fall of the two words and the change in their competition relation to particular socio-historical events: gold rushes, sports betting, the popularity of video games, and the gaming industry boom. The classical competition model of near synonyms remained valid until recent socio-economic events introduced additional and unique meanings for both words. The article thus shows that linguistic variations as collective human behavior changes can be leveraged to evidence other collective human behavior changes.
Keywords
Introduction
This study aims to show that important changes in patterns of collective human behavior can be corroborated and/or discovered by leveraging mature tools to analyze linguistic big data. In particular, we demonstrate that the largest accessible English corpora—enTenTen13 (19 billion words) and Google Books Corpus (361 billion words) (Michel et al., 2011)—are not unyielding but are instead powerful tools for knowledge discovery and linguistic analysis. With the right tools, the linguistic big data will yield information not readily accessible through other approaches. The approach we propose could be characterized as “Big Data Aided Armchair Linguistics” in the spirit of Fillmore (1992).
Gambling and gaming, two words with a close connection and rich implications for socio-economic life and human behavior, have long been the focus of social science research. In addition to the study of gambling- and gaming-related human behaviors, especially in terms of addiction (e.g., Potenza et al., 2019), gambling and gaming expressions are also studied in discourse studies (Catenaccio, 2015; Haakana & Sorjonen, 2011; Möring, 2013), conceptual metaphor theory (Lopez-Gonzalez et al., 2017), sociolinguistics (Yan, 2019), translation studies (Pan & Zhang, 2016, 2017), localization (Dong & Mangiron, 2018; Mangiron, 2017; Strong, 2018), and language sources in different discourse types (Ensslin, 2012). The lexical choices in relation to gambling and gaming activities are among the foci of studies. McGowan et al. (2000) compiled an annotated bibliography of the literature in the socio-cultural domain of gaming and gambling, bringing behavioral, text-based, and policy-driven research together to underline the complex links between gaming and gambling. Intriguing questions thus arise: what is the nature of the competition between these two concepts, and are such competition and usage changes reflected in and evidenced by linguistic usage?
The Oxford English Dictionary (OED, http://www.oed.com/) defines gaming and gambling with the same sense of “the action or practice of playing games, as cards, dice, etc., for stakes.” Yet, gaming has some other senses that gambling does not have, such as “the action of engaging in games or entertainments; merrymaking; sport” and “the playing of computer (video, etc.) games.” This “partial synonyms” account that restricts the similarity of the two words to one of the senses of gaming seems straightforward, but does not predict the difference in their usage, such as the use of gaming to convey positive polarity. Yoong et al. (2013) conducted critical discourse analysis using Fairclough’s three-dimensional framework to deconstruct a Malaysian lottery company’s strategy of promoting lottery activities as “not gambling but gaming.” Pan and Zhang’s (2016, 2017) diachronic studies discovered that the Macao government reframed the gambling business from social problem to entertainment industry through discursive processes in both Chinese and English. Dale (2018) studied the ambiguity of gambling and gaming in legal discourses using corpus linguistics methods to help lawyers improve their awareness of term ambiguity vis-à-vis their audiences to avoid miscommunication. However, these studies were rather restricted in terms of scope of data and time span. To the best of our knowledge, Li and Huang (2018) is the only large-scale quantitative study on the use of gambling and gaming, but it relies only on the single synchronic enTenTen13 corpus and its conclusion does not go beyond the distinction of semantic prosody—that is, gambling being more negative while gaming being more neutral.
In this article, we will investigate three research questions based on the synchronic and diachronic distributional patterns of gambling and gaming extracted from linguistic big data:
Method
Our methodology generally belongs to the field of corpus linguistics as an empirical and quantitative approach to studying language in real life (Teubert & Krishnamurthy, 2007). In particular, we focus on identifying widespread patterns of naturally occurring language that may be overlooked by a small-scale analysis (Baker & McEnery, 2005). It is in this spirit that Huang and Yao (2015) viewed corpus linguistics as the precursor to big data linguistics. Corpus linguistic studies rely crucially on both corpus data and analytical tools. In this study, we adopted Sketch Engine (SkE) developed by Kilgarriff et al. (2004). SkE offers a range of functions, such as Concordancing, Thesaurus, Word Sketch, and Sketch Difference, and has been widely adopted in lexicography, discourse studies, translation studies, language teaching and learning, and so on. SkE can contrast grammatical and collocational relations between near synonyms by processing a huge amount of authentic data (Wang & Huang, 2017) and triangulate the findings obtained through its different functions (Li et al., 2018, 2020). Near synonym-driven research is one of the most productive approaches in corpus linguistics, pioneered by Atkins and Levin (1995) for English and Tsai et al. (1998) for Chinese. By minimizing the lexical contrast, the collocational differences extracted from corpora lead to pinpointed linguistics accounts. In an earlier study using SkE, Li et al. (2018) summarized three groups of grammatical relations (GramRels) from a total of 36 GramRels in SkE—that is, possessive relation, verb–noun relation, and modifying relation—to provide relevant comparisons of three Chinese synonyms. In this study, we selected nine GramRels and summarized them into three groups: coordination relation, verb–noun relation, and modifying relation.
We chose the English web corpus enTenTen13, which contains more than 19 billion words with rich metadata, as our main corpus. The gargantuan corpus size provides comprehensive coverage of a larger variety of linguistic properties. The data were downloaded from the internet in 2013, cleaned, deduplicated, tagged, and spam-filtered (Jakubíček et al., 2013). The last step is vital to our study as gambling and gaming are among the most popular spam words. The corpus is further divided into sub-corpora according to the top-level domains (e.g., .org, .edu, .com, .uk, and .us) from which the texts are retrieved. This allows users to differentiate between text sources to make comparison across regions.
We also studied the Google Books Corpus to add a historical dimension. The early version of the Google Books Corpus contains 5,195,769 digitized books and more than 500 billion words, with 361 billion in English (Michel et al., 2011). On the free, web-based platform, users can query word usage in one of the languages, or in one of the varieties of English for the period they are interested in. Although the corpus has been criticized, in particular for the quality of the metadata and OCR errors, scholars are still quite optimistic about its future applications (e.g., Nunberg, 2010). This corpus also helps to address Schützler’s (2018) lament about the problem of a lack of diachronic corpora large enough to enable analyses of less-frequent items in diachronic lexical studies. Michel et al. (2011) demonstrated a range of innovative studies in what they called “culturomics”. Twenge et al. (2017) examined trends in the use of seven taboo words in the Google Books Corpus from 1950 to 2008 and found that American culture has become increasingly accepting of the use of taboo words, consistent with higher cultural individualism. Drawing insights from the previous methodologies, our study focuses on the interplay between lexical competition and related socio-historical events.
To study patterns of meaning changes in relation to Research Question 1, we used the Thesaurus function to produce words with a close relation to the keywords. The distributional thesaurus lists words with similar collocational behaviors to the keyword and may include a set of synonyms, antonyms, hypernyms, and hyponyms (Kilgarriff et al., 2014). We hypothesized that these closely related “neighbors” would enable us to identify new meanings and meaning variations. We also examined the extent to which gambling and gaming are similar to each other with this function and generated common and only patterns through Sketch Difference.
To detect patterns of correlational changes in collective human behavior, as in Research Question 2, we selected three GramRels—that is, “definitions,” “X is a . . .,” and “. . . is a X”—to provide rich descriptive information about the status quo of gambling and gaming in context, with which updated definitions of the terms can be formulated and the latest trends in gambling and gaming activities can be observed. Sentences 1 and 2 are examples of how gaming is represented in the web corpus:
Alternate reality
For the diachronic variations, we utilized three complementary sources—Online Etymology Dictionary, online Oxford English Dictionary, and the Google Ngram Viewer (GNV)—to track the historical changes in the use of gambling and gaming in terms of meaning and frequency. The Online Etymology Dictionary (https://www.etymonline.com/) is a map of the wheel-ruts of modern English, providing explanations of what words meant and how they sounded 600 or 2,000 years ago. The Oxford English Dictionary, containing both synchronic and diachronic information of the English language, provides information to the meaning, history, and pronunciation of 600,000 words from across the English-speaking world. The GNV (http://books.google.com/ngrams) provides time sequences of frequency or usages of words and phrases (i.e., n-grams) over a period of five centuries in eight languages, covering 6% of all books ever published (Lin et al., 2012). Based on the diachronic patterns extracted from the Google Books Corpus, we attempted to map the changes in important historical events to identify the significant social and cultural changes underlying lexical competition and variations. The Gricean Cooperative Principle (Grice, 1975) and Relevance Theory (Sperber & Wilson, 1986) were drawn on for predictions about the use of gambling and/or gaming in specific socio-historical contexts. We further mapped gambling and gaming to the Suggested Upper Merged Ontology (SUMO; Niles & Pease, 2001) through WordNet (Pease & Fellbaum, 2010) to understand the conceptual basis for the contrast.
Finally, Research Question 3 regarding regional variation was examined with the Sketch Diff function to sketch and contrast the variations in use of each keyword in the two sub-corpora of enTenTen13—that is, the American English corpus and the British English corpus. The differences were also mapped to possible relevant events in each society.
Results
The results of our analysis are presented here, based on the data and methodology described in the previous section.
Words in the Thesaurus
The 32 words most closely related to gambling and gaming extracted by SkE thesauri from the enTenTen13 corpus are listed in Table 1. Each of these close “neighbors” can be considered to represent a particular semantic dimension of the keywords. Gambling and gaming are found to be the most closely related word to each other lexically and grammatically (similarity score = 0.376), which reflects their close ties and implies both the difficulty and the necessity of distinguishing one from the other. Closer observation reveals that the thesaurus of gambling can be classified into three major thematic groups: business and industry (
Thesaurus Lists of Gambling and Gaming by Descending Similarity Score.
Sketch Diff: Common and Only Patterns
The Sketch Diff function compares two words at a time, generating common patterns of the pair and only patterns of each word. For this study, we compared gambling and gaming as nouns. We opted for the default setting of a minimal frequency of 10 for both words and a maximum number of items in a GramRel of the common block of 30. The three GramRels examined are as follows: coordination relation, verb–noun relation, and modifying relation.
First, a coordination relation is defined by the context-free pattern of “X and/or . . . .” As shown in Table 2, the colors green and red indicate the tendency of listed words to collocate with gambling and gaming, respectively, in that particular relation. The deeper green a word is, the more likely it is to co-occur with gambling; the deeper red a word is, the more likely it is to co-occur with gaming. When a word, such as computing, does not have an attested “and/or relation” with gambling in the corpus, it is considered to be an only pattern for gaming. The common patterns and only patterns in coordination relation provide evidence of semantic differences based on the clustering of similar words. Gambling tends to be used in juxtaposition with words associated with an unhealthy lifestyle or even crime (e.g., prostitution, smoking, pornography, porn, addiction), whereas gaming is juxtaposed with those related to technology and recreational activities (e.g., browsing, multimedia, anime, playback, computing). Words that frequently collocate with both gambling and gaming (e.g., poker, casino, lottery, betting, racing, entertainment) in the white area of Table 2 are their common patterns. This indicates the situations in which the two words are to some extent interchangeable. It is worth noting that words in the common patterns still exhibit different degrees of tendency to collocate with the two terms. For example, gambling collocates much more strongly with casino, whereas gaming collocates relatively more with entertainment.
Coordination Relation of Gambling and Gaming.
Second, we examined verbs that collocate with gambling and gaming in verb + noun or noun + verb relations, in which gambling and gaming are either the object or the subject. The verb collocates reveal information about how gambling/gaming activities are carried out, perceived, or regulated by relevant parties. The results automatically generated by SkE inevitably contain occasional noises. Hence, all results were manually checked to avoid inadvertent errors. It was discovered that while gambling noticeably tends to collocate with verbs related to regulation, policing, or control (e.g., legalize, counsel, violate, state-sponsor, deduct, criminalize, prohibit, ban, curb, outlaw, and oppose), gaming tends to co-occur with verbs pertaining to favorable changes or technological advances (e.g., redefine, console, revolutionise [ze], innovate, develop, and experience).
Third, in terms of modifying relation, we focused on both adjectives and nouns that have a modifying relation with gambling and gaming, including the GramRels of “adjective predicates,” ‘modifiers of gambling/gaming,’ and “nouns modified by gambling/gaming.” The results of the three GramRels are shown in Tables 3 to 5. Again, on one hand, gambling is found in collocation with adjectives with a strong negative meaning (e.g., immoral, sinful, unlawful, risky, regressive, unregulated, rampant, evil, illegal, pathological). On the other hand, gaming collocates with adjectives with a neutral or positive meaning, indicating its pervasiveness and predominance (e.g., massive, public, booming, populous, mainstream, ubiquitous) as well as its technological (e.g., real time, PC, 3D, etc.) and recreational (e.g., interactive, pleasurable, sensible, casual) nature. The nouns modifying and being modified by gambling can be classified into three main themes—that is, venues or facilities (e.g., den, roulette, mecca, establishment, sites), gambling-related actions or activities (e.g., drinking, bankroll, bet, game, bonus), and (mostly negative) consequences (e.g., debt, hell, winning, monopoly, addict, addiction). By contrast, nouns strongly associated with gaming tend to be either technical/technological terms (e.g., console, 3D, retro, PC, laptop, video, software, platform, headset, tabletop) or the proper names of large gaming companies or organizations (e.g., Boyd, Cantor, SK, WMS).
Adjective Predicates of Gambling and Gaming.
Modifiers of Gambling and Gaming.
Nouns Modified by “Gambling/Gaming.”.
Representation of Gambling and Gaming in Activities
Tables 6 and 7 present information about how gambling and gaming activities are represented through the three GramRels. To focus on the distinctness of each activity, we introduce the only patterns first in Table 6, summarizing the only patterns of gambling and gaming for the three GramRels, with the words sorted in descending order of salience. Table 7 illustrates the common patterns with the frequency of collocation in three GramRels for each keyword. Yet again, gambling occurs with nouns that denote very negative evaluations—for example, addiction, risk, disorder, problem, and sin—evoking clearly negative representations of the activity involved, although it also collocates with a range of generic umbrella terms such as behavior, act, phenomenon, practice, and activity. By contrast, the collocates of gaming mainly pertain to the participants and stakeholders in the gaming industry—for example, developer, provider, market, company, iNetBet—or the platforms, tools, and types of gaming—for example, software, medium, genre, category, live, console, site—conveying a neutral image.
Only Patterns of Gambling and Gaming in Three GramRels.
Frequency of Common Patterns of Gambling and Gaming in Three GramRels.
The common patterns of gambling and gaming summarized in Table 7 show that both words are most frequently represented as a game, activity, business, industry, hobby, or pastime. They also have a similar number of instances of being described as a sector or an issue. However, notable differences in representation are still evident through these common patterns. For example, gambling is much more strongly represented as an addiction (n = 128) than gaming is (n = 17), whereas gaming is far more frequently portrayed as a hobby (n = 210) than gambling is (n = 59). The fine-grained differences match and support our earlier generalization regarding the polarity differences of these two near synonyms.
Regional Variation
In this section, we focused on regional variations in the use of gambling and gaming in the two major varieties of English: American English and British English. Gambling shows notably different distributions in coordination relation, noun–verb relation, and, in particular, modifying relation between American English and British English. Because the relative sizes of the U.K. (roughly 1,182 million words) and the U.S. (164 million words) corpora strongly favor U.K. usages, which could lead to “over-magnified” U.K.-only patterns, our discussion here focuses on the U.S. only patterns.
One salient group identified as U.S.-specific modifiers of gambling are the names of American football teams—for example, Seahawks, Texans, and Ravens (Table 8). While this is not a surprising finding, it is remarkable that no U.K.-specific team names (e.g., Manchester United) occur in the only patterns. This indicates the popularity of betting on football games in the U.S. (cf. sports betting in the “Discussion” section). Nouns modified by gambling reveal similar differences between British and American English.
Only Patterns of “Modifiers of Gambling/Gaming” in the U.S. Sub-Corpus.
There are also clear differences between British and American English in the modifiers of gaming (Table 8). The predominant modifiers of gaming in the United States tend to be sports-related, such as football, rugby, hockey, alternate, preseason, playoff, and NFC (National Football Conference). This again consolidates the above findings in relation to the modifiers of gambling, that sports betting tends to be a common practice in the United States. Apart from the differences in collocations, there are regional differences in terms of the two words’ frequency of occurrence (see Table 9).
Frequency of Occurrence of Gambling and Gaming in enTenTen13.
Taking the entire enTenTen13 corpus as a reference, the normalized frequency of both words in the U.K. sub-corpus is very close to that in the corpus as a whole, whereas that of gambling and gaming in the U.S. sub-corpus is more than 3.1 and 1.5 times higher, respectively, than in the enTenTen13 corpus as a whole (Figure 1). This indicates that both gambling and gaming are used much more frequently in the United States, most probably reflecting the prevalence of gambling and gaming activities there. We used UCREL log-likelihood wizard (Rayson & Garside, 2000) to perform tests for a significant difference in the frequency of each word between the U.K. and the U.S. sub-corpora. A log-likelihood of 6.6 or above illustrates that the difference is significant at p < .01 level. The log-likelihood scores show that the frequency differences between the U.K. corpus and the U.S. corpus for both gambling and gaming are significant. In addition, gambling is more frequently used than gaming in the United States, whereas gaming is more commonly used in the United Kingdom and the enTenTen13 corpus as a whole.

Normalized frequency of gambling and gaming in enTenTen13.
Diachronic Change
We then used dictionaries and GNV to track diachronic changes in the use of gambling and gaming in terms of meaning and frequency. The Online Etymology Dictionary provides a brief historical account:
gambling: (n.) 1784, “habitual indulgence in gambling,” verbal noun from gamble (v.). Gambling-house attested by 1794.
gaming: (n.) 1500, “gambling,” verbal noun from game (v.). From 1980s in reference to video and computer games. Gaming-house is from 1620s; gaming-table from 1590s.
OED traced gambling a little further back to 1700 in the quotation “The Room where it stood was an old gambling Cock-loft” from a translation related to the Spanish novel Don Quixote. OED’s earliest record of gaming with the meaning “the action or practice of playing games, as cards, dice, etc., for stakes” is in 1501, consistent with the finding in the Etymology Dictionary, although the word is spelled as gamyng. In addition, OED reveals that gaming and gambling are etymologically linked by the obsolete word gameling (earliest instance 1594), which means “the playing of games; (perhaps) gambling.” Gamble, the verb form of gambling (1700), first occurred in the 1750s. Based on this information, we can confirm that both gambling and gaming are derived from “game,” following the sequence below:
game, v. → gaming, n. → gameling, n. → gambling, n. → gamble, v.
We queried GNV for the occurrence of gambling and gaming in the Google Books English corpus (Figure 2). Before the 1750s, gaming has achieved sizable usage while gambling was rarely used. The frequency of use of both words fluctuated noticeably in the early stages due to the scarcity and uneven distribution of historical data, so we focus on trends after the 1750s, and divide them into three periods. In the first period (1750–1900), the use of gaming increased rapidly and remained at a high level until the 1820s, but steadily declined toward the end of the 19th century. By contrast, the use of gambling quickly rose from a low-level, surpassed gaming in 1847 and continued to increase (cf. The Gold Rush era in the “Discussion” section). In the second period (1900–1980s), gambling maintained a frequency around three times as high as that of gaming, whereas the First World War (WWI) and the Second World War (WWII) saw temporary downturns in the use of both words. Finally, from the early 1990s, use of the two words rose sharply in parallel, although gambling was always more frequently used than gaming by a wide margin (cf. elaboration on socio-historical factors in the “Discussion” section).

Google Ngram Viewer of Gambling and Gaming (1750–2008).
The trends exhibited in the overall English corpus closely match the trends for both the American and the British English sub-corpora in GNV, although gambling occurred at markedly higher frequencies in American English than in British English over the past 150 years, particularly from the 1990s onwards.
Discussion
Based on the above results, we now discuss some key socio-historical events and factors that have contributed to the rise and fall in usage of the two words over the last two centuries. We draw on the Gricean Cooperative Principle and Relevance Theory to pinpoint the pragmatic motivations for preferring one word over the other, and relate the word usage to the relevant conceptual frameworks, based on the Ontology–Lexicon Interface (OntoLex, Huang et al., 2010). We argue that the dominant model that captures the relations of the two words shifts from the competition model in the 19th century to the co-development model from the 1990s.
The Gold Rush Era: Rise of Gambling and Decline of Gaming
The increase in the use of gambling and the decline in use of gaming largely coincided with the “gold rush” period in the English-speaking world, peaking around the years of the California Gold Rush (1848–1855). Gambling in gold-mining areas has been recorded in English-speaking countries such as Australia, New Zealand, South Africa, the United States, and Canada (cf. Courtwright, 1996; Fetherling & Fetherling, 1997). In the United States, the years following the discovery of gold in California (1848) witnessed the influx of gold miners and the sprouting of saloons and gambling halls “everywhere” (Courtwright, 1996, p. 71). Courtwright noted that gambling was part of the gold rusher’s life—“[i]n such a risk-all atmosphere, with death and illness daily prospects, gambling was a natural pastime” (Courtwright, 1996, p. 72). In the eyes of one San Franciscan, “[e]verybody gambled [. . .] that was the excuse for everybody else” (Courtwright, 1996, p. 73).
GNV captures moderate use of “gold rush” lemma in American English around the U.S. gold rush era, along with much more frequently occurring alternative expressions such as “gold discovery” and “gold mining.” The continued increase in the occurrence of the “betting_NOUN” lemma was also noticeable in this period. Gold rushes emerged as a socio-historical event in which the practice of gambling and (money) betting reached a considerable volume, and the word gambling gained the momentum for a steady rise in usage. The frequency of occurrence of gambling quadrupled in both American and British English in the 19th century, whereas the use of gaming declined by more than two thirds in both varieties of English (cf. Figure 2). Based on the trends and timing, the gold rush is likely to have been the catalyst among other events favoring the popularity of gambling.
The dominance of gambling at this stage can be predicted by both the Gricean sub-maxim of quantity—that is, “make your contribution as informative as is required (for the current purposes of the exchange)” (Grice, 1975, p. 45)—and the principle of maximizing the relevance to cognition in Relevance Theory (Wilson & Sperber, 2006). Gambling entails sufficient and precise enough information to denote betting-for-money activities, whereas gaming sounds vaguer and is less effective in achieving the communicative purpose. The former therefore eventually overtook the latter around the mid-19th century.
The Surge of gaming: The 1980s Onwards
Since the late 1980s, both gambling and gaming have shown a sharp and continuous rise in frequency count (cf. Results). We discuss the reasons for the sharp rise in the use of gaming in this section and that of gambling in section “gambling and Sports in the United States.” We found two factors underlying the recent surge in the use of gaming—(a) the desire of the industry and government to switch from gambling to gaming in their discourses and (b) the substantial spread of (computer-based) video games from the late 1980s.
First, recent studies (e.g., Pan & Zhang, 2016; Yoong et al., 2013) have revealed that the gambling industry and governments tend to shift their usage from gambling to gaming when they refer to the industry in an attempt to change the public’s perception and reframe the socially problematic business as an important source of revenue. Gaming is more favorable than gambling in terms of its semantic and conceptual properties. Gaming tends to be represented as a market, a company, a medium, or a genre, whereas gambling often collocates with words with a negative meaning—for example, addiction, risk, disorder, problem, or sin (cf. Results). The SUMO network reveals more fine-grained differences. SUMO (Niles & Pease, 2001; Pease, 2011), the only formal ontology that has been mapped to all of the approximately 117,000 word senses in WordNet lexicons (Fellbaum, 1998), identifies the lexicalized conceptual differences between gaming and gambling, although both words are considered to be complex events. According to SUMO, gaming is both a Contest and a Recreation or Exercise, which is defined as “A Contest whose purpose is the enjoyment/stimulation of the participants or spectators of the Game.” By contrast, gambling is a Game and Betting at the same time, and Betting is defined as “A Financial Transaction where an instance of Currency Measure is exchanged for the possibility of winning a larger instance of Currency Measure within the context of some sort of Game.” Since gambling is a sub-type of gaming, we can observe that shifting from using gambling to gaming serves to de-emphasize the characteristics inherent in Betting while embracing the characteristics that are conceptually closer to gaming. In terms of communication, this preference is an attempt to “optimize” the meaning by strengthening the positive meaning and dissociating from the negative meaning (Sperber & Wilson, 1986).
Second, the emergence of (computer-based) video games created strong demand for the use of gaming. Video games debuted in the mid-20th century and were commercialized in the early 1970s, but also experienced several crashes and recessions (Wolf, 2012)—for example, the great crash in North America in 1983. It is generally believed that the industry was almost single-handedly revitalized by the Japanese corporation Nintendo in 1985, and video gaming culture has become well established since the widespread success of the Nintendo Entertainment System in the United States, followed by Europe, Australia, and other regions (Consalvo, 2006). Since the 1980s, video gaming has become a major form of entertainment, especially for the young. Figure 3 shows that the “gaming_NOUN” lemma increased in frequency from the late 1980s at a speed similar to that of “video games” and “computer games,” suggesting a strong positive correlation between the spread of video games and the increase in the use of the word gaming. Gaming provides sufficient and precise information about the playing of video or computer games, satisfying the Gricean maxim of quantity and Relevance Theory of maximizing meaning for cognition. In addition, based on SUMO conceptual ontology, gaming is not only a synonymous alternative and euphemism for gambling but also a hyponym of video gaming and wagering activities, suited to a much wider range of usage domains.

“Gambling,” “video games,” and “computer games” in English (1900–2008, GNV).
Gambling and Sports in the United States
Gambling tended to be used more frequently in American English than in British English over the past 150 years in the sub-corpora in GBV and also more recently in enTenTen13 (cf. Results). We would argue that this reflects the practice of sports betting in the United States (cf. Figure 4). The popularity of betting on sports in the United States has been clearly documented in gambling studies—for example, Thompson (2015) observed that “making wagers on the results of games may be the most popular form of gambling in North America and it is certainly the most popular form of illegal betting in the US” (p. 387). We also found that the most salient usage of gambling in American English occurs in the context of (professional) sports (cf. the “Regional Variation” section).

“Gambling” and “sports betting” words in American English (1930–2008, GNV).
Interestingly, sports betting was actually prohibited in the United States by the Professional and Amateur Sports Protection Act of 1992, except in four states that had already legalized it—Delaware, Montana, Nevada, and Oregon (Dayanim et al., 2018). The reason why the use of gambling related to sports increased at the same time as it became illegal in most states lies in the fact that gambling activities still took place outside of the four states, and the U.S. government and the press wanted to underline the illegality of these activities and differentiate them from gaming. In other words, the use of gambling to specifically refer to sports betting satisfied the need to maximize meaning by underlining its illegal (in most cases) status while satisfying the communication need to optimize meaning by allowing gaming to refer to sports that did not involve betting. In terms of Gricean maxims, it is similarly the competition between the maxim of relevance and the supermaxim of ‘being perspicuous’—that is, avoiding ambiguity (Grice, 1975, p. 46). This accounts for the rise in use of both words in parallel from the late 1980s in American English. Here we must emphasize that such correlations were surmised and corroborated in our research, but not proven. For instance, we suspect that the formation and growing power of governing bodies of sports in the mid-1990s (Forster, 2016) may also have contributed to the increased usage of gaming, but our data simply do not indicate a strong enough pattern to support this hypothesis.
In terms of semantic prosody, we observed that the widely used sports-related collocations of gambling in American English largely have the effect of mitigating the harms and problems associated with gambling activities while highlighting the interactional and recreational features of sports as an integral element of gambling (cf. Results). By contrast, the largely negative terms collocated with gambling in the U.K. corpus probably reflect its narrower usage in British English.
An explanation for the greater frequency of occurrence of gambling in the United States than in the United Kingdom can also be constructed based on ontology. Sports-related terms, as featured collocations of gambling in the U.S. corpus, tend to meet the desirable conceptual associations of participants in or spectators of games, including enjoyment and stimulation as well as recreation or exercise. In this favorable view of sports, gambling on professional sports tends to be considered more socially acceptable. In fact, sports betting continued to grow regardless of the legal constraints in the United States and was finally legalized by federal law in 2018.
The Competition Model and the Co-Development Model
The chronological data from this study provide evidence not only for the competition between gambling and gaming in earlier times but also for their rapid rise in parallel in recent decades. This evokes two different models.
The fact that gambling and gaming are near synonyms, although with hyponymic relation, suggests that they should be in competition. The earlier trend in usage bore out this prediction—gambling exhibited a steady rise in usage from the late 18th century into the early 20th century, whereas gaming was in continuing decline. The sum of their frequencies was roughly stable, suggesting that gambling, the newcomer, was to a large extent replacing gaming in usage. We can observe that there are bi-grams in which gaming was gradually replaced by gambling, leading to the shift, for example, from “gaming debts” to “gambling debts” and from “gaming houses” to “gambling houses,” in around the 1840s. The competition model prevailed in this period.
However, from the 1990s, both gambling and gaming increased exponentially in frequency of use. A new model emerges, which we named the “co-development model.” The co-development period of the two words corresponds to a time that saw the opening up and expansion of gaming businesses on a transcontinental scale, from North America to Europe and the Asia-Pacific region (Thompson, 2009). In the United States, given the staggering “underground” sports gambling market, there has been extensive debate, in official settings and on the web, surrounding the voter referenda and the legalization of sports betting in 2018. The booming industry across the world, including in Macao, Singapore, Australia, Canada, South Korea, and the United Kingdom, created an unprecedented need to use the words gambling and gaming in both professional and mass communication.
Apart from the (external) social factors, we hold that, linguistically, the co-development of gambling and gaming is possible only if the two words stand as distinct lexical options, each uniquely suitable for certain communicative settings and purposes. These words are therefore not fully interchangeable and are both needed to communicate gambling/gaming matters. We have observed that gambling and gaming are complementary in terms of their functions in the context of sports betting in the United States. Therefore, with the recent boom in the gaming industry, gambling and gaming co-existed and greatly increased in their frequency of occurrence, exhibiting a shift from the competition model to a co-development model, each with different additional semantic functions—that is, video games for gaming and sports betting for gambling.
Conclusion
This study has used two of the largest language data sets (enTenTen13 and Google Books Corpus) with SkE and GNV to examine gambling and gaming. It is found that gambling tends to have a more negative semantic prosody, whereas gaming is more neutral. The OntoLex approach further casts light on their conceptual differences. Diachronically, three periods emerged—from two early periods in competition that resulted in shifting dominance from gaming to gambling, to recent decades when usage of both terms rose in parallel. The competition of cognitive and communication needs of Relevance Theory (Sperber & Wilson, 1986; Wilson & Sperber, 2006) nicely accounts for the dynamic co-variation of the two terms, as did the application of different Gricean maxims. The changes in the pattern of co-variations occurred in the context of possible socio-economic catalysts such as the mid-19th century gold rushes, and the late-1990s success of video-game products, as well as the popularity and semi-legalization of sports betting in the United States. The shift from the competition model of near synonyms to the unusual co-development model seems to be the result of a coincidental simultaneous surge in the popularity of video games and sports betting. That is, these surges created semantic space for gaming and gambling, respectively, freeing them from the paired competition relation of the past 300 years.
To conclude, we wish to draw attention with this study not only to the potential of linguistic big data–driven research but also to the simple fact that any linguistic change is a change in collective human behavior. Observation, explanation, and prediction of collective human behavior changes might be the single most impactful research issue in the humanities and social sciences. Yet, historical changes in collective human behavior are extremely difficult to pin down. However, based on the self-evident truth that changes are either a reaction to or instigated by other changes, we were able to use the historical usage record of a pair of near synonyms that represent a significant human behavior to surmise and later corroborate potential major changes. The co-dependency of gambling, sports, and gaming is in fact well documented (Macey & Hamari, 2018). This ability to provide evidence for changes in collective human behavior may well be the most valuable contribution of linguistic big data.
Footnotes
Acknowledgements
Longxing Li and Vincent X. Wang would like to acknowledge the research project MYRG2019-00162-FAH of the University of Macau, which provided the PhD scholarship for Longxing Li.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This article was supported by the MYRG2019-00162-FAH of the University of Macau and the Joint Supervision Scheme with the Chinese Mainland, Taiwan, and Macao Universities sponsored by the Hong Kong Polytechnic University (Project No. G-SB97).
