Abstract
The paper explores the potential of big data analytics for researching anti-immigrant discourse. We emphasize contextualization as an essential element of research and follow a hybrid approach inspired by best practices of computational content analysis, combining human hermeneutic expertise with supervised machine learning to classify a corpus of comments in online news communities in Singapore over 6 months (N = 399,225). The paper highlights how big data analytics can provide a nuanced and critical apprehension of immigrant-related discourse in large social media datasets.
Keywords
Introduction
Anti-immigrant attitudes are said to have become increasingly negative worldwide, and hostile coverage of immigrant issues “has been a staple for most mainstream media” (c p.2). It is often through incendiary political debates (Hovden and Mjelde, 2019), prejudices, xenophobia, or hate speech that rhetoric about immigrants comes to be depicted. The communication scholarship often associates the rhetoric about anti-immigrant attitudes with ethnonationalism and racism (Akerlund, 2021; Daniels, 2018; Ekman, 2019; Farkas et al., 2018; Laaksonen et al., 2020; Matamoros-Fernandez, 2018; Merrill and Åkerlund, 2018; Poole et al., 2019; Siapera et al., 2018). This lets us presume that the anti-immigrant vote and discourse are mainly driven by identity-linked resentment.
Some scholars have recently pointed to inconsistencies in such characterization. They argue that the media or the political elites inaccurately convey an impression of increased identity-driven hostility toward immigrants in public communication. Dennison and Geddes (2019) maintain that attitudes towards immigrants are more favorable in Europe than the political discourse has suggested since the migration crisis in 2015; they lament “several common assumptions by policymakers about popular attitudes to immigration: that they are negative and increasingly so; that this negativity fuels populist movements; that they are linked to racism and xenophobia; and that they are malleable and should be altered accordingly” (Dennison and Geddes, 2019, p.106). According to the authors, voting for anti-immigration parties correlates more with the salience of immigration in political discourse than citizens’ attitudes toward immigrants. Similarly, the latest General Social Survey in the US shows that Americans view immigration far more positively than the political discourse has suggested over the last ten years (Hout and Maggio, 2021); while two-thirds of adults surveyed in the 1990s called to reduce immigration, only one-third asked for it in 2018. For Bonikowski (2017), there is no surge in ethnonationalism but an increased resonance of the discourse, which channels the expression of other pre-existing attitudes. He argues that ethno-nationalist discourses may enable aggrieved citizens to connect negative social experiences and undesirable political changes with an apparent deleterious effect of immigration. Bonikowski argues that the perceived deterioration of collective status is a latent frame, and discourse against immigrants may be misrepresented as an issue uniquely related to intergroup conflict. Those positions raise a counterintuitive ontological problem: does anti-immigrant rhetoric solely express resentment towards immigrants? The question challenges current assumptions and research practices in public discourse about immigrants.
The current scholarly work on immigrants also says little about how anti-immigrant discourse is “enclaved” in the broader public sphere. The data used for studies on anti-immigrant attitudes often examine radical actors located at the periphery of the public communication space, inadvertently focusing on a particularly antagonistic public. This rich literature attached to reactionary sites remains confined to a “dominant focus on the extreme” (Titley, 2014a: 51) – i.e., the German Politically Incorrect News (PI-News) (Lee, 2015), the Austrian Unzensuriert (Krzyżanowski and Ledin, 2017), the Finnish MV-Lehti (Ylä-Anttila et al., 2019), and the Swedish Avpixlat (Ekman, 2018)—and limit our understanding of the wider landscape in which such discourse operates.
Scholarly work about attitudes towards immigrants in the public sphere also displays an unevenly spread research agenda. Studies tend to scrutinize the influence of Western media or politicians (Eberl et al., 2018; Heidenreich et al., 2022; Nortio et al., 2021). Despite attention from scholars in the Global South, notably in Southeast Asia (Jun, 2019; Liu, 2014, 2019; Yi and Jung, 2015), there is still a relative lack of consideration for diverse and alternative configurations related to immigrants (Vertovec, 2019), restricting opportunities for an exhaustive episteme.
Alternatively, social media has offered a prodigious venue for observing how social issues such as anti-immigrant attitudes circulate (Rogers, 2009). Online platforms have created a discursive space where individuals outside the traditional centers of power can voice their concerns, allowing investigation of alternative interpretations rooted in street vernacular (Dahlberg, 2011; Hauser, 2022). However, even though the progress of social computing and the abundance of digital data help us analyze social phenomena, it is essential to recognize that data-intensive research poses challenges.
The set of practices associated with big data often embodies a positivistic approach that overlooks the inherently political nature of data (Boyd and Crawford, 2012; Gitelman, 2013; Iliadis and Russo, 2016). A paramount concern with big data studies in immigrant-related literature centers around ensuring that the assemblage and analysis of the data do not forgo the influence of identities and the multiplicity of viewpoints (i.e., color blindness) (Nikunen, 2021). Scholars have observed that big data studies frequently neglect important concepts such as situated knowledge and positionalities, failing to evaluate whether data collected to understand specific social mechanisms does not convey forms of subjugated knowledge or ideological biases (Cooky et al., 2018). Those issues are fundamental when researching issues related to minorities (Matamoros-Fernandez and Farkas, 2021).
Additionally, the advent of automated software tools and algorithms, or computer-based text analysis methods (CTAMs), has brought about significant advancements in detecting and measuring latent social science constructs within vast datasets. They offer welcome opportunities for scalable and replicable results. The problems associated with CTAM's ability to decipher complex textual meanings or operationalize accurate measures for social constructs (Van Atteveldt et al., 2021) are increasingly mitigated by more robust methodological and validation frameworks (Baden et al., 2022; Birkenmaier et al., 2023). Recent studies have shown that combining algorithmic extraction of coherent and recurrent patterns with an inductive exploration using expert human knowledge permits a more rigorous and valid analysis of complex natural discourse (Baden et al., 2020; Nelson, 2020).
This study aims to address the knowledge gaps identified above. We use big data analytics and computational text analysis method to refine our understanding of narratives about immigrants and evaluate their incursion into public discourse while addressing the risks of potentially unproblematized, biased, and decontextualized data.
Study context: Singapore
Singapore offers an interesting case study for anti-immigrant discourse in a non-Western context. The case study is an opportunity to contrast perspectives on the rhetoric by offering a different socio-political configuration and a unique immigration profile.
Unlike many other immigration-prone nations, Singapore is, at its core, distinctively multicultural and multiracial. Since its heyday as a trading post in the nineteenth century, the city-state has sustained an open-door policy for immigrants and, in the past two decades, has pursued one of the world's most significant experiments in mass immigration. The total population has grown by over 40% through immigration; London and New York grew by 25% and 13%, respectively (World Economic Forum, 2017). The government also controls immigration flows, mainly low to mid-skill migrant workers (80% of the foreign workforce) (Ministry of Manpower, 2021).
The political regime in Singapore has left little room for political entrepreneurs to articulate alternative discourses and has likely impacted how citizens talk about immigrants in public. Public speech about racial Others comes with strong injunctions. The People's Action Party, which has governed the city-state since 1959, has kept a stronghold on political activities through a sophisticated legal infrastructure, and harmful speech related to races and ethnicities has been restricted by the Maintenance of Religious Harmony Act since 1990. Besides, through intricate socialization mechanisms, commonly called OB markers or out-of-bound markers (Hallin and Mancini, 2011), the people of Singapore have learned to adeptly navigate discussions concerning the delicate subjects linked to race or religion; this internalization of self-censorship has served to foster a culture of political discipline within the populace, as observed by Rodan in 2003. Emerging evidence suggests that the Singaporean government has also ingeniously maintained public compliance through a light-touch approach toward Internet censorship (Seng, 2008; Soon and Soh, 2014), allegedly engaging in guerrilla-type activism (Tan, 2020) and “counter-insurgency” against critics on social media (Li, 2007). An intriguing aspect of this strategy is the reported involvement of the “Internet Brigade,” a team of digital media consultants who employ subtle tactics to create a favorable public opinion on controversial policies while co-opting dissenting voices (Tan, 2020). In April 2019, the Online Falsehoods and Manipulation Bill (POFMA) was introduced as the first-ever effort garnered by a national government to regulate social media, with a focus on curtailing the proliferation of false or misleading information online (Vaswani, 2019). The landmark effort is seen by proponents of freedom of speech as inadvertently limiting public expression in the digital sphere
The restraints on public discourse have allegedly not reined in outbursts against immigrants online. The issue of anti-immigrant discourse has gained prominence since the general election (GE) in 2011; after the GE in 2016, it was designated as the government's most perilous concern (Soon and Samsudin, 2016). Singaporeans have reportedly and increasingly voiced anger against immigration on social media, protesting inequitable policies or demanding more measures to differentiate between citizens and noncitizens (Matthews and Jiayi, 2016). Much of this discontent is directed toward the recruitment policies in place for Foreign Talents (FTs), a commonly used term to refer to bright students, young professionals, and wealthy individuals, primarily of Chinese or Western origin, whose migration has played a crucial role in fueling Singapore's thriving economy (Yang, 2014). In 2020, journalists in the mainstream media expressed worries about the direction of the rhetoric surrounding immigration, cautioning against the emergence of xenophobic sentiments and framing the issue in terms of ethnocentrism and nativism (Han, 2020; Tai, 2014, 2020).
Online comment sections
Comment sections on news websites have become the most popular and dynamic spaces for readers to engage with news stories and express their opinions, offering a good source of participatory data (Tenenboim and Cohen, 2015; Weber, 2014). These platforms offer an outlet for individuals living under authoritarian-prone regimes to vent their frustrations “before they take their gripes to the streets” (MacKinnon, 2011: 33), thus acting as a safety valve (Hassid, 2012).
Comment sections have often been criticized as anti-discursive cesspools, where constructive conversation is drowned out by negativity and hostility (Peacock and Van Duyn, 2021). However, commenters, often driven by social interaction motives, can engage in quality discussions with other users (Springer et al., 2015). In specific settings, such as commenting on the Facebook platform, visibility to friends and relatives is a potent deterrent to disruptive behavior and improves the quality of commenting (Rowe, 2015) while possibly affecting the proclivity to be outspoken (Hille and Bakker, 2014).
Additionally, counterpublics, traditionally characterized by their struggle for visibility, have found in comment sections a discursive sphere that interacts with the dominant editorial lines (Toepfl and Piwoni, 2015). Comment sections thus allow us to cast a broader net and extract data in secondary communication threads, shedding light on less visible, but not necessarily less meaningful, narratives developed among a subset of a public that struggles to impregnate their alternative interpretations in the mainstream arena (Hauser, 2022). Access to large threads of data in online comment sections gives renewed opportunities to analyze counter-arguments in the immediate spatial vicinity of mainstream opinions (Toefl and Piwoni, 2015, 2018), offering a wider instantiation of public views and valuable data points for analyzing the structure of the debate about immigrants among citizens.
Discursive frames about immigrants
This section reviews the vast communication scholarship on anti-immigrant attitudes and categorizes the identified interpretative frames, which will be used as a gateway into interpreting data.
Conceptually, frames are synthetic latent dimensions of a text that help confer coherence and meaning to any type of discourse. In practice, communication scholars have resorted to frames to identify salient elements in news stories and used them as valuable apparatus for analyzing the potential influence of news media and the political elite on the public (De Vreese, 2012).
Identifying frames can be done inductively, but the difficulty in replicating the outcome (Hertog and McLeod, 2001) has made pre-defined operationalizations of standardized frames (deductive-led approach) a common choice for scholars studying immigration-related content (Lecheler et al., 2019). For instance, “issue frames,” one such standardized framing analytical device, has offered a well-tested investigative path that involves looking in a text for concrete issues linked to immigrants—i.e., attached to the economy, security, public welfare, or powerlessness (Greussig and Boomgaarden, 2017). Such frames are traditionally identified by indicator questions in manual content analysis (Semetko and Valkenburg, 2000) or search strings in automatic content analysis (Burscher et al., 2016).
However, frames are abstract notions (Matthes and Kohring, 2008), and the effort to identify or measure them may disregard ambiguities and critical viewpoints (Gamson et al., 1992; D'Angelo et al., 2019). In immigrant-related literature, frames are criticized for producing analytical outcomes with a distinctly descriptive emphasis (Lecheler et al., 2019).
Therefore, we summarize the literature on anti-immigrant sentiments while reflecting on political or ideological nuances and grouping the diverse narrations into theoretically simple and consistent categories. We aim for parsimony to facilitate our comprehension of the multiple dynamics underlying discourse by capturing the smallest possible number of frames that reflect the most significant number of accounts about immigrants.
Boundary-making frame
Boundary-making frames about immigrants highlight otherness and rigid dichotomic representations (“them” against the legitimate “us”). Such a frame belongs to the theoretical field of intergroup conflicts (Ceobanu and Escandel, 2010) and is evident in xenophobia, nationalism, and racism. These notions have different cultural content (based on the concepts of foreigner, nation, or race), but they imply positive differentiation of the dominant group and drive prejudicial and discriminatory attitudes (Tajfel, 1982). Immigrants are seen as threats to the nationals’ economic well-being (jobs, housing, or social welfare benefits) or the cultural system of meaning underlying religious beliefs, values, or traditions (Atwell Seate and Mastro, 2016).
The boundary-making frame is highly salient in the scholarship on media coverage of immigrants (Philo et al., 2013). It draws attention to immigrants’ legal status or their inclination to abuse welfare systems (Lawlor and Tolley, 2017), their negative economic impact, such as increasing unemployment (McLaren et al., 2018), their disruptive influence on the public order and criminality (Greussig and Boomgarden, 2017), their negative effect on national cohesion, and national or local identity (Abrajano et al., 2017).
New media scholars have also shown the prevalence of the boundary-making frame in politicians and citizens-generated content, which often portrays immigrants via alarmist or prejudicial speech and imageries (Hakoköngäs et al., 2020; McSwinney et al., 2021). Online communication is known for accentuating the negativity and toxicity of such discourses toward immigrants (Devlin and Grant, 2017; Rettberg and Gajjala, 2016).
Blame frame
Blame rhetoric against those who control much of society's economic, political, and social domains is also a common form of negative discourse against immigrants (Waisbord, 2018). It is a shift from a binary representation (“us” against “them”) to a triangular consciousness (“us” against “the elite” who are perceived to support “them”). Scholars have documented how talks about immigrants target the political elite (Heiss and Matthes, 2020). According to Mouffe (2005), the anti-immigrant discourse has a “strongly xenophobic character” while emphasizing that “[….] multiculturalism is perceived as being imposed by the elites against the popular will” (Mouffe, 2005: 69). Studies of the anti-immigrant discourse among far-right groups on Facebook in Europe have shown that conversations about immigration regularly incriminate the political authorities, who are perceived as becoming deaf to the plight of those who feel threatened by immigrants and, thus, fail to protect the people; this may lead to blaming political leaders, more often than the immigrants themselves (Klein and Muis, 2018).
Self-victimization frame
Victimhood is a critical frame related to immigrants in media studies; it emphasizes humanitarian points of view and often labels immigrants as victims of prior abuse or the host country's inability to provide them with the conditions needed to achieve their dreams (Benson, 2013; Chioularaki and Stolic, 2017).
However, in various contexts, scholars have reported instances where citizens who, despite belonging to the national majority group, claim they are victimized by immigration to the point of claiming that nationals are the object of discrimination. Bloch et al. (2020) showed how anti-illegal immigration advocates in the US converge on social media around the belief in “White injury” and allege that immigrants play the “race card” to channel more help at the expense of the citizens. Such claims of injustice and “reverse racism” are increasingly salient worldwide (Atkins, 2019; Pantti et al., 2019), especially among extreme right-wing parties, as illustrated by Paula Hanson's ‘It's OK to be White” campaign in Australia (Sengul, 2022).
Self-victimization discourse highlights a seemingly normative incoherence (i.e., reverse discrimination) and a different repertoire. The rhetoric turns the social hierarchy upside down as minorities are alleged to be better supported than the self-proclaimed majority. Some scholars view such attitudes as acts that silence, demean, and de-historicize painful racialized experiences (Song, 2014). Others, notably historians, see in such claims a re-actualization of specific groups’ past experiences with immigration and link sentiments toward immigrants with a process of compensation for the lower-class members of society. This position focuses on the fact that the sense of belonging to a homogeneous national group may enable the less endowed (i.e., blue-collar workers) to substitute perceived social exclusion with national inclusion (Rosanvallon, 2021) and increase their self-perceived sense of dignity (Lamont, 2009; Sullivan, 2006).
The self-victimization frame, thus, does not primarily suggest intergroup conflicts but a deleterious collateral effect of the social hierarchy. By claiming victimhood, people appear to root the discourse toward immigrants in their anxieties about status; they point to the insecurity of declassification or collective fears of status loss, as suggested by Bonikowski (2017) or Norris and Inglehar's work on cultural backlash Norris and Inglehar's (2019).
Methods
We followed a hybrid method for content analysis (see Baden et al., 2020; Nelson, 2020) and combined different approaches to ensure a theoretically sensitive capture of the meaning in social media comments.
We first referred to the results of an adjacent exploratory qualitative analysis of interview data from Singaporean citizens (Emes and Chib, 2022), which provided a preliminary categorization scheme in coherence with our theoretical framework developed in the preceding section. This also helped to identify unusual textual signifiers (i.e., specific jargon, local vernacular, and imageries) that may be overlooked in systematic computational output.
The deductive analysis of the large-scale corpus followed several steps as we combined supervised algorithmic extraction of patterns with human-based classification (Baden et al., 2020; Nelson, 2020). The first analytical step involved human coders identifying patterns in a subset of the corpus. This interpretive stage allowed us to validate and refine the preliminary categorization scheme by checking for new possible representations. The human-annotated dataset was then used to train the classification algorithm following a supervised classification algorithmic strategy (Eisele et al., 2023). To minimize noise and ambiguities (Nicholls and Culpepper, 2021), we adopted a “dictionary-plus-supervised learning” that combines supervised learning with a dictionary approach (Dun et al., 2021; Dobbrick et al., 2022).
Sampling
Most user-generated comments are extracted from the Facebook sites of ten of the most popular online news communities (see Table 1). The rest were retrieved from Reddit's r/Singapore site, where community-oriented users share experiences, opinions, and insights on the city-state and its culture. Reddit is known for its laissez-faire content policy and for spreading hate speech and toxic cultures (Gaudette et al., 2021). Table 1 displays all the details of the 11 community sites included in the sample.
Online community sites included in the sample.
English is the language for national identification in Singapore (see Note 1); we thus restricted our corpus to English-speaking comments. To retrieve comments on Facebook, we used a social media data-extracting tool, Netvizz (Rieder, 2013). Reddit discussion threads were scraped through an Application Programming Interface using the Python package Praw. The data structure in the comments sections of both platforms allows for the extraction of similar metadata (e.g., upvotes/likings and number of replies) to support our analysis.
Research on publicly available data is not considered human subjects research and does not require an IRB review according to AoIR guidance (Fiesler et al., 2020); we, however, sought IRB approval given the socio-political setting of our study. Social media users do not expect their activity to be used for research and are not offered opportunities for informed consent (Zimmer 2020). We thus reflected on the sensitivity of the content, preserving users’ privacy by adopting anonymization measures (i.e., deleting profile names and pictures). We also adhered to the platforms’ terms of service that were active during data collection.
Comments, along with metadata on likes/upvotes, comment counts, and post IDs, were collected from 1 June, 2018, to 1 December, 2018, before the enactment of the POFMA bill (see Note 2). Data collection yielded 4973 posts and 480,676 comments. We filtered all comments containing the Roman Alphabet (dropping emojis, hypertext links, and Chinese characters), leaving N = 399,210 comments in our corpus (our Broad-Scope Corpus).
We adopted a dictionary-based approach by keyword searching. The dictionary-building procedure proceeds as follows: (a) read a set of literature on immigrants, with pre-selected thematically coherent words attached to the target concept (immigrants) (Boomgaarden and Vliegenhart. 2007; Lind et al., 2019) (b) augment that list using the local literature on immigration in Singapore and involve “human experts” (e.g., Singaporean interviewees). The details and final list of keywords are available in Note 3.
The keyword-restricted or Narrow-Scope Corpus consists of n = 31,855 comments, of which 12% (n = 4020) were selected for manual coding via a stratified proportionate random sampling strategy. Comments contain an average of 67 words, which is sufficient for meaningful coding; most commenters online adopt strategies, often referred to as textism, to reduce typing efforts without impeding readability or the quality of the meaning conveyed (Boot et al., 2019).
Manual content analysis
Four coders, including the author, were separated into two groups, independently coding half of the corpus each (over 2000 comments each). Codes are mutually exclusive. A stepwise method was adopted throughout the coding process; after every 300 comments, inter-rater agreement was calculated between the pairs, discussed, and refined if necessary. If unsatisfactory, coders went through repeated rounds of coding until good reliability was achieved (Hruschka et al., 2004). Although resource-intensive, this method clarified the definitions of codes and ensured good reliability (see Table 2). Examples of annotated comments are available in Table 3.
Codes and intercoder reliability (Krippendorff's alpha coefficient).
Example of human-annotated comments.
PAP: People's Action Party; FT: Foreign Talent
Computational automated text analysis
We chose Bert Tokenizer as a natural language processing algorithm whose vector embedding representations can recognize same-meaning words (but not homonyms) and facilitate contextual disambiguation (Naseem and Musial, 2019).
Google's Bidirectional Encoder Representations from Transformers (BERT)—i.e., the case-sensitive base model used by the Devlin et al. (2019) seminal paper—was used as the classifying algorithm. BERT has shown promising results for tasks such as Semantic Textual Similarity or Natural Language Inference (Zhang et al., 2019). The model can be generative and recognize gaps in a sentence (Masked Language Model) or predict sentences in a text (Raff, 2022). Trained models on BERT have performed well on short Tweets, which are likely to contain misspellings, sarcasm, textisms, and slang terms (Ilic et al., 2018).
Sampling adjustment techniques (RandomOversampling) were used to address issues of unbalanced data that could create biases for the majority class prediction (Mohammed et al., 2020). After rebalancing the dataset, we had a training set of 11,705 cases, of which 15% (1990) were used as a test set. After iteratively adjusting the parameters, we reached a highly satisfactory performance of the prediction model for each frame (see Table 4).
Performance of the prediction classifier for each frame.
Level of insertion of immigrant-related discursive frames
The level of immigrant-related discourse insertion depends on the frequency at which the issue is mentioned (Boomgaarden and Vliegenthart, 2007). It is defined as the chance of reading a comment about immigrants drawn from the pool of comments read by an average social media user. Figure 1 maps the communication sphere; comments posted on a site in the upper right corner have likely higher visibility and interactivity levels.

Average visibility and interaction level per site in the sample.
The score for each comment is operationalized as follows:
Visibility score
Interaction score (M = 0.72, SD = 0.95) assesses the level of the public response to a comment, measured by the product of
Interaction score of
Results
Distribution of immigrant-related discursive frames
The distributions of the codes in the human-annotated sample and the Narrow-Scope Corpus automatically classified using BERT are displayed in Table 5 below. “Unrelated” comments contain selected keywords but are not related to immigrants; commenters usually discuss issues of home (about Singaporeans) or world politics. During our study, the neighboring Malaysian General Election attracted significant attention in the Singaporean public sphere following the unprecedented victory of the 93-year-old Mahathir's coalition. The discussion added substantial noise to our analysis.
Frames frequency in the different corpora.
The automated textual analysis of the Narrow-Scope Corpus shows that a major part of the comments is not related to immigrants (up to 75%), suggesting that our model may be able to discriminate between foreign or home policy matters and specific immigration-related issues.
We assessed the reliability of our classification algorithm by randomly selecting 50 human-annotated comments and 50 classified by supervised machine learning (a total of 100) in our Narrow-Scope Corpus to assess possible inconsistencies (Pilny et al., 2019). We found an agreement ranging from 87.75% to 100% (see Note 4).
The results suggest that immigrant-related issues are little discussed in online public communication. The proportion of comments about immigrants is low in all spheres of communication (state-owned, independent, and alternative) (see Figure 2). Overall, about a third (an estimated 35% in the Narrow-Scope Corpus) of the negative discourse related to immigrants directly targets immigrants. About a fifth of the comments are linked to the blame discursive frame (about 15%). Self-victimization is detectable in about 37% of the comments.

Proportion of immigrants-related comments in each sphere of communication. Note: The proportion of frames is measured as a percentage of the total number of comments in each of the three spheres.
Insertion of immigrant-related discursive frames
The visibility and interaction scores for the identified frames were computed in the Narrow-Scope Corpus. The difference of means using ANOVA statistics highlighted that frames in comments do not statically significantly vary in their visibility level F(4) = 1,323, MSE = 1.196, p = 0.259 or their interaction level F(4) = 1.134, MSE = 1.102, p = 0.338. Self-victimization or boundary-making discursive frames appear to be the most frequently used anti-immigration discursive frames, but they do not appear to be significantly more visible or attract significantly more reactions. Figures 3 and 4 show the relatively similar distribution of immigrants-related comments’ visibility and interaction level, whatever their frame in the communication sphere.

Distribution of the interaction score for immigrants-related frame (n = 31.857).

Distribution of the visibility score for immigrant-related frame (n = 31.857).
Limitations
The analysis shows that content about immigrants likely consists of about 2% of the comments in our corpus, but this result can only be indicative. Contrasting the salience of the topic with other politically charged matters is difficult, the literature offers little material for an informed comparison in different contexts. The data was also extracted during a period devoid of national elections, and electoral cycles are known for increasing hostile rhetoric against migrants (Dekeyser and Freedman, 2023).
We did not control for the topics of the posts under which comments were retrieved to consider how previous (or lack thereof) conversations set an agenda triggering reactions among the commenters, but it is fair to assume that posts in the news forum are unlikely to diverge from the pro-immigrants political line of the government as journalists do not see political advocacy and dissent a part of their responsibility (George and Venkiteswaran, 2019; Hao and George, 2012). Alternative sites, which are more likely to attract counter-discourses, do not appear to display significantly more hostile comments about immigrants.
Former studies have shown that in authoritarian-prone regimes, users see social media as affording surveillance (Marwick, 2012; Pearce, 2015), possibly thwarting their urges to deviate from internalized behavioral codes like the “OB markers” in Singapore. Such considerations emphasize online commenters’ psychological and social restrictions and the behavioral calculus that disclosure implies, possibly inciting users to withdraw from sensitive conversations such as those about immigrants in Singapore. The results must be analyzed within the broader contextual framework of Singapore's controlled media space, and we can speculate that anti-immigrant discourse is expressed in ways that evade sanctions.
Last but not least, the abstract nature and complexity of the text analyzed (textism and local vernacular) require us to be cautious. Synthetic constructs with high abstraction levels, such as frames, can decrease the measure's efficiency when classified by computer-assisted content analysis (up to 24%) (Baden et al., 2022).
Discussion
This article points to flaws and lacunae in our reading of anti-immigrant discourse and suggests a need to refine our understanding. This section links the findings with extant literature and reflects on the substantial implications.
We aim to make the anti-immigrant discourse's semiotic and political complexities evident while leveraging the analytical power of the best practices of computer-assisted classification. People's experiences with immigrants were analyzed through their production of digital data, and user-generated comments in online communities were taken as naturalistic data points. This came with material (i.e., curation system), political (i.e., controlled media space), and cultural (i.e., self-censorship) structures that were deliberate input in the investigation.
Our observations highlight a multi-faceted public representation of immigrants and perplexing themes. The content related to immigrants appears restricted (estimated at 25% of the Narrow-Scope Corpus), suggesting that immigration issues may not be publicly overly discussed in this specific context. As expected, immigrants are portrayed as foreign Others via negative identity-driven, sometimes discriminatory language, but such discourse merely amounts to half of the content about immigrants. A significant part of the anti-immigrant discourse is related to political issues about leadership or the status of citizens. Immigrants are thus more commonly represented as a collective political corporeal that intensifies polarization in the national political community (blame frame) or weakens the perceived self-worth of citizens (self-victimization frame). Our findings suggest that along with exclusionary identarian-driven claims, anti-immigrant discourse exposes anxieties of “declassification” as factors fueling negativity toward immigrants.
As mentioned earlier, the influence of Singapore's firmly calibrated political environment is fundamental to our interpretation. Participation in a tightly controlled media space has psychological limits that affect the level of spontaneous public self-expression; the public usage of harmful, extremist, and emotional language is likely reduced (Marwick, 2012; Pearce, 2015), possibly driving more substantive and elaborated claims.
Still, studies on anti-immigrant attitudes using big data in different contexts have led to observations consistent with this study's result. Thomas Piketty and Julia Cagé (2023) assembled a remarkably rich dataset, reporting over 200 years of voting data in France. Their analysis suggests that traditional political conflict may account for more than nationalist or communitarian influences in anti-immigrant leanings, although both go hand in hand. In this case, socio-economic variables, like rurality or belonging to the lower social classes, appear to provide more compelling explanations for anti-immigrant votes. These findings convey the importance of social inequalities and hierarchies in motivating positions against immigrants.
By highlighting a strained association between anti-immigrant discourse and xenophobia, our findings add a critical charge to our analysis (Nikunen, 2021) and question the terms of the problematization of public discourse about immigrants. Those who share their insights via social media (i.e., “data points”) may be subjugated to normative constraints (i.e., “OB markers”); they also may face issues of de-legitimization and discourse appropriation. Their message may be ignored, discounted, disparaged, or used for the benefit of other political stakeholders. Such considerations invite caution when researchers use big social media data (Cooky et al., 2018).
We argued earlier that a sizeable amount of the scholarship on anti-immigrant attitudes online, whose objective often involves measuring the effect of new media on attitudes, tends to examine data retrieved from sites at the periphery of the communication sphere where conflictual language against minorities is ripe. The penchant for investigating the effect of exclusionary language in such sites may have reinforced the depiction of anti-immigrant rhetoric as a vehicle for the reprehensible expression of differentiation and domination against minorities. The liberal elite has been reported to condemn anti-immigrant publics (Golberg, 2015; Reilly, 2016) and demean it “as being constitutionally ‘bad’” (Valluvan, 2016: 1); “Being xenophobic is demonized” (Neuman, 2016: 189). Our study thus underlines that anti-immigrant advocates on social media should not be assumed to be de facto actors in powerful positions.
Our findings convey another aspect of anti-immigrant discourse: the fears linked to declassifications and inequalities. Extreme right parties worldwide are reported to clean their repertoire of discriminatory markers like racism-related terms and inject liberal vocabulary, such as “social justice” in their discourse (Alduy, 2016; Moffit, 2017). This liberal stylization of anti-immigrant language may entice aggrieved citizens struggling with social changes (Bonikovsky, 2017). It also questions the pertinence of excessively associating anti-immigrant discourse with a deficit narrative that reduces it to problematic identity and communitarian-related tropes.
Conclusion
We conducted close-up and immersive research exploring the vast potential of online participatory content while using the best practices of computer-assisted text analysis. We combined different methods (human hermeneutics and supervised text analysis) for a contextually rich analysis. We assessed the validity of our analysis at several stages to minimize the distance between the frames systematically identified by automated analysis and the personal narratives of the citizens. Our approach allowed us to leverage big data analytics and refine our understanding of citizen-led narratives about immigrants and their incursion into the public sphere. It also showed that hybrid computational classification systems and big data can help critically examine a discourse and discern diffused power dynamics.
Overall, our work challenges assumptions about anti-immigrant discourse and takes a critical perspective to improve our reading. We found the emergence of a self-victimization discursive frame, suggesting the existence of latent frames that are not connected to direct resentments towards immigrants but to citizens’ experiences with their social position. Our results point to an aspect of anti-immigrant rhetoric rooted in established hierarchies and likely mobilizing citizens’ fear of declassification.
Footnotes
Acknowledgments
The author would like to thank Arul Chib, Kokil Jaidka, Saiffudin Ahmed, Anfan Chen, and Poong Oh for their helpful comments on drafts of this article.
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Ministry of Education in Singapore (grant number MOE2017-T2-2-145).
