Abstract
Human trafficking, a grave human rights violation with far-reaching global consequences, serves as a compelling case study for analyzing multifaceted polarization dynamics in online discourse and the influence of social media on public perceptions and responses. Drawing on social identity theory and self-categorization theory, this article aims to elucidate both group polarization and opinion polarization surrounding human trafficking on social media. Through an integrated approach that combines clustering, social network analysis, text mining, and topic modeling, this study provides a comprehensive examination of community formation, influential actor identification, topic classification, and semantic analysis. The similarity between user-generated content from clustered groups and the topics identified is calculated to quantify the degree of multifaceted polarization. The findings reveal a robust community structure within the network and uncover divisions and structural characteristics across each subgroup. Utilizing the BERTopic model, thematic clusters such as vulnerable groups, persecution experiences, incident areas, law and politics, public awareness, contraband, and case events are identified, reflecting the primary public concerns regarding human trafficking. This research enhances our understanding of multifaceted polarization shaped by social identity in digital conversations about critical social issues and holds significant implications for policymakers, advocacy groups, and practitioners navigating public opinion regarding human trafficking in the digital realm.
Introduction
Human trafficking is a modern-day manifestation of enslavement, involving the coercive, deceptive, or forcible exploitation of individuals for purposes such as forced labor, sexual exploitation, or financial gain (Van Buren et al., 2019). As a transnational criminal enterprise, it imposes profound and multifaceted consequences on societies, economies, and global governance systems. Economically, trafficking distorts labor markets, depresses wages, and infiltrates supply chains, exacerbating inequality and undermining sustainable development initiatives (Zimmerman et al., 2011). In addition, the severe psychological trauma and health risks endured by victims constitute pressing public health challenges (Mak et al., 2023). Moreover, it is inextricably linked to gender-based violence and systemic discrimination, disproportionately affecting women and children.
The academic discourse on human trafficking is broad and highly interdisciplinary, encompassing criminology, public health, gender studies, and information systems. Prior studies have addressed a range of focal areas, including sexual violence and exploitation (Kejriwal & Szekely, 2022), psychosocial and public health interventions (Mak et al., 2023; Such et al., 2024), public awareness and prevention strategies (Savoia et al., 2023; Zimmerman et al., 2011), gender-sensitive frameworks and social equity goals (Cho, 2013), and legal and regulatory frameworks (Simmons et al., 2018). Research has also examined structural vulnerabilities, government responsiveness, and network-based models of trafficking relationships (Ali et al., 2021; Malik et al., 2018).
The rise of social media platforms—such as Facebook, Instagram, and X—has profoundly reshaped how trafficking is discussed, understood, and politicized (Cao & Yu, 2019). These platforms facilitate large-scale user-generated discourse, enabling high levels of interconnectivity (Lane et al., 2023). Human trafficking-related content online spans themes such as gender vulnerability, labor rights, international collaboration, victim experiences, and human rights advocacy (Moran & Prochaska, 2023). However, online discourse is increasingly fragmented by algorithmic curation, identity politics, and ideological biases, distorting public understanding and reinforcing oppositional narratives (Guess et al., 2023a; Xing et al., 2022).
Recent scholarship has increasingly focused on public discourse surrounding human trafficking, particularly in digital spaces (Keighley & Sanders, 2023; Wang, 2023). Studies have examined sex trafficking (Pfeffer et al., 2024), online child exploitation (Kranrattanasuit, 2024), and migrant victimization linked to deceptive employment promises (Stollwerck et al., 2024). In addition, researchers have assessed the design and targeting of awareness campaigns (Savoia et al., 2023) and explored the evolving concerns of younger demographics regarding trafficking and sexual harassment (Ardabili et al., 2024). One study also emphasized the critical needs of survivors within healthcare, criminal justice, and social services (Preble et al., 2022). Collectively, these studies reveal a highly fragmented landscape of public attention and institutional responses.
Public discourse on trafficking is often shaped by structurally distinct communities and ideologically entrenched groups (Boler et al., 2024). Polarization in this context refers to the divergence of public opinion into distinct and often conflicting factions, which leads to reduced mutual understanding and limited engagement across ideological boundaries. It can manifest in unipolar (e.g. ideological extremity), bipolar (e.g. binary opposition), or multipolar (e.g. competing ideological clusters) forms (Wakefield & Wakefield, 2023). In multipolar polarization, distinct ideological factions coexist and compete, resulting in a fragmented discursive landscape (Phillips & Carley, 2024).
Polarization has been extensively examined in domains such as political partisanship and misinformation (Moran & Prochaska, 2023). However, its impact on complex humanitarian crises—such as human trafficking—remains insufficiently explored. Trafficking, which intersects gender, labor, migration, human rights, and criminal justice, is particularly vulnerable to discursive segmentation. On social media, user communities gravitate toward frames that align with their values or advocacy goals, resulting in cognitive and thematic silos (Lee et al., 2014). These parallel discourses coexist in relative isolation, limiting opportunities for consensus and shared action.
In this context, polarization becomes more than a social media phenomenon—it constitutes a structural barrier to collective action. Advocacy groups, law enforcement, legal scholars, public health officials, and survivor communities often emphasize divergent aspects of trafficking—such as criminal justice, harm reduction, labor rights, or immigration control. These competing frames reinforce thematic silos, hindering interdisciplinary efforts and undermining systemic responses. Understanding how such discursive divides emerge and persist is crucial for improving policy coordination, resource allocation, and stakeholder collaboration.
In this study, we propose the concept of “multifaceted polarization,” which captures the simultaneous presence of structural fragmentation (i.e. the formation of distinct user communities) and semantic fragmentation (i.e. the identification of various topics) within online discussions. Grounded in social identity theory and self-categorization theory, this concept synthesizes two interrelated dimensions:
Group polarization: the tendency of individuals to self-organize into ideologically or affectively aligned communities, reinforced by in-group interactions and shared identity cues;
Opinion polarization: the emergence of divergent narratives around a common issue, as distinct communities interpret and prioritize the topic through different cognitive, normative, or affective lenses.
Unlike prior approaches that treat ideological and affective divides in isolation (Menezes et al., 2024; Renstrom et al., 2023), our approach foregrounds the interplay between network structure and discursive content, offering a more integrated analytical lens. Accordingly, we pose the following three research questions:
RQ1. How can social identity and self-categorization theories be used to construct a framework for analyzing multifaceted polarization?
RQ2. How can group and opinion polarization be synthesized to investigate multifaceted polarization in human trafficking discourse?
RQ3. How does multifaceted polarization in human trafficking discourse inform public engagement patterns and guide targeted interventions?
Methodologically, we integrate clustering, social network analysis (SNA), text mining, and topic modeling to detect user communities and discourse themes on the social media platform X (formerly Twitter). To assess multifaceted polarization, we analyze the degree of alignment between structural clusters and thematic segmentation using similarity analysis, revealing the contours of both structural and semantic polarization. The findings provide theoretical contributions to polarization research as well as practical implications for promoting more unified and effective anti-trafficking efforts.
Literature review
Online polarization in social media
The rise of social media platforms has fundamentally reshaped the landscape of public discourse, empowering users to generate, disseminate, and interpret content at an unprecedented scale (McNally & Bastos, 2025; Xing & Zhang, 2025). These platforms facilitate community formation by connecting individuals with shared interests or beliefs, thereby reinforcing social cohesion and enabling the expression of group identities (Guess et al., 2023b; Khawar & Boukes, 2024). However, the same affordances that foster collective engagement may simultaneously exacerbate polarization (Bail et al., 2018). Algorithmically curated content, selective exposure, and identity-driven interactions contribute to fragmented information environments and diminished exposure to dissenting views (Powell, 2024; Robertson et al., 2023).
Polarization in digital spaces is broadly characterized by the intensification of attitudinal, ideological, or interpretive divisions between groups, manifesting in both network structures and discursive content (Lee et al., 2014). Scholars have examined this phenomenon from diverse disciplinary perspectives, analyzing factors such as algorithmic bias (Guess et al., 2023a), cognitive heuristics (Buder et al., 2023), and the role of feedback mechanisms—such as likes, shares, and social endorsements—in reinforcing entrenched viewpoints. For example, Messing and Westwood (2012) demonstrated that prominent social endorsements increase users’ likelihood of selecting particular content while simultaneously reducing partisan-selective exposure, thus complicating assumptions about strictly ideological behavior. Complementing this, Nyhan et al. (2023) found that reducing exposure to like-minded sources can enhance openness to diverse perspectives and decrease impolite discourse, even if it does not significantly alter core attitudes. These findings suggest that exposure patterns and social signals interact in complex ways to shape online polarization.
Current studies further reveal that social media-induced polarization is a multifaceted phenomenon characterized by the exacerbation of ideological divisions and the reinforcement of pre-existing beliefs among digital users (Lee et al., 2014). Recent research has expanded the analytical lens to include affective polarization and ideological extremity (Risius et al., 2024), political divisiveness (Masullo, 2023), and even depolarization interventions (Combs et al., 2023), reflecting a growing concern with both the causes and mitigation of polarization in digital environments.
In synthesizing this broad literature, two influential lines of inquiry have emerged. The first focuses on group-level structural dynamics, examining how individuals cluster with like-minded others and form ideologically or emotionally cohesive communities—commonly termed group polarization (Wuestenenk et al., 2025). This research stream investigates the formation of homophilic clusters, the roles of influential users in shaping group norms, and the feedback mechanisms that reinforce alignment within communities over time (Haq & Kwok, 2024; Marchetti et al., 2024).
The second line of research examines content-level divergence, analyzing how communities engaged with the same issue construct increasingly distinct narratives (Overgaard, 2024). This phenomenon, variably termed opinion polarization, topic polarization, or semantic polarization, highlights the fragmentation of interpretive frames, emotional tones, and normative reasoning across groups (Masullo, 2023; Renstrom et al., 2023). Computational methods such as topic modeling, sentiment analysis, and moral framing analysis are widely employed to detect these divergences.
Although these two strands provide complementary insights, they are often examined in isolation. Recent studies increasingly emphasize their interdependence, recognizing that the structural formation of ideologically coherent communities and the semantic divergence of discourse frequently unfold simultaneously (Cinelli et al., 2024; Yang et al., 2021). This integrated perspective conceptualizes polarization as a multifaceted process, encompassing both the social dynamics of group formation and the fragmentation of public discourse. Such a framework advances a more nuanced understanding of how polarization emerges and stabilizes within complex online ecosystems, with significant implications for public opinion, collective decision-making, and the governance of digital communication.
Social identity and self-categorization
Online discourse on human trafficking illustrates how social identities and categorizations shape ideological divides and opinion formation in digital environments. Social identity theory and self-categorization theory provide foundational frameworks for understanding these dynamics (Moran & Prochaska, 2023), clarifying how group affiliations influence individual perceptions, behaviors, and the emergence of polarization.
Social identity theory, introduced by Tajfel and Turner (1979), explains how individuals construct their self-concept based on their affiliations with social groups. The theory posits that people categorize themselves into distinct social groups—based on factors such as race, ethnicity, religion, or shared interests—and integrate these memberships into their overall identity (Li, 2022; Velasquez & Montgomery, 2020). This process establishes psychological boundaries between “in-groups” and “out-groups,” shaping how individuals perceive themselves and others.
Recent studies have illuminated the application of social identity theory in social media research, demonstrating its explanatory power across diverse phenomena. For example, it has been employed to analyze behavioral intentions, social sorting, and the role of identity in influencing user engagement and platform participation (Lane et al., 2023; Qian & Seifried, 2023). Furthermore, research on intergroup dynamics in digital contexts has shown how social identity processes enhance group cohesion while simultaneously fueling intergroup conflict (Burgers et al., 2023). In addition, Ashforth and Mael (1989) emphasized that social identity theory explains how individuals develop a strong sense of belonging to their in-group, often accompanied by preferential treatment of in-group members and antagonism toward out-groups—processes that intensify polarization (Dutot, 2020).
Self-categorization theory, introduced by Hogg and Turner (1987), extends social identity theory by detailing how individuals categorize themselves across varying levels of abstraction—from specific personal identities to broader social identities. This theory emphasizes the situational prominence of particular identities and explains how such prominence shapes cognition and behavior within specific contexts (Koschate et al., 2021). In social media environments, users often conform to the norms and values of their online communities, as deviation may provoke ideological conflict and social sanctions.
Empirical studies applying self-categorization theory have examined phenomena such as information sharing, impression formation, evaluation of social identities, and adherence to social norms in influencer marketing contexts (Qian & Seifried, 2023). These findings demonstrate how salient self-categorizations foster alignment with group norms and strengthen collective identity, thereby influencing how individuals interpret and disseminate information online (Koschate et al., 2021).
Group polarization describes the tendency for group discussions to shift members’ attitudes further toward the group’s prevailing viewpoint. Social identity theory explains this phenomenon as individuals’ motivation to maintain and enhance a positive image of their in-group, reinforcing adherence to group norms and amplifying distinctions between in-group and out-group attitudes (Fritsche et al., 2018; Wakefield & Wakefield, 2023). In contrast, opinion polarization refers to the increasing divergence of perspectives between groups. Self-categorization theory clarifies this process by showing how individuals adopt the dominant narratives of their in-group while dismissing or rejecting out-group views (Flanagin et al., 2014).
In online discussions about human trafficking, identity-driven groups interpret the issue through distinct thematic frames that align with their collective narratives. For example, law enforcement communities predominantly frame trafficking as a criminal justice challenge, advocacy groups emphasize its human rights dimensions, and political factions situate it within broader ideological debates (Tsai & Bagozzi, 2014). Individuals selectively expose themselves to information that resonates with their in-group’s framing, thereby reinforcing both group cohesion and ideological divisions (Chen et al., 2021). Consequently, group polarization (the intensification of attitudes within groups) and opinion polarization (the widening of differences between groups) emerge as interrelated phenomena shaped by the interplay of social identity and self-categorization processes.
Social identity and self-categorization theories account for how in-group affiliation and identity salience shape online polarization. In-group favoritism and norm conformity strengthen internal consensus (group polarization), while intergroup differentiation exacerbates ideological divides (opinion polarization). These mechanisms demonstrate how identity-driven dynamics structure digital discourse, offering insights into managing polarization and fostering more integrative, cross-group communication in online environments.
Methodology
Data collection and processing
Rebranded as “X” in 2023, Twitter functions as an online social media and networking platform that facilitates user connectivity, communication, and interaction. With features such as likes, retweets, and replies, X promotes user engagement, fostering seamless content interaction and amplification. Consequently, it has become a crucial source for news updates, coverage of significant events, and discussions on trending topics, establishing itself as a pivotal element in contemporary information consumption. Notably, a substantial number of users on X engage with the issue of human trafficking, with thousands of daily tweets dedicated to this topic.
A web crawler was employed to extract data from X using the keyword “human trafficking” via the X API. The X API provides a programmatic gateway for obtaining essential components related to human trafficking, including tweets, retweets, timestamps, text content, and user profiles, while ensuring user anonymity. Data was collected from 1 January 2025 to 7 January 2025.
During this period, we collected user data, which included user IDs, tweets, retweets, comments, interaction dates, and profile information such as occupation, city, place of residence, links to social media profiles, and account creation date. To maintain data integrity, we excluded comments from users with incomplete profile information. In addition, we conducted an additional level of scrutiny on user profiles to identify and eliminate bot accounts, thereby enhancing the overall accuracy of our research findings. It is important to emphasize that the examination of user profiles was conducted solely for bot detection purposes and not for any third-party use.
In the data processing phase, we employed Excel functions to remove invalid characters, including hyperlinks, null values, emoticons, and punctuation marks (e.g. $, &, [], and similar symbols). We also conducted a manual review to ensure spelling accuracy. Following this cleaning process, we retained a total of 8138 tweets for further analysis.
Modeling multifaceted polarization
The multifaceted polarization framework (Figure 1) applies social identity theory and self-categorization theory to uncover how group dynamics and ideological divides shape online discourse on human trafficking. This framework aims to clarifies: (1) how multifaceted polarization is driven by social identity theory and self-categorization processes, (2) how it can be quantified by combining group polarization and opinion polarization, and (3) how the empirical results align with and contribute to these theoretical perspectives. It evaluates two dimensions:
Group polarization, encompassing group clustering, social network analysis, and influential users identification.
Opinion polarization, involving thematic segmentation, topic distribution, and keywords identification.
The mechanism of multifaceted polarization is examined through a two-step analytical process. First, the overlap between group structures and thematic divisions is quantified using similarity metrics. Second, the extent to which each group’s discourse concentrates on specific topics is assessed using the Herfindahl Topic Concentration Index (HTCI). This approach not only demonstrates the applicability of the theoretical frameworks but also enhances our understanding of their explanatory power in the context of online discourse polarization.

Theoretical framework and analytical process of multifaceted polarization.
Group clustering
This study utilizes NodeXL, an integrated SNA tool in Microsoft Excel, to cluster users into diverse groups based on their interactions in online discussions related to human trafficking. We examine network structures using NodeXL’s Clauset-Newman-Moore algorithm (Clauset et al., 2004), which efficiently identifies communities within extensive networks by initially treating each vertex as an individual cluster and then merging them pairwise. Following the clustering process, the Harel-Koren Fast Multiscale layout algorithm is employed to generate a visual representation of the graph. In addition, the Fruchterman-Reingold algorithm, a force-directed layout method, is used to arrange the communities.
In the clustering graph, vertices represent X users, while edges capture user interactions such as retweets, replies, and mentions, which reflect engagement between users (Colleoni et al., 2014). The network’s size, connectivity, and cohesiveness are assessed through graph density (GD) and average path length (APL). In addition, the prestige, prominence, importance, and authority of users are evaluated through degree centrality, betweenness centrality, closeness centrality, eigenvector centrality, and PageRank.
Topic identification
Furthermore, we employ BERTopic, an advanced topic modeling technique, to uncover online public opinion on human trafficking across X. The BERTopic model utilizes BERT (Bidirectional Encoder Representations from Transformers) embeddings and class-based TF-IDF to generate concise and meaningful topic clusters (Xing et al., 2024). Traditional topic modeling approaches, such as Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF), face challenges in analyzing short and noisy texts (Menezes et al., 2024). These methods rely on word co-occurrence patterns and bag-of-words representations, which often fail to capture contextual nuances, leading to less coherent and more overlapping topics in datasets with sparse textual content (Celikten & Onan, 2025). In our previous experiments, BERTopic consistently demonstrated superior performance compared to LDA and NMF, producing more interpretable and well-separated topics. Therefore, BERTopic was adopted for topic clustering in this study. The workflow is illustrated as follows:
Document embedding
Within the BERTopic framework, we transform documents into vector representations within a semantic space, based on the assumption that documents sharing the same topic exhibit significant semantic similarity. The embedding process underlying BERTopic relies on the Sentence-BERT (SBERT) framework, which converts sentences and paragraphs into dense vector representations using pre-trained language models. This approach has demonstrated robust performance across various sentence-embedding tasks.
Documents clustering
As data dimensionality increases, the distinction between proximity to the nearest and farthest data points diminishes. We employ UMAP to align language models operating in varying dimensional spaces, thereby reducing the dimensionality of document embeddings. These reduced embeddings are then clustered using the hierarchical density-based spatial clustering of applications with noise (HDBSCAN), an extension of density-based spatial clustering of applications with noise (DBSCAN) that supports clustering in datasets with varying density structures. HDBSCAN accommodates noise as outliers through a soft-clustering process, reducing the likelihood of misassigning unrelated documents to clusters and thereby enhancing topic representation quality.
Topic representation
The introduction of a modified Term Frequency-Inverse Document Frequency (TF-IDF) approach enables topic identification based on word distribution within document clusters, assessing word significance within specific topics (Ao et al., 2023). Building upon the core principles of TF-IDF, documents within a cluster are treated as a cohesive unit through concatenation, leading to the creation of a class-based TF-IDF (c-TF-IDF) for topic description.
Similarity/dissimilarity
To uncover the mechanisms underlying multifaceted polarization in human trafficking discourse—by integrating both group polarization and opinion polarization—we calculate textual similarity between user-generated content from clustered groups and the topics identified by the BERTopic model. We represent the textual content tweeted by individuals in each clustered group as TF-IDF vectors, where each term’s weight reflects its significance within the respective discourse. These vectors are then compared with keywords in thematic categories, since the topics are represented through structured keyword rankings generated by the c-TF-IDF algorithm, which assigns greater weight to words that differentiate specific topics.
Subsequently, cosine similarity is calculated to quantify the degree of alignment between each group’s discourse and the identified topics. As a widely recognized metric, cosine similarity is widely used in information retrieval, natural language processing, and social sciences (Agrawal et al., 2021; Kasban & Nassar, 2020). The similarity scores range from 0 (indicating complete dissimilarity) to 1 (indicating identical content), allowing us to systematically evaluate textual similarity and thematic alignment in the discourse on human trafficking.
Multifaceted polarization
The HTCI is an adaptation of the Herfindahl-Hirschman Index (HHI) (Song et al., 2013), applied to quantify how narrowly a group’s discussion focuses on specific topics. For a given group G with similarity scores across n topics, the HTCI is calculated as:
where
Finally, this study quantifies the Weighted Mean Polarization Index (WMPI) of multifaceted polarization by calculating the size-weighted average of group HTCI scores to overall polarization intensity across the entire dataset:
Results
Clustering network and group polarization
The clustered network depicted in Figure 2 is a directed graph comprising 3846 vertices and 4150 edges, indicating substantial connectivity. This network contains 356 distinct groups, each representing a unique community structure. The interaction dynamics within this network are multifaceted, with 2398 retweets, 315 original tweets, 435 replies, and 99 mentions, reflecting a broad spectrum of communicative activities. In addition, there are 395 self-loops, indicating some level of self-referential behavior. The reciprocated vertex pair ratio is relatively low at 0.0084, indicating limited mutual engagement, while the reciprocated edge ratio is 0.0166. The network consists of 507 connected components, including 125 single-vertex components, revealing the presence of isolated users or sub-networks.

Clustering network of group polarization on human trafficking.
The APL of 4.9467 indicates a moderately connected network structure, suggesting that information can propagate through the network with reasonable efficiency despite its size. The GD of 0.0004 reveals a sparsely connected network. In addition, the high modularity score of 0.8478 signifies a well-defined community structure, where nodes within the same cluster exhibit significantly higher interconnectivity compared to nodes in different clusters.
Table 1 summarizes structural metrics for the top 10 clusters based on modularity-driven community detection. G1, the largest cluster (223 users), shows low GD (0.004) but a short APL (1.98), suggesting a centralized network with efficient information flow. This structure reflects the presence of key influencers who drive widespread dissemination.
Statistics in the Top 10 Clusters.
In contrast, G2 demonstrates a high number of self-loops (n = 18) and the longest APL (2.78), indicating insular discourse with limited external connectivity—features consistent with the dynamics of echo chambers.
Clusters such as G4 and G8 display higher GD (0.019 and 0.017, respectively), yet diverge in APL. G4 (APL = 1.95) reflects tightly interconnected users, possibly enabling rapid coordination, whereas G8 (APL = 4.58) may contain dispersed or weakly linked subgroups which may impede efficient exchange.
Other clusters (e.g. G9, G10) exhibit dense but compact structures, marked by repetitive internal interactions and limited structural reach. These variations highlight differences in group cohesion, communication efficiency, and potential influence across the broader network.
Table 2 presents the top 15 users ranked by betweenness centrality—a key indicator of brokerage and gatekeeping in information flow. The vertex labels indicating user IDs have been normalized to anonymize individual accounts while preserving structural integrity. User1 consistently ranks highest across multiple centrality metrics, including in-degree (226), closeness (0.071), eigenvector centrality (0.718), and PageRank (0.012), positioning this account as a dominant opinion leader. Its exceptionally high betweenness score (278,454) suggests a unique role in linking otherwise disconnected communities, thereby facilitating the spread of polarized content across network clusters.
Top Influential Users Based on Betweenness Centrality.
User2 and User3 also occupy structurally influential positions. Although User2 has a moderate in-degree (87), its high betweenness (189,787) and closeness (0.061) indicate extensive reach across diverse subgroups. User3, despite having no inbound ties, ranks high in closeness (0.065) and eigenvector centrality (0.047), implying influence through proximity to prominent users rather than through direct popularity.
Accounts such as User12, User14, and User15 appear structurally peripheral, each with only one or two connections. However, their disproportionately high betweenness scores reveal their function as strategic bridges between otherwise unconnected regions of the network, amplifying their structural importance despite minimal direct interaction.
From an influence diffusion perspective, users like User6, User9, and User13 exhibit high in-degree and PageRank values, reflecting their roles as attention magnets and local information hubs. For example, User9 (in-degree = 114; PageRank = 0.006) may exert concentrated influence within a specific ideological cluster, even if its influence across the broader network is comparatively constrained.
Topic identification and opinion polarization
The BERTopic algorithm identifies 29 topics within our dataset. Figure 3 illustrates the distribution of these topics related to human trafficking on X. Using dimensionality reduction techniques, the spatial distribution chart visualizes topics and documents in a two-dimensional space (see Figure 3). The distance between nodes reflects their semantic similarity, while the colors of the points indicate their subject categories. Outlier topics within primary topics are infrequent, as most documents closely align with the central topic cluster. This indicates that discussions related to human trafficking predominantly focus on the core semantic domain of the primary topic cluster, demonstrating a high level of semantic concentration.

Distribution of integrated topics on human trafficking.
To ensure reliable theme identification, two experts in deep learning and sociology conducted a systematic, computer-assisted content analysis of topic labels and top-ranked keywords, resulting in the inductive categorization of seven coherent themes: “vulnerable groups,” “persecution experience,” “incident area,” “law and politics,” “public consciousness,” “contraband,” and “case events.”
Table 3 presents the top five keywords ranked by c-TF-IDF scores in each topic. The specific c-TF-IDF values of each keyword are provided in Appendix 1. The theme “vulnerable groups” comprises topics 0, 9, 17, and 18, with representative keywords including “children, women, parents, girls, kids,” and so on. The theme “persecution experience” encompasses topics 4, 6, 13, and 27, featuring prominent keywords such as “pedophilia, rape, harvesting, slavery, endworkness,” and so on. The theme “incident area” is composed of topics 1, 7, 8, and 14, primarily containing location-related keywords like “Israel, Ukraine, Rwanda, Africa, American,” and so on. The theme “law and politics” includes topics 5, 22, 23, and 26, focusing on terms such as “crimes, republicans, federal, illegals, court,” and so on. The theme “public consciousness” involves topics 3, 10, 11, and 19, containing keywords such as “community, awareness, WalkForFreedom, preventing, disgraceful, extremism,” and so on. The theme “contraband” consists of topics 2, 16, 21, and 25, emphasizing “border, drug, earninday, combat, smuggling, migrant,” and so on. Finally, the theme “case events” is composed of topics 12, 15, 20, 24, and 28, represented by keywords such as “Jadi, POTUS, Biden, Joe,” and so on. The identified topics achieve a topic coherence score of 0.236 and a topic diversity score of 0.864, indicating robust model performance and the emergence of semantically distinct and well-structured topics.
Keywords and Distributed Topics in the Themes.
Similarity analysis and multifaceted polarization
Figure 4 presents the outcomes of the similarity analysis between clustered groups and identified topics. The findings reveal that individuals within the same cluster tend to exhibit stronger alignment with their most semantically relevant topics. In addition, the results indicate that different groups focus intensely on specific thematic areas while paying comparatively less attention to others. For instance, users in clusters G228 to G232 exhibit a pronounced focus on Topic 1, characterized by keywords such as “America, Israel, boundaries,” which aligns with the “incident area” theme. Similarly, individuals in clusters G157 to G161 demonstrate particular attention to Topic 0 and Topic 9, both of which relate to “vulnerable groups,” as evidenced by keywords such as “children, women, people,” and “parents.”

Similarities between clustered groups and identified topics.
The identified topics are further synthesized to correspond with key themes related to human trafficking within each clustered group. For example, Groups 8, 16, 93, 124, and 269 predominantly address issues concerning vulnerable populations, whereas Groups 4, 21, 53, 174, and 287 focus on experiences of persecution. Groups 44, 68, 249, 251, and 301 emphasize the geographical distribution of trafficking incidents, while Groups 6, 28, 39, 273, and 345 engage in discussions centered on the legal and political dimensions of human trafficking. Conversely, Groups 2, 83, 125, 318, and 321 prioritize public awareness and consciousness, whereas Groups 7, 46, 75, 164, and 193 focus on contraband and illicit trade. Finally, Groups 24, 33, 157, 183, and 253 primarily discuss specific human trafficking cases.
Table 4 highlights a strong inverse relationship between group size and thematic coherence, as measured by the HTCI. The WMPI of 0.75 indicates a moderate yet notable level of polarization across the dataset. Larger groups tend to exhibit lower topic concentration and greater internal variability, as reflected in their higher standard deviations. A robust negative correlation is observed between group size and topic concentration (Pearson’s r = −.72, p < .001), suggesting that as group size increases, thematic alignment decreases. Specifically, for every 10-user increase in group size, HTCI declines by 0.05 (SE = 0.01), reinforcing the idea that larger groups engage in more diverse discursive content.
Group-Level HTCI Scores and Polarization Levels.
Discussion
Online discussions on X regarding human trafficking are marked by multifaceted polarization, evident in both group polarization—the formation of distinct user clusters—and opinion polarization—the thematic segmentation of discourse. To comprehend these dynamics, social identity theory and self-categorization theory offer critical insights into how individuals affiliate with particular discussion groups, reinforce their collective identity, and selectively interact with information that aligns with their pre-existing beliefs. The case of human trafficking exemplifies how these theories elucidate not only the underlying mechanisms of polarization but also its impact on shaping public discourse and fostering collective action within digital environments.
According to social identity theory, individuals derive a sense of self from their membership in social groups, fostering ingroup cohesion and outgroup differentiation (Dutot, 2020). In online discussions on human trafficking, social media users tend to self-organize into communities that reflect their interests, values, and perspectives on the issue. For instance, groups focused on victim advocacy primarily engage with narratives centered on survivors’ experiences, whereas those emphasizing law enforcement concentrate on legal frameworks and criminal networks. Although these groups may share a common concern regarding human trafficking, their distinct identity-based framing results in minimal interaction across group boundaries, thereby reinforcing discursive segregation. As self-categorization theory suggests, this process enhances within-group consensus, making individuals more resistant to alternative perspectives (Flanagin et al., 2014).
This dynamic is particularly pronounced in opinion leader-driven clusters, where high-profile figures (e.g. activists, policymakers, journalists) shape the discourse within their communities, further solidifying ideological boundaries. Influential users—identified through measures such as centrality and PageRank—play a pivotal role in structuring discourse by amplifying ingroup perspectives while filtering or countering outgroup narratives. This leads to varying degrees of polarization, where some groups remain encapsulated within ideological silos, while others serve as bridges between differing viewpoints. Ultimately, these findings emphasize how social identity processes influence online discussions about human trafficking, reinforcing group-based divisions that shape the spread of information, engagement patterns, and the overall structure of online discourse.
While group polarization fosters social clustering, opinion polarization reflects cognitive segmentation, where users selectively engage with specific aspects of human trafficking while disregarding others. Self-categorization theory explains that individuals categorize not only themselves but also information, leading to thematic silos in online discussions (Hogg & Terry, 2000). For instance, some online communities prioritize discussions about forced labor, whereas others focus on sex trafficking, legal frameworks, or human rights advocacy. This thematic division reflects structural polarization, which—unlike affective or ideological polarization—highlights the fragmentation of discourse along issue-specific lines without necessarily implying emotional hostility or opposing ideologies. For example, although both groups—one advocating for strict anti-trafficking laws and the other emphasizing harm reduction approaches for vulnerable populations—seek to address the managerial issue, their discourse remains parallel rather than intersecting, thus limiting opportunities for collaborative solutions.
Furthermore, we find that users tend to engage with content that aligns with their pre-existing identity, thereby fostering selective exposure and confirmation bias. For instance, a group emphasizing survivor narratives may disregard discussions centered on criminal justice perspectives, perceiving them as overly punitive. Conversely, groups focused on law enforcement solutions may dismiss advocacy-driven narratives as overly emotionally charged or politically biased. This mutual disengagement diminishes the likelihood of constructive dialogue between perspectives that could, in fact, be complementary rather than adversarial. Moreover, the presence of high-engagement influencers within each community amplifies identity alignment, as users rely on opinion leaders to establish in-group norms and define acceptable discursive boundaries. Over time, these boundaries become entrenched, reinforcing discursive silos that impede a holistic understanding of human trafficking.
The resulting similarity matrix provides a detailed overview of the content overlap between clustered groups and identified topics, elucidating how users’ engagement with human trafficking discussions is shaped by their social affiliations and cognitive biases. In highly polarized environments, similarity values tend to be sparse, with high values concentrated within specific topics, whereas more fluid discussions exhibit moderate similarities distributed across multiple topics.
Notably, larger groups exhibit a more dispersed focus across various topics, whereas smaller groups demonstrate more concentrated engagement with specific themes. This pattern suggests that as the number of users and interactions within a cluster increases, topic diversity and divergent thinking intensify. In contrast, smaller clusters with fewer interactions tend to concentrate intensely on specific topics, potentially reinforcing ideological homogeneity and stronger polarization. The analysis of the similarity scores reveals that certain groups exhibit a high degree of alignment with particular narratives, reinforcing shared identities and collective beliefs, while others display considerable divergence in focus. This pattern aligns with the proposition that social identity fosters group cohesion, strengthening users’ commitment to specific thematic frames while limiting engagement with broader, integrative perspectives on the issue (Fritsche et al., 2018).
Implications
Theoretical implications
This study makes a significant contribution to the field of online polarization research by integrating perspectives from social identity and self-categorization in examining public perceptions of human trafficking on social media platforms. We propose a conceptual framework that encompasses a multifaceted analysis of polarization.
The modeling of clustering and group polarization provides valuable insights into group dynamics, enabling the identification of subgroups with shared interests and potentially revealing echo chambers within social networks. Network metrics further underscore the role of influential nodes in shaping information diffusion, exerting structural control, and maintaining network cohesion.
Our study advances the literature through the application of BERTopic, an advanced topic modeling method, for identifying opinion polarization patterns related to human trafficking. Although previous studies have utilized BERTopic to detect signatures of problem gambling in online communication data (Smith et al., 2023) and for thematic characterization (Jeon et al., 2023), its potential for examining opinion polarization on social media remains largely unexplored.
Finally, this study contributes to a deeper understanding of multifaceted polarization in online discourse on human trafficking by analyzing both group and opinion polarization. Using TF-IDF vectors and cosine similarity, we assess the alignment between user-generated content within clustered groups and identified topics, revealing patterns of thematic coherence and overlap. The HTCI captures topic concentration within each group, while the WMPI assesses overall polarization on the issue by accounting for both group size and thematic alignment. Collectively, these methods offer a rigorous analytical framework for examining the dynamics of online polarization and its broader implications for public discourse.
Practical implications
This research advances scholarly understanding in the fields of international affairs and crisis management by addressing the complex and pervasive issue of human trafficking, which entails profound societal consequences. Understanding and combating human trafficking is crucial for protecting human rights, strengthening law enforcement and victim support systems, raising public awareness, informing policy development, and addressing the root causes of exploitation.
Our approach to group clustering facilitates the identification of influential individuals, groups, or organizations within the network, thereby amplifying anti-trafficking messages and supporting resource mobilization. Furthermore, the examination of group polarization enables an assessment of the direction and strength of information flow within the network, which is strategically valuable for designing and implementing targeted interventions.
Our BERTopic-based approach to topic identification in the context of human trafficking represents a substantial methodological advancement in automated topic classification. This method enables the detection of emerging trends and salient issues associated with this pressing social problem. By assessing the prevalence of specific topics or terms, researchers can evaluate public awareness and the depth of understanding across multiple dimensions of the issue. Consequently, government agencies and advocacy organizations can allocate resources more effectively by prioritizing areas associated with heightened public engagement.
Finally, the HTCI and WMPI provide actionable metrics to assess the concentration and polarization of discourse, allowing stakeholders to design targeted interventions that foster balanced dialogue and mitigate ideological fragmentation. In addition, these tools can be applied to other contentious social issues, offering a transferable framework for understanding and addressing the dynamics of online polarization in public discourse.
Conclusion
Human trafficking, as a heinous and illicit transnational activity, raises substantial public concern and demands a comprehensive approach to ensure global public security. This study fills the research gap regarding the influence of social media platforms on the multifaceted polarization of global affairs, offering new insights and establishing a basis for continued research in this dynamic field. Rather than being a panacea, social media presents new layers of complexity that warrant systematic examination to foster a more informed and balanced discussion on international issues.
Limitations and future research
This study has several limitations that should be acknowledged. First, the data were drawn solely from the social media platform X, potentially limiting the generalizability of the clustering and topic modeling results. Different platforms, such as Facebook, YouTube, and Instagram, exhibit distinct patterns of information dissemination, particularly in terms of user interactions and the formation of emergent groups. To enhance the validity of our findings, future research should incorporate additional platforms to assess online polarization in a broader context. Second, the BERTopic model used for topic classification has demonstrated efficiency exclusively on our English-language dataset. To evaluate its general applicability, future studies should test the model on non-English texts. Third, while HDBSCAN allows the BERTopic model to approximate topic distribution within a document using the probability matrix, this method only partially resolves the issue and does not adequately account for documents that contain multiple topics. To address this limitation, future work should aim to develop an adapted version of HDBSCAN that can more effectively manage topic modeling in multi-topic scenarios. Finally, although our interpretation draws on social identity theory to explain patterns of group polarization and discourse alignment, it is important to note that identity processes are inferred from observed group structures and communication patterns rather than directly measured. Future research should incorporate direct measures of identity salience and group affiliation to validate and extend these findings.
Footnotes
Appendix 1
| Themes | Topics | Top keywords (c-TF-IDF score) | Proportion |
|---|---|---|---|
| Vulnerable groups | 0 | (“children,” 0.1053), (“child,” 0.05990), (“women,” 0.0470), (“parents,” 0.03644), (“families,” 0.02915) | 6.98% |
| 9 | (“women,” 0.0237), (“girls,” 0.0186), (“rights,” 0.0164), (“people,” 0.0122), (“humantrafficking,” 0.0108) | 4.49% | |
| 17 | (“ohio,” 0.0427), (“teacher,” 0.0285), (“kids,” 0.0271), (“victims,” 0.0244), (“vulnerable,” 0.0226) | 3.11% | |
| 18 | (“tate,” 0.0237), (“andrew,” 0.0186), (“brother,” 0.0164), (“slaves,” 0.0645), (“survivors,” 0.0501) | 2.98% | |
| Persecution experience | 4 | (“porn,” 0.106), (“pedophilia,” 0.104), (“rape,” 0.0912), (“harvesting,” 0.0868), (“sextrafficking,” 0.03789) | 4.95% |
| 6 | (“slavery,” 0.0125), (“endworkness,” 0.0112), (“salve,” 0.0009), (“remorse,” 0.0008), (“satanism,” 0.0008) | 4.88% | |
| 13 | (“podcast,” 0.1012), (“suffering,” 0.0998), (“tour,” 0.0973), (“torture,” 0.0885), (“trauma,” 0.0699) | 3.30% | |
| 27 | (“interview,” 0.1146), (“trafficking,” 0.0667), (“abuse,” 0.0498), (“pedophilia,” 0.0469), (“organ,” 0.0354) | 0.84% | |
| Incident area | 1 | (“Israel,” 0.0806), (“Israeli,” 0.0655), (“Hamas,” 0.0453), (“boundaries,” 0.0409), (“America,” 0.0403) | 6.37% |
| 7 | (“Ukraine,” 0.1053), (“bio,” 0.0599), (“labs,” 0.0470), (“Kenya,” 0.0365), (“Rwanda,” 0.0292) | 4.62% | |
| 8 | (“organ,” 0.0868), (“harvesting,” 0.861), (“China,” 0.0523), (“Arizona,” 0.0488), (“Ukr,” 0.0407) | 4.50% | |
| 14 | (“Rwanda,” 0.0139), (“refugees,” 0.0132), (“African,” 0.0116), (“headmen,” 0.0095), (“NYC,” 0.0090) | 3.29% | |
| Law and politics | 5 | (“charges,” 0.0511), (“crime,” 0.0428), (“law,” 0.0406), (“commissioner,” 0.0404), (“unvetted,” 0.0371) | 4.90% |
| 22 | (“power,” 0.1542), (“executive,” 0.1316), (“ballot,” 0.1310), (“police,” 0.0739), (“operating,” 0.0547) | 1.96% | |
| 23 | (“republications,” 0.1141), (“voted,” 0.0647), (“Gaetz,” 0.0542), (“federal,” 0.0406), (“illegals,” 0.0327) | 1.89% | |
| 26 | (“challenge,” 0.0971), (“supreme,” 0.0450), (“court,” 0.0363), (“trade,” 0.0345), (“empire,” 0.0299) | 0.96% | |
| Public consciousness | 3 | (“claytonconway65,” 0.0385), (“elonmusk,” 0.0254), (“Raph,” 0.0238), (“accountability,” 0.0231), (“contract,” 0.0211) | 5.54% |
| 10 | (“community,” 0.0965), (“awareness,” 0.0867), (“join,” 0.0821), (“WalkForFreedom,” 0.0456), (“preventing,” 0.435) | 3.45% | |
| 11 | (“GregAbbott,” 0.1806), (“TX,” 0.1149), (“Texas,” 0.1034), (“disgraceful,” 0.0823), (“truth,” 0.0823) | 3.40% | |
| 19 | (““Fentanyl”,” 0.1931), (“Jordan,” 0.1580), (“Denomi,” 0.1538), (“extremism,” 0.1531), (“strength,” 0.1465) | 2.97% | |
| Contraband | 2 | (“border,” 0.2653), (“cartels,” 0.0860), (“drug,” 0.0673), (“videos,” 0.0583), (“indoctrination,” 0.4904) | 5.85% |
| 16 | (“EarnInDay,” 0.3244), (“EarnIn,” 0.2884), (“combat,” 0.2714), (“drug,” 0.2489), (“operation,” 0.2308) | 3.13% | |
| 21 | (“smuggling,” 0.1589), (“migrant,” 0.1456), (“trafficking,” 0.1279), (“money,” 0.1272), (“fraud,” 0.1125) | 2.52% | |
| 25 | (“arrested,” 0.0949), (“biolabs,” 0.0471), (“death,” 0.0471), (“slaves,” 0.0639), (“import,” 0.0500) | 1.24% | |
| Case events | 12 | (“Jadi,” 0.8789), (“NG,” 0.7802), (“korban,” 0.7654), (“Tristan,” 0.7226), (“MikeGil,” 0.7223) | 3.40% |
| 15 | (“POTUS,” 0.1706), (“Trump,” 0.0856), (“Donald,” 0.0611), (“Travis,” 0.0602), (“Diana,” 0.0524) | 3.22% | |
| 20 | (“biden,” 0.3433), (“cartels,” 0.0652), (“border,” 0.0624), (“boundaries,” 0.0621), (“Hillary,” 0.0592) | 2.67% | |
| 24 | (“Peru,” 0.3475), (“43,” 0.2285), (“syndicate,” 0.1168), (“YouTube,” 0.1036), (“released,” 0.0926) | 1.75% | |
| 28 | (“biden,” 0.1992), (“Joe,” 0.1613), (“oversees,” 0.1295), (“Hollywood,” 0.0949), (“accused,” 0.0942) | 0.84% |
Acknowledgements
There are no acknowledgments to declare.
Ethical Considerations
No human participants or animals were involved in the experimental procedures conducted for this research.
Consent to Participate
All respondents provided informed consent before enrollment in the study.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Department of Education of Jilin Province (JJKH20241362SK).
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
Data will be made available on request.
