Abstract
The Rohingya refugee crisis, a humanitarian tribulation involving the persecution of the Rohingya Muslim ethnic minority group in Myanmar, has led to a massive exodus of refugees, primarily women and children, to neighboring Bangladesh. Analyzing public opinion toward the Rohingya crisis poses a challenge due to the time complexity of manually assessing individual expressions from the vast amount of text on online platforms. This research focuses on identifying hidden patterns in online discussions surrounding the Rohingya refugee crisis, employing topic modeling and a thematic sentiment analysis-based approach. It represents the first comprehensive exploration of public views on internet spaces to support this community. In the experiment, we identified 15 coherent topics from 6,840 unique documents with a high coherence score of about 0.60. The key themes explored encompass familial resilience, the urgency of addressing the refugee crisis, complexities within the Rohingya situation, religious and cultural elements, and geopolitical considerations. Sentiment analysis revealed nuanced emotional tones, with positive sentiments in discussions about refugee support and international aid and mixed or negative sentiments in topics concerning religious dynamics and women’s protection. The implications of this research extend to guiding policymakers, humanitarian organizations, and advocacy groups in developing targeted interventions, communication strategies, and informed policy initiatives. In addition, the findings emphasize the importance of understanding and responding effectively to the multifaceted challenges faced by the Rohingya community.
Keywords
Introduction
The Rohingya refugee crisis is a humanitarian tragedy stemming from the persecution of the Rohingya Muslim minority in Myanmar (Ware & Laoutides, 2019; Zahed, 2021). This ethnic group, residing primarily in the western state of Rakhine, has endured systemic discrimination, violence, and denial of citizenship rights for decades (Uddin, 2022). The crisis reached a critical point in August 2017 when the Myanmar military launched a brutal crackdown in response to attacks by Rohingya insurgents on security forces (Islam, 2020). Atrocities such as killings, rapes, and the burning of Rohingya villages ensued, prompting a mass exodus of refugees fleeing the violence (Uddin, 2022). Hundreds of thousands sought refuge in neighboring Bangladesh, leading to the creation of one of the world’s largest refugee camps in Cox’s Bazar district (Hoque et al., 2023). The majority of the refugees are women and children (Karin et al., 2020) in the camps.
This refugee crisis is a pressing issue that demands immediate attention due to its multifaceted impact on various aspects such as human rights, malnutrition and poor health (Davidson et al., 2022), environmental degradation, and regional security. The ongoing crisis has led to a significant increase in the Rohingya refugee population, resulting in economic, social, and political challenges (Ullah & Chattoraj, 2023). The influx of Rohingya refugees has also been associated with forest degradation and loss of ecosystem function, necessitating the development of conservation and protection strategies to address the environmental impact (Hasan et al., 2020). Furthermore, the crisis has raised concerns about statelessness, displacement, and grave human rights violations, including genocide and crimes against humanity, emphasizing the urgency of resolving the situation (Mutaqin, 2018). So, this crisis requires a comprehensive and coordinated response from regional and international actors to address the multifaceted issues related to human rights, environmental impact, and regional security (Hidayat et al., 2018; Shukri, 2021).
Understanding the media dynamics, educational needs, societal implications, and political ramifications of the crisis is crucial for developing effective responses and policies to support Rohingya refugees (Heidenreich et al., 2024). However, a significant challenge in dealing with the vast amount of text data on Rohingya issues from online sources is the difficulty people face in identifying insights and hidden themes. The sheer volume of information can be overwhelming, making it challenging to discern meaningful patterns or extract relevant insights. The complexity is exacerbated by the diversity of perspectives, languages, and nuances within the online discourse. Manual analysis becomes impractical due to the sheer scale, leading to the need for sophisticated text analysis tools and techniques. In addition, misinformation and biased narratives further complicate the process, requiring careful consideration to ensure the accuracy and reliability of the insights obtained. Effectively addressing this challenge involves developing advanced text analytics capabilities, extracting insights from unstructured big data, it enables the automated extraction of useful information from vast amounts of textual data (Abbe et al., 2016; Adnan et al., 2021).
This study presents the first comprehensive exploration of public discussion on online platforms to support the Rohingya community. It leverages cutting-edge methods of text processing, including text data scraping, language detection and translation, text normalization, topic modeling, topic labeling and interpretation, and thematic sentiment analysis. The proposed approach encompasses topic modeling and thematic sentiment analysis to uncover hidden patterns within large text collections. Employing the latent Dirichlet allocation (LDA) generative model, this research identified topics based on prevalent terms extracted from a collection of 6,840 unique documents or textual facts. We achieved 15 coherent topics with a coherence score of about 0.60 using LDA, which performed better than non-negative matrix factorization (NMF) and latent semantic indexing (LSI). In our analysis, key themes discussed encompass family and hope, religious issues, geopolitical concerns, humanitarian issues, and societal and cultural aspects. The sentiment analysis revealed complex and multifaceted aspects. Positive sentiments were found in themes like refugee support, international aid, and illegal jobs, while neutral sentiments were found in topics like the Rohingya situation, the Muslim community, and government actions. Negative sentiments were found in discussions on religious dynamics, responsibilities, support, understanding, and women’s protection. Consequently, these findings offer crucial guidance for policymakers and humanitarian organizations, facilitating targeted interventions, effective communication strategies, and informed policy initiatives to address the complex challenges confronting the Rohingya community.
This article makes significant contributions to the understanding of the Rohingya refugee crisis, delving into its multifaceted impact on human rights, health, environmental degradation, and regional security. The main contributions are summarized as follows:
The organization of this article is as follows: section “Related Works” summarizes existing research on Rohingya issues, while section “Methodology” describes the proposed methodology used in this study. However, the experiments, results analysis, and discussions are presented in section “Experiments and Result Analysis” and section “Discussion,” respectively. Finally, the article concludes with future directions in section “Conclusion.”
Related Works
To conduct topic modeling on refugee information, it is essential to consider the challenges faced by refugees in accessing relevant and accurate information. Wall et al. (2016) highlighted the concept of “information precarity” experienced by Syrian refugees, which refers to their instability in accessing news and personal information, leaving them vulnerable to misinformation and rumors. This aligns with the findings of Hassan and Wolfram (2020), who identified specific information needs of refugees related to housing, health care, employment, and education. In addition, Oduntan and Ruthven (2017) emphasized the importance of understanding the contextual and individual variables, including sociological processes, in refugee integration to identify their information needs. Moreover, the role of information in shaping attitudes toward refugees was highlighted by Hayo and Neumeier (2020), indicating that information provision can influence public attitudes toward the right to asylum. This underscores the significance of understanding the information needs and behaviors of refugees, as emphasized by Mansour (2018) in profiling the information needs of Syrian refugees displaced to Egypt.
The sentiment analysis of the refugee crisis has been a topic of extensive research, particularly in the context of its impact on public attitudes, policy preferences, and media discourse. Several studies have highlighted the association between exposure to the refugee crisis and the rise in anti-refugee sentiments, support for anti-immigration policies, and decreased willingness to support refugees (Hangartner et al., 2018; Schaub et al., 2020; Stockemer et al., 2019). Furthermore, the refugee crisis has been linked to an increase in anti-refugee sentiments globally, with specific attention to its impact on European countries (Bohnet & Rüegger, 2021; Kopacheva & Yantseva, 2022; Yantseva, 2020). The media’s role in shaping public perceptions and sentiments toward refugees has also been emphasized, with media coverage contributing significantly to the construction of dominant representations of refugees and influencing attitudes and behaviors toward them (Akimova et al., 2020; Nerghes & Lee, 2019).
Moreover, the sentiment dynamics during the crisis have been examined, with studies employing sentiment analysis to understand the evolution of sentiments and framing in media discourse (Chung et al., 2019; Greussing & Boomgaarden, 2017; Yantseva, 2020). The impact of the crisis on nationalism, immigration attitudes, and support for extreme-right parties has also been a subject of investigation, highlighting the complex interplay between the refugee crisis and political attitudes (Brug & Harteveld, 2021; Dinas et al., 2019). In addition, the crisis has been found to influence public opinion dynamics in a manner similar to responses to other large-scale crises, indicating the need for a comprehensive understanding of its societal implications (Nordø & Ivarsflaten, 2021).
The refugee crisis has also been associated with heightened xenophobia, religious tensions, and political turmoil, with implications for global refugee policies and interventions (Bemak & Chung, 2021; Pratisti et al., 2019). The short-term exposure to refugees during the crisis has been linked to anti-refugee voting and sentiments, underscoring the immediate impact of the crisis on political behavior (Gessler et al., 2022). In addition, the framing of the crisis in media discourse has been analyzed, shedding light on the construction of narratives and representations that shape public perceptions and responses (Daalmans et al., 2019; Holmes & Castañeda, 2016).
There is a comprehensive research (Ansar & Maitra, 2024) that provides valuable insights into user engagement with the Rohingya crisis on social media by categorizing engagement metrics such as likes, shares, and reactions (positive and negative). Its strength lies in the clear quantification of user interactions, offering a straightforward metric-based analysis of public sentiment. However, the study’s reliance on manual categorization and surface-level engagement metrics is a significant limitation, as it overlooks the deeper context and content of user comments. This approach provides only a superficial understanding of public opinion, missing out on the nuanced themes and sentiments that can be revealed through more advanced text mining techniques. In addition, text processing also takes less time complexity when dealing with large amounts of text, making it a more efficient method for analyzing extensive data sets.
Furthermore, several articles collectively offer significant insights into how the Rohingya diaspora engages with transnational and online spaces, shedding light on digital and social dynamics. Aziz (2022) investigates how polymedia affordances foster affective networked spaces among the Rohingya diaspora, emphasizing the role of digital platforms in maintaining cultural ties and emotional support networks. In a complementary study, Aziz (2024) maps the online spaces where Rohingya identity, visibility, and resistance are constructed, focusing on visibility and identity rather than direct user interactions or comments. Ansar and Khaled (2022, 2023) delve into civic engagement and political mobilization within the Rohingya digital diaspora, exploring virtual togetherness and collective identities for activism. Abraham and Jaehn (2020) examine the Rohingya’s quest for international recognition within the framework of nationhood and international relations. However, there remains a research gap in understanding how user-generated content on social media platforms shapes public opinion and influences advocacy efforts among the Rohingya community. Future research should include direct analyses of user interactions to better grasp the nuances and impacts of digital diaspora activism using automated text processing approaches.
Existing research is constrained because online text processing on the Rohingya crisis is scarce, with most available studies relying on manual categorization and analysis. These studies lack in-depth analyses that focus on the unique challenges and dynamics of the crisis. While some research addresses sentiment analysis and media discourse, there is a pressing need for more detailed examinations of online discussions related to the Rohingya crisis. Specifically, extracting and analyzing hidden patterns in user comments would provide a more comprehensive and detailed understanding of public discourse surrounding the crisis. Moreover, integrating identified information needs of refugees with sentiment analysis within the Rohingya crisis context is crucial for a holistic view of the challenges faced by the Rohingya community. Addressing these gaps enhance our understanding of the Rohingya refugee crisis’s online discourse and provides nuanced insights for policymakers, humanitarian organizations, and researchers.
Methodology
This study is intended to contribute to a better understanding of how humanitarian crises and conflicts affect refugee dynamics. Our methodology, depicted in Figure 1, uses advanced text processing techniques to analyze the online discourse surrounding Rohinga refugee issues. The method’s key components include text data scraping, language detection and translation, text normalization, topic modeling, topic labeling and interpretation, and thematic sentiment analysis.

Architectural diagram illustrating the text analysis process for exploring public views on the Rohingya crisis.
The initial phase, Text Data Scraping, focuses on systematically extracting online discussions related to Rohingya from diverse sources, including social media platforms such as Twitter and Facebook as well as traditional news outlets. This extraction is performed using predefined keywords that are relevant to Rohingya refugee issues. By leveraging web scraping techniques, we aim to compile a comprehensive data set that reflects the varied perspectives and discussions surrounding the Rohingya crisis. The resultant extracted texts are considered raw text data or unprocessed data, as they are in their original, unstructured form. This raw data encapsulates the richness of online conversations but is characterized by their unclean nature and the inherent heterogeneity of languages and formats.
In the subsequent phase, Language Detection and Translation, our objective is to identify the languages present in the extracted texts and, if necessary, translate them for uniformity. Initially, each text or document is broken down into smaller units, known as tokens. This tokenization facilitates efficient language detection and subsequent translation. We meticulously analyze the language of each token and, if required, convert them into English. This systematic approach ensures that the entire corpus undergoes a consistent language transformation, as described in Algorithm 1. By processing each text through this method, we aim to establish a standardized language across the data set, facilitating coherent and comprehensive analysis in subsequent phases.
Language Detection and Translation.
Text normalization refines the extracted and translated texts, transforming them into a standardized and consistent format to facilitate comprehensive analysis. This multifaceted normalization process involves removing hash tags, symbols, mentions, URLs, special characters, numbers, and punctuation. It also encompasses expanding contractions, converting text to lowercase, and applying stemming or lemmatization to words. Moreover, the process aims to reduce the impact of case variations, maintain uniformity in word representation, ensure a unified vocabulary, and improve the overall cohesiveness of the text data set. This refined and standardized text data set serves as the foundation for subsequent analysis tasks. An example of the cleaned text is shown in Example 1 after removing #, numbers, punctuation, stop words, and lowering letters.
Topic Modeling (TM) is an unsupervised machine-learning method that identifies hidden patterns in large text collections, helping to identify themes and issues related to the Rohingya refugee crisis. It aids in understanding how information and misinformation are spread, creating counter-information measures, and providing a comprehensive understanding of the Rohingya refugees’ experiences. In our methodology, we leverage the Latent Dirichlet Allocation (LDA) generative model as part of the TM process. This model assumes that documents are mixtures of topics, and topics are mixtures of words. It offers a natural way to model a corpus’s underlying structure. The mathematical structure of the LDA model is defined by plate notation, which includes elements such as the Dirichlet priors on the per-document topic distributions (
Each identified topic is assigned a label based on prevalent terms extracted from corresponding documents. The algorithmic procedures presented in this phase collectively aim to assign a meaningful topic name based on sentiment scores associated with the terms extracted from a given input topic. We introduce Algorithm 2, which orchestrates this process by leveraging the capabilities of the VADER 1 sentiment intensity analyzer (sid). The input topic is initially split into individual terms, and an empty dictionary, sentiment_scores, is established to store the sentiment scores for each term. Subsequently, sentiment scores are computed for each term by invoking the GetSentimentScores procedure (Algorithm 3), utilizing the VADER analyzer (sid). The net sentiment scores for each term are then determined by subtracting the negative sentiment score from the positive sentiment score. The algorithm proceeds to identify the term with the highest net sentiment score using the argmax operator, ultimately assigning this term as the topic name.
AssignTopicName.
GetSentimentScores.
Algorithm 3 encapsulates the process of calculating sentiment scores for a given term using the VADER sentiment intensity analyzer (sid). The sentiment scores, comprising positive (“pos”) and negative (“neg”) values, are obtained through the polarity_scores method of the VADER analyzer. These scores are then encapsulated in a tuple, which is subsequently returned as the output.
In the conclusive phase of our research methodology, we employ Thematic Sentiment Analysis (TSA) to extract the sentiments from the topics previously identified and labeled. In other words, it involves performing sentiment analysis on the content related to each identified topic. We meticulously examine the distribution of sentiments within each topic to gain profound insights into the emotional tones associated with various aspects of global refugee issues. Furthermore, we explore into investigating the correlation between specific topics and sentiment scores, aiming to comprehend whether certain topics are more inclined to elicit positive or negative sentiments. In addition, we explore potential changes in sentiment dynamics over time for specific topics, assessing whether there are temporal variations in emotional responses to the identified themes. We also take a focused approach by placing emphasis on the terms used within the identified topics, statements made, and expressions found in online discussions, known as “augmented topic analysis.” This specialized analysis enhances our exploration, offering a detailed and contextually rich understanding of sentiments within the identified topics.
Experiments and Result Analysis
This section represents experiments of this study and results obtained from it. We also present the outcomes of these experiments, offering insights, trends, and interpretations that contribute to a deeper understanding of the subject matter under investigation.
Data Collection and Preparation
We collected the data set systematically by extracting online discourse related to Rohingya from diverse sources. Our focus was on public discussions about Rohingya refugees on Facebook, Twitter, YouTube, Newsfeed, newspaper headlines, and research article abstracts. This cross-platform approach was adopted to ensure a comprehensive representation of perspectives, ranging from local activists to international voices. In addition, utilizing multiple platforms enabled us to gather a large data set for analysis, enhancing the robustness and generalizability of our findings. By targeting a wide range of social media platforms and sources, we aimed to capture diverse viewpoints and narratives surrounding the Rohingya crisis. Utilizing API-based tools such as Facepager, 2 Twitter API, 3 News API, 4 and Google Scholar, 5 we extracted public texts, each method tailored with specific keywords and extraction techniques. This approach not only enhances the reliability and validity of our findings by mitigating platform-specific biases but also enriches our analysis by incorporating a wide array of perspectives. The detailed procedures for data collection are outlined in Table 1.
Data Collection Sources and Methods for the Rohingya Crisis.
After systematically collecting the data set through the outlined sources and methods in Table 1, the initial data set comprised a total of 7,038 records or texts. To ensure data integrity and eliminate redundancy, a meticulous process of duplicate removal was undertaken, resulting in a refined data set of 6,840 unique records. These records were initially referred to as raw texts or unprocessed texts. Subsequently, we implemented a language translation process, as outlined in Section “Methodology”, to transform the data set into unilingual texts. Finally, the data set underwent a cleaning phase utilizing normalization techniques, ensuring its preparedness for subsequent processing stages.
We have incorporated two quotes from our original data set exemplifying both positive and negative thematic sentiments toward the Rohingya crisis. One quote highlights negative sentiments toward individuals in Indonesia: “Shame on Indonesian. U are fake muslim. God will do same to u in coming days.” This underscores the intensity of negative thematic sentiments expressed toward certain groups, enriching our analysis of online discourse, particularly within discussions on religious dynamics and responsibilities. In addition, a quote reflects positive sentiments toward international efforts: “World News in Brief: Food aid relief for Rohingya, 5 new Security Council members, jailing of Russian poets draws rebuke from rights expert.” This highlights positive aspects of the international response to the Rohingya crisis, specifically efforts to provide humanitarian aid, aligning with themes of refugee support and international aid.
Topic Extraction and Analysis
The primary aim of this research is to reveal latent themes within online discussions related to the Rohingya refugee crisis. To accomplish this, we utilized the LDA model, adjusting its key parameters to optimize effectiveness. The configuration of the LDA model underwent fine-tuning by defining ranges for alpha, eta, and the number of topics, with the goal of obtaining optimal settings for our analysis, as detailed in Table 2. Subsequently, optimal configurations for different numbers of topics were identified and summarized in Table 3. Each row in this table represents a specific configuration, outlining the number of topics, the best alpha value, the best eta value, and the corresponding best coherence score. The coherence score serves as a measure of the identified topics’ coherence within the model. This table acts as a reference for the selected hyperparameters that produced the highest coherence scores during the LDA topic modeling process.
Parameter Values for LDA Model.
Best Hyperparameter Values for LDA Topic Modeling.
The extraction of the top terms for each topic, coupled with relevance scores, reveals the most pertinent and distinctive terms that contribute meaningfully to the characterization of that topic. These terms enhance the interpretability of the topic model’s results and provide a concise summary of the content encapsulated by each topic. In other words, the salient terms and their associated probability scores provide overall insights into the thematic content of each topic. In Topic 2, for instance, terms like “refugee” and “rohingya” dominate with high probabilities, emphasizing a focus on discussions related to Rohingya refugees. Similarly, Topic 6 is characterized by terms such as “country” and “enter,” suggesting discussions about entering countries, as depicted in Figure 2. This visual representation offers a nuanced understanding of the prevailing themes within each topic, showcasing the diverse array of subjects covered in the data set’s content.

The most salient terms in each topic, showing the probability of each term across topics.
Gaining insights into the distribution of topics across our corpus, which is characterized by 15 distinct topics, is crucial for a thorough understanding of the data set. The topic distribution, depicted in Figure 3, serves as a quantitative representation of the prevalence of each topic, offering a comprehensive overview of their impact on the corpus. To obtain this distribution, we calculated the topic distribution for each document in the corpus, identifying the maximum document probability for each topic. Subsequently, we counted the number of documents for which each topic had the highest document probability. For instance, Topics 6 and 12 exhibit higher document counts, indicating their prevalence across a larger number of documents. Conversely, Topics 5 and 15 have notably lower document counts, suggesting a more limited representation within the corpus. Leveraging this quantitative data enables us not only to validate the efficacy of our topic modeling algorithm but also to perform targeted analyses, such as identifying topics that may require deeper exploration or scrutiny.

Quantitative representation of topic distribution across the data set, indicating the number of documents for each topic.
The extracted topics, as presented in Table 4, from the data set cover a diverse range of subjects related to the Rohingya refugee crisis. Each topic consists of the top ten terms or words and is labeled manually and sentiment-based (based on the Algorithm 2) to provide insights into the online contents. For instance, Topic 2, labeled “Refugee Crisis,” encompasses terms like “refugee,” “rohingya,” and “care,” indicating discussions focused on the crisis and the care required. Similarly, Topic 14, labeled “Food and Aid,” includes terms such as “food,” “aid,” and “provide,” shedding light on discussions related to humanitarian assistance. The sentiment-based labels, such as “hope,” “care,” and “support,” further categorize the topics based on the emotional tones associated with the discussed content. This classification facilitates a deep understanding of public attitudes and sentiments surrounding global refugee issues on online platforms or public forums.
Extracted Topics and Labels With Manual and Sentiment-Based Categorizations for Discussions on the Rohingya Refugee Crisis.
In Table 4, we presented the 15 obtained topics along with their labels, derived using both manual and sentiment-based methods. The consistent relationship between these labeling methods is evident, showing the close emotional context assigned to each topic. To explore deeper into the content of these topics, we conducted a meticulous examination of sentiment distribution within each topic, utilizing the TSA method as outlined in Section “Methodology”. Figure 4 illustrates the sentiment polarity of each topic. The sentiment analysis reveals distinct emotional tones associated with various aspects of global refugee issues related to the Rohingya crisis. We identified seven positive, five negative, and three neutral expressions on the topics.

Thematic sentiment analysis of extracted topics.
We also observe in Figure 4 that the TSA indicates topics linked with positive sentiments, such as hope, care, assistance, support, and positive actions, reflect optimistic and affirming discussions. Conversely, topics characterized by negative sentiments may involve critical or adverse conversations, potentially highlighting challenges, conflicts, or emotionally charged aspects of the Rohingya crisis. Furthermore, topics with neutral sentiments likely encompass factual information or discussions where sentiment polarity is not strongly expressed. This nuanced categorization of sentiments enriches our understanding of the emotional landscape surrounding the Rohingya crisis, offering insights into public attitudes and perceptions on online platforms or public forums.
Improving the interpretability and dependability of topic modeling results requires post-topic modeling similarity analysis. This analysis between topics offers a variety of insights to distinguish between topics and show how distinct they are from one another. Higher scores reflect semantic consistency and possible topic connections, whereas lower values show clearly defined, non-overlapping thematic classifications. Moreover, this analysis acts as a tool for evaluating quality as well, pointing out issues with the topic modeling approach and recommending possible duplication. The flexibility of the analysis allows for incremental enhancements that enhance the robustness and interpretability of the topic modeling results. The cosine similarity matrix, illustrated in Figure 5, serves as a critical tool for unraveling the semantic relationships among the 15 distinct topics derived from the topic modeling process. Each cell in the matrix denotes the extent of similarity between pairs of topics, encompassing values from 0.0 to 1.0. Starting with the diagonal elements, which consistently register a perfect similarity of 1.0, it is evident that each topic is self-consistent, aligning with the expected outcome. The off-diagonal elements reveal intriguing insights into the similarity and dissimilarity between topics. For instance, the remarkably low similarity score of 0.0 between Topic 1 and Topic 2 suggests a stark contrast in the terms and content characterizing these topics. However, the moderate similarity score of 0.4 between Topic 2 and Topic 3 indicates a shared thematic context to some extent. Throughout the matrix, scores close to 1.0 signify high similarity, while scores closer to 0.0 represent low similarity. Notably, the matrix assists in identifying distinct topics with dissimilar terms and elucidates the nuanced relationships between topics that share common conceptual ground.

Exploring semantic relationships using cosine similarity matrix among extracted topics.
Comparative Analysis of Topic Models
In this research, our primary focus was on attempting to computationally analyze online content related to Rohingya issues. Given that there is no similar work accomplished in the literature, we conducted a comparison of our approach with state-of-the-art models in terms of topic modeling. Specifically, we aim to demonstrate the rationale behind our selection of the LDA model over other machine translation models found in the literature. Our comparative analysis involved assessing LDA models against NMF and LSI.
In our comparative analysis of the coherence scores obtained from three distinct topic modeling models—LDA, LSI, and NMF—a comprehensive examination of their performance across varying numbers of topics reveals insightful patterns, as depicted in Figure 6. LDA consistently demonstrates competitive coherence scores, showing an upward trend as the number of topics increases. Identifies the highest coherence score for LDA at topic number 15, reaching a peak of 0.60, suggesting its proficiency in capturing intricate patterns within the data set. In contrast, LSI exhibits a decline in coherence as the number of topics rises, indicating a potential limitation in maintaining interpretability with a higher topic count. NMF, while demonstrating competitive performance, tends to yield lower coherence scores compared to LDA and LSI across various topic counts. Overall, our findings suggest that, based on coherence scores, LDA with 15 topics emerges as a promising choice for computationally analyzing online content related to Rohingya issues in this specific context.

Comparison of topic modeling approaches.
Discussion
Our study employed a comprehensive methodology to delve into the intricate dynamics of how humanitarian crises and conflicts impact refugee dynamics, specifically focusing on the Rohingya crisis. Through advanced text processing techniques and a well-defined methodology (depicted in Figure 1), we were able to extract, translate, and normalize online discourse, ultimately conducting a thorough analysis using TM and TSA.
Insights From Topic Modeling
The application of LDA, surpassing LSI and NMF in performance, resulted in the extraction of 15 topics with a coherence score of approximately 0.60, indicating a reasonable level of topic coherence. These topics encapsulate the diverse facets of discussions surrounding the Rohingya refugee crisis. Table 5 provides an overview of key themes and insights derived from the identified topics. The topic labeled “Family and Hope” underscores the resilience of the Rohingya community in the face of adversity. Emphasizing positive family values and the anticipation of new life, this topic suggests that, despite the challenges of displacement, individuals within the Rohingya community maintain a sense of hope and commitment to familial bonds. Humanitarian efforts should recognize and support these emotional dimensions, considering the importance of family-centered approaches in providing assistance and fostering resilience.
Exploration of Diverse Topics With In-Depth Insights.
The topic “Refugee Crisis” highlights the urgency of addressing critical issues surrounding refugees, particularly the Rohingya. The emphasis on care and response to the claims of refugees signals a collective awareness of the severity of the crisis. This insight reinforces the need for swift and comprehensive humanitarian responses, emphasizing the protection of refugee rights and fostering a supportive environment for those affected. Similarly, the topic “Rohingya Situation” delves into the complexities of the crisis, touching upon staying, islands, and academic perspectives. This multifaceted understanding reflects the diverse viewpoints and analytical depth present in the discussions. Policymakers and humanitarian organizations can benefit from considering these academic perspectives, ensuring a well-informed approach to addressing the nuanced challenges faced by the Rohingya population.
Furthermore, “Religious Discussions” illuminates the intersection of religious and cultural elements within the Rohingya crisis, covering Islamic and Buddhist perspectives. This topic emphasizes the importance of recognizing and respecting diverse religious beliefs to promote understanding and inclusivity. Addressing issues related to religious persecution requires an approach that acknowledges and accommodates the cultural and religious diversity within the Rohingya community. Finally, the topic “Government and Borders” delves into discussions related to government actions, borders, and involvement with Malaysia. Insights from this topic highlight the need for informed policymaking as well as the exploration of new perspectives in the context of government and borders. Inclusive approaches that involve collaboration with governments, international organizations, and local communities are essential for addressing the complex geopolitical dimensions of the Rohingya crisis.
Key Themes Discussed in Online Forums
Analysis of the identified topics revealed key themes discussed by individuals in online forums. These themes reflect a broad spectrum of concerns, ranging from geopolitical aspects such as political instability in Myanmar and international aid efforts to more personal and emotional topics like cultural identity and public perception. The discourse on political instability in Myanmar signifies global awareness and engagement with the ongoing challenges faced by the nation, particularly the Rohingya crisis and the situation of refugees. Concurrently, discussions on international aid efforts underscore the collective interest in addressing humanitarian issues and providing support to affected regions.
The key themes extracted from the provided topics encompass a range of humanitarian, societal, cultural, and responsibility-oriented subjects. Humanitarian themes delve into the importance of family bonds and hope, the challenges faced by refugees, particularly the Rohingya crisis, and the overall situation affecting the Rohingya population. Respecting and welcoming homes, providing assistance in Bangladesh, and addressing issues within the Muslim community and illegal job scenarios are crucial aspects. Societal and cultural discussions include exploring religious beliefs, learning about different countries, government roles in managing borders, and the dynamics of belief systems and rejection. Finally, the theme of responsibility and action emphasizes the need to address societal issues, such as supporting and understanding victims, protecting women, and providing aid and sustenance. Together, these themes encapsulate a comprehensive view of various social, cultural, and humanitarian aspects.
Sentiments and Expressions in the Topics
TSA allowed us to unravel the emotional tones associated with each identified topic. Our investigation into the sentiments expressed in online discussions shed light on the complex and multifaceted nature of the Rohingya crisis. Some topics invoked predominantly positive sentiments, such as discussions around international aid and assistance, while others, like human rights violations and religious persecution, elicited more negative sentiments.
The sentiment analysis of the identified topics in online discussions on the Rohingya crisis reveals a nuanced and emotionally diverse landscape. Positive sentiments are evident in themes related to refugee support, international aid, assistance from Bangladesh, and discussions on illegal jobs and riches. These topics resonate with empathy, hope, and a willingness to contribute positively to addressing the crisis. However, neutral sentiments characterize topics like the Rohingya situation, the Muslim community, and government actions, reflecting the varied perspectives and complexities inherent in these subjects. Meanwhile, discussions on religious dynamics, responsibilities, support and understanding, and women’s protection evoke mixed or negative sentiments, highlighting the challenges, critical perspectives, and emotional nuances associated with these sensitive issues. Overall, the sentiment analysis not only provides a glimpse into the emotional tones of online conversations but also underscores the multifaceted nature of discussions surrounding the Rohingya crisis and related themes.
Implications of the Research
The implications of our research extend beyond understanding online discourse to informing actionable recommendations for policymakers, humanitarian organizations, and advocacy groups. Our study provides specific recommendations based on empirical findings:
Enhancing international aid and assistance: Prioritize and strengthen support for international aid programs aimed at Rohingya refugees, ensuring equitable distribution of resources and coordinated efforts among stakeholders.
Addressing human rights violations and religious persecution: Advocate for accountability for human rights abuses and promote religious tolerance within affected communities.
Developing targeted communication strategies: Tailor communication strategies to effectively engage with stakeholders and address the diverse perspectives and sentiments surrounding the Rohingya crisis.
Promoting sustainable solutions and long-term support: Focus on sustainable initiatives that address root causes of the crisis and ensure long-term support for Rohingya refugees’ well-being and integration.
Furthermore, our research findings can have practical implications for addressing the Rohingya refugee crisis and countering misinformation effectively, including the following:
Media ethics and responsible reporting: Our findings can inform media professionals and journalists about the prevalent themes and sentiments within online discourse related to the Rohingya refugee crisis. By understanding the nuances of public perceptions, media practitioners can adopt more ethical and responsible reporting practices, avoiding the perpetuation of misinformation and promoting factual accuracy.
Advocacy and awareness campaigns: Humanitarian organizations and advocacy groups can utilize our insights to tailor their messaging and awareness campaigns effectively. For instance, understanding the positive sentiments surrounding discussions about refugee support and international aid can guide the development of advocacy strategies aimed at mobilizing support and resources for Rohingya refugees.
Policy formulation and decision-making: Policymakers and governmental agencies can leverage our research findings to inform evidence-based policy formulation and decision-making processes. Insights into public sentiments regarding the urgency of addressing the refugee crisis and the complexities within the Rohingya situation can guide policymakers in developing targeted interventions and allocating resources appropriately.
Conclusion
This study represents a comprehensive exploration of the Rohingya refugee crisis within the domain of online public discourse, employing advanced text processing techniques, specifically topic modeling and sentiment analysis. We employed LDA modeling for TM, outperforming NMF and LSI models, resulting in the extraction of 15 coherent topics with a commendable coherence score of approximately 0.60. Consequently, this research has deepened our understanding of the multifaceted challenges confronting the Rohingya community.
The identified key themes included familial resilience, the pressing need to address the refugee crisis, complexities within the Rohingya situation, religious and cultural elements, and geopolitical considerations. The sentiment analysis uncovered intricate emotional expressions in online discussions, revealing positive sentiments concerning refugee support and international aid, while discussions on religious dynamics and women’s protection evoked mixed or negative sentiments.
The implications of our research extend to informing the decision-making processes of policymakers, humanitarian organizations, and advocacy groups. Positive sentiments point to opportunities for collaborative efforts, while negative sentiments underscore the urgency of addressing specific issues. Overall, our findings contribute significantly to the broader discourse on humanitarian crises, emphasizing the crucial role of advanced text processing techniques in facilitating a more effective and empathetic response to the Rohingya crisis. However, in the future, we plan to expand this research and develop an application aimed at extracting insights from online platforms within refugee communities worldwide.
Footnotes
Acknowledgements
We would also like to express our gratitude to the Department of Computer Science and Engineering at Jagannath University for providing the necessary laboratory facilities to conduct the research presented in this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: We would like to acknowledge the UGC-supported Jagannath University Fund for providing partial financial support for this research.
