Abstract
Objective
The use of social media during the COVID-19 pandemic has been researched extensively since the outbreak. Sina Weibo, as one of most commonly used social media platforms in China, played an important role in public expression for the duration of COVID-19. We investigated the themes that emerged from the posts and examined the sentiments associated with each theme.
Methods
For this study, we collected 72,084 Weibo posts related to the 2022 Shanghai public health event to present a thematic and sentiment analysis of posts by the public.
Results
The findings showed that the public was more inclined to express concerns about the impact of the outbreak and of outbreak containment measures on their personal lives on social media and exhibited negative attitudes and opinions rather than discussing the impact of COVID-19 on human life and health, suggesting that the impact of the outbreak on people's daily lives was greater than was the impact on their livelihoods and health risks.
Conclusions
This research highlights the importance of understanding the role of social media in times of crisis and the potential insights that can be gained from analyzing online public discourse. Our empirical findings provide insights for future public health communication strategies and crisis management plans in China in the information age.
Introduction
Over the past three years, the novel coronavirus (COVID-19) has been a major global public health emergency with a long duration due to its rapid spread, widespread infection rate, and the tremendous difficulty involved in preventing and controlling the virus. This public health event was a crisis and a major challenge for every country. From 2020 to 2023, the daily life of the public underwent remarkable changes, particularly when managing scattered outbreaks in cities, ranging from causing severe and sudden damage to normal global operations, to the normalization of epidemic prevention and control, to the complete relaxation and lifting of prevention and control measures by the end of 2022. The confusion, panic, and uncertainty surrounding public health events appeared to have shifted to how the pandemic was disrupting routines and encouraging people to rebuild their daily lives. This shift will affect how the public responds to public health events at present and in the future.
During the pandemic, social media platforms facilitated the rapid dissemination of relevant information at regional, national, and international levels, as well as a robust tool to obtain public opinion and sentimental information in the context of disaster and emergence cases. 1 Social media platforms and the widespread dissemination of information about COVID-19 on them helped to increase knowledge about personal protective measures to control and prevent COVID-19 in communities, and enabled communication with friends and family to decrease the loneliness and tedium associated with anxiety and prolonged distress. 2 Several social media companies reported a spike in social media use after the stay-at-home policy was implemented. For example, Facebook's analytics department reported that overall messaging increased by more than 50% in March 2020; WhatsApp usage also increased by 40%. On one hand, social media has the characteristics of immediacy, many-to-many, and huge information flow, which encouraged the public to take the initiative to obtain information about the epidemic via social media. On the other hand, home isolation made the public more dependent on social media, thus causing social media to become one of the important information channels for obtaining news about the epidemic.
The use of the internet and social networks during Chinese public health events has been of significance in recent years. According to the 2020 User Development Report, Sina Weibo had 511 million active users in the month of September 2020. Immediately after the outbreak of the COVID-19 public health emergency, Weibo launched a special section on “Fighting Pneumonia,” which assisted governments and state media to express their authoritative voices, provide information about prevention and control measures, and to dispel rumors and misinformation through more than 70,000 videos and 30,000 live broadcasts. More than 200 million users tracked the epidemic on Weibo every day; in addition, more than 70 million users visited the “Fighting pneumonia” section every day. Weibo also launched a “Help Message for Pneumonia Patients” by working closely with the Wuhan government, the media, and public welfare organizations to allow people's queries to be responded to quickly. Such real-time and rapid connectivity, and the tremendous flow of information, supported the public focus on prevention and control measures and knowledge about health.
The COVID-19 outbreak has introduced a new era of social media use and practice, thus stimulating a new vision of interdependence and resource interaction in the information age and forging a new path for information consumption and dissemination between the government and the public. This has resulted in knowledge and experience quietly changing from what it was prior to the epidemic.
During public health crises, understanding the nuances of public sentiment is crucial for formulating effective responses. The detailed sentiment scale used in this study goes beyond the simplistic positive/negative/neutral grading by identifying specific emotions such as happiness, anger, fear, and sadness. This granularity enables authorities to tailor their communication and intervention strategies more precisely. For instance, if a significant portion of posts expresses anger or fear, authorities can address the specific causes of these emotions, whether they stem from misinformation, lack of resources, or perceived inefficacies in public health measures. By monitoring shifts in sentiment, authorities can dynamically adjust their strategies to reduce negative emotions and foster a more positive public outlook.
In this study, we treat the Pandemic outbreak in Shanghai as a particular case. Zhou et al. have found that during the 2022 Shanghai public health event, the accessibility to abundant medical resources and public services might be impeded in districts with more rigorous lockdown policies. 3 Moreover, there is an obvious causal relationship between government behavior and public mood change, and the impact on negative mood is greater than that on positive mood. 4 The 2022 Shanghai public health event is remarkable since it highlights the side-effect of the outbreak and of outbreak containment measures on their personal lives. This means that policymakers need to better anticipate and address public needs that may arise from future blockades by accessing public opinion and sentiment.
Our analysis shed light on the ways in which the public used social media during this outbreak. By examining the themes and sentiments that emerged and the correlation between the major themes and sentiment score, we argue that the impact of the outbreak on people's daily lives was greater than was the impact on their livelihoods and health risks. This research highlights the importance of understanding the role of social media in times of crisis and the potential insights that can be gained from analyzing online public discourse. Our empirical findings provide insights for future public health communication strategies and crisis management plans in China in the information age.
Literature review
Crisis communication through social media: government–public interactions during COVID-19 in China
The COVID-19 pandemic brought unprecedented challenges to public health, necessitating efficient crisis management and communication. Social media emerged as a crucial platform for the government to communicate with the public, disseminating information, shaping public opinion, and coordinating responses to the pandemic.
Many studies have explored the government's use of social media as a tool for managing the public health crisis. For instance, Xue et al. emphasized the importance of rapid governmental response and effective communication on social media to mitigate fear, uncertainty, and anxiety among citizens during public health emergencies. 5 Similarly, Zheng and Vicari highlighted the role of Weibo, controlled by government media, in promoting public policies during the COVID-19 outbreak, illustrating how social media could rally public support for health initiatives. 6 Wu and Feng demonstrated how state media used platforms like Douyin to disseminate positive narratives that fostered social unity and reinforced community values in the face of the pandemic. 7
During the COVID-19 pandemic, crisis communication through social media became a pivotal strategy for the Chinese government to manage public health responses and foster interaction with citizens. Shao et al. illustrated how WeChat official accounts served as essential platforms for the government to disseminate real-time information, addressing public fears and uncertainties during the crisis. 8 These accounts allowed for the rapid transmission of accurate health information, helping to reduce misinformation and build public trust. Similarly, Liao et al. studied the early stages of the pandemic, revealing how the government engaged with public concerns through social media, responding promptly to questions and offering updates. 9 Their findings emphasized the role of government responsiveness in stabilizing the public's emotional state and encouraging compliance with health directives.
Despite these efforts, challenges remain in improving the interaction between the government and the public on social media. Zeng et al. noted that governmental microblogs, while becoming more interactive post-COVID-19, still lack sufficient two-way communication. 10 Effective communication remains critical for fostering trust and legitimacy during crises, as seen in Meadows et al.'s analysis of Weibo posts, which revealed the government's strategic use of themes related to investigations, policies, case updates, and prevention efforts to maintain authority and credibility. 11 Chen et al. further expanded on this by examining dialogic communication on local government social media accounts. 12 Their research found that while some efforts were made to foster two-way communication, many local governments struggled to maintain meaningful dialogue with the public. This gap in interaction highlighted a need for improved communication strategies that not only inform but also engage citizens more actively.
On the other hand, the public's engagement with social media during the pandemic reflected their diverse needs and emotional states. Yang et al. explored how social media served as a platform for public voices, influencing policy evolution and public health innovations. 13 Posts about symptoms and diagnoses on platforms like Weibo also provided valuable predictive data for public health authorities, as Shen et al. demonstrated, allowing for timely responses to the pandemic. 14 Zhou et al. explored the emotional dynamics of social media content, showing that posts from both the government and the public played a significant role in shaping collective sentiment. 15 Emotional appeals, such as messages of solidarity and hope, were found to be particularly effective in fostering a sense of community and social unity during the pandemic.
Meanwhile, Luo et al. conducted a content analysis of vaccination promotion on social media, revealing that Chinese public health authorities effectively used persuasive messaging to drive vaccination campaigns. 16 Clear, informative posts that highlighted the benefits of vaccination and provided direct calls to action proved successful in encouraging public participation in the vaccination effort.
In summary, the interaction between the government and the public on social media during COVID-19 has evolved but remains an area requiring further improvement, particularly in enhancing dialogue and addressing public concerns. Effective use of social media by both the government and the public is essential for managing public health crises, mitigating fear and anxiety, and fostering a cooperative relationship that supports timely responses and crisis management efforts. This body of work underscores the importance of a well-coordinated crisis communication strategy that leverages social media to engage the public, provide timely information, and foster a collaborative response to public health emergencies.
Information related to public health and sentiment on social media
Types of information related to public health
The lockdown in Wuhan, in response to the COVID-19 pandemic contained the infection, but also created a tremendous information challenge for people living in social isolation, and led to a surge in online health information seeking (OHIS) behavior by patients and their families. Within the first 20 days of the lockdown, Zhao and Basnyat summarized the following themes based on a total of 10,908 “#COVID-19 patients seeking help” posts on Weibo: (a) Accessing medical services through OHIS, (b) Managing self-isolation through OHIS, (c) Accessing practical Support, and (d) Addressing Information Disparities through OHIS. After first 20 days, quarantine began to become a more common and strictly enforced epidemic prevention policy across the country until the policy was removed at the end of 2022. 17 In response to the practical need to control the risk of online public opinion arising from localized COVID-19, some researchers observed a relationship between the volume of social media postings and the severity of the outbreak in the real world. For example, Liu et al. observed the state of the epidemic as it entered its normalization phase; they divided the period from May 7, 2020, to September 3, 2021, into seven phases, and the results revealed that localized outbreaks of COVID-19 increased the risk of the risk of negative public opinion being disseminated online, while the associated analysis confirmed that the level of public opinion risk was correlated positively with the severity of the outbreak in the real world. 18
Extensive studies have identified different categories of information pertaining to different topics that the public consumes in the early stages of a public health event; for example, Zhao et al.'s study classified the topics in COVID-19 discourse in which the public was most interested in the early stages of the outbreak according to the categories of the situation regarding new cases of COVID-19 and its impact, frontline reports about the outbreak and prevention and control measures, expert interpretations and discussions of the source of the infection, frontline medical care during the outbreak, concern about the global outbreak, and the search for suspected cases. 19 The rapid, global spread of this dreaded disease resulted in the mass media becoming active in educating the community about the COVID-19 outbreak by conveying health information about it. Lu et al.'s research on Weibo posts during the pandemic showed that, since the beginning of the outbreak, the discussions mainly consisted of personal reflections, opinions, updates, and appeals. 20 The largest increase in the discussions was a simultaneous spike in criticism and support directed at the Chinese government when Wuhan was placed in lockdown on January 23. Criticism was directed at the government for its lack of action, incompetence, and wrongdoing, particularly in terms of censoring information related to public welfare. Expressions of support were directed at the government's proactive actions and positive results. As the crisis developed, the public focuses changed, with critics focusing on the government's shortcomings and supporters on its actions. In generally, in COVID-19, social media provides public spaces for self-dissociation, public communication, emotionally driven, meaningful connections, and identity construction, thus contributing to building solidarity and a sense of belonging, and facilitating the psychological reconstruction of people who were vulnerable to potential mental health crises.
Public sentiment about information related to COVID-19
Analyses of public sentiment during disease outbreaks provide insightful information to inform appropriate public health responses. Wang et al. argued that Weibo posts expressing negative emotions were valuable for analyzing public concern. 21 Their study revealed that people were concerned about three aspects of COVID-19: the origin of COVID-19 symptoms, production activities, and public health control. Luo et al. aimed to understand the content published by the government and to identify how the citizens’ engagement related to the type of content and emotional valence. 22 By analyzing more than 45 million Weibo posts from December 1, 2019 to April 30, 2020, Shi et al. found that the outbreak and rapid spread of COVID-19 triggered a sharp increase in fear within a relatively short period. 23 This phenomenon was particularly evident in Wuhan and its surrounding areas, with central cities responding more strongly to COVID-19 than their neighboring cities did. The topics, corresponding emotions, and analyses of the conclusions of posts can provide auxiliary reference materials for the monitoring of online public opinion during major public health events. Zhao et al. employed the “hot search list function” on Sina Weibo, and used the content mining method to analyze the word segmentation, word frequency, and emotions in the collected texts, and constructed a corpus consisting of public opinions that were expressed on social networking sites. 24 A more targeted analysis focused on the Chinese public's reactions to COVID-19 in the early stage of the epidemic, and described the different stages of the public's attention to topics pertaining to COVID-19. The keywords in the hot topics were slightly different in each stage, and the emotional tendency changed from negative to neutral. The research of Enander et al. shows that public online expression can highlight the motivation of public participatory of crisis. 25 Negative emotions decreased overall, while positive emotions increased overall. Therefore, social media, such as Sina Weibo, can be used to measure the public's concern about public health emergencies and changes in the citizens’ moods.
How public health reconstructed the public's daily life during COVID-19
Wu et al. provided the empirical evidence and a conceptual framework with eight dimensions (physical symptoms, anxiety, trauma, economic loss, place-based identity, self-stigma, health self-interventions, and changing lifestyles) for understanding the physiological, psychological, socioeconomic effects, and aspects of health behavior pertaining to people's daily lives during COVID-19. 26 The results suggested that local and global governments, in addition to providing comprehensive health care, should focus on the impact that people experience on their lives by providing social and digital infrastructures to alleviate the disruption and trauma that citizens encounter in their daily lives during public health crises. Ren et al. investigated the prevalence and determinants of depression and anxiety amongst the general population in the context of COVID-19 in China. 27 The results showed that the general population's levels of depression and anxiety were closely related to the COVID-19. Researchers have investigated fear, anxiety, trauma, social problems, and health-related consequences following the COVID-19 epidemic.28,29 Pedraza et al. explored COVID-19-related determinants of life dissatisfaction and feelings of anxiety using data collected from March 23 to April 30, 2020, in 25 advanced and developing countries on four continents. 30 The decrease in life satisfaction was likely not only due to exposure to negative emotions on a daily basis, but also due to hopelessness, fear, and the avoidance of social interactions. Blasco-Belled et al. 31 used the example of South Africa to highlight the privileged nature of the ability to transform one's life in response to COVID-19 and argued that the virus both highlighted and exacerbated existing inequalities in access to infrastructures.
The COVID-19 pandemic has had direct and indirect effects on the global population. This effect was not only a disruption to people's daily lives, but also raised issues related to physical and mental health. The occurrence of such phenomena in human societies is very different from the direct impact of a pandemic on human health, but are the result of a pandemic that lasted for three years. 窗体顶端
Method
In this study, we used a two-step quantitative method: (1) the thematic analysis and (2) sentiment analysis to analyze a unique data set and integrate findings from different methods to explore the major themes and public sentiments appeared in Weibo content during the 2022 Shanghai public health event.
The rationality of case selection
The 2022 Shanghai public health event as a case study
A new wave of COVID-19 outbreaks in the city of Shanghai began on February 28, 2022. On February 28, a 56-year-old vaccinated woman arrived at Tongji Hospital with a fever and was confirmed positive on March 1, thus becoming patient zero in this Shanghai outbreak. Various types of lockdowns were implemented across the city, and people were at great risk of being prevented from carrying out their normal activities their lives and jobs. The Shanghai authorities had previously adopted a more relaxed approach to the pandemic compared to other Chinese cities. However, by March 27, just before Shanghai started area-separated testing, a total of 16,013 people had tested positive for COVID-19. Shanghai decided to adopt area-separated and batch-separated control (分区分批防控) starting on March 28. Shanghai officials said that Shanghai had needed to follow the static management (静态管理) policy since April 22. On May 6, multiple areas (including prevention areas) were put into a so-called “silent period” (静默期), with all entries and exits, including shipments of deliveries, being banned. The stated rationale was that some positive cases appeared in prevention areas, and it was necessary to make a “final assault” (攻坚) on the number of cases. On May 17, another wave of “silencing” began, this time spanning numerous entire districts, again under the name of a final assault. On May 6, when Shanghai was entering its second month of lockdown, city officials said that cases had been declining since April 22 and that the outbreak was under control. From mid-May, the city began to gradually relax its control measures in response to pressure, and resumed work and production from June 1; normal production and living conditions began to be reinstated. In late June, takeaways from restaurants and haircuts were still strictly controlled and the economy had not yet recovered. The severity and consequences of this outbreak sounded the alarm for the prevention of the epidemic in every city in China. Therefore, we chose the 2022 Shanghai COVID-19 outbreak as the case study to analyze the themes pertaining to public health events as expressed on Weibo.
Accordingly, we conducted a thematic and sentiment analysis of 72,084 public posts on social media related to the 2022 Shanghai COVID-19 outbreak. We employed a research framework based on the latent Dirichlet allocation (LDA) model and the sentiment scores in the Baidu CNSENTI model, one package of python for data analysis. In this study, we sought to answer two main research questions (RQs): RQ1: What major themes appeared in Weibo content during the 2022 Shanghai public health event? RQ2: How were public sentiments expressed during the lockdown following the 2022 Shanghai COVID-19 outbreak?
Significance of choosing Sina Weibo as data source
In this study, we focus on data from Sina Weibo since the characteristics of the platform and users, providing a general space for exploring the themes and sentiment of the public during the public health event.
Sina Weibo, a popular social media platform in China, predominantly attracts young users, with a significant portion under the age of 30 and a higher concentration in urban areas. These users exhibit diverse interaction behaviors such as liking, sharing, and commenting on posts, and actively engage with trending topics and public discussions. Content on Sina Weibo is largely user-generated, encompassing personal updates to professional and commercial posts, with users consuming a wide range of media including news articles, videos, and multimedia posts. Technologically, most users access Sina Weibo through mobile devices, highlighting their engagement with mobile technology. The integration with other social media platforms and applications further enhances user experience and engagement.
The variety in user-generated content, ranging from personal updates to professional posts, means that topic analysis can uncover a wide range of subjects being discussed. 32 Young, urban users contribute to a diverse range of topics, necessitating categorization of content into distinct themes such as entertainment, technology, personal life, and professional interests. 21 Active engagement with multimedia posts, including videos and news articles, means that topic analysis will often need to handle diverse content types, and sentiment analysis might need to account for the emotional impact of different media formats. 33
The high level of mobile engagement implies that users may post more spontaneously, leading to real-time topic analysis capturing immediate trends and discussions. 33 Integration with other social media platforms can result in cross-platform influences on topics, necessitating a broader context for accurate analysis. Trends in topics can shift rapidly due to the dynamic nature of interactions on Sina Weibo, making continuous monitoring essential to stay updated on emerging topics. 29 The integration with other social media means that topic analysis should consider the broader social media ecosystem to understand the full context of discussions on Sina Weibo. 32 By understanding these user characteristics, researchers can fine-tune the topic analysis approaches to more accurately reflect the unique dynamics of Sina Weibo's user base. Therefore, the characteristics of Sina Weibo users play a crucial role in promoting the thematic analysis.
The predominance of young users often results in trends and topics aligned with youth interests and concerns, leading to a sentiment that reveals higher degrees of emotional expression, enthusiasm, or volatility typical of younger demographics. 33 Urban users might focus on topics related to city life, modern challenges, and technological advancements, contributing to a sentiment that could be more progressive or forward-looking. 30 Frequent interaction behaviors, such as liking, sharing, and commenting, amplify the reach and impact of posts, potentially skewing sentiment analysis towards more popular or viral topics. 34 Additionally, active engagement with trending topics and public discussions means sentiment analysis will often capture the immediate public reactions and opinions. 33
Since user posts can reflect subjective well-being and emotional states, sentiment analysis can uncover prevalent moods and psychological conditions within the user base. For instance, during stressful events, a higher prevalence of negative sentiment may be observed. 33 The use of the platform to build and maintain social networks means sentiment analysis might frequently capture sentiments related to social interactions, relationships, and community-building efforts. 21 Given the emotional expressiveness of a younger demographic, sentiment analysis tools should be sensitive to nuances in language, including slang and emojis commonly used by younger users. 33 The real-time nature of posts from mobile devices requires sentiment analysis to process large volumes of data quickly to capture current moods and opinions accurately. 21 Thus, data from Weibo will show the sentiment of the public during the public health event.
Data collection
This study is part of a sub-theme of a large research project that belongs to a research institute in Shenzhen, China, and therefore the whole process of this study was conducted in Shenzhen, China.
This study was conducted on the one-year anniversary of the 2022 Shanghai public health event. In March 2023, the researchers of this study saw a large number of users on Weibo posting to commemorate the one-year anniversary of the event and express their thoughts about their experiences during the event, which aroused the researchers’ interest in retracing what the focus of the public opinion was and how the public's sentiment had changed. This study was then officially started in March 2023, and in December 2023 the first draft of the manuscript was completed.
As we focused on Weibo, we chose posts on Sina Weibo as the essential data source. How this research addressed the proposed research questions was determined by the data collection 35 ; specifically, the type, depth, and scope of the data influenced the quality of the research, and the richness, comprehensiveness, representativeness, and relevance of the collected data were crucial for the quality of the outcome. We crawled data from the time that Shanghai was entering its lockdown (March 28) to the time of the announcement of the lifting of the lockdown (June 1). We collected the data via a keyword strategy using a crawler that was written in Python language. We chose two main keyword phrases for the search to collect the closely related posts “Shanghai (shi) Yiqing” (上海(市)疫情, Shanghai outbreak) and “Shanghai yi qing fang kong” (上海(市)疫情防控, Epidemic prevention and control in Shanghai). The corresponding content included the user ID, publishing time, the content, the URL, and the location if the user provided it when posting. In total, the entire quantitative dataset contained 72,084 posts.
According to Weibo's verification mechanism, user types can be divided into individuals and organizations. Specifically, the individual user type includes unverified users (ordinary users) and verified individual users (celebrities, actors, and professional people); the organizational user type includes governmental sectors, companies, the media, universities, and other organizations. Since this study focused on the public's perspective, we differentiated between individuals’ posts and organizations’ posts. Finally, we selected posts from unverified users to explore the themes and the public sentiment regarding the 2022 Shanghai COVID-19 outbreak. Subsequently, duplications and irrelevant information were removed via information cleansing. After the information cleansing of the data, a total of 29,625 posts were selected for this study.
Characteristic of participated users
Gender distribution
The gender distribution of participated Weibo users (Figure 1) reveals a slightly higher representation of females compared to males. Specifically, out of the total participants, females comprised 52.7% (15,605 individuals) and males accounted for 42.2% (12,488 individuals). Not Available category included 5.2% (1533 individuals), indicating either non-disclosure or unavailability of gender information. This data suggests that female users were more active in participating or were more represented during the 2022 Shanghai COVID-19 outbreak on Weibo. The slight dominance of female participants may reflect broader trends in social media usage where females tend to engage more in online discussions on Weibo (Table 1).

Gender distribution of participated users.
Basic information about the database.
Location distribution
The location data (Table 2) provides a detailed breakdown of the geographic distribution of participated users: Shanghai had the highest representation with 35.1% (10,421 individuals), reflecting the focus of the outbreak in this region. Jiangsu (6.2%, 1827 individuals) and Beijing (5.5%, 1624 individuals) also had notable participation, indicating significant attention from nearby and prominent provinces. A diverse range of other provinces contributed smaller percentages, such as Guangdong (4.8%), Zhejiang (4.6%), and Shandong (2.9%). Overseas participants accounted for 9.5% (2837 individuals), showing international interest or engagement. Not Available data constituted 5.2% (1533 individuals), similar to the gender distribution, reflecting either non-disclosure or unavailability of location information.
Location distribution of participated users.
The geographic distribution highlights that the majority of participants were from regions directly affected by the outbreak or from major urban areas. The significant proportion of users from Shanghai, given that the crisis took place in Shanghai.
Data preparation
For the data preparation, we first manually deleted the data that contained invalid Weibo content, and then removed URLs, numbers, and special characters from the content. Finally, we used the Jieba database to divide the content of each post into words and deleted some meaningless words using the Jieba word list; the Jieba word list can be added to manually or words can be deleted. The Chinese text segmentation in Jieba was considered to be the best Chinese word segmentation module in Python. Our data were obtained from Weibo, and the data content was in Chinese; therefore, this study used a database of Chinese words for the data analysis.
Latent Dirichlet allocation (LDA) and the thematic analysis
The LDA model was first used to extract themes from our dataset of the sampled Weibo posts. As a form of digital analysis, LDA is the most common method for constructing a type of probability model for identifying the themes that are present in content. 36 Using LDA for theme modelling allows for the automatic identification of basic themes in a large amount of unstructured text data. We were able to rapidly identify themes, such as aspects of public concern during public health events, in a large amount of content (Weibo posts). In this study, we adopted LDA to treat each post as a combination of topics and each topic as a combination of words. LDA is used to predict the topic distribution in a document; it can determine the topic in each document in the document set in the form of a probability distribution. Thus, by analyzing documents to extract their topic distribution, subject clustering or text classification can be applied based on the topic distribution. Previous research has shown that LDA is widely used for identifying themes in vast amounts of content. 37 This study is one of the first to apply the LDA model to extract themes based on a dataset extracted from Sina Weibo during COVID-19. The dataset that we used for the analysis included a numerous unstructured text. Accordingly, we employed the LDA model to divide the content in our dataset according to themes, such as aspects of public concern during public health events, via subject clustering according to the topics we identified.
To determine the optimal number of topics, we evaluated models with 10, 15, 20, 25, and 30 topics by calculating the perplexity of a held-out set of documents. Perplexity measures how well a probability model predicts a sample, with lower perplexity indicating a better fit. After comparing the perplexity scores, we selected the model with 20 topics as it had the lowest perplexity (5.082682), suggesting it was the best fit for our data.
To explain our choice of the optimal number of topics more thoroughly, we performed the following steps:
Model Training: We trained multiple LDA models with different numbers of topics (10, 15, 20, 25, and 30) using the Gensim package. Perplexity Calculation: For each model, we calculated the perplexity on a held-out set of documents. Perplexity is a standard measure for evaluating the performance of probabilistic models, with lower values indicating better predictive performance. Coherence Scores: In addition to perplexity, we calculated coherence scores for each model. Coherence measures the semantic similarity of words within a topic, providing a complementary metric to perplexity. Model Selection: Based on the comparison of perplexity and coherence scores, we selected the 20-topic model as it provided the best trade-off between model fit and interpretability.
We used Python programs to construct LDA to identify themes in the dataset for this study by using relevant functions in the Gensim package. Finally, the results of the LDA model were visualized using the pyLDAvis package to display the distribution of topics and the top 30 most relevant terms in each topic with their respective weights.
Sentiment analysis
We conducted a sentiment analysis to further investigate the public's attitudes towards the 2022 Shanghai COVID-19 outbreak as expressed in Weibo posts. Sentiment in textual data mainly refers to the sentiments that are hidden in a text. Sentiment analysis has been widely used in numerous opinion mining studies because valuable information can be found if the sentiment included in a text is analyzed. 38 Dictionary-based sentiment tagging analyses the words in a sentence and obtains a total score by adding the scores for each word using a sentiment dictionary. 39 The next step is adopting the sentiment analysis tool to perform the analysis.
We used Baidu 1 CNSenti to perform the sentiment analysis. Baidu CNSenti is a sentiment analysis tool developed by Baidu. It is part of Baidu's Natural Language Processing (NLP) capabilities and is designed to analyze and determine the sentiment of text data. This tool can process large volumes of text data to identify positive, negative, or neutral sentiments, which is useful in various applications such as market research, customer feedback analysis, and social media monitoring. In this study, we selected Baidu CNSenti since the data collected from Weibo presented as Chinese. Moreover, Baidu CNSenti can show seven specific scales of emotion (Good, Happy, Mourn, Angry, Fear, Evil, Shock) instead of limited presenting positive, negative, or neutral sentiment. The scales of specific sentiment were created by Ekman 40 and enriched by Xu et al. 41 CNSenti has been widely used in analyzing the Chinese text, including the text style transfer in online entertainment and social media, 42 the relationship between Internet nationalism and Internet news, user engagement and emotion in China. 43 Therefore, Baidu CNSenti is appropriate for this study as a sentiment analysis tool for analyzing Chinese text.
We first processed the data by using a dictionary of positive words, a dictionary
2
of negative words, and degree words to match the sentiment scores for each form of content. This step will present the general trend of sentiment (positive, negative, and neutral sentiment) of the database. The result of the calculated value is positive if it is greater than 0, negative if it is less than 0, and neutral if it is equal to 0. The process of calculating the sentiment score for a post consists of seven main steps:
find the emotion word in the clause, record whether it is positive or negative, and note its position; find the degree word before the sentiment word and stop searching when you find it. Assign a weight to the degree word and multiply it by the sentiment value; look for negatives before the emotive word and find all the negatives, multiplying by −1 if the number is odd and by 1 if it is even; look for important punctuation such as exclamation marks and greetings that match; calculate the sentiment values (positive and negative) for all the clauses in a comment and record them in an array (list); calculate the mean positive sentiment scores and the mean negative sentiment scores for each clause in each comment, then compare the sum of positive sentiments and the sum of negative sentiments; and plot the distribution of the sentiment scores for each post according to the timing of each item in the content.
After processing the data, we used CNSenti to conduct sentiment analysis to explore the distribution of the specific scales of sentiment of the public during the public health event. CNSenti is a Python open-source project and therefore this study uses Python. The sentiment analysis begins with pre-training on a randomly sampled dataset (8888 posts, 30% of the database), where each text example is labelled with the appropriate sentiment (Table 3) as a means of training to identify patterns and features associated with each sentiment category. Then, we split a validation set (2963 posts, 10% of the database) to adjust the hyperparameters and evaluate the model during training to improve the accuracy of the results. Subsequently, sentiment prediction is performed on a wider range of data from the database using the trained model to classify the text into one of the predefined sentiment categories based on the learnt patterns. We validate the model by performing human evaluations to qualitatively assess predictions. The final step is to interpret and visualize the results of line chart showing the distribution of sentiments across the database.
Example of sentiment scores for a sentence.
To obtain a deeper understanding of public sentiment, positive and negative semantic networks were constructed to identify the important roles and characteristics in the respective networks, and the networks were visualized and analyzed statistically.
The average accuracy of sentiment analysis models generally ranges from 70% to 90%, with more advanced and tailored models achieving higher accuracies.44,45 Analyzing sentiment on social media platforms can be more challenging due to informal language, slang, and brevity and accuracies in this domain might range from 75% to 85%. 46 One of the more interpretable and available textual methods is the lexical approach, in which the algorithmic logic is clear; this means that how good the lexicon is determines how good the sentiment analysis is. The lack of a lexicon limits the calculation of sentiment in a text. Most researchers currently use adjective sentiment dictionaries. CNSenti uses the Dalian University of Technology Sentiment Ontology Library, and Hownet and CNKI. The advantage is that CNSenti provides a relatively complete lexicon. The distribution of sentiment scores in the seven categories for each post is plotted according to the time of each content.
Research design
To extract related information and common indexes to represent the patterns of public attitude, attention, and activities on Weibo during the 2022 Shanghai COVID-19 outbreak, we adopted a research framework that combined data preparation, data mining, and analyses based on the LDA model and statistical analyses (Figure 2).

Research framework.
Findings
Major themes in Sina Weibo texts
We used the LDA model to train the Weibo text corpus and to identify the themes in the public posts during the 2022 Shanghai COVID-19 outbreak (Table 4). We selected words with greater descriptive value of the theme as the keywords that could better describe the content of the theme. After screening, the keywords for each theme were confirmed, as shown in Table 5. The bubble distribution depicts the different themes; the top 30 feature words within the themes are located on the right. The proximity of the topics to each other is expressed by the distance of their location from each other. The bubble distance uses the JSD distance, which can be thought of as the degree of difference between topics, and the fact that the bubbles overlap indicates that there is a crossover of feature words in the two topics.
Themes and keywords.
The LDA model was used to classify the posts according to 20 categories of topics; the distribution in the inter-topic distance graph shows that there were about 15 categories of topics that were close to each other and had a high degree of overlap. Further classifications were based on the keywords in each topic.
The inter-topic distance map (Figure 3) generated by the LDA model provides a visual representation of the relationships between the identified topics. The map reveals clusters of closely related themes, indicating areas where public concerns overlap. For example, the proximity of themes related to life during outbreak and Policy of prevent outbreak suggests a strong interconnection between these issues in public discourse. The overlapping bubbles in the map highlight the complexity of the themes and the multifaceted nature of the discussions on Weibo during the outbreak.

The resulting theme model (produced using pyLDAvis).
During the 2022 Shanghai COVID-19 outbreak, Weibo users paid attention to the themes of disadvantaged groups, life during the outbreak, policies for outbreak prevention and expressions of sentiment.
In particular, Theme 1 included the most frequent topics, and the theme described the general concerns of Weibo users during the 2022 Shanghai COVID-19 outbreak. The first topic identified by the LDA model prominently features terms such as “elderly,” “subsidies,” and “children.” This theme reflects widespread concern for vulnerable populations, particularly the elderly, children, and disabled, who were disproportionately affected by the quarantine measures. An important part of China's prevention and control approach to COVID-19 was the home quarantine policy, whereby a larger quarantine area was defined based on the confirmed extent of infection, and within which households were quarantined on a household basis. Once the quarantine area was defined, the inhabitants of each household were not allowed or had extremely limited permission to leave their homes. With such a quarantine policy was only in place on certain days, many families were unable to prepare adequately for their household supplies and food. This meant that many families, particularly elderly people living alone, children, and people with disabilities lacked the ability to cope with the unexpected segregation policy. When examining some specific Weibo posts, we found that many people expressed concerns and anger about how such vulnerable groups were coping during the 2022 Shanghai outbreak, particularly with regard to the brutal attitudes and actions of the epidemic prevention officers, which resulted in vulnerable groups being unable to meet their basic living needs. In addition, a number of people whose family members included such groups confided that they were unable to understand or address the livelihood issues that their families experienced as a result of the quarantine policy due to separation and the digital divide. Many users expressed concern and anger over the insufficient support provided to these individuals, particularly criticizing the harsh enforcement of quarantine policies that left many vulnerable people in distress. The discourse emphasizes the need for improved social support systems and greater consideration of these populations in public health strategies.
Theme 2 describes how the public worked and lived during this municipal level outbreak, with key terms including “renting,” “starting a business,” “buying groceries,” “commuting,” “restricting traffic,” “hiring.” The public's concerns included renting apartments, buying groceries, limited transport, and commuting, while the work concerns included starting a business and hiring staff. Posts in this theme discussed the difficulties in securing essential goods, navigating restricted mobility, and coping with economic challenges, including job losses and business closures. The lockdown that was triggered by the 2022 Shanghai COVID-19 outbreak significantly affected people's basic work and life, with some members of the public having their daily life affected and urban transport disrupted. Work-related keywords showed that some people may have lost their jobs during the outbreak and had to start their own businesses or apply for other jobs. Hence, work and living conditions constituted another theme of major concern. The discussions within this theme reflect the widespread impact of the lockdown on the daily routines and economic stability of Shanghai residents. Many users shared their struggles to adapt to the rapidly changing circumstances, highlighting the precarious balance between maintaining public health and sustaining livelihoods.
Theme 3 involves a critical evaluation of the epidemic prevention measures implemented during the outbreak. The LDA model identified terms like “ expectation,” “optimal,” “maintenance,” “initiatives,” “balance,” “monitoring.” It showed that the public was also concerned about and evaluated the impact of the measures taken during the epidemic, and whether some of the initiatives met the best expectations. Posts reflected a range of opinions, with some users supporting the stringent measures as necessary for controlling the virus, while others questioned their effectiveness and fairness. The discussions reveal a tension between compliance with public health directives and skepticism about the measures’ proportionality and impact.
Theme 4 identified is the emotional response to the outbreak and its associated policies. The LDA model highlighted keywords such as “missing,” “farewell,” “support,” “rescue,” “escape.” The related posts reflected the public's emotions during the 2022 Shanghai COVID-19 outbreak. Posts in this theme conveyed the emotional toll of the lockdown, including feelings of isolation, anxiety, and frustration. The implementation of the quarantine policy may have resulted in the public not being able to see their friends and families, which triggered feelings of loneliness. Support and treatment were also sources of public support for the medical and control teams on the frontline of the outbreak. Some posts expressed solidarity and support for frontline workers, reflecting the complex emotional landscape during the outbreak. Another section of the public expressed a desire to flee Shanghai, which was no longer an ideal city in which to live, thus demonstrating the disappointment, sadness, and helplessness of local residents during the 2022 Shanghai COVID-19 outbreak. This theme underscores the psychological impact of the prolonged lockdown on the population, with many users grappling with the uncertainty and stress caused by the crisis. The discussions within this theme provide insight into the collective emotional experience of the Shanghai public during the outbreak.
The above main themes echoed to the characteristics of Weibo's users, i.e., the majority of users posting are young, who are massively and simultaneously concerned about this public health event, demonstrating the wide range of issues that were of public concern to this stage, and helping to capture immediate trends and discussions and obtaining the public reaction and opinion.
The major themes identified by the LDA model provide a detailed and nuanced understanding of the public discourse on Weibo during the 2022 Shanghai COVID-19 outbreak. These themes reflect the diverse concerns of users, ranging from the welfare of vulnerable groups to the broader societal impact of the lockdown and the emotional responses to the crisis. The analysis of these themes offers valuable insights into the key issues that shaped public opinion and online discussions during this critical period.
Public sentiment during the 2022 Shanghai COVID-19 outbreak
In this section, we scored the public posts using a sentiment dictionary, a link to which is provided in the methodology section. We calculated the mean positive and the mean negative sentiment scores for each clause in each post, and then compared the total positive sentiments to the total negative sentiments, with the larger number being the resulting sentiment tendency. The distribution of the sentiment scores for each post was then plotted according to the timing of each post's content. A score of 0 was the cut-off point, with positive numbers indicating positive sentiments and negative numbers indicating negative sentiments.
According to the results in Figure 4, the posts posted by unauthenticated users, the ordinary people, during the Shanghai outbreak showed an overall negative sentiment, as there were more posts expressing negative sentiments and an even larger number of extreme sentiment scores. This is consistent with the results that were presented in the previous section for the themes on which the public focused and negative sentiments such as pessimism, anger, and sadness indicated by the keywords. In particular, the negative sentiment scores peaked between April and May, which was the period of strictest urban lockdown and control in Shanghai. This also shows that the content that the public posted on Weibo corresponded to the development of the epidemic in real life and to epidemic prevention measures. The prevailing emotional and psychological conditions in the user base reflect the close relationship between the characteristics of Weibo users and sentiment analysis, that is, during stressful events, a higher incidence of negative sentiment can be observed. Thus, government departments could use the content posted by the public on Weibo to judge public opinion and to conduct public opinion surveys. Moreover, the content of public postings did not show a significant increase in significantly positive sentiment posts over time. Overall, during the current epidemic in Shanghai, the content of public postings was dominated by negative sentiments, with extreme sentiments appearing at isolated times, thus reflecting the public's negative attitude towards the epidemic and responses to the epidemic including epidemic prevention and home quarantine.

Sentiment score for public posts during the 2022 Shanghai COVID-19 outbreak.
In addition to the above scoring of sentiment words, we used the Baidu CNSENTI model to further analyze the distribution of the seven sentiments in content of our collected posts. Figure 4 shows the distribution of the seven types of emotions over time during this outbreak. In general, the positive sentiments in posts tended to increase over time, particularly the trends of “good” and “happy” in comparison to the distribution of other sentiments, during the peak in late May to early June, which may cause by the decreased cases in the outbreak. Such a trend and the relaxation of strict epidemic prevention policies by the epidemiological authorities may have given the public a signal that home quarantine may have been coming to an end (Figure 5).

CNSENTI model of the sentiment results for public posts during the 2022 Shanghai COVID-19 outbreak.
Nevertheless, the level of negative emotions, such as sorrow, anger, and fear, was somewhat higher in public posts, and did not tend to decrease significantly over time. For example, the distribution of fear exhibited several small peaks in early April, just after starting home quarantine policy, which was probably due to public fears about the development of the epidemic on one hand, and the impact of home quarantine on their lives and work on the other. Overall, the public's emotional keywords during the 2022 Shanghai COVID-19 outbreak were negative, and this negative sentiment did not change significantly towards a positive trend over time.
Analysis of theme-sentiment correlations
To further facilitate the understanding of public participation during the 2022 Shanghai Covid-19 outbreak, the heatmap (Figure 6) generated in this study visually represents the correlations between the identified themes and various sentiment scores derived from a sentiment analysis of social media posts. The themes analyzed include

(a) Correlation between themes and sentiment scores. (b) Correlation between specific terms of themes and sentiment scores.
Theme 1
Theme 2
Theme 3
Theme 4
The heatmap and associated sentiment scores provide a detailed understanding of public sentiment across different themes during the 2022 Shanghai COVID-19 outbreak. Themes associated with policy and life disruptions are more likely to evoke strong negative emotions such as fear and anger, as indicated by the high sentiment scores in these categories. In contrast, themes related to disadvantaged groups and emotional expression are more nuanced, eliciting a mix of positive and negative sentiments. This analysis highlights the importance of addressing public concerns with sensitivity and the need for balanced communication strategies that acknowledge both the difficulties and the resilience of the population.
Discussion
This study explored the themes and sentiments in the public's social media use during the 2022 Shanghai COVID-19 outbreak by analyzing 29,625 Weibo posts from the public to observe public attitudes and opinions during this public health event. The findings show that the public was more inclined to express concerns about the impact of the outbreak and of outbreak containment measures on their personal lives on social media and exhibited negative attitudes and opinions rather than discussing the impact of COVID-19 on human life and health, suggesting that the impact of the outbreak on people's daily lives was greater than was the impact on their livelihoods and health risks.
Implications to practices
The findings reveal the relationship between public's use of social media and the governmental information release under the circumstance of health crisis. This provides some practical references especially for the governments in different administrative levels in China, to advance the efficiency of crisis communication with the public. On one hand, this study demonstrated the public's use of social media during public health events, particularly with regard to topics of public concern. These findings can assist government agencies and public health emergency response departments to access public views, to understand public opinion, and to address public needs in a timely and effective manner at all stages of COVID-19.
On the other hand, the mental and physical effects of the three-year COVID-19 pandemic 11,22,23 was significant. Thus, an analysis of posts about COVID-19 on social media can also help health care providers and mental health services to provide timely and effective medical help and support to the public, and can mitigate the long-term effects of public health events. In particular, the inability of some vulnerable groups, such as the elderly and children, to cope with public health events on their own also requires governments and other public health authorities to be attentive to the needs of such populations and to use the information from the grassroots grid management established by such events to plan and prepare in advance, rather than simply responding to a pandemic itself.
Implications to literature
The theoretical implications of these findings are multifaceted. Firstly, in the case of China, the findings underscore the importance of social media as a tool for gauging public sentiment and concerns during public health crises. Previous studies have extensively showed that social media is a robust tool to obtain public opinion and sentimental information in the context of disaster and emergence cases. 1 In this study, we not only observed the positive and negative feelings of public regarding to the COVID quarantine, but also demonstrated the trends of feelings and how it changed according to the quarantine regulations, echoing previous investigations into the side effects of COVID-19, and how the lockdown policy affects the public's daily life.3,4
Secondly, literature we reviewed previously showed that during the crisis, Chinese government has been trying to alter its single role from information disseminator to a participator, which participates and encourages the public online engagement. 12 Apart from harnessing the public data for governance, governments make responds to the sentiments of the public online along with emotional expression. Government agencies and public health emergency response departments can leverage social media data to understand public opinion and address public needs more effectively. This can enhance the timeliness and relevance of public health communications and interventions.
Thirdly, existing investigations on online expression highlight the motivation of public participatory of crisis, 25 whereas this study underscored that the degree that social issues discussed on social media is paralleled with the emotional flow of the public dynamically. Compared with western evidence, the public perception on health crisis in China is observed sentimental in Shanghai case rather than arguing for a change on regulations. For instance, data above proved that the negative feelings of vulnerable groups did not show a change for the better, which is a longitude observation during Shanghai crisis. Such negative sentiments were derived from their fear of daily life being disturbed. The same trend can be identified from unauthenticated users as well. The negative feelings of these two groups dominate their voices online instead of concerning the health issues generated by pandemic.
Conclusion
Using thematic research and sentiment analysis as the main research methods, this study found that, during the outbreak, the public tended to focus on the responses of vulnerable groups and the public's life and work during the lockdown, and displayed negative emotions and attitudes that did not change significantly to positive attitudes over time, indicating that the outbreak had a significant impact on the daily lives of the public and thus displayed a pessimistic attitude.
Our research also showed that, in 2022, in the third year of COVID-19, the public was far more concerned about the impact of the outbreak and preparedness measures on individuals’ working lives than they were about the threat to life posed by COVID-19, as supported by the four main topics of public concern presented in the findings. This suggests that, during this regional outbreak, the impact of the outbreak on people's lives was greater than was the impact on their livelihoods and health from the public's perspective. This is different from the initial public health prevention and control aims and objectives, from fear of death and illness to fear of isolation at home and the inability to live a normal life as a result of epidemic prevention measures, which is perhaps something that current epidemic prevention policies should consider. In responding to public health events in the future, emergency responders—government departments, public health authorities and other public sectors—will need to not only respond quickly to the onset of a pandemic, but should also take the structure of the public's daily life into account in their contingency plans to minimize the impact of public health events and response mechanisms on people's lives.
The sentiment analysis conducted in this study offers practical insights for authorities managing public health crises. This research revealed that the emotional online expressions of the public can influence the trajectory of decision-making among health authorities. The identification of specific emotions allows for targeted interventions. For example, high levels of anger may indicate dissatisfaction with current policies, prompting a review and potential revision of these measures. On the other hand, increasing trends in positive emotions, such as happiness, could reflect public approval of certain actions, suggesting that similar strategies should be continued or expanded. By continuously monitoring and analyzing public sentiment, authorities can proactively address emerging concerns, enhance communication, and ultimately improve public trust and compliance.
There are two limitations to be clarified. The first limitation of this study was the data collection process, as Weibo data set that was used in this case study was incomplete. The data set was collected using the Python crawler via the advanced search function on Sina Weibo, in which the search limit is only 50 pages. We used Sina Weibo's advanced search function to capture as much relevant data as possible. Although most of the data could be captured completely, there were still some missing data related to the research periods in this study. Data incompleteness had a limited impact on our thematic analysis, however, most of the trends could be observed in the results. Furthermore, as our research was based on Sina Weibo data, it may not adequately reflect the public opinion of vulnerable groups since many social media platforms are developing along with the progress of digital communication technology. How to combine online and offline data from multiple aspects and channels to accurately reflect the concerns of the public, particularly those of vulnerable groups, is a direction for future research.
The second limitation was the biases present in the data and the analysis. Biases arise in sentiment analyses when data fail to accurately represent overall sentiment distribution within a given population. As this study exclusively focuses on Weibo data, it becomes evident that user characteristics inherent to Weibo significantly impact the analytical outcomes. Imbalance within user group may result in favoritism towards majority sentiments at the expense of accurate prediction for minority sentiments; for instance, since most Weibo users belong to younger age groups, measuring sentiments among middle-aged or elderly individuals becomes challenging. Furthermore, user bias emerges during model training with highly active but non-representative users vis-à-vis a broader yet less active populace—limiting both applicability and accuracy across varied demographics.
Mitigating these biases through future research is pivotal for enhancing generalizability and equity within sentiment analysis models. Achieving improved generalizability necessitates leveraging diverse representative datasets alongside balanced training across demographic segments while integrating cultural sensitivity considerations into our approach. Continuous model updates with fresh data facilitate capturing evolving sentiments as well as trends; meanwhile implementing robust tools for identifying and rectifying biases further bolsters precision and relevance.
Footnotes
Acknowledgements
We would like to express our sincere gratitude to Dr Zhang for his invaluable assistance in proofreading and refining the language throughout this article. His meticulous attention to detail and insightful feedback have significantly enhanced the clarity and quality of our manuscript.
Contributorship
The first author and the second author provided the idea of the whole research and researched literature and conceived the study. The first author did the data analysis. The third author was involved in revision, proofreading, and data analysis. The first author and the second author wrote the first draft of the manuscript. All authors reviewed and edited the manuscript and approved the final version of the manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by National Office for Philosophy and Social Sciences, National Social Science Fund of China (Project ID: 20&ZD152).
