Abstract
Introduction
Any global public health emergency has the potential to go beyond medical boundaries to impact society and people on many other levels and interrupt normal life.1–3 The COVID-19 pandemic has been linked to an increase in psychosocial and mental health disruptions worldwide.4–7 With the emergence of COVID variants and prolonged pandemics, the mental health implications of the pandemic are receiving more attention. 8 According to the WHO, pandemic-related guidelines, such as social distancing, self-isolation, and quarantine9–11 may lead to loneliness and, in turn, serious psychosocial consequences1,2.12–17 Recent empirical research has also demonstrated an increasing number of cases of anxiety, panic, fear, suicidal behavior, and other mental disorders that result from social distancing, employment termination, loss of loved ones, and other emerging uncertainties.18–21 Despite the global expansion of the COVID crisis, the psychosocial problems which were caused by the pandemic were very varied and took different forms on different individuals. 22 Therefore, it is needed to approach this problem from an individualist approach and incorporate the heterogeneity of survivors.23,24
The pandemic has dramatically changed people’s lives. Particularly for those who have survived infection with SARS-CoV-2, life has been fundamentally different as they face a series of post-infection physical, psychological, and social challenges. 25 The severity and extent of COVID-19 can result in various psychological disorders, such as insomnia, depression, post-traumatic stress disorder, and obsessive-compulsive disorder. 26 Considering the increasing number of recoveries, long-term psychosocial and emotional sequelae are a growing concern. Therefore, understanding COVID-19 survivors’ emotions, mental health status, and the prevalence and severity of psychosocial challenges that they encounter is critical to developing improved metrics of mental health and strategies for psychosocial rehabilitation.
During quarantine, social media plays a significant role in everyday interpersonal communication, information sharing, and social support exchanges.27,28 Among the most popular platforms, Reddit configures a unique virtual community in which members can freely share their experiences and feelings regarding the pandemic with no limits (e.g., limited characters). Thus, it provides profound opportunities for researchers to examine COVID-related public opinion and sentiments. 29 The content generated from users’ interactions on social media platforms during a social crisis such as COVID is a trove of first-person experiences and an insightful resource for studying society’s reaction to crises.30,31 Using topic modeling and a sentiment lexicon approach 32 revealed COVID-19 symptoms and relevant sentiments by analyzing personal narratives posted on Reddit about members’ experiences during the first 14 days after infection. Likewise, Jelodar et al. 33 analyzed COVID-related discussions posted on Reddit in the early stage of the pandemic and revealed significant topics and sentiments as well as the polarity of comments. Kimiafar et al. 34 replicated similar study posts and comments on Reddit and obtained comparable findings. Their study yielded 12 general topics that were mostly discussed by Reddit users and shared similarities with the existing studies. Sarker and Ge 35 conducted computational mining on the content posted within the ‘r/covidlonghaulers’ subreddit community and identified long-COVID symptoms from members’ self-reports.
In another study, Zhang et al. 36 used the contents of two primary subreddits (i.e., /r/China flu and /r/coronavirus) to investigate the trajectories of users from the beginning of the pandemic until September 2020. In their study, they quantified the differences between users’ profiles and their language use as well as their migration between the two subreddits. Their findings demonstrated the patterns with which social media users of different backgrounds engage in online conversations. Their study also validated the importance of Reddit as a means of disseminating COVID-related information and facilitating conversations among people. A platform-wide study of popular posts and comments on Reddit regarding COVID from March 2020 until the official administration of vaccines displays a dynamic trend of social inquiries about COVID. The top 10 topics identified by this study comprise a range of concerns from conspiracy theories to inquiries about vaccines and people’s reluctance or enthusiasm to receive them. 37
Despite burgeoning COVID-related research, little scholarly attention has been devoted to COVID-19 survivors’ experiences, opinions, sentiments, and their corresponding mental health status. A thorough study on 15 large mental health support groups and 11 non-mental health forums on Reddit specifies the spike in conversations regarding mental health issues compared to the pre-COVID period. 31 In a narrower study, researchers examined the longitudinal physiological and psychological symptoms that were mentioned by COVID survivors on Reddit during the pre-vaccine period. They identified common symptoms reported by users and the time lags between contracting the virus and the emergence of symptoms. 38 However, no study has specifically focused on COVID survivors and the trajectory of their inquiries and the emotional state of users over an extended period before and after the administration of the COVID vaccine. The official administration of COVID vaccines marked an important milestone in combating the unprecedented pandemic. Beyond its immediate medical impact, vaccination has a significant impact on the emotional state of societies, particularly COVID survivors. Therefore, a longitudinal study of contextual and sentimental inquiries about the virus pre- and post-vaccination can provide invaluable insights, yet the literature lacks this study. Our present study, therefore, seeks to fill this research gap by examining COVID-19 survivors’ online interactions and the mental health status that their interactions reflect. Specifically, we ask the following research questions: 1) What are the major topics and sentiments that COVID-19 survivors demonstrate in their interactions within the virtual community on Reddit and 2) How do these topics and sentiments interact and evolve over time? Using data collected from the “r/COVID-19Positive” subreddit community, we integrated topic modeling and sentiment analysis methods to unpack meaningful themes, salient sentiments, as well as the evolving semantic patterns displayed by COVID-19 survivors’ online interactions. Findings from this study contribute to the scholarship on COVID-19 survivors’ mental health outcomes and provide insights that can serve to improve the design of mental health support and rehabilitation services for COVID-19 survivors. This is the first longitudinal study to assess the evolution of the textual and sentimental state of the social media content generated by COVID survivors in the pre- and post-vaccination periods. Our findings shed light on the social and emotional impacts of nationwide vaccination endeavors and changes in the social media behavior of Reddit users.
Materials and methods
Data collection pre-processing
The dataset used in this study was collected from the Reddit platform. The flexibility of users in posting lengthy conversations, engaging in different layers of a conversation thread, and the existence of specialized subreddits add to the values of Reddit and distinguishes it from most social media platforms. Reddit posts offer several advantages over traditional mental health clinical datasets. The immediate and public accessibility of data, the presence of historical data for comparisons of multiple time frames, and the anonymity of users provide an ecological setting in which users can freely express their concerns, thoughts, and opinions freely. 31 In this study, the data were drawn from the “r/COVID-19Positive” subreddit, created in March 2020 with more than 125,000 members. This is a specialized forum for people who have tested positive for COVID-19 to share their experiences, express their concerns, and interact with others with similar experiences. The data are collected using the official APIs of the platform and include a set of features related to the posts, comments, sub-comments, and users.
Our dataset includes more than 43,700 posts, comments, and sub-comments posted by more than 14,540 users over an 18-month period spanning April 2020 to September 2021. Contents were grouped based on their creation date (monthly basis), which allowed us to observe the evolving patterns of topics and their corresponding emotional states over time. The distribution of data across the specified 18-month period was not uniform, with May 2020 having the highest number of observations (3900 posts and comments) and May 2021 having the lowest observations (1493 posts and comments). Despite the unbalanced number of observations per unit of time, there were enough observations in each month (on average 2430 posts and comments) for the analysis to be carried out. While the geographic distribution of Reddit users is not disclosed, statistics indicate that the United States, the United Kingdom, Canada, and Australia form the majority of users; 3 therefore, the findings may not be generalizable to countries with different political regimes.
Topic modelling
To capture the major themes of discussions and important topics embedded in the conversation threads, we used a topic modeling technique known as latent Dirichlet allocation (LDA), introduced by Blei et al. 39 LDA is a probabilistic model that extracts cohesive clusters of content (topics) from a collection of documents and assigns membership probability to each content. LDA assumes that each corpus is composed of several clusters (topics), wherein each topic is a probability distribution over words. It can infer topics from a given content collection without any supervisory or prior knowledge. However, the quality of the outputs is affected by the input data, which emphasizes the importance of data cleaning and text preprocessing. Therefore, we applied a series of text preprocessing steps, including stop-word removal, stemming, lemmatizing, tokenization, and n-gram procedures using Python packages such as the Natural Language Toolkit (NLTK),40,41 lowercase transformation, and punctuation removal. Stop words are common words in every language, such as articles and prepositions, that do not add much information to the text. Stemming and lemmatization convert words into meaningful base forms. Tokenization is used to split phrases, sentences, and texts into smaller units, individual words, or terms.
The preprocessing steps were supported by Stanford’s NLTK Library. 42 We then ran the LDA model on the final content to find the optimal number of topics using the Topic Coherence (TC) metric. The TC score measures the degree of semantic similarity between high-scoring words in a topic. 43 These measurements help distinguish between topics that are semantically interpretable and those that are artifacts of statistical inferences. The interpretation of generated topics requires human judgment. Each topic commonly comprises multiple correlated and unique keywords, along with several repetitive words. The words with the highest membership scores are the main indicators of a topic’s specific theme, whereas repetitive keywords are indicators of the general theme of the corpora.
Sentiment analysis
While topic modeling techniques detect underlying themes in content, sentiment analysis extracts the polarity and emotional states embedded in discussions. The polarity of content captures the overall valence of the content’s emotional underpinnings, in addition to other dimensions of human feelings such as happiness or trust. We used the “Syuzhet” sentiment analysis toolkit to extract the polarity of the content, along with a set of eight major sentiments: anger, fear, anticipation, trust, surprise, sadness, joy, disgust, and overall. In this study, for the purpose of interpretability, we focus on three dominant sentiments: “Fear,” “Sadness,” and “Trust.”
After the topics were identified, we conducted a sentiment analysis of the content belonging to each topic for each month. Breaking up the sentiment analysis based on month-long time spans and mapping them to each category revealed how the emotional perceptions of users regarding the same topic evolved over time. It also allowed us to compare the dynamics of users’ emotional states across different topics and map them to extraneous factors and events, such as the administration of COVID vaccines.
By conducting sentiment analysis, every data instance (i.e., post or comment) receives a positive integer score for each sentiment as well as two separate scores for the positivity and negativity of the content. To provide a meaningful means of comparing the emotional state of the topics across different time periods and to control for the unbalanced number of instances within each month, we use the average sentiment per topic per month as an indicator of the emotional state of a particular topic in a given time unit. A higher average value of sentiment indicates that the intensity of the corresponding sentiment is higher. To operationalize the valence of topics, we calculated the overall valence of each conversation by deducting the score of the negative valence from that of the positive valence. The average valence of a topic for a given month was then calculated.
Results
Applying the topic modeling method, the coherence metric indicated that the optimal number of topics was seven (Figure 1). In other words, by clustering the conversations into seven classes, we achieved the highest degree of semantic similarity among the most relevant keywords for each topic. It is also worth mentioning that each instance (post or comment) can belong to multiple topics with different membership scores. That is, a post can simultaneously discuss more than one topic. However, for interpretability and to address the complexity of multiple memberships, we assigned each instance to one topic by selecting the cluster with the highest probability. Thus, every post or comment was allocated to only one topic. This setting improves the interpretability of the findings and lowers the inclusion of noisy observations in each topic, as the majority of content has high membership in only one topic. Coherence values of LDA models with different num of topics.
List of topics and their corresponding keywords.
Summary statistics of the data.
Figure 2 plots the relative frequency of topics within each month. As can be seen, the popularity of topics has a dynamic pattern some of which maps to different extraneous events throughout the COVID-19 era. Some of the easily discernible patterns were as follows: (1) Decreasing trend in conversations regarding general symptoms of the virus (Topic 1). As time passed, the relative frequency of conversation about general symptoms such as cough, headache, and fever declined. (2) Bell-shaped trend of conversations about “loss of senses” (Topic 2). This topic demonstrated an increasing pattern by attracting a larger percentage of monthly conversations, followed by a decreasing trend starting May 2021. (3) Conversations revolving around pneumonia and respiratory conditions showed an oscillating pattern that repeated three times in May and December 2020 and May 2021. (4) “Post recovery/vaccination” related conversations demonstrated a steady pattern across 2020 but from February 2021, gained popularity and started occupying a larger share of conversations. (5) Increasing pattern in conversations regarding “COVID tests” starting July 2020 followed by a drop in March 2021 and another jump, starting August 2021. (6) Finally, a cyclic pattern in conversations concerning users’ “general inquiries, personal experiences” began as one of the most frequently discussed topics and gradually decreased over time. The second surge of related discussions started in February 2021. Distribution of the topics over time. (a) Distribution of topics per month. (b) “General symptoms.” (c) “Loss of senses.” (d) “Pneumonia/respiratory conditions.” (e) “Post recovery/vaccination.” (f) “COVID tests.” (g) “General inquiries/personal experiences.”
Sentiments
First, we present the polarity of conversations across different categories over time. Figure 3 plots the overall valence of conversations for individual topics. The results indicate that the conversations about “general symptoms,” “pneumonia and respiratory conditions,” “post recovery and vaccination,” and “COVID tests” were dominated by negative sentiments. On the other hand, the contents related to “loss of senses” and “general inquiries/personal experiences” demonstrate mixed trends with positive emotional states at certain periods and negative valence at others. Valence of the topics over time.
Contents revolving around the “General Symptoms” had the most negative polarity compared with other topics. This topic experienced a relatively constant and highly negative valence from June 2020–December to 2020, followed by an upward trend in January 2021. This timeframe is consistent with the beginning of widespread COVID-19 cases across the U.S. until the official administration of vaccines. The negativity of this topic’s conversations increased following the official announcement by the Texas Southwestern Medical Center about the first two cases of the Delta variant in May 2021.
“Loss of senses,” as one of the most salient, unique, and initially unknown symptoms of COVID-19, showed a relatively positive valence at the begging of the pandemic. Soon after, it fell and stayed in the negative regime until late 2020. With nationwide administration of the vaccine, it returned to the positive valence regime and remained as such, except for May 2021 (emergence of Delta variant). The stagnancy of this topic with respect to the widespread delta variant may be related to the fact that loss of senses is not among the main symptoms of the delta variant, 4 unlike the original variant.
Conversations related to “respiratory conditions and pneumonia” consistently had negative polarity, with a peak in April 2021 approaching the neutral regime. However, with the emergence of the delta variant, a steep decreasing trend was noted, likely underscored by respiratory symptoms associated with new mutations. The fourth topic, focusing on post-recovery and vaccination conversations, exhibited an oscillating trend in late 2020. By the start of nationwide vaccination efforts, the polarity of conversations related to this topic had slowly trended upward toward a positive regime. Contents related to patients’ COVID-19 tests (Topic 5) also displayed an oscillating polarity pattern, with the lowest taking place in November 2020 and the highest in May 2021. Finally, the last topic focused on general inquiries and users’ experiences throughout the pandemic and was characterized by mostly positive sentiments, except for September 2020 and January 2021. Starting in January 2021, the trend displayed a slow yet constant increasing slope, consistent with the beginning of public vaccination.
In addition to the polarity of conversations, the associated sentiments reveal invaluable information about users’ emotional states at different stages of the COVID-19 era. Our sentiment analysis tool measures 8 different sentiments, but for the sake of interpretability, we herein focus on three dominant sentiments: “Fear,” “Sadness” and “Trust.” Figure 4 presents the intensities of these sentiments for the different topics. Emotional states of the contents across different topics. (a) The average intensity of “fear” sentiment across different topics. (b) The average intensity of “sadness” sentiment across different topics. (c) The average intensity of “trust” sentiment across different topics.
Fear
Figure 4(a) indicates that discussions about the “general symptoms” and “respiratory effects” of the virus have consistently been associated with a high degree of anxiety. In late 2020, the fear of respiratory symptoms surpassed that of general symptoms, likely due to announcements about the emergence of the Delta variant overseas. In addition, discussions about “post-recovery and vaccination” started as one of the topics with the lowest intensity of fear sentiment, yet with the administration of vaccines at the beginning of 2021, it displayed an increasing slope and shifted to rank among topics with a high level of fear sentiment.
Sadness
Figure 4(b) shows the average level of sadness embedded in various conversations. The highest level of this sentiment was tied to contents relating to “general symptoms” of COVID-19, followed by “respiratory effects” and “COVID tests.” Notably, while the public availability of vaccines coincided with lowering the intensity of sadness in users’ discussions, their concerns about new variations of the virus intensified the sentiment for some topics related to the Delta variant, such as respiratory effects, COVID-19 testing, vaccine efficacy, and patients’ personal encounters with the new mutation.
Trust
Figure 4(c) indicates that throughout the specified period, users found discussions about the personal experiences of other users, generally discussed inquiries, and Q&A content on the subreddit highly trustworthy. Following this category, conversations about COVID-related tests and respiratory effects of the virus harbored a high level of trust. An exception was late May 2021, when the Delta variant started spreading in the U.S. Users’ lack of information about the new variant, accuracy of the testing kits, and potential symptoms significantly affected users’ trust levels.
Descriptive statistics of the sentiments across different topics.
Topics at a glance
In the previous section, we provided an overview of the topics and temporal dynamics of their emotional states. In this section, we summarize the overall emotional state of each topic in a static manner independent of the time dimension. For this reason, we aggregate all content and conversations belonging to a particular topic and measure the average level of each sentiment, as shown in Figure 5. It is worth noting that in this figure, we plotted the average level of both the negative and positive polarity instead of their difference (as in the previous section). Average level of the sentiments for each topic. (a) Topic 1 (general symptoms). (b) Topic 2 (loss of senses). (c) Topic 3 (pneumonia/respiratory). (d) Topic 4 (post recovery/vaccination). (e) Topic 5 (COVID tests). (f) Topic 6 (general inquiries/personal experiences).
As can be seen from Figure 5, the overall valence of conversations about COVID-19’s general symptoms throughout the specified period was negative, with sadness and fear being the most dominant emotions embedded in the content. The overall polarity of discussions about the loss of senses (Topic 2) has been nearly neutral and characterized by anticipation and trust. The set of posts and content discussing the respiratory effects of the virus (Topic 3) had been relatively negative in polarity and accompanied by fear, sadness, and trust as dominating emotions. A combination of sadness, fear, and anticipatory emotions formed the majority of post-recovery content sentiments, with negativity as the dominant valence. This category discusses topics such as vaccination experiences, expectations from anti-COVID medications, and the post-recovery lives of patients. The major sentiments of the conversations regarding the COVID-related tests are sadness, fear, anticipation, and trust, which capture the emotional status of patients waiting or discussing their experiences of being tested for COVID. The dominant valence of this topic was negative. Finally, contents related to general conversations and discussions about users’ experiences and personal beliefs regarding COVID are characterized by trust and fear as their dominant sentiments and positive valence.
Discussion
The engagement of COVID survivors in detailed conversations on social media platforms about their experiences creates a unique opportunity to elicit knowledge about their health status, especially their mental well-being. Numerous studies have shown that user-generated content on social media can serve as a reliable proxy for measuring their mental and physical well-being status.44–46 Social media platforms are accessible venues for people to share their opinions, experiences, and concerns. 47 The importance of these platforms in forming people’s behavior in response to COVID is indisputable.37,48 Reddit provides a unique environment for users to engage in lengthy and thorough discussions containing invaluable information about their emotional, intellectual, and social stances. 29 Mining longitudinal discussions of COVID-positive patients over an 18-month period can shed light on the physical, psychological, and social challenges facing COVID-19 survivors. It also provides an invaluable insight into the critical turning points throughout the pandemic and their impact on the mental state of survivors. We applied topic modeling and sentiment analysis techniques to a dataset collected from a forum dedicated to people with first-hand experience of contracting the virus. We extracted the most salient conversational themes and emotional states of the content within each topic. Our study provided a longitudinal overview of the textual and emotional states of survivors’ conversations. Our analysis led to the detection of six topics pertaining to “general symptoms” of the virus as well as specific symptoms such as “pneumonia and respiratory infections” and “loss of senses.” Other topics about “post-recovery and vaccination” experiences, COVID-19 “test results,” and finally, users’ “general inquiries and personal experiences.”
This study reflects on the learning effect and accumulated knowledge of the pandemic population. Patients demonstrated less interest in discussing general symptoms of the virus over time. This relates to how users process previously unknown phenomena and the associated waning in uncertainty regarding the nature of the disease over time. As a unique natural experiment in the era of big data, we were able to capture this phenomenon, showing how people discuss an unknown emerging global crisis. At the outset of the pandemic, people demonstrated an increasing tendency to share their opinions, experiences, and concerns with others on the forum. However, this tendency has declined over time. With the emergence of new variations of the virus, as well as the public administration of vaccines, users started posing inquiries and relied on popular wisdom. Our study is unique and different from existing studies in the sense that it sheds light on the effects of critical events such as curfews, initial news about vaccination, and nation-wide administration of vaccines.
Our main finding was that while people tended to discuss their post-COVID experiences with others, this tendency increased after the public administration of vaccines. People find forums such as Reddit to be accessible venues to discuss their concerns and speculations about the vaccine, the post-COVID era, and their visions of the new norms. These discussions and speculations provide a resource for researchers to collect honest perceptions of the pandemic. The availability of vaccines seems to have infused hope into society and fostered discussions about the post-pandemic era. Our study sheds light on the research about COVID survivors, providing guidance for public health practitioners and policy makers in designing and implementing strategies and policies that help improve COVID survivors’ health status and quality of life. The diversity of findings in terms of emotional states and experiences over the COVID period, once again emphasized the importance of incorporating the heterogeneity of crisis survivors in the assessment of their psychosocial problems. 22 This individualistic approach adds to the validity and generalizability of the findings and development of proper counter measures for future crises.
The overall polarity of individuals’ experiences is prevailingly negative, indicating that users have a higher tendency to discuss unfavorable emotional experiences with others. The highest negative emotional polarity was observed in discussions regarding the general symptoms of the virus. The discussions were also underscored by three major emotions: sadness, fear, and trust. These emotional states reflect the different phases of the COVID-19 era, that is, the initial fear of the pandemic followed by sadness toward the virus’ adverse effects, such as the death of loved ones, increased unemployment, long lockdown periods, and nationwide quarantines; and trust in emerging healthcare solutions and cumulative knowledge about the virus resulted in the development of COVID-19 vaccines and boosters.
Conclusion
In conclusion, our study, like many other studies on devastating events, emphasizes the validity of reliance on social media for quick and real-time analysis and assessment of human behavior in society. To obtain a deeper and more prescriptive understanding, more direct and vigorous local measures should be adopted. This study is only a first step toward the future work needed for psychosocial rehabilitation and highlights expressions that need further support during the pandemic. That is, the mental-health-related traits that have seen dramatic changes are associated with the spread of COVID-19 in different geographical locations. Such a geo-specific study will further our understanding of the correlation between the spread of COVID-19 and the associated distress, since the disease did not spread at the same rate in different locations. This effort will offer healthcare administrators and policymakers a reliable decision-making tool to detect, prevent, and prescribe well-suited solutions to mitigate the psychosocial effects of the pandemic.
The COVID-19 pandemic, one of the most unprecedented global crises of the 21st century, has exposed the vulnerabilities in the healthcare, social, and economic systems of nations worldwide. While its devastating impact cannot be understated, it has also presented a unique opportunity for medical professionals, policymakers, and global leaders to develop strategic action plans and adapt current strategies to enhance preparedness for future unpredictable catastrophes. Consequently, experts in various fields such as sociology, medicine, information technology, and politics can gain valuable insights from these findings and explore potential countermeasures for the future.
First and foremost, the role of social media in real-time information dissemination amidst uncertainty has emerged as a crucial area for future study. It is imperative to address the challenge of combating misinformation during crises, as this can significantly impact public perception and response. Understanding how social media can be effectively utilized to transmit accurate information and counter false narratives is paramount.
Secondly, the development of swift and proactive policies to mitigate such crises holds immense importance within the field of law. Examining successful strategies in responding to the COVID-19 pandemic can provide valuable lessons for creating effective frameworks that enable prompt actions, thereby minimizing the impact of future crises. Moreover, it is crucial to investigate the experiences of COVID-19 survivors who have faced harassment or targeted attacks on social media for sharing their personal accounts and discussing their psychosocial challenges. Understanding the extent of this issue and exploring ways to address and support survivors in the face of such negativity can contribute to the overall well-being and resilience of individuals affected by future crises.
By exploring these areas and conducting interdisciplinary research, we can gain insights into the complexities of crisis management, develop effective communication strategies, and establish proactive policies that ensure a more robust response to future global challenges.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research project was funded by the University of South Florida through the Interdisciplinary Research Grant Program.
