Abstract
Social Media acts as a primary source of information, opinion, and news source for millions of individuals every day for over a decade. This has never been as apparent as the global pandemic of COVID-19, wherein a span of more than just a year, the evolution of emotions amongst the users of social media has never been so swift and fickle. This study uses Reddit post-extraction and classifies 60,370 posts between the time frame of February 11th, 2020 to January 26th, 2021 from the two major subreddits of r/COVID and r/COVID19. With the help of the Lexicon Approach, the posts are classified into positive, negative, and neutral sentiment polarities, and then distributive frequencies and valence scores are measured for measuring emotional contagion. The findings reveal that there is a high presence of negative sentiments in the posts, and the increase in sentimental extremities occurred in three time-frames, the initial pandemic stage; the implementation of massive lockdowns stage; and the approval and administration of vaccines stage. It also shows that there is a linear relationship between the valence of exposed stimuli and their response. Emotional contagion is present in both positive as well as negative posts. The important implications can be drawn for the emotional wellbeing, perspective, and contagion of the users of Reddit.
Introduction
The outbreak of Novel Coronavirus (2021) has had a tremendous impact on the physiological as well as mental health of numerous individuals. The physical symptoms are, by and large, visible and tangibly diagnosable; but the mental symptoms are majorly missing (WHO, 2021a, 2021b). Individuals have suffered heavy losses in family and fortunes, mobility, incomes, social engagement along with stress, anxiety, isolation, depressive symptoms, uncertainty with emotions, livelihoods as well as mental strength (Low et al., 2020). There have been a few support groups on the platform known as Reddit, which have allowed users to share, discuss and help those who are in need of help dealing with this pandemic’s experience. This study uses the Natural Language Processing Tools to analyze the invoked emotion in the discourse of COVID-19 on an online platform by evaluating their sentiments and identifying the presence of emotional contagion.
Early findings showed that this fear in the public’s mind was reflected in their online discourse, which necessitated the importance of studying the patterns in the emotional content of these discussions (Shankar & Tewari, 2021a). There has been, therefore, an increase in the literature studying the sentiments of the online discourse related to COVID-19 (Alamoodi et al., 2021; Crocamo et al., 2021; Valenzano et al., 2020). These studies also found that there has been a pattern of intense emotional discussions when it comes to the norms and outcomes of COVID-19, which could result in the development of emotional contagion. Continuous bombardments of online notifications lead to a long-term effect on an individual’s emotional sentiments as well as contagion, leading to people experiencing emotions that are intense, without the apparent awareness (Ferrara & Yang, 2015; Rubin & Wessely, 2020; Shankar & Breithaupt, 2019).
Severe acute respiratory syndrome (SARS) and the Middle East respiratory syndrome (MERS) and the common cold are among the primary illnesses triggered by a family of viruses that are known as Coronaviruses. SARS-CoV-2, or severe acute respiratory syndrome coronavirus-2, has been originated in China in 2019, which was then called COVID-19 by the World Health Organization (WHO, 2020a). It was then declared a pandemic by WHO in March, 2020 (WHO, 2020a). As of 26th June, 2021, there have been over 181 million positive COVID-19 cases reported, with over 165 million of those cases being recovered and nearly 4 million deaths.
This all has its own baggage of fear in its social stigma for people, places, and things that are associated with COVID-19. This pandemic has also led to an “infodemic” (WHO, 2020a) due to an overwhelming response, opinions, emotions as well as conspiracies surrounding the COVID-19 discourse. With the involvement of uncertain emotions, what we get is an emotionally charged crowd that is easily swayed to unreliable dogma. Human interactions are built upon recognizing, acknowledging, and embracing emotions and sentiments (Cowen et al., 2019), but if they are not properly expressed or addressed, they seem to result in dire mental health issues (Wells, 2006). Fear mongering has been a weaponized tool for political, sociological, and psychological attraction and warfare since the days of yore (Aslam et al., 2020). The unfortunate misuse of media for sensationalizations of half-truths or fake news has been done for hundreds of years to propagate false pieces of information (Friedman et al., 1999).
The other coronavirus outbreaks were reported first in Guangdong province, along with Hong Kong, Toronto, Singapore, and Hanoi by many researchers (Hsu et al., 2003; Lee et al., 2003; Tsang et al., 2003) among others, resulting in 8,439 infections and 812 deaths (Liang et al., 2004). That outbreak was clinically named SARS-CoV-1, which bears 70% similarity with SARS-CoV-2, or COVID-19 (Rat et al., 2020). Another outbreak was reported in 2012 as Middle East respiratory syndrome coronavirus [MERS-CoV] first in Saudi Arabia, resulting in 2,400 infections and 850 deaths (Killerby et al., 2020; Wang et al., 2020; Zaki et al., 2012).
The coverage of this new strain of the virus generated fear amongst the public, which, in turn, sparked a scared discourse on all the social media platforms. To this day, COVID-19 remains the top 5 discussed, covered, or conversed topics on all the biggest platforms across the globe, including platforms like Google, Twitter, Reddit, Facebook, etc. (Volkmer, 2021). Even in the early days of the pandemic, and lockdowns and quarantines, a study (Barkur &Vibha, 2020) found that the overall sentiment of the social media users has been overly positive, but then later studies (Aslam et al., 2020; S. Das & Dutta, 2021) found that there is a shift into the negative sentiment towards the social media discourse of COVID-19, many of which is spearheaded by the negative coverage of the virus in the mainstream media. Natural Language Processing (NLP) has a unique and prominent method of classifying sentiments in the texts that are being analyzed, known as Sentiment Analysis (B. Liu, 2012), and there has been an increased use of sentiment analysis in understanding the medical emergencies and research (Sokolova & Bobicev, 2013; Zeng-Treitler et al., 2008). Sentiment analysis uses a technique known as Lexicons which matches the text units (words, sentences, etc.) with emotions and sentiments (Mohammad & Turney, 2013; Taboada et al., 2011), which then segments these sentiments into positive and negative sentiments by assessing the overall polarities of these texts (Pang & Lee, 2008; Strapparava & Mihalcea, 2008; Strapparava et al., 2006; Turney & Littman, 2003; Wilson et al., 2005). The positive sentiment explains the polarity which has favorable sentiment, and the negative sentiment explains the polarity of dissimilar sentiments (Bermingham & Smeaton, 2010). Even though there are obvious challenges to sentiment analysis of big data in real-time (Turney, 2002), organized use of classifiers to continually evolve the classification Lexicon-based approach is important (Pang et al., 2002).
In this study, we have tried to use social media to decipher the communication and discourse amongst users talking about COVID-19. The focus of their discussions would be through the Reddit platform, concentrating on the topics, posts, and news related to COVID-19. The sentiment analysis is done to evaluate the emotional valence and emotional contagion of the outbreak. There have been intensive recent studies that have studied the content that is shared on social media by social media users, such as Facebook (Kim et al., 2021; Kramer et al., 2014; Sturm Wilkerson et al., 2021), Twitter (Arora et al., 2021; Ferrara & Yang, 2015; Park et al., 2021; Shankar & Tewari, 2021b), as well as traditional News Headlines (Aslam et al., 2020; Srivastava & Deepak, 2021); but the studies dealing with the conversations and discourse on Reddit—especially when it pertains to emotional regulation and contagion—in the context of COVID-19, are substantially scarce, even though there are two big subreddits dedicated to COVID-19. The studies done on the other social media and traditional media platforms have used Sentiment Analysis and Deep Learning Techniques to identify and evaluate their users’ emotional and behavioral valence. But the limited literature on Reddit discourse has primarily focused on understanding medical uncertainties or relatedness in responses (Bunting et al., 2021; Thompson et al., 2022), and were linguistically (Low et al., 2020) or geographically (Zhang et al., 2020) constrained. We have tried to justify the consequences that Reddit discourse has on emotional wellbeing, and urges for interventions in the front. The questions that are being answered here are: (1) What is the overall sentiment polarity of the Reddit posts of COVID-19? (2) What are the top conversation topics while discussing positive and negative sentiments of the COVID-19? And (3) Is there an effect of emotional contagion on the users of Reddit discussing COVID-19? Based on the textual analysis, the study proposes intervention strategies that can be employed to identify and convert negative sentiments into positive ones. This can help in identifying the most credible sources of the user pools that can act as information disseminators for the discourse of COVID-19 information.
Review of Literature
NLP techniques have been used in the studies of social media analytics when analyzing varied domains of research. Amongst the most popular techniques of NLP is Sentiment Analysis. Sentiment analysis has been used in multiple ways for emotional mining (Cambria et al., 2013; Feldman, 2013; B. Liu, 2012; Medhat et al., 2014; Pang & Lee, 2008). Since there has been a wide range of domains that have implemented Sentiment Analysis, this section would discuss the use of natural language processing in the domain of pandemic-related social media analytics.
With the advent of COVID-19 and subsequent online interactions, there was an immediate need to address the concerns regarding assessing the emotions of the users interacting on these platforms. Thus, numerous researchers started looking for the sentiments and sentiment analysis of these social media interactions. Several studies include platforms like Facebook and Twitter (Domalewska, 2021; Rianto & Pratama, 2021), YouTube, Instagram, etc. (de Las Heras-Pedrosa et al., 2020; Shukla, 2021), Twitter, and other online platforms (Chakraborty et al., 2020; Dubey, 2020). A study performed a systematic literature review to suggest that sentiment analysis played a substantial role in figuring out the emotional intelligence as well as emotional contagion of online users (Alamoodi et al., 2021).
This review study also provided a stark understanding which formed the initial premise of this study. It found that the impact of social media discourse and its sentiment analysis was prominent in figuring out the presence of emotional contagion, especially when the contents shared or discussed were tangential to the mainstream discourse, or worse, were misinformation.
Sharma et al. (2017) analyzed the impact social media has as an information source that might help in decreasing the pandemic information spread. This study used the Zika virus pandemic as the source to discuss the misleading posts that gain traction and popularity in comparison to accurate posts. Similarly, B. F. Liu and Kim (2011) evaluated the organizational dissemination and subsequent social media responses to the 2009 H1N1 flu virus, showing that legitimate organizations did not use this opportunity to properly discuss the epidemic crisis. Similarly, the 2015 India H1N1 flu epidemic and its related issues were discussed showing Twitter acted as a better platform for effective information transmission, as compared to similar traditional media and social media platforms (Jain & Kumar, 2015). Similar studies have discussed the misinformation regarding the Ebola virus (Apuke & Omar, 2020). There has been an epidemic of false information spread in the field of health (Apuke & Omar, 2021; Pulido et al., 2020).
There is a similar use of social media for COVID-19 and social media interactions. Right from investigating how China and its organizations used social media during this time (Q. Chen et al., 2020), discussing citizen engagement and its impact on misinformation, to India and its organizations’ use of social media (S. Das & Dutta, 2021) talking about government’s handling of the events during the pandemic on Twitter. Some studies have rightly found that there is an obvious connection between social media and misinformation in the pandemic era (Hou et al., 2020; Huynh, 2020; Pennycook et al., 2020). People are looking at social media for seeking information (Huynh, 2020), but the spread of fake news and fake posts has been especially prominent on social media (Frenkel et al., 2020; Pano & Kashef, 2020; Russonello, 2020). Fabricated information involving the origins of the virus, preventive home cures, propagating racial division and hate along with denials of scientific methods and mental health, added to undermine the efforts of front-line workers, scientists, and governments (Lampos et al., 2021).
Similar studies discussed the impact that social media and misinformation have on emotional risk, such as in Vietnam (Huynh, 2020), Taiwan (Frenkel et al., 2020), Nigeria (Alpert, 2020), India (S. Das & Dutta, 2021), Africans (Lampos et al., 2021), United States (Pennycook et al., 2020), among others. These papers found a consistent finding that social media has been inefficient in combating medically unproven “cures,” along with incompetent predictions of multiple millions of deaths in each country due to this pandemic, resulting in unnecessary fear and uncertainty (Aslam et al., 2020; Hassan, 2021; Sahu et al., 2020).
There has also been empirical studies and literature survey for analyzing COVID-19-related opinion, commentaries, discussions, posts, etc. These studies have focused on understanding COVID-19 (Sohrabi et al., 2020), tackling COVID-19 (Lampos et al., 2021), documenting comprehensive reports of COVID-19 (Sahu et al., 2020), studying mental health related to COVID-19 (Rajkumar, 2020), and compiling media reports of COVID-19 (Zhou et al., 2020) along with accessing emotions related to the news headlines from major news outlets regarding COVID-19 (Aslam et al., 2020).
These studies showed that this use of sentiment analysis with social media is a continually evolving topic (L. Das & Dutta, 2020), and this study tries to build upon this topic. With the help of methodologies such as Sentiment Analysis and VADER, we would be able to explain the emotional contagion and emotional valence of the user having an online discourse.
If the COVID-19 related opinions expressed are unpopular or tangential, and if they’re opined during the online discourse, lead to a prompt as well as unnecessary social and cultural discrimination (Akroyd et al., 2020), and could fortify up and divulge into civil and societal unrests (Bloem & Salemi, 2021), as was seen during the peak isolation periods. The increase in mental health and its related issues—including stress, anxiety, motivation, and emotional intelligence (Shankar & Tewari, 2021b), etc. lead to further deterioration of society. This has real, severe, and long-lasting implications for the weaker sections of the society (Shankar & Tewari, 2021b), who could be easily extorted into receiving improper medical attention, leading to counterfeit emotions as well as high emotional contagion (Rincón-Aznar et al., 2020).
Interplay Between Online Discourse and Emotional Contagion
Modern English as well as Medical dictionaries define Contagion as the “transference of disease by contact” (R. P. Das, 2017; Merriam-Webster, 2022). Speaking in explicitly medical terms, the presence of contagion is seen when the transference of an infectious disease happens through the mediums or carriers of the pathogenic microorganisms, usually through the air, water, or other contaminable sources (Valenzano et al., 2020). In our study, we are centrally focused on the theory of emotional contagion (Hatfield et al., 1993).
Emotional Contagion, as defined by Hatfield et al. (1992) provides an underlying understanding of the theoretical foundations of collectivist behavior, afferent behavioral mimicry, and behavioral transmission, along with human cognition, behavior, and emotion as well as other neurophysiological and psychological outcomes. Multiple pieces of research, including a study of a 20-year-long longitudinal study, found that intense emotions have a way of finding a path into a person’s psyche through social media and other online interactions (Fowler & Christakis, 2008). Some other studies found that emotional contagion occurred even when the interactions were non-verbal, online, and were manipulatively controlled (Kramer et al., 2014; Sasaki et al., 2021; Steinert, 2021). These studies suggest that getting emotional cues from social media platforms can have long-term negative effects (Shankar & Tewari, 2021a).
The possibility of information manipulation that users could see or denied seeing is not only well-suited for these platforms, but they are actively involved in it (L. Chen et al., 2022). The ethical concerns are pretty obvious (Ferrara & Yang, 2015), but what is usually glossed over by the individuals are the long-lasting consequences of these manipulative behaviors.
As fear, uncertainty, anxiety, and stress kicked in with the advent of COVID-19, social media awarded us with emotionally charged messaging, and subsequent radical emotional responses. This leads to a toxic emotional environment, which are the breeding grounds for emotional contagion. Users were less worried about the tangible consequences of COVID-19 and its policies and were more concerned with perceptual alignment with the narratives (Altamura et al., 2019).
Although many studies have studied social media interactions regarding COVID-19 and possible psychological and emotional triggers and stress, there is not a very clear understanding of why some triggers affected users more than others. Also, there is uncertainty in figuring out why some were affected more than the others. In light of this premise, this study has tried to identify the presence of emotional contagion in Reddit interactions regarding COVID-19, and how this contagion has impacted the sentiments and responses of these users.
Method
Data
This study collected 60,370 posts from two subreddits, namely r/covid and r/covid19 that are dedicated to discussing about the coronavirus pandemic with nearly over 500k spectators combined in the subreddit communities. Reddit is a social media platform that covers social news and media aggregation as well as content aggregation through posts, comments, replies, images, videos, links, etc. The popularity of a post is measured by the engagements of other users through reply threads, and upvotes/downvotes providing scores to the post.
Unlike other social media platforms, Reddit allows for the extraction of contents through metadata on its website. This study was done using Python to extract all the posts from February 11th, 2020 to January 26th, 2021 for analysis. The summary of the dataset is given in Table 1.
Summary of the Dataset.
The threads and posts were discussing about every aspect of their lives that has been affected by this COVID-19 pandemic. Sentiment analysis was conducted from the posts extracted through these two subreddits.
Data Pre-Processing
The extracted contents of the users usually contain textual as well as non-textual information that warrants cleaning before any kind of NLP analysis. Text analytics (Angiani et al., 2016; Kharde & Sonawane, 2016) is used to look preliminarily into the raw extracted data. The following steps were taken for data processing and data cleaning:
Conversion of posts into a text file.
The texts were converted into a corpus on which the analysis was to be done.
Conversion of all texts of the entire document into lower case.
Removal of punctuation marks, such as commas, hyphens, periods, and other line and page breaks.
Removal of stopwords, such as “like,”“and,”“or,”“in,”“is,”“there,”“were,”“for,”“isn’t,”“couldn’t” etc.
Removal of URLs, mentions, emoticons, and other non-ASCII characters.
Removal of numbers, digits, and numerals.
Removal of unnecessary white spaces, tabs, and other spaces.
Stemming—Removal of words with common occurrence ending with “es,”“ed,” and “s.”
Lemmatizing—Assessing different forms of inflected words to get a better understanding of the text.
Sentiment Analysis
Sentiment analysis is usually defined as a classification task where each classifying type and category is characterized by a sentiment (Prabowo & Thelwall, 2009). This technique is usually defined in two ways: First, is the Lexicon-Based Approach, and the second is the Machine Learning Approach (Kharde & Sonawane, 2016; Piryani et al., 2017; Taboada et al., 2011). The more popular of the two approaches is the Lexicon-Based Approach, as it relies on a set of predefined “lexicons,” or a list of predefined wordset that help in identifying the polarities of the texts to be analyzed. These lexicons have an inbuilt repository-list of wordset, along with their own sentiment polarities (Mohammad, 2015).
NRC Emotion Lexicon is the most popular lexicon used in these researches (S. Das & Dutta, 2021) which uses the rule-based approach to associate the words from its eight basic emotions (Ekman, 1992; Plutchik, 1994). This corpus-level mining approach allows for a simplified understanding of the complex linguistic structure of the language, especially around emotionally-charged topics like COVID-19.
Valence Aware Dictionary and Sentiment Reasoner (VADER)
VADER is amongst the most commonly used Lexicon-Based Approach and Rule-Based Sentiment Analysis model that is predominantly used to analyze the words, texts, and emoticons from the social media platforms (Hutto & Gilbert, 2021). Since it has a predefined repository from which it does its analyses, it is usually found to be much more effective and efficient in terms of speed, time, and accuracy in comparison to its machine learning approach counterparts (Hutto, 2014; Hutto & Gilbert, 2014; Shankar et al., 2021).
All of the values of the textual conversions are first converted into vectors, that provide scores to the sentiments of the texts, which divides the vectors into positive, negative, neutral, and compound polarities, which then normalizes the polarities of negative, positive, and neutral from 0 to 1; and the compound polarities are normalized from −1 to +1 [negative to positive] (Mäntylä et al., 2018, Pano & Kashef, 2020). These scores are identified as VADER scores and they are used to measure the emotional trend of the COVID-19 texts.
VADER has been consistently performing better on human as well as Twitter Data (Hutto & Gilbert, 2014); and has performed better per capita against the other seven popular lexicons (Elbagir & Yang, 2020). There have been several linguistic adaptations of VADER too (Amin et al., 2019; Las Johansen, 2018; Oyewusi et al., 2020; Tymann et al., 2019), that further reinforce the understanding that VADER is the most widely used for collaborative and ensemble understanding of the emotions of the textual data (Bonta & Janardhan, 2019; Borg & Boldt, 2020).
Results
Sentiment Analysis
Sentiment Analysis was done using Python’s package VADER (Hutto, 2014). The sentiment polarities were divided into two polarities, that is, Negative and Positive, and the emotions were classified into eight emotions (Mohammad, 2015; Plutchik, 1994). In order to analyze the word distribution visually, a word cloud was created. Figure 1 shows the word cloud of the words being used in the posts based on their occurrence frequency. The bigger is the word, the higher is the frequency of occurrence of the said word.

Word cloud of the sentiment polarities based out of COVID-19 post.
It can be clearly seen from the word cloud that in the positive word corpus, the most commonly occurring words were thank, fun, help, please, hope, happy, good, well, etc., whereas in the negative word corpus, the most commonly occurring words were covid, death, people, struggling, feeling, throat, pain, infected, loneliness, etc. For further understanding, a group of sample Reddit posts by the Redditors explaining the positive and negative sentiment polarities is presented in Table 2.
Sample Redditors’ Posts with Emotional Classification and Sentiment Polarity.
The posts show that when a user is discussing the illness and its ramifications, it is usually in a negative sentiment polarity, where they are clearly talking about the struggles they have with the disease and how they have tried to deal with it. And when talking about vaccines and uplifting news and stories, they are being presented in the positive sentiment polarity. It is also evident that positive posts have an air of caution in them, and negative posts have a trickle of hope.
The choice of also measuring the intensity of the sentiment polarities when dealing with longer texts is discussed in previous studies (Ferrara & Yang, 2015; Hutto & Gilbert, 2014; Thelwall et al., 2010), with the limitations to intensity of sentiment analysis being presented when the text is limited by the character or word count. With such limitations being absent in Reddit, the efficiency of sentiment intensity increases. Figure 2 presents the sentiment distribution of the top 10 most discussed topics or organizations.

Average sentiment intensity in the top 10 most discussed topics or organizations.
It can be clearly seen that when the posts are pertaining to organizations (business, medical, or otherwise), the sentiment is majorly positive, exhibiting positive emotions; but when the posts are about the disease or the effects that they have on their lives or livelihoods, then the sentiments are similar in the positive and negative polarities, suggesting a presence of static or neutral emotional presence in these topics.
Effect of Emotional Contagion
There is a long-standing idea that emotions can be transferred through interactions on the online media, especially social media, even when there is an objective absence of non-verbal communications, interactions, and cues, which are considered the core ingredient for any emotional interactions (Fowler et al., 2008; Hatfield et al., 1992).
To achieve this, the posts were vectorized and the instances of occurrence of “covid” were counted through frequency distribution across the time-frame of data collection. This allowed us to figure out the time when there was a significant rise in the discussions related to COVID-19. The results are shown in Figure 3.

Average daily occurrence of the word “covid.”
This shows that from early March to early April, there was significant use of the word “covid” in the posts, which goes down and returns during late April, as this was the time when COVID-19 was first declared a pandemic and the nomenclature was derived (WHO, 2020b). This spike was again resumed around early June when most of the nations, including the UK, India, etc. went into severe lockdowns, and there was a serious spike in death toll too (Venkata-Subramani & Roman, 2020). Again, the spike in occurrence of the word returned around early December and mid-January, when the first of the vaccines were finalized (BBC, 2020) and approved (WHO, 2020b) by the governing health authorities across the globe, including WHO and the vaccines were beginning to be administered to the public (WHO, 2020b).
The posts were then divided into their sentiment polarities based on their VADER scores. Then for each polarity, a distribution was created through observation of the average daily change. The results are shown Figures 4 and 5. Figure 4 shows the distribution of the sentiment polarities after posting the texts on the subreddits. The sentiments included are positive and negative sentiments, on a daily average. Figure 5 shows the distribution of the standard deviation of the sentiment polarities after posting on subreddits.

Sentiment average change with time—mean change.

Sentiment deviation with time—standard deviation.
The figures show that there are overall more positive sentiment posts being posted on the two subreddits, but there were higher average negative sentiments on those posts. Similarly, there was a higher deviation of positive sentiment posts, but there was a consistent presence of negative sentiment posts throughout the time frame. The possible spike in the negative sentiments during the period of June 2020 could be attributed to the severe lockdown restrictions implemented across the globe. A similar spike in the standard deviation of positive sentiment can be attributed to the release and administration of vaccines and the related hope with it. These spikes in negative and positive sentiments have been attributed to immediate mental health issues, that lead to a sustained feeling of negativity or positivity in the continued threads of the post, which has also been reported in similar studies (Low et al., 2020).
These results show the presence of emotional contagion on both the negative and positive sentiments, which is also found in previous studies (Ferrara & Yang, 2015). In order to further solidify the results, measures of anomalies and valence were also done. For anomaly detection, the texts were vectorized according to the VADER score, and then the anomalies were clustered using the DBSCAN package of Python. Figure 6 presents the distribution of sentiments for the anomalies.

Distribution of sentiments across observed anomalies.
The figure clearly shows that the distribution follows a bimodal distribution, which is pretty visible in all the categories used. It also shows that when the sentiment strength is low to medium, the anomalies in positive and negative sentiments are in higher presence, but when the strength of the sentiments increases, the anomalies of the neutral sentiment topics increase, and positive and negative sentiments almost become negligible. This shows that there are multiple sub-groups that are present in the entire anomaly distribution (Aslam et al., 2020).
The valence method measures the sentiment ranges, where the lower score show is greater disproportion towards negative sentiment, and a higher score shows a greater disproportion towards positive sentiment. The texts are divided into bins of the same size (here: size as 1) which contain a set of posts made by the users with their corresponding sentiments, which are then used to find the valence in the input bins. The results of the method are shown in Figure 7.

Valence relationship in the posts of the subreddits.
The findings show a very strong and positive linear relationship between stimulus and response valence. The result is also statistically significant (p < .000, SE = 0.026). This shows the presence of strong positive and negative emotional contagion in the contents, with a clear indication that a strong negative stimulus triggers a strong negative response, and vice-versa, including the neutral stimulus-response.
Discussion
The study aimed to evaluate the emotional perspective, emotional wellbeing, and emotional contagion of the users of Reddit, especially on the popular subreddits of COVID-19. Different from the studies carried out on Facebook (Kim et al., 2021; Kramer et al., 2014; Sturm Wilkerson et al., 2021), Twitter (Arora et al., 2021; Ferrara & Yang, 2015; Park et al., 2021), News Headlines (Aslam et al., 2020; Srivastava & Deepak, 2021), etc. where the control variables included only the content shared by users on their social media, this study tried to evaluate the sentiment and emotional content of the posts shared on Reddit.
The findings of the sentiment analysis showed that there is a high connection between posts on Reddit and emotional and sentiment polarity. They especially showed a high emotional score to negative posts shared on Reddit. The outbreak of the pandemic disease and the ineptitude of the governments to prepare for the treatments are not lost on the Redditors, and that has created a sense of fear, uncertainty, and anxiety that is not helpful for the mental and emotional wellbeing of the netizens.
The obviousness of the edge of negative words over positive words presented itself when the cluster of those words was done according to their sentiments. The initial word cloud and the emotional intensity graph showed that there was a slight edge towards the negativity in the sentiments on the discourse, which was accentuated when talking about the medical authority and the allied institutions, and when the discussion was about the disease itself. This finding is in line with some recent studies that have looked into other platforms (Domalewska, 2021; Gulati et al., 2022) who have also found similar edges to negative sentiments over positive sentiments in social network interactions.
The important finding was that there was always an undertone of negative sentiment, in almost all of the posts on Reddit. It also showed that the rising death numbers and loss of loved ones have been leading to chronic mental disorders, which is echoed in the findings of S. Das and Dutta (2021), which showed that mass quarantines and lockdowns have led to overall online community anxiety, and isolation is not helping at all.
It is also important to understand that misleading information, misinformation, or conspiracies surrounding the disease are bound to escalate the fear and anxiety levels amongst the public. The previous literatures studying the SARS/MERS virus outbreaks have clearly shown that the emotional wellbeing of the people can be increased by keeping them well-tested, medicated and informed, stopping the incidents to rise to the situation of mass hysteria, which might make the people who are negative, or are feeling low to mild symptoms, or symptoms of similar viral infections, to feel high levels of uncertainty, anxiety, and concern, thereby further increasing their stress levels, and also of the frontline workers serving and treating them.
This kind of mass hysteria can act as the prompt to unnecessary social and cultural discrimination, which was seen across the globe with the rise in civil unrests during the peak isolation periods. This was visible when the fluctuations in the presence of sentiments were measured, which showed clear spikes around three important time frames of COVID-19, the first being the stage when the WHO declared COVID-19 as an official global pandemic, the second was when the massive lockdowns were implemented around the globe, and the third was the COVID-19 vaccines were first approved and administered. This is important to understand because if medical and administrative leaders want to be fair in figuring out solutions to catastrophic problems, they need to learn about the sentiment intensity of their constituents, especially if they are from a marginalized group. The weaker sections of the society could be extorted into not receiving proper medical care, leading to further breakdown of the societies and economic developments. The implications are quite severe and explicitly harmful (Akroyd et al., 2020), and could cause counterfeit emotions, or emotional contagion (Rincón-Aznar et al., 2020).
The interesting outcome of these findings was that the sentiment spikes were similar in both the emotions around the same time, which is inconsistent with previous findings of sentiment analysis (Crocamo et al., 2021; Srivastava & Deepak, 2021). The difference in these findings can be understood with the findings of Eghtesadi and Florea (2020), who argued that Reddit, as compared to its counterparts like Facebook, Twitter, and Instagram, is comparatively less polarized; and as Allgaier (2016) notes that Reddit is more of an online forum, but others are social network sites. This might be a reason why the spikes were consistent in their time frames and the sentiments themselves.
The findings of emotional contagion showed an irrefutable presence of contagion in all the posts shared on Reddit, irrespective of the sentiment polarities of the text. It was observed that, on average, there is an overall over-exposure of negative or positive posts that generates a similar response. There was a weak presence of anomalies of negative or positive posts, but a very strong presence of anomalies of neutral posts.
As there are now increasingly substantial understandings that non-verbal cues and the absence of interactions can also lead to emotional contagion (Ferrara & Yang, 2015), it was important to see whether even the mentions of “covid” triggered valence. It was found affirmative that the mentions of this disease had an intense emotional reaction, which attributes to the idea that mentions of “covid” are an emotionally charged and behaviorally replicable phenomenon.
There was also a strong positive relationship between the valence stimuli and response, suggesting a strong presence of emotional contagion in the data, which is also present in the previous studies (S. Das & Dutta, 2021; Pano & Kashef, 2020). The relationship was also linear, which essentially amounts to similar contagion regulating both negative as well as positive emotions. We can also determine through the stimuli-response valence that the susceptibility of high-end-high-interaction users (Valenzano et al., 2020) are more prone to feeling the emotion that is highly present in a Reddit post, despite its informative value.
The study was not immune to shortcomings. One clear shortcoming was the data itself. The data covered the head posts shared by the users, but failed to collect all the comments and response threads of these posts. Despite the time frame of the posts collected (almost a year), the absence of a comment thread posed a loss of nuance in the interactions between the users. This would allow measuring the additional contexts to users’ experience, and the pushback to the misinformation on incorrect posts/news. Future studies might try and study the posts and the associated comments in their full context to extract homophily in the users and figure out the community structure. It must be said that these limitations are present in all the studies using big data, NLP, and sentiment analysis as tools of analysis, and do not affect the overall validity and soundness of this study.
Conclusion
In order to understand the impact that online discourse has on the emotional wellbeing, emotional perspective, and emotional contagion of the users, it is imperative to recognize that the average person has made social media their primary source of information, be it for news, community, advice, or any other. There have been numerous studies in recent times that have tried to evaluate its impact on people’s mental health. With this study, we have tried to make the following additions to the literature; first, we have tried to find the sentiment trend of the Redditors when it comes to COVID-19 discourse on their subreddits, by using the largest extraction of post data for almost a year. Second, it has tried to figure out whether the emotional contagion plays a role in the posts and their corresponding responses.
By evaluating the sentiments through the word clouds and variations, this study has tried to figure out the evolution of the attitude of the people connecting on these subreddits, which adds to the growing body of the use of Natural Language Processing while studying about COVID-19 and its impacts on public lives. By evaluating the anomalies and valence of the posts, the study argues the presence of emotional contagion in all the occurrence of emotional stimuli, which triggers a similar strength emotional response. This finding can also be corroborated in the previous literature.
The study was not without its limitations. The major limitation was the evaluation of just the head posts and not the corresponding replies or comments from other users. This robs these analyses of the necessary contexts, especially when the posts are partially misleading, or otherwise. Another one was the language, as all the posts were in the English language, and other language texts were removed in data pre-processing. Future research can focus on translating these other language posts to find a more comprehensive meaning to the posts shared, along with combining the posts with their responses and replies to assess the contexts of the content associated with the head posts. This would also help us understand how the community understanding has been helping people dealing with COVID-19 or navigating life in the aftermath of COVID-19. This would allow us to create an optimal processing strategy, as the opinions and emotions of the pandemic and infodemic changes and evolves.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
