Abstract
This study aimed to identify and assess the prevalence of vaccine-hesitancy-related topics on Twitter in the periods before and after the Coronavirus Disease 2019 (COVID-19) outbreak. Using a search query, 272,780 tweets associated with anti-vaccine topics and posted between 1 January 2011, and 15 January 2021, were collected. The tweets were classified into a list of 11 topics and analyzed for trends during the periods before and after the onset of COVID-19. Since the beginning of COVID-19, the percentage of anti-vaccine tweets has increased for two topics, “government and politics” and “conspiracy theories,” and decreased for “developmental disabilities.” Compared to tweets regarding flu and measles, mumps, and rubella vaccines, those concerning COVID-19 vaccines showed larger percentages for the topics of conspiracy theories and alternative treatments, and a lower percentage for developmental disabilities. The results support existing anti-vaccine literature and the assertion that anti-vaccine sentiments are an important public-health issue.
Introduction
The Coronavirus Disease 2019 (COVID-19) Pandemic has dramatically influenced many people’s lives. Globally, social guidelines and restrictions have been implemented to contain the virus’ spread, and several vaccines have been authorized. However, despite increasing evidence of the vaccines’ efficacy and safety, many people remain reluctant to receive vaccination. Members of this group are known as “vaccine hesitant” or “anti-vaccinationists (anti-vaxxers). The World Health Organization recognizes vaccine hesitancy as a major threat to global health. 1
Understanding the threat posed by antivaxxers on social media is critical to any vaccination program 2 as the uptake of many vaccines continues to be suboptimal. 3 With social media platforms playing a growing role in health communities, 4 a growing interest in these platforms has emerged by health informaticians 3 to understand how social media platforms become a reflection of anti-vaxxers beliefs and practices. 4 In facts, health informatics in conjunction with machine learning and data science have been applied in different real-life applications including healthcare analytics. 5 The text analytic framework proposed in this study sheds new light on social media data and its potential in public health surveillance.
Social-media platforms afford information-sharing regarding vaccines; 6 however, such information can increase vaccine-hesitant behaviors and anti-vaccine sentiments. 7 Anti-vaxxers utilize social media to manipulate public emotions, promote conspiracy theories and misinformation, and create divisions among the public. 8
To obtain a holistic view and insights into prevailing vaccine-related topics and issues, it is necessary to analyze anti-vaxxers’ discussions on social media. Such insights could inform public awareness campaigns for reducing social-media-based anti-vaccine movements’ impact. Few studies have considered COVID-19-vaccine hesitancy in the context of social media, with those that have being limited in regard to sample data size and data-analysis methods.
Accordingly, this study aimed to leverage text analytics (specifically, topic-mining) to analyze negative discourse regarding COVID-19 vaccines in the US. In contrast to prior research, the current study aims to analyze shifts in the relative prevalence of various topics related to vaccine hesitancy. Specifically, we examine a relatively large dataset extracted from Twitter and seek to identify, track, and analyze topics associated with COVID-19-vaccine hesitancy and rejection. We compare the popularity of such topics before and after the onset of COVID-19, and against sentiments towards the well-known vaccines for influenza (flu) and measles, mumps, and rubella (MMR).
Background and significance
Sharing concerns about vaccination on social-media platforms could negatively affect the vaccination process. 9 Vaccine scare has been considered as a major health issue over the past decade. 10 Such scares is a recent phenomenon that is characterized by mass media posting that generates panic about health interventions, such as vaccines. 11 Accordingly, several studies have aimed to analyze the social-media-based vaccine movement and identify best practices and guidelines for increasing public trust in vaccines.1,9,12–14 Dhaliwal and Mannion 1 explored public perception of vaccination through analysis of social-media platforms, categorizing the obtained data into truth, consequences, and myths, respectively. The analysis showed that claims about vaccines ranged from questioning the ethics of vaccination to vaccine’s benefits, truths consist of information’s supported by scientific evidence, and autism was the main concern when to comes to vaccination. Meanwhile, Tara and Rubinstein 15 examined the major themes present in a popular anti-vaccine broadcast, identifying “they are lying to you,” “civil liberties,” “everyone is an expert,” “science will not save us,” “skew the science,” and “they are out to harm you.”
Gunaratne, Coomes, and Haghbayan 16 analyzed trends in pro-vaccine and anti-vaccine discussions on Twitter, finding a lower number of anti-vaccine than pro-vaccine tweets, and despite an increase in anti-vaccine Twitter users, no increase in anti-vaccine tweets since 2014. Massey et al. 17 characterized pro-human-papilloma-virus (HPV) and anti-HPV vaccine networks on Instagram, finding that, in contrast to pro-HPV-vaccine posts, anti-HPV-vaccine posts originated from individuals and included personal narratives. Topics among anti-HPV-vaccine posts included misinformation, vaccine debate, evidence base, and health beliefs.
Kang et al., 14 analyzing vaccine-related data on Twitter using semantic networks, found that the positive-sentiment network centered on parents and emphasized communicating health risks and benefits, while the negative-sentiment network centered on children and emphasized organizational bodies. Ruiz, Featherstone, and Barnett 7 analyzed Twitter data regarding three vaccines and identified vaccine influencers and their online communities. Sentiment analysis revealed 3 influencer communities: focusing on the dangers of childhood vaccines and showing negative sentiments; focusing on promoting vaccines and showing a neutral sentiment; and focusing on increasing and encouraging vaccination rates and showing positive sentiments.
Social media and COVID-19 vaccines
Lyu et al., 9 using Twitter data, analyzed public opinions concerning the potential of COVID-19 vaccines. They found that socioeconomically disadvantaged users held polarized opinions on vaccines, and that anti-vaccine opinions were strongest among users with the worst pandemic experience (e.g. sickness in one’s family). The major topics identified were safety, effectiveness, and politics.
Wu et al. 12 examined COVID-19-vaccine concerns by analyzing active users on Reddit, finding that the top-10 topics were skeptical/aggressive remarks, clinical trials/research/testing, life/family/kids, people/vaccine efficacy/risks, governments/big companies, symptoms/immune systems, time/long-term effects, stock market/sports, politics/news sources, and lockdown/spread/cases. Jamison et al. 18 analyzed 2000 Twitter accounts, of which 45% opposed vaccinations, finding that most of the vaccine opponents’ tweets concerned public-health topics, news topics, discussion topics, conspiracy theories, insinuation/rumors, and scams.
Bonnevie et al. 19 evaluated shifts in vaccine opposition by comparing online conversations during the 4 months before and after the COVID-19 outbreak, respectively, finding that vaccine opposition on Twitter increased by 80%. 11 themes were identified: negative health impacts, pharmaceutical industry, policies and politics, vaccine ingredients, federal health authorities, research and clinical trials, religion, vaccine safety, disease prevalence, school, and family. Similarly, Quintana et al. 20 analyzed Twitter data for the 75 days preceding and succeeding the declaration of the COVID-19 Pandemic, respectively, finding that vaccine-related discussions increased during the pandemic, that a small community of unorthodox users were ambivalent regarding vaccines, and that the moral and non-moral language used by a number of communities suggested a trust-first model of political engagement.
Overall, few existing studies on vaccine hesitancy and anti-vaxxers have explored the changes, from the pre-COVID-19 period to the post-COVID-19 period, in the respective popularities of associated topics, or differences in discourse for different vaccines. Bonnevie et al. 19 reported only increased vaccine opposition on Twitter and more tweets concerning certain topics associated with vaccine hesitancy. Other studies only identified topics associated with vaccine hesitancy in the COVID-19 era.9,12–14,20,21 No studies have investigated changes in anti-vaccine topics during the pandemic and compared such changes across different vaccines; only one study has compared the shift in vaccine-hesitancy topics before and after the COVID-19 outbreak. 20 The present study used manual content analysis to identify the main anti-vaccine topics. The current study extends the literature by analyzing COVID-19-related anti-vaccine discourse on social media over an extended period of time, and by comparing anti-vaccine topics and trends over time across different vaccines.
Methodology
The present study’s methodology comprised 3 activities: collecting relevant social-media data, identifying anti-vaccine-related topics using extant literature and topic-modeling, and analyzing the identified topics (Figure 1). Research methodology.
Data collection and preprocessing
Social media platforms, such as Twitter, have been widely used in health related crises 22 and research, 23 and considered to be the fastest and most convenient source of information24,25 to address these crises and help understand how the populations respond to them. 22
Twitter was selected as a data source because it is commonly used by anti-vaxxers. 4 To identify relevant anti-vaccine tweets, a search query (Appendix A) was developed by reviewing the relevant literature and identifying a list of search terms that reflect the negative sentiments on vaccines that are commonly presented by anti-vaxxers. Using Brandwatch, a social-media data collection and analytics tool, we collected tweets matching the search query that were posted between 1 January 2011, and 15 January 2021 (excluding retweets and tweets with URLs). Twitter content is available from January 2011 in Brandwatch.
Next, collected tweets were processed by removing stop words, user identifiers, and hashtags. The tweets were then represented using word-level n-grams; 26 for example, “autism,” “conspiracy theories,” and “vaccines contain mercury”.
Topic identification and validation
We identified relevant topics by screening literature concerning the anti-vaccine movement on social media, performing topic-modeling using the latent Dirichlet allocation (LDA) algorithm,27,28 visualizing and labeling the topics from LDA using PyLDAVis 29 and t-distributed stochastic neighbor embedding (t-SNE), 30 and combining the resultant topics into one list.
Topic models are statistical-based models for uncovering themes from a large unstructured collection of documents.27,31 A topic model can help automatically summarize textual data and simplify manual content analysis. We optimized the LDA model using the coherence score measure. 32
Latent Dirichlet allocation requires specifying the number of topics. According to the literature, the number of topics could be determined using a number of measures such as perplexity and coherence. 33 For LDA applications in which end-users will interact with the generated topics, coherence is considered the best measure 34 since it leads to better human interpretability of topics 33 compared to the perplexity method since it is not stable and the LDA results using perplexity measure could vary with seeds for the same dataset. 35
To label the topics, the LDA results were visualized using PyLDAVis and t-SNE. The labeling process was based on the 30 most relevant terms returned in the visualization and their estimated overall term frequency within each topic. To ensure the validity and consistency of the topic labels, two independent researchers labeled the topics. Inter-rater reliability (kappa statistic) 36 was evaluated to ensure that the researchers assigning topic labels would eventually obtain similar evaluations.
The final list of topics was generated by merging the list of topics from the literature review and the results of the topic-modeling. This merge involved comparing the listed topics and their meanings and synthesizing the topics into a final list of high-level topics. This process was conducted by one researcher and validated by another.
Topic analysis
The final list of high-level topics was used for analyzing the collected tweets; this was performed using the ReadMe algorithm. 37 The ReadMe algorithm is a supervised learning algorithm that requires sample tweets (training data) to be manually labeled into a list of predefined topics. ReadMe is an automated nonparametric content analysis method 37 that is widely used in social science applications where the interest is to determine the aggregate proportion of all documents that belong to predefined categories. 38 ReadMe estimates the “aggregated distribution of opinions” instead of focusing on individual classification of each single text. 37 The ReadMe deploys a “word-profile of each category” based on the training data set, then the text of the training data set is compared to these profiles, and then a fit estimation for each category is generated for test data. 38
The algorithm is practical for analysis aiming to show how tweets spread across different topics, and provides an unbiased text classification when compared to traditional supervised learning techniques. 37 We trained the ReadMe algorithm by manually labeling a sample set of tweets from each predefined topic, and then used the trained model to analyze the entire collection of tweets.
A sample of 110 tweets was used to assess the manual-labeling process and ensure the reliability and consistency of the manual training process for the ReadMe algorithm. Two researchers independently assigned labels to each tweet based on the obtained topics from the “topic identification and validation” step. The kappa statistic was again used as a measure of inter-rater reliability. 36
Based on the results from the ReadMe algorithm, we completed the following analyses: First, we analyzed the distribution of tweets over topics and time. Second, we analyzed the distribution of tweets across different topics by considering tweets before and after 1 February 2020 (February 2020 was chosen because the US Centers for Disease Control confirmed the first US COVID-19 case on 21 January 2020). 21 Third, we analyzed the distribution of tweets across different topics for seasonal influenza, MMR, and COVID-19 vaccines, respectively. The tweets concerning each vaccine were identified by filtering the data based on vaccine-name-related keywords. The keywords for seasonal influenza were (flu OR Influenza); those for measles, mumps, and rubella were (MMR OR MPR OR MMRV OR measle* OR mump* OR rubella); and those for COVID-19 were (coron* OR covid* OR “chinesevirus” OR “china virus” OR “wuhanvirus” OR “SARS-CoV-2”).
Results
The search query returned 272,780 tweets posted by 125,461 Twitter users.
Topic identification and validation
Anti-vaccine topics from the literature.
Latent Dirichlet allocation optimization yielded, based on the coherence score, optimal parameter values for 48 topics (Figure 2). Optimal number of topics based on coherence score.
The LDA model results were visualized using PyLDAVis and t-SNE (Figure 3) and analyzed by two independent researchers. Topic visualization and analysis through PyLDAVis using t-distributed stochastic neighbor embedding.
Anti-vaccine topics identified through topic-mining.
Combination of anti-vaccine topics identified through literature and topic-mining.
Topic analysis
When labeling the tweets for training the ReadMe algorithm, after several iterations and enhancements in the assigned labels, we achieved a kappa statistic of 0.80, representing substantial agreement among the raters.
36
The trends in the volume of tweets for each topic are shown in Figure 4. Overall, 35% of the total tweets referenced developmental disabilities, followed by “government and politics” (21%) and “conspiracy theory” (13%), respectively. The remaining topics represented less than 10% of the total tweets. “Developmental disabilities” have been consistently discussed over the years; however, from March 2020 there was a decrease in the number of such tweets and an increase in the number of tweets discussing “government and policies” and “conspiracy theories.” Appendix B shows a list of categories and example tweets classified by the ReadMe algorithm. Volume of tweets across topics between 1st January 2011, and 15th January 2021.
Figure 5 shows a comparison of the percentages of tweets for each topic before and after 1 February 2020. There were 196,200 anti-vaccine tweets before 1 February 2020, and 90,092 tweets afterwards. All but three topics (“government and politics,” “conspiracy theories,” and “developmental disabilities”) showed similar percentages of tweets for before and after 1st February 2020. After 1st February 2020, “government and politics” and “conspiracy theories” showed increased percentages, while “developmental disabilities” showed a decreased percentage. Appendix C shows the changes in the percentages of tweets concerning flu, MMR, and all vaccines, respectively, by topic over time. Most topics show, for all three vaccine types, a similar pattern; this indicates that the COVID-19 outbreak did not significantly impact overall trends in flu and MMR discourse. Percentages of tweets for each topic before/after 1st February 2020.
Filtering by vaccine, we identified 13,189, 7,225, and 4371 tweets challenging COVID-19, MMR, and flu vaccines, respectively. “Side-effects,” “pharma industry,” and “civil rights/freedom” showed similar percentages of tweets across the three vaccines (Figure 6). However, for “nature is better,” “government and politics,” “conspiracy theories,” and “alternative treatments” the highest percentages were for COVID-19, followed by flu and MMR, respectively. Additionally, for “effectiveness and efficiency” and “chemical/non-natural” the highest percentages were for flu, followed by MMR and COVID-19, respectively. Finally, for “developmental disabilities” the highest percentage was for MMR, followed by flu and COVID-19, respectively. Percentages of tweets regarding flu, MMR, and COVID-19 vaccines between 1st January 2011 and 15th January 2021, across different topics. MMR: Measles, mumps, and rubella.
Appendix D shows the distribution of tweets for each year across different topics. The distribution was similar across all vaccines except those for flu in 2013, at which time the percentages for flu-related tweets regarding “effectiveness and efficacy” increased significantly and decreased for “government and policies”. The percentages of tweets concerning “developmental disabilities” remained generally consistent from 2011 to 2019. From 2020 (marking the introduction of COVID-19), the percentage of tweets regarding “government and politics” and “conspiracy theories” increased.
Anti-vaccine tweets regarding COVID-19 and flu from Jan 2020 to Jan 2021.
Discussion
This study’s findings support the literature45–49 on vaccine hesitancy and the assertion that, despite advancements in vaccine development and the demonstrated efficacy and safety of vaccines, anti-vaccine sentiment remains an important issue. We identified several topics associated with vaccine hesitancy. These topics generally accord with those mentioned in prior research, such as safety/risk and politics,9,20,21 efficacy/effectiveness,20,21 governments/big companies,9,21 immune system, 21 vaccine ingredients and religion, 9 conspiracy theories, 12 side-effects,6,41 developmental disabilities,1,43 nature is better and alternative treatments,5,19 religion/ethics,6,9 and civil liberties/freedom. 50
During the COVID-19 Pandemic, of the anti-vaccine tweets concerning the topics “government and politics,” “conspiracy theories,” and “nature is better,” the highest percentages related to COVID-19 vaccines when compared to flu and MMR vaccines. This is not surprising. In the US, the debate on vaccinations has long been highly politicized. 51 In 2015, health-care specialists condemned republican candidates for advancing inaccurate perceptions of vaccines. 52 Furthermore, the present results support existing findings that attitudes toward vaccines are influenced by political beliefs 45 and public mistrust of governments’ pandemic responses.53,54
Tweets promoting conspiracy theories39,55 and misinformation 56 are considered a threat to public vaccine acceptance. 55 Many vaccine-related conspiracy theories have emerged during the COVID-19 Pandemic. For example, some believe that COVID-19 vaccines were developed to implant nano-chips in people’s bodies so that people can be controlled through 5G technology, 46 that vaccines are a tool by which governments can gain political control, 54 and that mRNA-based COVID-19 vaccines can permanently change human DNA. 47
Our results also revealed relatively high percentages of “alternative treatments” and “nature is better” tweets regarding COVID-19 vaccines when compared to flu and MMR vaccines. Several factors may explain this. For example, until December 2020 there were no approved COVID-19 vaccines or treatments, and people began considering natural and traditional medicines with known safety profiles. 48 Regarding the increase in “nature is better” tweets, some anti-vaxxers have disseminated a “natural immunity” theory 49 that suggests that the human body can treat itself without vaccines.
The percentage of tweets on developmental disabilities fell after February 2020. This is consistent with the analysis of the distribution of tweets across the different vaccines, which showed that the percentage of tweets on developmental disabilities was highest for MMR when compared to flu and COVID-19. The high level of discussion on developmental disabilities before February 2020 also aligns with the literature. Such popularity is mainly due to discussions on the side-effects of MMR vaccines and the risk of developmental disabilities such as autism1,49 and epilepsy. 1 A link between MMR and disabilities was considered the main social-media topic for anti-vaxxers prior to COVID-19. 42
The present results highlight the need for government and health-care agencies to increase transparency in policy development and decision-making before and after the introduction of COVID-19 vaccines. There is also a need to provide updated information to the public on how vaccines are developed and tested before they are administered to the general population. 57 According to Kennedy, 58 vaccine hesitancy and political populism are driven by similar dynamics, characterized by public mistrust of politicians and experts. Thus, it is necessary to build trust between these parties and anti-vaxxers and address the issues underlying vaccine hesitancy.
Limitations and future work
While this study emphasized changes in percentages of tweets on certain topics over time and across vaccines, there is a need for a separate analysis of the number of unique Twitter users post across different topics. Such analysis could clarify whether changes in volume are related to changes in the number of active users or in the numbers of tweets from users. Further, analyzing Twitter threads that comprise a main tweet and replies could clarify the structure of vaccine-related discourse on Twitter. This study examined tweets from US users; future studies could compare our findings with those for users of other nationalities. Further, research could investigate the impact the 2020 US Presidential Election and the associated increase in political discourse on Twitter had on COVID-19-vaccine skepticism, as well as the impact of other controversial issues. Such controversial issues must be identified and understanding of their impact on tweeting activity improved.
Conclusion
In this study, we sought to understand the drivers of hesitancy towards COVID-19 vaccines in the US. We compared tweet-based topics along two dimensions: before and after the onset of COVID-19, and across 3 vaccines (flu, MMR, and COVID-19). Using text analytics, we identified trending themes and topics of public concern regarding vaccine hesitancy in general and COVID-19 vaccines in particular.
Overall, the threat of the COVID-19 Pandemic did not cause the anti-vaccine discussion to shrink, but to shift, with the prevalence of “conspiracy theory,” “government and policies,” and “alternative treatments”/”nature is better” tweets increasing. Discussion of vaccines’ effect on developmental disabilities reduced after the outbreak. The variations between before and after the COVID-19 outbreak regarding the main reasons people oppose vaccines or become vaccine-hesitant show that vaccine hesitancy is context-, time-, place-, and vaccine-specific, 54 and that it rises in prevalence when a pandemic occurs. 59 Such variation could be attributed to pivotal events such as pandemic outbreaks, government responses, and related discourse on social-media platforms. Moreover, the present findings support the theory that vaccine hesitancy has various causes based on several factors. These factors can be grouped into environmental/external, agent/vaccine-specific, and host-specific. 57
The significance and implications of this research transcend the COVID-19 Pandemic by demonstrating the importance of social-media mining and its potential for supporting public-health-related policies and decisions. Government officials and decision-makers could tailor and fine-tune public awareness campaigns and prioritize policy interventions to increase vaccine acceptance.
Supplemental Material
Supplemental Material - A comparative analysis of anti-vax discourse on twitter before and after COVID-19 onset
Supplemental Material for A comparative analysis of anti-vax discourse on twitter before and after COVID-19 onset by Tareq Nasralah, Ahmed Elnoshokaty, Omar El-Gayar, Mohammad Al-Ramahi and Abdullah Wahbeh in Concurrent Engineering.
Footnotes
Acknowledgements
The authors would like to thank Dakota State University for generously supporting this research by providing access to the BrandWatch platform.
Author Contributions
All authors have contributed to the writing of this research paper.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Data Availability Statement
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
