Abstract
Objective
The present study aims to examine the threshold of coronavirus disease 2019 (COVID-19) vaccine hesitancy over time and public discourse around COVID-19 vaccination hesitancy.
Methods
We collected 3,952 questions and 66,820 answers regarding COVID-19 vaccination posted on the social question-and-answer website Quora between June 2020 and June 2021 and employed Word2Vec and Sentiment Analysis to analyze the data. To examine changes in the perceptions and hesitancy about the COVID-19 vaccine, we segmented the data into 25 bi-weekly sections.
Results
As positive sentiment about vaccination increased, the number of new vaccinations in the United States also increased until it reached a ceiling point. The vaccine hesitancy phase was identified by the decrease in positive sentiment from its highest peak. Words that occurred only when the positive answer rate peaked (e.g., safe, plan, best, able, help) helped explain factors associated with positive perceptions toward vaccines, and the words that occurred only when the negative answer rate peaked (e.g., early, variant, scientists, mutations, effectiveness) suggested factors associated with vaccine hesitancy. We also identified a period of vaccine resistance, where people who decided not to be vaccinated were unlikely to be vaccinated without further enforcement or incentive.
Conclusions
Findings suggest that vaccine hesitancy occurred because concerns about vaccine safety were high due to a perceived lack of scientific evidence and public trust in healthcare authorities has been seriously undermined. Considering that vaccine-related conspiracy theories and fake news prevailed in the absence of reliable information sources, restoring public trust in healthcare leaders will be critical for future vaccination efforts.
Keywords
Introduction
Since its identification in December 2019, coronavirus disease-19 (COVID-19) has had a detrimental effect on various aspects of human lives all around the world. More than 6 million people have died of the disease, and the number of confirmed cases is more than 540 million as of June 2022. 1 The healthcare system has been overloaded,2,3 and many people have experienced financial hardship4,5 and depression. 6
Thirteen COVID-19 vaccines have been developed, 7 and some (i.e., those manufactured by Pfizer-BioNTech, Moderna, and Janssen) have been approved by the U.S. Food and Drug Administration (FDA) as safe and highly effective in preventing serious illness and death from severe acute respiratory syndrome coronavirus 2. 8 Enough COVID-19 vaccine has been produced to provide adequate distribution in most developed countries. Although the COVID-19 vaccination rate has been gradually increasing globally, 9 more than one-quarter of American adults (26.3%), 10 and about one-fifth of people in France and Germany 11 report COVID-19 vaccine hesitancy. A cross-country comparison study revealed that slightly more than one-third of people in India, Bangladesh, Nigeria, and Sudan reported COVID-19 vaccine hesitancy. 12 Women,11,12 people with lower incomes and less education11–13 and individuals who identify as racial/ethnic minorities10,14,15 report higher COVID-19 vaccine hesitancy than their counterparts who are male, have higher incomes and education, and do not identify as racial/ethnic minorities. Additionally, those who mistrust the medical system and government, and have insufficient information are more likely to be hesitant about receiving the COVID-19 vaccine.16–19
As such, most previous studies on COVID-19 vaccination have focused on explaining differences in vaccine hesitancy based on individual characteristics, including socioeconomic status, race/ethnicity, and perception of the medical system. However, it could also be helpful to compare public sentiment toward COVID-19 vaccination with the actual vaccination rate to better understand uptake of the COVID-19 vaccine. This knowledge could be used to help guide attempts to increase COVID-19 vaccination rates.
Social media can be an appropriate source for identifying the public's perceptions and attitudes about COVID-19 vaccines.20,21 Therefore, by analyzing big data from Quora, one of the most frequently used global social media platforms, 22 we aimed to (a) discern the evolution of COVID-19 vaccine hesitancy over time and (b) examine discourse around COVID-19 vaccination hesitancy in terms of public attention.
Methods
The present study attempted to identify points in time at which public sentiment toward COVID-19 vaccine switched from more negative to more positive (or vice versa) and what kind of issues influenced those changes of public sentiment. We employed Word2Vec, a machine learning technique, and sentiment analysis (SA), an artificial intelligence technique for natural language processing (NLP), to analyze big data generated by online discourse among the public.
Data
We collected data from Quora, a community-driven question-and-answer (Q&A) website launched in 2010, which provides an excellent outlet for self-expression and sharing “the wisdom of crowds”. 23 To compare user sentiments 6 months before and after the first COVID-19 vaccine became available on 11 December 2020, we collected all data regarding the COVID-19 vaccine posted between June 2020 and June 2021. We used the automated testing framework in Selenium WebDriver to crawl desired data 24 by using the query term “vaccine”. For the data crawled, we conducted pre-processing, where the data are split (referred to as tokenization) and organized into manageable units through lemmatization and lower-casing. Questions and answers that did not contain “COVID” or “COVID-19” were removed. After pre-processing, we collected a total of 3952 questions and 66,820 answers. To examine changes in Quora user perceptions and hesitancy about the COVID-19 vaccine, we segmented the data into 25 bi-weekly sections.
To compare trends in vaccine sentiment with COVID-19 vaccination uptake, we retrieved COVID-19 vaccination data from Our World in Data (https://ourworldindata.org/). Rates of COVID-19 vaccination, defined as people who received at least one COVID-19 vaccine dose in the United States and worldwide, were obtained for the dates of 11 December 2020, to 27 June 2021.
Analysis
The analysis involved two major approaches. First, we used the Word2Vec model to find words semantically close to the keyword “vaccine” in the collected data using Python version 3.8.5. This model provides an efficient method for analyzing word associations and semantic meanings from a large amount of unstructured text. 25 To identify temporal changes in the patterns of words semantically similar to the keyword (i.e., vaccine), we adopted the skip-gram model for Word2Vec because it is useful to predict words that surround a keyword in diverse contexts. 25 Additionally, we employed negative sampling. Negative sampling has the advantage of understanding words in context, which creates annotations. Words that share a lot of contexts have similar annotations. That is, using negative sampling is effective in accurately representing the similarity between words. 26
Second, we used SA, which allows big data analysis of people's attitudes, opinions, and emotions toward events, individuals, or topics on social media. 27 Through SA, we measured people's perceptions and emotions regarding COVID-19 vaccine and vaccination from the big data provided by Quora. The transformer-based model is employed for SA as a state-of-the-art NLP methodology for classification and generation tasks. 28 The present study used BERT (Pre-training of Deep Bidirectional Transformers for Language Understanding). The model's performance can vary depending on the pre-trained data because embedding in vector spaces is sensitive to the trained data. Therefore, the pre-trained data should change along with the purpose and area of research. Because our data are from Quora, a social Q&A platform, we needed to pre-train a model using social media-related data. Since “Bert-tweet” used Twitter data and outperformed several previous models, 29 we pre-trained our model with “Bert-tweet.”
Results
This section reports our findings as follows: (a) proportion of positive, negative, and neutral answers regarding COVID-19 vaccine questions posted on Quora before and after vaccine availability, (b) vaccine hesitancy and acceptance phases (i.e., peaks and troughs in positive sentiment toward COVID-19 vaccination) during the 1-year observation period and how these phases correspond with vaccine uptake, and (c) results from the sentiment analysis that identified words associated with the two phases of vaccine hesitancy and acceptance.
Figure 1 presents the positivity, negativity, and neutrality of the 66,820 answers to questions about the COVID-19 vaccine posted on Quora from June 2020 to June 2021. Before the availability of a COVID-19 vaccine (indicated by the vertical yellow line), the proportions of positive and negative answers fluctuated. However, after the vaccine release, the overall rate of negative answers decreased, while the rate of positive answers increased. Rates of neutral answers remained stable throughout the period. Note that positive sentiment outweighed negative sentiment after 23 April 2021.

Prevalence of positive, negative, and neutral answers about coronavirus disease 2019 (COVID-19) vaccine (June 2020 to June 2021).
Figure 2 depicts trends in positive sentiment toward COVID-19 vaccination over the same 1-year period and how such trends related to COVID-19 vaccination rates both in the United States and globally. Positive percent represents the rate of positive sentiment expressed in Quora answers as a percentage. The US new vaccination rate indicates the number of new COVID-19 vaccinations administered every two weeks (per million Americans). The US vaccination rate depicts the cumulative total US vaccination rate as a percentage. The world new vaccination rate indicates the number of new COVID-19 vaccinations administered every 2 weeks (per million people worldwide). Plotting these rates overtime highlights several peaks and troughs in vaccine sentiment that directly correspond with the rate of new COVID-19 vaccination in the United States. As seen in Figure 2, there were four positive peak points, which indicate an increase in the public's positive perception of the vaccine, and four troughs, which indicate a decrease in the public's positive perception of the vaccine, before the new US COVID-19 vaccination rate reached its initial ceiling point.

Trends of positive sentiment and coronavirus disease 2019 (COVID-19) vaccination rates (June 2020 to June 2021).
Additionally, we were able to demarcate a vaccine hesitancy phase and a vaccine resistance phase in the United States. The vaccine hesitancy phase ran from 11 December 2020, to 23 April 2021, and can be identified by the decrease in positive sentiment from its highest peak. Once reaching the peak in new vaccinations, this trend is referred to as vaccine resistance rather than vaccine hesitancy, because those who have not been vaccinated by the ceiling point likely reflect those who have decided not to be vaccinated and are unlikely to be vaccinated without further enforcement or incentive. The vaccine resistance phase began once reaching the ceiling point of new US vaccinations on April 24, 2021 (i.e., the point at which the number of new vaccinations per million stopped increasing). Note that the ceiling point of new COVID-19 vaccinations occurred when 27.46% of the US population had been vaccinated. Interestingly, the hesitancy threshold, the point at which positive sentiment outweighed negative sentiment (23 April 2021), preceded the ceiling point of new vaccination (9397.9 per million). In other words, as positive sentiment about vaccination increased, the number of new vaccinations also increased until it reached the ceiling point.
Table 1 presents the results of the Word2Vec analysis. These are the top 15 semantically close words with “vaccine” only at the peaks or troughs of positive answers. In other words, the present study has excluded words that appeared in both peaks and troughs. The rank of words is determined by cosine similarity, which represents semantic closeness to the word “vaccine”.
Top 15 semantically close words with “vaccine” in peaks and troughs of positive answers.
The top 15 most semantically close words with “vaccine” (e.g., safe, plan, able, choice, death, new) that occurred only when the positive answer rate peaked are presented in column one of Table 1. Representative “quotes” (that have been paraphrased due to copyright protection) that clarify how these 15 individual words were used in context are provided in Table 2.
Example “quotes” containing words observed at peaks of positive answers.
COVID-19: coronavirus disease 2019; FDA: Food and Drug Administration.
These comments expressing positive sentiment dispel distrust of the vaccines and argue that vaccination is safe, without serious side effects, helpful, and the best choice to defeat COVID-19. They emphasize that, if the vaccination process is successful and herd immunity is achieved, COVID-related deaths can be greatly reduced, and people can return to normal daily life. In contrast, the top 15 most semantically close words with “vaccine” (e.g., early, variant, scientists, booster, cost, information, evidence) that occurred only during the troughs are presented in column two of Table 1. Representative “quotes” (paraphrased due to copyright protection) that clarify how these 15 individual words were used in the context provided in Table 3.
Example “quotes” containing words observed at troughs of positive answers.
COVID-19: coronavirus disease 2019; FDA: Food and Drug Administration.
These comments expressing negative sentiment highlight the confusion and concerns caused by less reliable and early data on COVID-19 vaccination. Some comments revealed their information needs, acknowledgment of fake news, and recognition of the need to stop the spread of fake news. They also shed light on doubts about whether vaccination can defeat the variants, distrust of vaccine experts, concerns about the side effects of vaccination, and the negative consequences of former President Trump's COVID-19 vaccine policy.
Discussion
The present study attempts to identify what influenced COVID-19 vaccine hesitancy and acceptance by analyzing public sentiment toward COVID-19 vaccination on social media.
By analyzing 66,820 answers to questions about the COVID-19 vaccine asked between June 2020 and June 2021 on Quora, we have found that positive sentiments increased, and negative sentiments decreased after the availability of COVID-19 vaccination. As public opinion toward vaccination grew more positive, the number of people (per million) who were newly vaccinated increased as well until the ceiling was reached. The fall in positive sentiment from its high point marks the start of the vaccination hesitancy phase. Continued negative sentiments found in discourse about the COVID-19 vaccine even after the vaccine was widely available suggest that additional support or policy by governments is needed to promote continued uptake of COVID-19 vaccination.
Compared with previous studies of social media data that dealt with short texts in microblogs such as Twitter, we were able to observe more specific and detailed public perceptions regarding vaccination by analyzing long arguments and discourse data on Quora. Positive sentiment adjectives such as safe, best, able, plan, and clear indicate that people who accepted the vaccines’ safety desired to improve their daily lives through immunization. However, the words associated with negative sentiments (e.g., variant, mutations) imply that one of the main concerns contributing to vaccine hesitancy is the lack of efficacy against variants. Further, the ceiling point of new COVID-19 vaccinations, which was 27.46% of the US population, offers practical insights for increasing the vaccination rate. The ceiling point indicates that the number of people willing to be vaccinated voluntarily has been reached and suggests that governments should use new strategies, such as that financial reward programs, 30 conditional cash transfer 31 or the conditional cash lottery program,32,33 to increase the vaccination rate, rather than just recommending that people get vaccinated as previous studies. Prior research has demonstrated that the COVID-19 vaccination incentive programs are especially beneficial for racial/ethnic minorities or those with lower levels of education, who are more likely to exhibit vaccine hesitancy.10,14,15,30,31
We observed the public's perceptions and attitudes about the COVID-19 vaccine and vaccination as they were shared in a social Q&A community and documented changes in those perceptions and attitudes. Analyzing the discourse around vaccine hesitancy could help to identify the public's major concerns about vaccination. The division between positive and negative sentiments is largely based on vaccine safety and effectiveness. Positive sentiment increased when people were optimistic about the safety of the vaccines, COVID-related mortality was lowered, other nations approved the vaccines, and no apparent causal relationship could be drawn between vaccination and serious health problems. Conversely, negative sentiment reflected a belief that reliable data on the safety of vaccines were lacking, doubts about the effectiveness of vaccines, particularly with respect to new variants, an unwillingness to trust healthcare authorities or experts to provide credible guidelines for vaccines, and skepticism due to negative results (e.g., increased side effects during the second vaccination, vaccination rate that did not reach herd immunity).
Unlike previous studies, 34 our analysis excluded words that appeared in both peaks and troughs. We extracted semantically close words to the keyword, COVID-19 vaccine or vaccination that appeared only in peaks and those that occurred only in troughs. The words only in peaks helped explain the factors that are associated with positive perceptions and attitudes toward vaccines, and the words that appeared only in troughs suggested the factors are associated with vaccine hesitancy. The factors that influence positive perceptions or attitudes have practical implications for vaccination promotion. Likewise, addressing the specific concerns driving negative perceptions could be an effective way to deal with vaccine hesitancy. For example, the use of words such as “variant” and “mutation” implies that the public is skeptical about booster shots not only because people fear variants, but also because they doubt the efficacy of the vaccines against variants. Furthermore, vaccine hesitancy occurred because uncertainty about vaccine safety was high due to a perceived lack of scientific evidence or data. Above all, as indicated in previous studies35,36 and this study, public trust in healthcare authorities and experts, such as the World Health Organization (WHO) and scientists, has been seriously undermined by the COVID-19 pandemic. The WHO's erroneous instructions on wearing masks and issues over its political neutrality during the fight over the epicenter of the pandemic have brought about international disappointment and damaged its authority, 37 which left people in a state of panic about who to trust. Considering that vaccine-related conspiracy theories and fake news prevailed in the absence of reliable information sources, future policy efforts should be aimed at reducing the spread of misinformation about COVID-19 vaccination and limiting anti-vaccine public advertising. 38 It will also be important to restore public trust in authorities, health care professionals, and the healthcare system.39–42 To further build and maintain positive public perception of COVID-19 vaccines, it is important for governments and healthcare authorities to provide continuously updated, evidence-based information through popular social and traditional media sources. Such an approach should also help reduce people's reliance on unverified information, which can cause confusion and anxiety about vaccines and vaccination. Given that vaccine safety is of utmost concern to the public, providing up-to-date safety information will facilitate effective health communication. Additionally, rather than reporting the daily number of confirmed cases and deaths caused by the coronavirus, the media should focus on timely updates on the number of confirmed cases, severe cases, and deaths according to vaccination status. It may also be useful to provide information about how to alleviate the side effects of the COVID-19 vaccinations. That type of communication could help the public make better decisions about vaccination.
The current study has several limitations that provide useful opportunities for future research. First, despite the fact that slightly more than one-third of Americans use Quora, 43 we were unable to determine whether the Q&A included in this study came from users in the United States. Second, we were unable to identify Quora user characteristics such as age or chronic illness status. Research has shown that people with chronic ailments are more vulnerable to COVID-19 than others, which could affect their receptivity to vaccination. 44 Also, younger people tend to have higher COVID-19 vaccine hesitancy than older people,45,46 thus future research should investigate whether perceptions of the vaccines differ by age. Third, as previously mentioned, Quora is one of the most frequently used social Q&A websites, but such websites generally have a young user base. 47 The perceptions of Quora users might, thus, overrepresent the attitudes of younger people toward COVID-19 vaccines. Finally, we used “vaccine” as a keyword to retrieve the relevant questions. Although we attempted to collect as many questions about the COVID-19 vaccine as possible, our sample is not necessarily complete.
Conclusion
The current findings have important implications for future communication interventions aimed at promoting COVID-19 vaccine uptake. Before the ceiling point, the new vaccination rate per million continued to increase despite ups and downs in positive sentiments; after the ceiling point, new vaccination decreased drastically despite the increase in positive sentiment, indicating the presence of a vaccine-resistant population. Vaccine resistance must be mitigated to increase the COVID-19 vaccination rate and achieve herd immunity. Thus, one direction for future study is to identify the characteristics and major concerns of those who have not been vaccinated and provide tailored information to counter vaccine hesitancy and resistance.
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
Ethical approval was not required for this study since we used a publicly available dataset.
Contributorship
SJ and YY researched literature and conceived the study. SP was involved in data analysis. SJ, SP, and YY wrote the first draft of the manuscript and MG further edited the manuscript. All authors reviewed and edited the manuscript and approved the final version of the manuscript.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
