Abstract
The current study aimed to explore the public understanding of COVID-19 vaccines and the social representations emerging from a corpus of user-generated comments on YouTube videos posted during the year following the World Health Organization's declaration of the novel coronavirus as pandemic. We used Structural Topic Modelling to process the text and identified a 10-topic solution as the best to represent the corpus of text data. The exploration of the topics showed a complex landscape of social representations underlying a plurality of perspectives, which we interpreted as reflecting different users’ needs to make sense of the unprecedented events. Implications for theory, future research, and intervention for health psychology and policy are discussed.
Introduction
The COVID-19 pandemic has marked a significant rise in medical disinformation, globally. A Pew Research Center survey of 9,654 US adults conducted in June 2020 (Mitchell et al., 2020) reported that around one-third of Americans who had heard about it tended to see some veridicity in the conspiracy theory according to which the pandemic had been planned by “people in power” (para. 4). Romer and Jameison (2020) found 14.8% of 840 US adults believing in conspiracy beliefs who endorsed the view that the virus had been created and introduced by the pharmaceutical industry, whereas 28.3% believed that the “coronavirus was created by the Chinese government as a biological weapon” (p. 4). Similarly, around 24% of respondents to a survey of UK residents (N = 949) reported a widespread belief that the virus had been created in a laboratory (Allington et al., 2021a). In the same vein, Romer and Jameison (2020) had found in the previously mentioned sample of US adults a common belief that pharmaceutical companies were pushing worldwide vaccination to increase their sales, whereas another UK-based study found a positive relationship between believing conspiracy beliefs and using social media over traditional media as a source of information (Allington et al., 2021a).
The dissemination of fake news and so-called conspiracy theories on the novel coronavirus (SARS-CoV-2) and its associated disease (COVID-19) has determined a trend defined as infodemic (Lovari, 2020), indicating the pervasive, transmissible, and harmful nature of scientific and medical disinformation, reportedly through social media user-generated content (Allington et al., 2021a, 2021b; Romer and Jameison, 2020; Wilson and Wiysonge, 2020). From the perspective of social sciences, social media represent a fertile ground for research on the public's co-construction and negotiation of a range of views and understanding of the social world (Moscovici, 1988), an elective online arena to engage in everyday conversations and mutual support (de Rosa et al., 2013). In fact, literature reported online science-related fora and social media pages as being followed on a daily basis by millions of users, worldwide (e.g., Hitlin & Olmestead, 2018). Moreover, user-generated content through online conversations within such interactive virtual arenas seems to play a role in supporting the need for individuals and groups within contemporary societies to make sense of the world they live in, providing the ideal context for social interaction, co-construction, and dissemination of meaning (de Rosa, 2013; de Rosa et al., 2016).
Marchal and Au (2020) have recently commented on social media discourse, highlighting a “tension between the need for authoritative medical information and the socio-technical mediation that enables multiple, competing voices to lay claim to such authority …; a tension exacerbated by the current pandemic” (p. 1). In this respect, Omoregie (2021) argued that online information might fundamentally be driven by a logic of quantity rather than quality, and other authors hypothesized that it might even contribute to diminishing the persuasive effect of scientific authority on public health understanding and behavior (Marchal & Au, 2020).
Prior research highlighted the association between conspiracy beliefs and anti-vaccination attitudes (Goertzel, 2010), while studies focusing on COVID-19 vaccines have found that conspiracy theories tend to have a negative effect on individuals’ intentions to uptake the available vaccines (Romer & Jameison, 2020). Moreover, social media use has been found to be associated with COVID-19 vaccine hesitancy, conspiracy beliefs, and low trust in scientists and health professionals, especially when compared to print and broadcast media (Allington et al., 2021a). These results are consistent with other findings indicating the detrimental social and public health effects of conspiracy theories during the pandemic (e.g., Bruder & Kunert, 2021; Jovančević & Milićević, 2020; Maftei & Holman, 2021).
Social media have presented an ideal environment for the dissemination of medical information and guidance since the early onset of the COVID-19 pandemic. In this regard, some major social media companies had already sustained considerable efforts to develop and implement strategies to target and battle medical disinformation on vaccines, even before the start of the novel coronavirus pandemic (Kim et al., 2020; Pham et al. 2018). For example, Kim et al. (2020) reported that in 2019 YouTube added a link to the Wikipedia “Vaccine Hesitancy” (2019) voice to videos dealing with the topic of vaccines, whereas some anti-vaccine channels were demonetized, and a study by Pew Research Center (2021) indicated that YouTube has seen a statistically significant growth among American users during since 2019, from around 73% of US adults in 2019 to 81% in 2021.
Commenting on videos posted on online social media and video-sharing services has been considered an essential form of participation and engagement in contemporary public discourse (Dubovi & Tabak, 2021), functional to the co-construction of everyday common-sense knowledge (see de Rosa et al., 2016). Chau (2010) considered social media like YouTube, which combine multimedia production and social networking, as fostering a participatory culture (Burgess & Green, 2018; Chau, 2010), which other authors have associated with the co-construction, dissemination, and evolution of social representations (de Rosa, 2013; de Rosa et al., 2022).
Research in social representations highlighted the psychosocial dimensions and communicative processes that contribute shaping the relations between expert knowledge, everyday common sense, media, and society (de Rosa, 2013)—in other words, the dynamic interplay between the “reified” and the “consensual” nature of scientific information and public understanding of science, respectively (Moscovici, 1961/1976, 2000). Social representations can be defined as common-sense theories that are co-constructed through social interaction and everyday practices, providing individuals and groups with a shared and flexible socio-cognitive infrastructure for meaning and purpose, ultimately enabling them to interpret objects and phenomena in the social world they inhabit, and to communicate around them (Moscovici, 1988).
Social Representations Theory (Moscovici, 1961/1976, 2000) has long investigated the dichotomy between a consensual domain, characterized by representations that are spontaneous, creative, and unpredictable (Purkhardt, 1993), and the reified domain, characterized by structure, rigor, and control, instead. The consensual is considered as dominating thinking within people's everyday experience, whereas the reified as a general framework for science. However, the two are intended as intertwined. According to Moscovici (1961/1976, 2000), scientific principles and concepts become progressively more accessible to the wider public through a process of transformation into images, metaphors, and everyday practices shared within and between groups (Bauer & Gaskell, 1999), turning the unknown and unfamiliar into known and familiar objects for individuals and groups (Moscovici, 1984a). This transformative process ultimately allows individuals and groups to “orientate themselves and master the material and social world they live in” and to “enable communication” among them by means of a common jargon (Moscovici, 1972, p. xiii).
Such a theoretical approach has underpinned recent investigations on the way both the scientific information on the spread and characteristics of the novel coronavirus and the process of vaccine development have been globally understood, interpreted, and reacted to. A study based on media communication in the COVID-19 pandemic across five geo-cultural areas/continents has shown the co-existence of diverse representations of the virus (de Rosa et al., 2022). The authors defined the co-existence of such diverse and antagonistic views in terms of a “multi-vocality” of polemic social representations (de Rosa & Mannarini, 2021; de Rosa et al., 2022), i.e.,, collective representations originated from social tension and controversy. Moscovici (1987/2020) considered conspiracy theories as polemic social representations, collective “mentalities” underlying a social and ideological nature and playing a key role in minority influence (p. 168). Polemic representations, in turn, are characterized by a set of specific themata, consisting of collections of primary and foundational notions, images, and principles organized in the form of oppositional concepts (Marková, 2000). In this regard, Moscovici (1987/2020) identified four main themata at the basis of conspiracy theories: (i) a core of knowledge and beliefs that is proposed as simultaneously mysterious and ubiquitous, rather than rational and specific; (ii) a false dichotomization of the world into a majority of passive masses and a minority of individuals and groups supposedly called to reveal an alternative reality to those masses; (iii) the search for the foundational or primal circumstances that marked the origins of a problem rather than a reliable reconstruction of facts, providing a sense of continuity to the leading narrative; (iv) the superiority of tradition vs. modernity in the interpretation of reality.
For example, the widespread and unsubstantiated narratives on microchips for mass surveillance throughout the recent pandemic, as well as the DNA-altering properties associated with the novel mRNA technology, can be interpreted through the lenses of Social Representations Theory as conspiracy mentalities underlying increasing concerns among the public over the fast-tracked development of COVID-19 vaccines between 2020 and 2021, with some authors arguing that they might have even exacerbated vaccine hesitancy (de Rosa et al., 2022). In other words, these representations can be understood as polarized expressions of disagreement and contention, derived from conflicting relationships and competing worldviews expressed by different groups within society. They refer to objects of common knowledge that assume a key role for various groups at a certain moment of time, antagonistic to homogenous and hegemonic views (Moscovici, 1988).
Harrison and Wu (2020) argued that dissent might originate within communities that feel under- or misrepresented by authorities, underlying a disrupted public's sense of belonging, as well as a demand for greater participation in policy- and decision-making. Similarly, Metcalfe et al. (2020) analyzed the public debate around COVID-19 across several countries and concluded that “media reporting often highlighted disagreement among experts and along with various conspiracy theories, may have served further to reduce trust in science” (p. 17). Research has also shown that trust in scientific research at the individual's level is associated to greater vaccination adherence. In addition, Palamenghi et al. (2020) suggested that “a climate of respectful mutual trust between science and society, where scientific knowledge is not only preached but also cultivated and sustained” (p. 785), being required to reduce hesitancy and foster adhesion to vaccination campaigns, with potentially dramatic consequences for public health systems.
For all these reasons, exploring the public discourse around vaccines against COVID-19 in online user-generated content is key to improve our current understanding of the public's social representations and beliefs associated with vaccine development, considering social media as an elective virtual outlet for everyday interactions and the coproduction, negotiation, and dissemination of social representations. In this regard, Social Representation Theory might serve as an ideal theoretical framework and conceptual map for research and intervention in the community. In fact, this theory provides a unique insight into the social and communicative processes that shape the public's understanding of health, specifically, in this case, the COVID-19 pandemic and the relevant vaccines. By using this theoretical framework, we aimed to shed a light on the conspiracy mentalities and polemic representations that might act as possible psychosocial mechanisms explaining variations in the public's trust and engagement in health measures, globally. Consistently, the aim of the present study was to explore the social representations underlying the public's understanding of COVID-19 vaccines in a corpus of YouTube user-generated comments on videos posted in the year following the World Health Organization's (WHO) declaration of COVID-19 as a pandemic (2020).
Materials and methods
We used the YouTube Data Application Programming Interface (API) to retrieve information on a set of videos posted from March 11, 2020 (WHO, 2020; declaration of pandemic) to March 10, 2021 via R and the tuber package (Sood, 2020). Although this timeline had a relatively limited scope, it was selected to monitor trends in the development and dissemination of social representations, in parallel and intertwiningly with trends in the development and dissemination of social representations of the novel coronavirus, up until the early days following the first deployment of the approved vaccines (UK Government, 2021). Clearly, we cannot consider this timeline exhaustive, and future research might benefit from following up with further investigation, accounting for wider time periods. The following keywords were used in the search: “covid|coronavirus|covid-19 + vaccine|vaccines|vaccination|vaccinations|vaccinate|vaccinated”. Keyword selection was informed by looking at the top-10 searched terms globally, according to Google Trends (2020, 2021), specifically relevant to the novel coronavirus (i.e., ‘coronavirus’, ‘covid’, and ‘covid vaccine’), within the News category. In addition, we included all the English terms based on the stem ‘vaccin-’ and listed in the Cambridge English dictionary (Cambridge University Press, 2021), aiming to retrieve a broad range of videos inherent to the novel coronavirus and semantically associated with the theme of vaccines. Nottingham Trent University's Schools of Business, Law and Social Sciences Research Ethics Committee reviewed the study procedure and expressed favourable opinion (2021/100).
The text corpus underwent a series of pre-processing steps consistent with recommendations from recent literature (see Bickel, 2019). We used Structural Topic Modelling (STM) to analyze the data (Roberts et al., 2014). STM is an unsupervised, natural-language classification and data reduction method that allows a researcher to explore the semantic field underlying the text in the form of a small set of interpretable topics. We used STM because this method provides a greater degree of flexibility and accuracy in the analyses if compared to other clustering algorithms. Specifically, a topic model is based on word probability distributions over topics, allowing for documents and words to be shared across topics, rather than be uniquely clustered into specific categories (Roberts et al., 2014). In fact, STMs are mixture models, and documents are assigned to hypothesized latent topics probabilistically, rather than on the basis of word similarities. Such flexibility, combined with the possibility to include document-level metadata, particularly time past the WHO's (2020) declaration of the pandemic, makes STM a suitable methodology for research in social representations, beyond the traditional clustering approach proposed by Reinert (1990) and widespread in research in social representations. The STM algorithm relies on topic prevalence (i.e., the frequency of a topic within the text corpus), content (i.e., the most frequent words used to discuss a topic), and metadata (i.e., covariates (Roberts et al., 2014). In particular, in the present study we used the number of days past the WHO's declaration of the pandemic as a covariate.
Text data reduction methods are extensively utilized in social representations research. According to Reinert (1990), similarities and differences in subjective and collective perceptions of the world can be identified and summarized through the analysis of word co-occurrences within a text, aiming to reduce the corpus to a smaller number of latent themes or topics that underlie repertoires of socially constructed and symbolic representations (Silva Souza, 2020), whereas the latter are considered as “backgrounds of perceptions, cognition, emotions, and experiences from which objects and their meanings emerge, as well as the activity of enunciation with a corresponding specific vocabulary” (p. 11.14). However, we implemented a significant conceptual and methodological adjustment to such approach, consisting in the use of STM. In fact, although both methods aim to reduce the documents into a smaller set of classes, topic modelling does so by estimating conditional probabilities of words under a given topic. In summary, this translated into words being shared across the topics, thus favoring a less constrained, more nuanced, and realistic interpretation of the corpus.
To establish the optimal number of topics and select the best model to retain and interpret, we ran 20 models, accounting for 5 to 100 topics, respectively, in multiples of 5. We used semantic coherence and topical exclusivity (Roberts et al., 2014) to inform the choice of the model to be retained, although as per recommendations from the literature, a broader manual inspection and interpretation of the alternative solutions was also used, considering key aspects such as the parsimony and interpretability of the solution in final decision-making (Bickel, 2019). We then interpreted the final model based on the relevant document–topic probability distribution (q) and the word–topic probability distribution (b), respectively representing “the proportion of words in a given document about each topic” and “the probability of observing each word in the vocabulary under a given topic” (Roberts et al., 2014, p. 6). Additionally, we examined the proportion of videos associated with the top 35 comments that were produced after November 18, 2020, that is, the date when Pfizer (2020) announced the results from the Phase III BNT162b2 mRNA vaccine clinical trial data. We selected this specific date and event because this was the first vaccine that received emergency validation from the WHO (2021), and that gave us the opportunity to investigate whether the emerging social representations would differ before and after the official release of vaccines.
Data-sharing statement
The retained STM, including the specific dictionary and relevant information, is provided and available at https://figshare.com/s/9b66277fed2b1af90cd5, along with all the R syntax files used to analyze the data. To protect the users’ full anonymity, in line with ethical recommendations for the study of online comments (Reilly, 2014), the original data set and user-generated comments/quotes are not shared.
Results
Descriptive statistics
We retrieved 1,472,174 comments. The initial screening of the data showed that these comments had been authored by 730,748 unique users, associated to 2,869 videos published between March 11, 2020 and March 10, 2021, and including at least one of the study's search keywords. We detected 999,737 (67.91%) English comments from the original data set by means of automated language identification, posted by 485,901 (66.49%) unique users and relevant to 2,700 (94.11%) videos. Subsequently, we filtered the comments and discarded all those posted later than March 10, 2021 and erroneously retrieved, eventually gathering a data set of 841,257 (57.14%) comments, posted by 425,790 (58.27%) unique users, relevant to 2,667 (92.96%) videos, and including a dictionary of 512,174 words, which were used in the analyses. Figure 1 presents the timeline of videos and comments.

Videos (1a) and English comments (1b) by date (locally weighted smoothing).
As shown in Figure 1a, the number of videos scored highest between approximately the end of November and early December 2020. Speculatively, the announcement of vaccine candidates meeting primary efficacy endpoints in mid-November 2020 might have triggered such trend, although we did not explore the content of videos as this was not an objective of the current study, and for this reason, caution is required in the interpretation of such information.
Structural Topic Modelling
We used four established stopword lists from the stopwords package (Benoit et al., 2021) to filter out from the corpus any terms considered as unnecessary or noise. Then, we examined the number of retained words, aiming to identify an optimal threshold for the minimum number of documents in which a word needed to occur to be considered as informative and meaningful for STM, which was established to be equal to 150 (see Roberts et al., 2014). Consistently, we removed all the words not appearing in at least 150 documents. The procedure led us to ultimately retain a corpus of 4,915 words from 822,374 documents, which were used in STM. The results from the analysis are reported in the form of graphs, in Figure 2.

Number of words (a), documents (b), and tokens (c) that would be removed as a function of a range of threshold values.
Subsequently, we ran 20 STMs, from 5 to 100 topics in multiples of 5, each including time (i.e., number of days) from the WHO's (2020) declaration of pandemic as a topical prevalence covariate, and for each model we estimated, plotted, and comparatively examined the average semantic coherence and exclusivity, alongside interpreting the alternative solutions via manual inspection. We found that the models of between 10 and 25 topics were the most informative, and the 25-topic model demonstrated the best metrics (standardized mean coherence = .852; standardized mean exclusivity = .741). However, our manual inspection suggested that, among the alternative solutions, the 10-topic model presented overall greater parsimony and interpretability, and for these reasons the latter was eventually retained. The estimated topic–document proportions for each topic are presented in Figure 3.

Final model, topic–document proportions for each topic (%).
Lastly, we plotted a network accounting for the community structure underlying the final model, which allowed us to identify six out of the ten topics to be clustered into two main communities, while the remaining four topics emerged as relatively unrelated (Figure 4).

Network graph of topic communities (clusters) based on inter-topic relations. T1: Making sense of social restrictions within everyday life; T2: Negotiating collective representations and mutual support in the “social arena”; T3: The pandemic and the US political scenario; T4: Conspiracy mentalities: vaccine development as a large-scale experiment; T5: Conspiracy mentalities: vaccines and their side effects; T6: The “othering” process: alternative explanations for the virus’ origins; T7: Religion and faith in response to the pandemic; T8: “Natural” remedies; T9: Conspiracy mentalities: misunderstanding and controversy over the mRNA technology; T10: Mistrust and social discontent.
Following, topics are presented and described, based on their top 25 words (i.e., words showing highest probabilities, β, for each topic (Figure 5).

Highest word probabilities for each topic.
We must note in the first place that a major, apparently controversial finding lies in the overall content of the 10-topic solution. The present study aimed to investigate the social representations of anti-COVID-19 vaccines, and yet several topics obtained through STM did not directly relate to vaccines, rather, the pandemic and the origins of the novel coronavirus. However, we must also acknowledge that this is not surprising due to the very epistemology of Social Representations Theory (Moscovici, 1961/1976). In fact, according to the ego–alter–object triadic model described by Bauer and Gaskell (1999; see also Moscovici, 1984b), groups and communities define and represent objects of their physical and social world based on a “variety of coexisting triangular dynamic structures competing, cooperating, or being in conflict with one another” (Marková, 2017, p. 369). As a result, different types of common sense and perspectives are likely to influence the agenda and focus of competing groups, and in some cases they might even highlight different responses to the same semantic stimuli within the same groups. In other words, the ego and the alter are supposed to interact and communicate around objects of the physical and social world, and by doing so they end up continuously redefining and transforming those objects. All components exert “a mutual influence on one another, and they jointly generated new patterns of knowledge, beliefs, and images” (Marková, 2017, p. 370), which do not necessarily match the baseline object of inquiry. Consistently, our analyses returned so-called lexical worlds (Reinert, 1990) that only apparently seemed to drift away from the original object of inquiry (i.e., the social representations of vaccines), rather, they can be interpreted as the outcome of the process of generation and transformation of social representations (Moscovici, 1984b).
Independent topics
Four topics emerged as being relatively independent from the others. Three of them showed a high prevalence across the user-generated comments, in total accounting for the 47.68%. These were Topic 2 (T2; 18.31%), T3 (11.05%), and T7 (10.72%). One additional independent topic showed a much lower prevalence, that is, T8 (7.6%). Only 37.14% of the top 35 comments characterizing T2 were associated with videos published after the announcement of vaccine trial data, compared to T3 (54.29%) and T7 (54.29%). Topic 8 was entirely based on comments derived from videos published after that date.
Topic 2. Negotiating collective representations and mutual support in the “social arena”
T2 was the most prevalent topic in the corpus, showing a positive trend along the considered timeline. Among the terms with highest probabilities for this topic, we found: video, comment, guy, news, conspiracy, watch, talk, question, lol, theory. We interpreted the user-generated comments and most frequently occurring terms within this topic as highlighting the role of social media as virtual arenas for users to co-construct and negotiate meaning, engage in mutual support, and share coping strategies throughout the pandemic. Users seemed to collectively process and transform scientific and medical information into common-sense knowledge (e.g., see the frequency of the “lol” initialism and popular element of the social media slang), seek out mutual support, and debate information relevant to best practices on public and personal health.
Topic 3. The pandemic and the US political scenario
T3 was the second most prevalent topic. This was mainly characterized by terms relevant to the 2020 US presidential elections and the role that vaccine development, as well as the future governance and management of the pandemic in the USA, played in the political agenda of presidential candidates. Among the words with highest probabilities for this topic, we found: trump, govern, american, presid, biden, america, support, elect, power, polit, vote, pay, democrat, nation, freedom. Topic prevalence showed a negative trend along the considered timeline, with posting being concentrated around the time of presidential elections, in November 2020. We interpreted the user-generated discourse on US elections emerging from this topic as possibly functional to affirming and debating competing social and political interests, rather than reflecting an interest in science, medicine, or vaccine development, often mapping out diverse sociopolitical perspectives and views on society, democracy, and freedom.
Topic 7. Religion and faith in response to the pandemic
T7 mainly included direct quotations from sacred texts, and among the words with highest probabilities for this topic, we found: god, love, video, jesus, read, mark, life, bless, lord, day, christ, save, word, earth, pray, showing a positive trend along the considered timeline. We interpreted the user-generated comments associated with this topic as centered around the perceived threat represented by the virus and the urgency experienced by a large part of the public to find reassurance, comfort, and relief. Comments were here utilized for offering, requesting, and receiving pastoral and religious support to cope with the uncertainty, suffering, and anxiety associated with the pandemic.
Topic 8. “Natural” remedies
T8 was associated with terms such as virus, real, spread, creat, control, corona, fake, fear, kill, eat, human, food, natur, anim, cold. We interpreted this topic as reflecting the core conceptual contrast between “natural” immunity, which some users associated with a healthy lifestyle, and a form of "techno-mediated" immunity, which they associated with the technologies for vaccine development. Some user-generated comments within this topic were based on pseudo-scientific knowledge and incorporated incorrect beliefs on the mechanisms underlying vaccine efficacy and disease prevention, for example, the belief that a lifestyle based on healthy eating and the use of supplements and vitamins could help prevent the infection. It is interesting to note that most of the user-generated comments were posted after the announcement of the outcomes of vaccine trials, possibly highlighting the trend of disinformation that might have characterized the social representations of anti-COVID-19 vaccines and the process of scientific discovery around them, since—and notwithstanding—the release of vaccine clinical trial data. The reified universe of science became the target of polarized “polemic” representations within user-generated content, which were often unrelated to the informative elements of the commented videos, rather expressing a plurality of users’ views. The core themata of these representations seemingly dichotomized nature and science in the attempt to make sense of a complex scientific process by simplifying it, i.e.,, reducing it to a consensual and easily interpretable dichotomy, making it more accessible and understandable.
Cluster 1. Controversial public understanding of vaccines
Cluster 1 included four topics, particularly T5 (11.94%), T1 (11.19%), T4 (8.1%), and T9 (7.28%), overall accounting for the 38.51% of the total. In general, we interpreted these topics as possibly reflecting the urgency, for some part of the public, to make sense of important scientific and public health challenges associated with the pandemic and the process of vaccine development. T4 and T5 were characterized by a positive trend across the considered time frame, conversely to T1 and T9. Whereas T5 (91.43%), T1 (82.86%), and T9 (88.57%) mainly derived from comments associated with videos published after the release of vaccine clinical trial data in November 2020, those were only 40% for T4.
Topic 1. Making sense of social restrictions within everyday life
T1 was mainly associated with terms and narratives functional to making sense of the everyday experience of the pandemic. Terms like people, mask, live, wear, life, day, happen, person, bad, feel, wait, stay, doctor, force, safe seem to refer to the context of mutated social interactions and everyday practices, globally, being significantly influenced by the threat of the novel coronavirus and the necessary adjustments that individuals and groups had to adhere and adapt to during the unprecedented scenario. In particular, we interpreted the use of the term “force” as to indicate an ongoing debate on mandatory public health measures to contain the spread of the infection across several countries (e.g., wearing masks), and the associated social and financial implications of those restrictions to individuals’ sense of agency.
Topic 4. Conspiracy mentalities: vaccine development as a large-scale experiment
T4 was associated with terms such as medic, vaccine, health, trial, doctor, effect, test, study, pfizer, drug, company, data, covid, death, dose. The discourse emerging from this topic mainly targeted vaccine development and clinical trials, and we interpreted it as highlighting some of the public's concerns around the possibility that the race for a COVID-19 vaccine might have undermined the safety and efficacy profiles of vaccines, both in the short and long term. Considering that about 60% of videos characterizing T4 were published during a time frame when evidence on the efficacy of vaccines was not available or still partial, we considered this topic to also reflect a social representation of vaccines based on the widespread feeling of uncertainty among the public and their need to make sense of the vaccine development process, which was seemingly, collectively processed into a polemic representation of the very principles and methods of science. The reified knowledge around clinical trials seemed to be processed into a lay view of vaccine development, possibly based on a shared and misunderstandood collective representation of such scientific process.
Topic 5. Conspiracy mentalities: vaccines and their side effects
T5 summarized some common misconceptions on the potential side effects of the vaccines under development. Among the words with highest probabilities for this topic, we found covid, die, death, flu, shot, kill, rate, people, test, beast, care, risk, chance, govern. The most prevalent user-generated comments in this topic manifested concerns over the safety profiles of the novel vaccines, which we interpreted as highlighting a link between the overall public understanding of COVID-19 vaccines and the emergence of a negative sentiment associated with institutional and authoritative scientific communication.
Topic 9. Conspiracy mentalities: misunderstanding and controversy over the mRNA technology
T9 mainly regarded the mRNA vaccine technology. The topic was identified by terms such as vaccine, immune, body, effect, cell, infect, virus, mrna, disease, dna, human, inject, term, product, response. In particular, we noticed that some user-generated comments within this topic relayed a common conspiracy belief, that mRNA vaccines allegedly target human DNA, aiming to modifying it. Conversely, some other users attempted to debunk such belief, informed by scientific knowledge and ethical principles, highlighting a complex landscape of conflicting polemic representations: on the one hand, conspiracy mentalities on the virus and the vaccines, and on the other hand, voices informed by scientific knowledge and ethical principles, challenging the former and alerting the other users on their potential risks.
Cluster 2. The “othering” process
Cluster 2 included T6 (9.44%) and T10 (4.38%), accounting for 13.82% of the total. Overall, we interpreted these topics as covering conspiracy mentalities and beliefs around the origin of the virus, the evolution of the pandemic, and vaccine development, which we considered in terms of an underlying “othering” process ( de Rosa & Mannarini, 2021, p. 370). This can be conceptualized as a process of antagonistic, collective representation of the pandemic and the vaccines that was “emotionally driven by fear” and which mirrored “the vulnerable self,” prompting a polemic “categorization, identification, and differentiation of self and others” (p. 377). These topics showed a negative trend along the considered timeline, and moreover, only 34.29% and 42.86% of the content characterizing these two topics, respectively, originated from comments posted after the announcement of vaccines clinical trial data in November 2020. Following, each topic's content is presented and interpreted in further detail.
Topic 6. The “othering” process: alternative explanations for the virus’s origins
T6 was associated with terms such as gate, vaccine, country, china, pandem, money, trust, india, time, develop, fauci, lie, plan, start, govern. The origins of the virus, its global diffusion, and the development of vaccines represented the main discussion points. Several user-generated comments debated the hypothesis of a spillover of the virus, which we interpreted in some cases reflecting speculative and ideological views over the geopolitical aspects associated with the first outbreak in Wuhan, China.
Topic 10. Mistrust and social discontent
T10 was associated with the following top terms: black, people, africa, time, business, african, continue, false, south, drop, manipulate, public, community, record, week. We considered this topic to reflect a common perception manifested by a part of the public around the poor governance of the pandemic across multiple geographical and institutional contexts, in political and financial terms. Words like “business” and “manipulate” seemed to allude to the perception, in some parts of the public, of possible speculations in the management of the pandemic.
Discussion
The aim of the current study was to analyze the content of a set of user-generated comments on YouTube videos related to COVID-19 vaccines posted during the first year of the pandemic. Past research highlighted informational and social interaction motives as key drivers in the use of online video-sharing platforms (Khan, 2017; Rosenthal, 2018), which we assumed to function as the ideal global setting through which public discourse is co-constructed and social representations are generated and shared. Our findings highlighted ten major topics as the best to represent the corpus of user-generated comments.
We interpreted the most prevalent topics as reflecting the users’ need for understanding and sharing a consensual explanation for the origins and characteristics of the novel coronavirus and the public health measures implemented globally in response to it, particularly vaccine development. This revealed a need for making sense of the pandemic through the lenses of consensual norms, beliefs, and values, reflecting a plurality of worldviews and standpoints. Social Representations Theory (Moscovici, 1961/1976) offered an ideal theoretical framework for the analysis and interpretation of data, considering that one of the functions of social representations is the provision to individuals and groups of a conceptual grid for understanding, interpreting, and navigating the physical and social world they live in, and in addition, to participate in public discourse and communication (Apostolidis et al., 2020; Fasanelli et al., 2020; Páez & Pérez, 2020).
Furthermore, these results are particularly interesting for a comparison between pre- and post-announcement of vaccines clinical trial data in November 2020. For example, Topic 8 was based on user-generated comments posted after the announcement of the early results of vaccine trials, in that case with content mainly centered around a sense of uncertainty towards the novel vaccines. We consider these results a reflection of the socially constructed discourse about the novel coronavirus and the relevant vaccines, specifically the dynamic tension between the reified universe of science and the consensual universe of everyday life. The wider multi-vocal conversations among users thus seemed to serve the function to support others in making sense of the unprecedented pandemic events ( de Rosa & Mannarini, 2021; de Rosa et al., 2022), in some cases activating representations of vaccines reproducing the “nature” vs. “science” dichotomy at the basis of what Moscovici (1987/2020) considered as a core themata of conspiracy mentalities.
The first cluster of topics discussed was characterized by user-generated comments mainly reflecting the public's attempt to specifically understand the process of vaccine development and help interpret the scientific and medical innovationsuch carried by novel technologies, such as the mRNA one. However, some of the user-generated content seemed to be characterized by a negative sentiment towards science and authority. In this case, conspiracy mentalities (Moscovici, 1987/2020) seemed to underlie a struggle for public involvement and participation in public health decision-making, confirming findings from prior studies on user-generated content in social media (de Rosa et al., 2022) and survey-based research on the public perception of the COVID-19 pandemic (e.g., Melotti et al., 2022).
The second cluster of topics included user-generated content reflecting concerns over the possible safety and efficacy profiles of vaccines. This is not surprising, in that these findings might reflect conspiracy mentalities in the wider communities, with such mentalities possibly representing common conceptual maps in the public's understanding of and debate on COVID-19 and vaccine development and uptake. In addition, it is plausible to hypothesize that the analyzed content, both in terms of the associated video (i.e., the materials that users commented on) and textual information, has a larger audience besides that represented by those actively participating in the comments’ section, thus potentially extending their—either positive or negative—impact far beyond the community of users engaged in the considered online conversations (Khan, 2017). However, interestingly, this was not a homogeneous trend, with some part of the content seemingly trying to alert other users to the fallacy and potential risks of conspiracy mentalities, outlining a complex landscape of social representations across individuals and groups.
The fragmentation and diversity of views and mentalities around vaccines found in the text corpus might thus reveal the essentially polemic nature of the social representations of COVID-19 vaccines across multiple countries and cultural contexts, as evidenced from the presence of topics related to the public health, social, and political scenarios of several English-speaking countries (e.g., see the reference to the US elections and the co-occurrence of the word “Africa” across two different clusters, respectively). Such evidence carries significant implications for health psychologists and policymakers globally. On the one hand, it suggests that some conspiracy mentalities and polemic social representations of COVID-19 and the relevant vaccines might have originated and spread out across several contexts, perhaps as a consequence of the global transformations operated by the diffusion of digital technologies and their relative affordability and availability in those contexts. On the other hand, it suggests that some elements and characteristics of those mentalities might be the product of locally specific social dynamics, facts, events, and public figures playing a key role within the public agenda of those contexts. For this reason, an in-depth knowledge and awareness of both the global scenario and specific local features is warranted to provide a more efficient prevention and intervention in those communities, aiming to disentangle and debunk dysfunctional narratives, for example by means of traditional and new media campaigns targeting the core themata of the social representations that dominate those contexts. Additionally, differences in the public's understanding and social representations of COVID-19 and the relevant vaccines might also be associated with either a lower degree of availability and digitalization of local communities or a more pronounced use of alternative information media and outlets (e.g., DiBisceglie & Arigo, 2021; Ren et al., 2021; Wu & Shen, 2021). For this reason, research focusing on user-generated content within multiple media outlets is warranted, aiming to promote a positive impact on public health prevention and intervention.
There are important implications of these results. First, from a theoretical point of view, the results of the present study highlight the relationships between social representations co-constructed and shared through user-generated content and the norms, values, and beliefs to which they pertain. Specifically, the topics discussed showed a shift in those representations along the considered timeline, with some topics illustrating the need for making sense of the unprecedented public health crisis and vaccine development under conditions of uncertainty, whereas other topics reflected alternative views over the definition of key issues that have characterized the course of the pandemic, such as social restrictions and other events. On a practical note, future public health campaigns would benefit from taking into consideration the contextual profile of the social representations of vaccines emerging from user-generated content and that might be disseminated among local communities, aiming to effectively target their core themata, as discussed. Nevertheless, scientific and institutional actors would benefit from taking into account the expectations, concerns, and uncertainty expressed by the general public through online user-generated content, aiming to engage with a necessary dialogue and possibly widen participation in health- and policy-related debates.
The study presented here was exploratory in nature, thus it does not allow us to further speculate on the interpretation of the data and the analyses. One of the main limits of our research was the restricted time frame for data collection, and we recommend that future research extends this interval and widens the scope of the analyses by using supplementary keywords that could provide further insight on other key aspects of the pandemic and its societal effects. For example, but not limited to, those could be the introduction of lockdowns, and more recently, of vaccine passports in several countries. Also, in our analyses, we did not control for the information derived from the inspection of video content or the written description of the videos. Previous investigations found that these elements play an important role in improving the understanding of user-generated comments, not only due to their explicit multimedia content, but also for the symbolic and ideological frames and positions that they might implicitly convey (de Rosa & Farr, 2001; de Rosa et al., 2022; Tian, 2010). Further research should therefore examine whether these variables facilitate a more comprehensive interpretation of the public discourse emerging from comments on videos about COVID-19 vaccines. Future studies could also investigate whether these comments differ across the demographic and psycho-social profiles of content creators, as highlighted by previous investigations (Khan, 2017). Lastly, but of foremost importance, the corpus was limited to English comments, which limits the generalizability of the results to other linguistic and cultural contexts. Furthermore, several other online video-sharing platforms might be equally or even more relevant than YouTube in many countries and cultural contexts. For this reason, the results here presented must be considered with caution. We invite researchers to use the evidence here presented to further contribute to research in the field, perhaps looking at other social media and video-sharing platforms, particularly in those countries and contexts where this medium might play a marginal role in public health information. This would foster a greater comprehension of the phenomenon under investigation, from a global perspective, which our study could not fully address.
Considering the characteristics of the present study, and in light of previously covered considerations, a useful follow up might consist of a content analysis of the videos retrieved, complementing the work we have undertaken on user-generated text material. In fact, this could shed light on whether different types of videos induce different views and representations of the pandemic and the vaccines, potentially helping to outline some practical recommendations to prevent common misconceptions and conspiracy mentalities, which could also contribute to inform the agenda of online broadcasting agencies during future healthcare crises. However, we must acknowledge that such a study would require a considerable extension to the present work, as well as careful reflection on the most appropriate methodology to analyze the multimedia content of the videos along user-generated content, which goes beyond the specific objectives of our research. As such, we take this opportunity to invite other researchers to plan and develop future investigations into these important aspects, drawing upon the findings presented in the current manuscript.
In conclusion, our findings highlighted a set of topics that allowed us to map the social representations regarding COVID-19 vaccines, which might be primarily indicative of the attempt to make sense and cope with the unprecedented global public health crisis and relevant scientific progress. Although the data do not allow us to speculate on the links between such representations, vaccine hesitance, and health preventative behavior, we consider the narrative underlying some of these representations of great value for public health, and we recommend that future research explores whether and which of the specific core contents of such representations might be associated with a reduced endorsement of health preventative behavior and vaccination uptake at the individual and community levels, and further addresses the question as to how governments, public health systems, and policymakers could improve their communication through traditional and social media channels.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article
