Abstract
Research has revealed that social media data may be promising in many health threats and help to understand how people respond to them. As the outbreak of a novel coronavirus disease (COVID-19) is a global pandemic, a real-time social media monitoring is needed to know the scale of this phenomenon. We have reported the frequency, reach and impact of online mentions about the COVID-19 illness taken from social media platforms: Facebook, Instagram, Twitter, blogs, forums, and news portals to highlight and better understand the scope of coronavirus discussion in Poland. We used SentiOne social listening tool to gather the data and perform the monitoring between 24 February 2020 to 25 March 2020. We found a total of 1,415,750 mentions related to COVID-19 which gives the average 47,192 mentions per day. 95.36% (1,350,059) of mentions were people’s updates and expressions, 4.64% (65,691) mentions were articles from news portals and social media. Males have dominated the online conversation about COVID-19 (65.32% vs 34.68% females). At the same time, women were more likely to discuss the topic on social media platforms such as: Facebook, Twitter, and Instagram. We concluded with theoretical and practical implications.
Introduction
A novel coronavirus illness, known as COVID -19, has been labeled a global pandemic on Wednesday 11 March 2020 by World Health Organization. 1 Until then other countries have reported their new cases of the disease. By the 25 March a total 441,187 cases have been confirmed in at least 177 countries and territories, with 1031 cases in Poland, 2 and the situation is still dynamic. On 4 March 2020, the first case of SARS-CoV-2 coronavirus infection in Poland was confirmed, and by 11 March, when the Polish government decided to close educational, cultural facilities, and cancel all public events, there were 31 cases. The day after, the first death of COVID-19 was recorded. The state of the epidemic was announced on 20 March, with 425 cases confirmed, and continues until further notice. At the day of 25 March the number of cases of SARS-CoV-2 coronavirus infection has exceeded 1000. 3 The crisis associated with the SARS-CoV-2 made the global society discuss about the new phenomena and this is mostly online conversation. In many cases this is a result of radical countermeasures adopted in many countries. The requirements of self-quarantine, social distancing, lockdowns and working in home mean that citizens are forced to rapidly change their everyday routine and interact electronically—using e-mails, Facebook, WhatsApp, Messenger, and Twitter more than ever before. As a result, internet traffic has increased so much that Netflix and YouTube have decided to reduce the quality of streaming in Europe to prevent the internet collapsing under the strain of unprecedented usage. 4 The novel coronavirus is also the challenge for social media industry. Marc Zuckerberg, Facebook Chief Executive Officer (CEO), admitted that the platform is facing “big surges” in usage, and voice and video calls on WhatsApp and Facebook Messenger, in particular, are more than double usual levels. 4 The global pandemic is also forcing changes to the way social media platforms and search engines operate providing accurate information for public health. While social media might be a promising tool for real-time updates from national health authorities, scientists and public health experts as well as supporting global and local agencies, and spreading useful information that helps people protect themselves or diagnose symptoms, still there is a problem of COVID-19 misinformation, sensationalism and false claims spreading across social media platforms. The UK’s National Health Services is working with Google, Twitter, Instagram, and Facebook to provide the public with reliable information about COVID-19. 5 Google is also working on a verified and handy NHS guidance when someone searches for coronavirus online. 5 The unprecedented step in preventing the spread of “fake news” and supporting government and public health sector during pandemic was a statement from Facebook, Google, LinkedIn, Microsoft, Reddit, Twitter, and YouTube cosigned on March 16, 2020. 6 Within it, a number of initiatives are created such as: providing the WHO with free ads or a “Coronavirus Information Center” running by Facebook 7 ; “WHO Health Alert” on WhatsApp 8 ; blocking ads that capitalize on the pandemic in Google and showing a “SOS Alert” banner, followed by news and information from recognized health organizations and governments 9 ; YouTube, similarly to Facebook and Twitter, has introduced info panels from the WHO or national health organizations which appear when users search for coronavirus. 10 Social media and IT giants have been already involved in tackling the new coronavirus and they respond it quickly.
Despite action taken, the volume of social media activity associated with the new coronavirus disease (COVID-19) oscillates around the unknown number. And people still post or search for new information about it. According to latest Edelman Trust Barometer Special Report, seven in 10 respondents following coronavirus news at least once a day or several times a day, 38% get them from social media. 11 We herein report and document the frequency, reach and impact of posts about the COVID-19 illness taken from Facebook, Instagram, Twitter, blogs, forums, and news portals to highlight and better understand the scope of coronavirus-related conversation in Polish Internet. We believe, our approach throws some new light on social media data and its potential in public health surveillance. To our knowledge, this is the first study that uses social media monitoring data to report the COVID-19 outbreak in Poland.
Methods
Data source
We used SentiOne social listening tool to capture online mentions of COVID-19 disease during 24 February 12:01 am to 25 March 2020 11:59 pm. SentiOne is an AI Automation Platform based on algorithms for natural language processing, web-scale crawling and data analytics, processing on all types of statements, comments and articles that have been posted publicly on the Internet, including: social networks (e.g. Facebook, Instagram, Twitter, YouTube), blogs, forums, news portals, and sites with reviews. 12 This system allows researchers to retrieve the relevant online content, including audiences’ information such as username, time of publication, and gender. Online listening was performed for a single month to highlight a real-time online conversation about a new illness. The search queries included two keywords: “COVID-19” and “SARS-CoV-2” and were all posted in Polish. All data collected were publicly available and obtained legally.
Data measurement
We performed six categories of data collected, gathered by SentiOne system:
number of mentions—all online statements, opinions, and comments that contain keywords: “COVID-19” and “SARS-CoV-2” with the exact time of publication, source, and sentiment;
sources—we have monitored Facebook, Twitter, Instagram, blogs, forums, and news portals (with forums and social media widgets). The language of monitoring mentions was Polish;
gender—the gender of the author of phrase or mention. It is based on the name’s library as well as other linguistic traits that determine it;
sentiment—online statements rated as positive, negative or neutral. This feature is based on natural language processing (NLP) algorithms, which is designed to process, analyze and interpret human natural data.
Reach—the estimate number of views of phrase in general (posts, comments, shares, reactions, articles) that appear on the Internet and social media. It may include the same content on several different channels and thus be received several times by the same user. The reach is also gathered from observing the spreading range of posts.
Engagement—the metrics that express how many people have interacted with the monitored content in a specific social media platform. In this study we used three engagement measures: the number of likes, comments, and shares.
The summary is provided in the figures below.
Results
A total of 1,415,750 mentions were found in Facebook, Twitter, Instagram, blogs, forums, and news portals during 24 February 12:01 am to 25 March 2020 11:59 pm with average 47,192 mentions per day. On the first day of monitoring (24 February) we captured 6490 mentions with phrases: “COVID-19” and “SARS-CoV-2”, and 87,839 mentions in the last day of monitoring (25 March). The highest number of mentions was noted on March 24—92,201, where there were 152 cases of COVID-19 recorded in Poland and two patients were dead (see Figure 1). 95.36% (1,350,059) of mentions found during this project were people’s updates and expressions, 4.64% (65,691) mentions were articles from news portals and social media.

Mentions’ distribution and number of cases and deaths recorded in time.
A total of 1,027,268 (72.56%) mentions were found on news portals and 388,482 (27.44%) in social media: Facebook—275,737 mentions, Twitter—99,957 mentions, forums—6149 mentions, blogs—4280, and Instagram—2359 mentions (see Figure 2).

Mentions’ distribution in time by source.
According to the analysis, the topic of COVID-19 generated the greatest interest on Wednesdays (the average number of mentions was at a level of 283,597) and around 9 pm (with average number of mentions at a level of 92,624) (see Figure 3).

Hourly and daily distribution of mentions.
Males dominated the online discussion about COVID-19 – 250,077 (65.32%) mentions were created by men compared to 132,776 (34.68%) by women (see Figure 4).

Mentions in time by gender.
But when we analyzed sources where the conversation took place, women, compared to men, participated more in social media platforms: Facebook, Twitter, and Instagram. News portals’, blogs’, and forums’ discussion was dominated by males (see Figure 5).

Sources of mentions by gender.
The sentiment of discussion from the very beginning was mostly neutral: 1,192,203 (84.21%) mentions were found neutral, 176,827 (12.49%) negative, and 46,720 (3.30%) positive. Although, analyzing the detection of sentiment and its dynamics in time, an increase in the number of negative mentions was also observed (see Figure 6).

Mentions in time by sentiment.
We also found that the total reach of the mentions collected between 24 February 12:01 am and 25 March 2020 11:59 pm was 3771 mln and in the overwhelming majority was neutral: about 96 mln of reactions were positive, 467 mln of them were negative. Again, the peak of involvement around the topic was on March 24 (see Figure 7).

Mentions in time by reach.
When we found out that 95.36% of recorded content was generated by users (posts, comments and shares with a comment), we then analyzed the type of reactions which spread the engagement over social media and Internet. We found 2,787,788 reactions to COVID-19 related content on Facebook, 686,143 reactions on Twitter, and 515,568 reactions on Instagram. There were 427,290 comments, 660,222 shares, and 2,901,990 likes. More detailed description of the reactions that made up the topic associated with COVID-19 disease reach in selected social media platforms illustrates Figure 8.

Summary of engagement in social media platforms: Facebook, Twitter, and Instagram.
Discussion
Social media big data can be used to assess perceptions about public health threats and issues.13,14 As the main function of social platforms is to give the people opportunities to create, discuss and share topics of their interest, as well as health issues. Our study demonstrated what is the scale of social media current conversation taken in Poland in case of a novel coronavirus illness COVID-19, which is global pandemic.
Since 24 February 2020 to 25 March 2020 we recorded a total of 1,415,750 mentions with the phrases: “COVID-19” and “SARS-CoV-2.” During this time period the number of cases reported in Poland have been already increased rapidly: till the last day of this study there were 1051 confirmed cases of COVID-19. 3 Li and et al. in a similar period of time (39 days) have found 115,299 COVID-19 posts on the Weibo platform. 15 We found that the overwhelming majority of content collected is user-generated. People talk about a COVID-19 disease and use social media to share their expressions, look for new information, comment those news and media reports. Although it was found that the very first source of mentions with phrases “COVID-19” and “SARS-CoV-2” were portals, there were people themselves who generated the reach by comments, shares, likes and other social widgets that spread it online. One explanation for that is the reach of the content related to COVID-19 which achieved 3.77 billion reactions that were caused by the monitored phrases. Again, we were in the very beginning of the pandemic in Poland, but the interest of coronavirus phenomena was still very strong, the situation was dynamic when we take all government restrictions taken at that time for consideration.
In our study, we have found that social platforms: Facebook, Twitter, and Instagram are more popular in women than men as a source of information about a novel coronavirus disease. Many studies and reports have found that women are more likely to use social media in general.16–18 In 2019 Facebook use remains considerably higher among women (75%) than men (63%). 19 Males are more likely to use news portals, blogs and forums. One explanation for that might be the global internet usage dominated by men 20 and type of content, that men are more likely to comment in Poland. 21
Social media play a critical role in transmission of the reliable information to disseminate disease risks and interventions from government and local and global health agencies, what is currently happening in context of COVID-19. It also seems that the analysis of social media data can be used both in health promotion and prevention of diseases, to allow a more precise identification of at-risk populations, and coordinate better targeted strategies and interventions to populations. 22 It was also found, that people use social media not only for seeking information about COVID-19 but they seek help, and most of them were elderly. 23 According to our findings, the spread of information about COVID-19 shapes the huge online discussion about a novel coronavirus and can make meaningful contributions in current crisis. As Zhao and Cheng et al. agreed the public’s emotional tendency toward the topics related to the COVID-19 epidemic changed from negative to neutral across the study period. 24 In our study, in turn, we have observed the tendency to increasing of negative emotions during time.
Numerous studies have shown an increasing interest in tracking disease spread in social media but the accuracy of such surveillance systems is not always valid, for example, Google Flu Trends—an algorithm detecting real infection on the basis of online search terms related to flu. It was shown that the system overestimated the actual number of cases almost a half. 25 Despite many limitations, online data may still be useful to researchers to predict future health risks and as such should not be ignored.
Overall, the nature of online conversation about COVID-19 is rapidly changing, showing us many signals of what peoples’ threats, thoughts and needs are. We have learned that the topic of COVID-19 is a public concern and could provide an insightful knowledge both to the government and health agencies in this period of COVID-19 pandemic. What we have noticed is that the future studies could extend to explore the problem of potential disinformation in the time of pandemic. Incorporating Natural Language Processing (NLP) methods in social media discussions is one of the ways. There is still a space for public health policy makers to develop the solutions like classifying issues based on reported symptoms received from social media posts over time 26 or prioritises posts for potential interventions based on topics, sentiment and the overall conversation threads. 27
Limitations
There are some limitations in this study. First, we used data limited to 1 month. We chose this time period to better demonstrate the dynamic of online discussion about COVID-19 disease, and highlight its dynamic in real-time, but we are aware that more long-distance analyzes should be carried out in near future. Second, we limited our study to local data referring to Poland only, while the coronavirus pandemic is a global phenomenon. Third, we have limited ourselves to monitor only two phrases, avoiding the word “coronavirus.” Conducting a trial study, we have noticed mentions that were not thematically related to COVID-19 disease, so we excluded this keyword from the research. We realize this may be a serious limitation and cause of bias, but after the disease has been officially named on 11 February 2020, we decided to leave the keyword strategy unchanged. Moreover, privacy settings on social media might becloud the actual results, which should be also taken into consideration. Finally, we do not perform sophisticated and deepened analysis of material collected, we realized that our findings are descriptive in nature. We do believe that further work needs to be done in order to validate the results, and more qualitative analysis could establish the role of social media in a new coronavirus pandemic. We treated our findings to be a first step in understanding it. We do hope, the real-time social media data can spread a new and interesting light on COVID-19 context but still it should not replace government or official statistics.
Footnotes
Acknowledgements
Authors want to thank the SentiOne team for their help and cooperation during this research.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
