Sage Journals: Discover world-class research

Abstract

Social media has become a platform for information diffusion, voicing concerns of existing inequalities and raising public awareness of various social and societal issues. Despite the social good, social media has become a fertile ground for spreading misinformation, hate speech and conspiracy theories. The death of George Floyd in May 2020 triggered a series of protests worldwide in support of the Black Lives Matter (BLM) movement and triggered a debate about equity, inclusion and social justice. The purpose of this study is to examine the impact of misinformation and social noise on the original intended message of BLM using data from the Twitter hashtag ‘BLM’. Results from topic modelling have shown the strong presence of misinformation and social noise. Such information was most probably intended to influence, mislead and dilute the original intended message. However, despite the effort to distort the original message of BLM, results from sentiment analysis show that users’ opinions of the BLM movement remained positive.

Keywords

Black lives matter equity inclusion misinformation sentiment analysis social noise topic modelling

1. Introduction

Advances in information science technologies, such as mobile devices, learning technologies, publishing tools, communication and socialisation hubs, live streaming and interactive multimedia, have made it possible for people to access a wealth of information and knowledge that never existed before. While this is considered a blessing for some, for most people, it is a double-edged sword that cuts both ways. The rise of misinformation, hate speech, conspiracy theories, cyberbullying, fake news and the increased use of unethical artificial intelligence are some of the challenges that we must deal with [1]. The recent events surrounding the US elections, Black Lives Matter issues and COVID-19 have highlighted the danger of misinformation and the importance to democratic institutions, science and society [2,3]. Undermining faith in science and democratic institutions has a long-term effect on human development and progress.

When inaccurate information is spread either intentionally or unintentionally, it is referred to as misinformation [4]. The difficult part is dealing with misinformation when those who are participating in its diffusion are not aware of its existence. The influence of a user’s personal and relational characteristics on information received through social media that might confuse, distort or even modify the original message is referred to as social noise [5]. A study of social noise in misinformation cases is needed, emphasising how it affects disseminating false and harmful information. BLM stands for ‘Black Lives Matter’ and is a decentralised political and social movement that protests against police brutality and racially motivated violence against black citizens. After George Zimmerman’s acquittal in the shooting leading to the death of African American teen Trayvon Martin, the campaign started with the hashtags ‘#BlackLivesMatter’, ‘#BLM’ and so on, on social media in July 2013 [6]. During the George Floyd demonstrations in 2020, the campaign resurfaced in the national spotlight and drew even more international attention.

While certain percentages of the people participating in social media discussions can be considered advocates of social justice causes, many others are engaged in misinformation activities and social noise. The openness of social media platforms allows external parties to infiltrate the conversation and exploit social noise to influence opinion and distort the original message for political or ideological reasons. The external influence on the American elections in 2016 and 2020 is a good example of the role external players can have on social media users. This study examines how misinformation and conspiracy theories in the form of social noise impact the original intended message. We hypothesised that the existence of social noise dilutes, alters and distorts the original intended message. Expanding our understanding of social noise in the way it is generated, manipulated and spread could help identify and minimise the impact of intended and harmful misinformation generated by outside players. The study also examines the effects of misinformation and social noise on social issues, such as equity, inclusion and social justice.

In this article, data related to the BLM movement are collected from Twitter, and sentiment analysis is performed to understand the sentiments of social media users towards the BLM movement. Latent Dirichlet Allocation (LDA) and a biterm model are built and executed, and the keywords related to social noise, misinformation, equity, inclusion and many others are identified and extracted. Additionally, the associations among the words are explored in this study.

2. Literature review

The first realisation of the power of social media as a platform for protest and social justice started in 2010 with the Arab Spring and the uprising in Tunisia. People discovered that they have a bigger circle of influence, and their ideas, messages and posts have a ripple effect on others [7 –9]. During the uprising, people used social media to organise protests, create awareness, share stories and, most importantly, document violence and violate people’s right to exercise freedom of speech and civic engagement. The role of social media in equity and inclusion that was demonstrated led the US supreme court to decide to legalise marriage between same-sex couples in 2015 [10]. The Human Rights Campaign (HRC) work around the #LoveWins was an effective way of using social media with more than 7 million tweets and 1.4 million photos on Instagram [11]. The Me-Too movement (or #MeToo movement) originated in 2006 has been elevated and has gained momentum in 2017 when famous actresses and high profile female musicians decided to share their experience with sexual harassment using the hashtag #MeToo [12]. This led to more awareness about sexual harassment issues and the conviction of several well-known figures in the Hollywood and the movies industry.

BLM started in 2013 with the use of hashtags, such as #BlackLivesMatter, #BLM and others, following the murder of Trayvon Martin. It aimed at protesting and combating anti-black racism and police brutality towards African Americans [6]. The movement and the hashtags got magnified after the murder of George Floyd when protests and demonstrations spread around the world calling for equity and social justice. The BLM movement, born in 2013, was indirectly created out of decades of frustration within the African American community over the legal system’s continual exoneration of those who had taken black lives [13]. Ince et al. [14] studied the social media presence in BLM. In their article, the authors analysed different hashtags related to BLM and found that the ‘hashtags mention solidarity or approval of the movement, refer to police violence, mention movement tactics, mention Ferguson or express counter-movement sentiments’. Francis [15], in his paper, mentions that the global rallying cry in the BLM protests alerts us to the urgency for transformative change in all spheres. As Brock [16] mentioned in his paper ‘twitter’s combination of brevity, multi-platform access and feedback mechanisms has enabled it to gain mindshare far out of proportion to its actual user base, including an extraordinary number of black users’.

Recent events in the United States and around the world have highlighted the danger of spreading misinformation and fake news on the Internet and social media [17]. The seriousness of the problem has prompted calls from politicians and legislators for technology companies to identify and stop the spread of misinformation. Efforts are being made by social media companies to counter the effect of misinformation by trying to introduce a combination of technical and non-technical measures. For example, Google and Facebook introduced a community-driven approach that allows users to flag false content to correct the newsfeed [18]. However, the effectiveness of such measures will depend largely on the ability to distinguish disinformation and misinformation created with the intent to harm from noise created and proliferated by users for different reasons. Misinformation in the form of social noise could occur for several reasons, including the selective and partial representation of the original information without the intent to harm [5,19]. Social media users tend to operate in groups that could foster confirmation bias, segregation and polarisation. This can also lead to the proliferation of biased narratives and unsubstantiated rumours, mistrust and paranoia that impact the quality of the information [20]. Not everyone in the group is aware of the impact of their action or the extent by which their participation is contributing to the problem of misinformation and social noise.

Social noise occurs when people interact differently with information on social media than if they encountered it privately due to the awareness of being observed by peers, colleagues, family and other members of their social network. Under the influence of social noise, a user may tamper their communication based on external cues from their social network regarding what behaviour is acceptable or desirable, consciously or unconsciously attempting to present themselves more desirably and increase their social capital within the network [19,21]. For example, if a well-respected friend posts a news article supporting a social issue that is generally in line with the users’ belief system, the user might indicate support by ‘liking’ the post without fully understanding the issues being discussed. Similarly, in an effort to maintain important relationships and avoid controversy, social media users might not indicate their true opinion in opposition to the issue, and instead, they pretend they agree on the issues [22 –24]. Social noise can influence users’ behaviour and push them to participate in conversations without knowing or understanding the issues at stake.

While misinformation in the form of doctrine, denial of the truth, manipulation of historical information and propaganda is considered a problem that needs to be dealt with, social media’s role in the spread of misinformation is evident [25 –27]. Recent remarks by President Joe Biden regarding ‘misinformation killing people’ referring to misinformation regarding the COVID-19 vaccine indicate the seriousness of the problem and the need for social media companies to take steps to control misinformation on their platforms [28]. Unfortunately, this is a complex problem given the openness of social media platforms and the challenges in identifying misinformation from social noise. To deal with the problem of misinformation, there is a need to understand why ordinary people might participate in the spread of misinformation. A study by the MIT lab found that false and manipulated information spread 70% faster on social media than authentic information [29]. Most people forward social media messages and retweet misinformation as a social event without realising that the information being sent might be manipulated or fake. People engaged in social noise as part of image curation, relationship management, conflict engagement or cultural agency are less probably to question the accuracy or the validity of the information they are dealing with [19].

Zimmerman [19] introduced the concept of social noise in an attempt to understand user information behaviour in social media and the type of content posted under certain circumstances that constitute social noise. She defines social noise as being made up of four constructs that could help in explaining the spread of misinformation. The four constructs are image curation, relationship management, conflict engagement and cultural agency. Image curation is defined as an attempt by social media users to knowingly or unknowingly craft their online identity and create a personal exhibition that satisfies them [30]. Relationship management refers to a user’s desire to build a community with individuals or groups with profound importance or high social value to them. This can be driven by a desire to be included as a member of a particular group (whether formal or informal) or to connect with and maintain good relationships with other people [21]. There are various reasons why people might participate in certain discussions on social media or retweet certain messages or content without understanding the broader implication of their actions. The social noise categories identified by Zimmerman are described in Table 1.

Table 1.

Social noise constructs identified by Zimmerman.

Construct	Definition
Image curation	Is the effort by a social media user, consciously or unconsciously, to craft their online identities
Relationship management	Refers to a user’s understanding of their roles and responsibilities within social institutions and the level of confidence in their personal beliefs
Conflict engagement	Is the level of social conflict with which a user is comfortable
Cultural agency	Is characterised by civic participation in social issues and is exhibited by individuals who believe in theirown power to be heard and to shape culture and beliefs

3. Methodology

To perform content analysis on data extracted from social media in an unstructured format, we use two methods of data analysis: sentiment analysis and topic modelling. Sentiment analysis is a popular text analysis technique that detects polarity (e.g. a positive, neutral or negative opinion) within the text, a whole document, paragraph, sentence or clause. Sometimes, it is combined with the content analysis for topic discovery and opinion mining [31]. Emotion detection is another type of sentiment analysis aimed at detecting emotions, such as happiness, sadness, anger and so on [32]. Sentiment analysis can also indicate whether the response is fact-based and the degree to which the opinion reflects the respondent’s personal opinion. Routray et al. [33] reviewed different aspects of sentiment analysis for text documents and highlighted four different research challenges: subjectivity classification, word sentiment classification, document sentiment classification and opinion extraction.

Topic modelling is one of the methods used to discover topics across various text documents. These topics are abstract in nature, that is, words related to each other form a topic. There can be multiple topics in an individual document. Topic modelling helps explore large amounts of text data, find clusters of words, the similarity between documents and discover abstract topics [34]. LDA is a topic modelling method used to identify the topics and topic keywords in a document. The basic concept is that documents are interpreted as random mixtures of latent topics, each of which is described by a word distribution [35]. Biterm is another topic modelling technique, wherein the biterm model will learn about the topics based on word co-occurrences patterns. It is a word co-occurrence-based topic model that learns topics by modelling patterns of word-to-word co-occurrences [36].

For this study, the data related to the BLM movement was collected from Twitter. Although the BLM movement has been in existence since 2013, it was the death of George Floyd on 25 May 2020 that ignited a global protest and gave momentum for the BLM discussion. Since George Floyd’s death, BLM has become one of the most discussed topics on social media platforms, such as Twitter and Facebook. For this study, the hashtag ‘BLM’ was used to collect data from 25 May 2020 to 10 June 2020. The time frame was chosen because the hashtag was trending during that time, following George Floyd’s death and the renewed interest in inequality and social justice.

The google cloud compute engine was used to build a virtual machine instance, which was then configured with a python environment. This python environment was used to run the python code that extracted the data from Twitter using Twitter APIs. A total of 104,546 records were extracted for the hashtag ‘BLM’. The data extracted included tweet content (referred to as tweets), tweet id, user id, language, tweet source, created date, retweet count, reply count, like count and quote count. Despite the fact that multiple data points about tweets were collected, this study only looked at the tweet content and the tweet creation date. The rest of the data will be used in future studies. The extracted data were imported into a local machine, and the analysis was performed. At first, Tableau Software was used to build visualisations for the data.

Figure 1 depicts the frequency of tweets related to BLM from 25 May to 10 June. Although George Floyd passed away on 25 May 2020, from the above graph, it is evident that the tweets related to BLM began to acquire traction slowly after his death. There was a peak on 31 May, which was the day when George Floyd’s second autopsy was performed. The number of tweets tweeted on 3 June grew once more because the charge against the officers responsible for George Floyd’s death was upgraded on that day. In addition, more tweets were tweeted on 8 June because George Floyd’s body was on display in his hometown for the public. It is evident from Figure 1 that anytime there was news about George Floyd’s death, users tweeted about BLM, linking George Floyd’s tragedy to the BLM movement.

Figure 1.

BLM daily tweet frequency chart.

Figure 2 represents the number of tweets tweeted between 25 May and 10 June on an hourly basis. Majority of the tweets were tweeted between 3 p.m. and 12 a.m., as shown in Figure 2.

Figure 2.

BLM hourly tweet frequency.

After creating the visualisations in Tableau, the data was cleaned by loading the data file into a python data frame and removing all the unnecessary text and symbols from the tweet content column. Exploratory data analysis was performed to analyse the dataset and summarise their main characteristics using visual methods. Furthermore, sentiment analysis was performed to identify the user’s sentiment towards the tweet. Sentiment analysis will reveal whether the social media user has a positive, negative or neutral feeling towards the tweet and whether the tweet is fact-based or influenced by the writer’s personal opinion. To perform sentiment analysis, the textblob python library was used. Textblob library is used to analyse textual data and provides an API that can be used to perform sentiment analysis. Textblob is a more reliable method because of its extensive sentiment analysis capabilities. Textblob was chosen for reasons, such as accessibility, lightweight, ease of use and a less oppressive learning curve [37]. Moreover, the Textblob library produces two outputs: subjectivity and polarity, both appropriate for this study.

Furthermore, data was preprocessed using techniques, such as tokenisation, lemmatisation, n-grams implementation and speech of tag selection.

Tokenisation refers to the process of splitting the sentences into words while lowercasing the terms, removing punctuation, ignoring tokens that are too short and dismissing letter accents.

Lemmatisation includes removing inflectional endings and returning the base or dictionary form of a word, known as the lemma. For example, terms, such as ‘used’, ‘using’, and ‘uses’, are all converted to a base word called ‘use’.

N-grams implementation is extracting sequences of ‘n’ words that frequently occur in the corpus. Bigrams and trigrams were created for this study, representing two words and three words in series, respectively.

Speech of tag selection is the process where part of the speech tag is assigned for each token. For this study, only those words, which have part of the speech tag as nouns, adjectives, verbs and adverbs, were kept for further analysis.

For topic modelling, the Gensim python library and Mallet toolkit were used to build the LDA model and extract the topics. Gensim is a python library used for topic modelling, document indexing and similarity retrieval with large text collections. Mallet toolkit is a java-based package based on machine learning for statistical natural language processing, document classification, topic modelling and other machine learning applications. The LDA model was run on the tweet data, and a total of 500 topics were extracted from 104,546 tweets. Each topic had 20 keywords related to the topic. The keywords were analysed to identify the presence of misinformation and social noise. The selected keywords were then assigned to different social noise constructs.

Another topic modelling technique, which is the biterm topic modelling (BTM), was also performed for richer results. The Biterm library was used to build a biterm model. To handle the problem of sparse word co-occurrence at the document level, the library explicitly models word co-occurrence patterns across the whole corpus [38]. The biterm model was run on 104,546 tweet data, and topics were extracted and analysed. The selected keywords were then combined with LDA keywords and assigned to different social noise constructs.

Furthermore, bigrams were built and analysed, and interesting word associations were discovered from the tweet data. Figure 3 illustrates the research workflow chart and processes used to complete this research work. The highlighted boxes represent the stages that led to the production of the results.

Figure 3.

Research flow diagram.

4. Results

4.1. Contextual analysis of tweets

Sentiment analysis was carried out on 104,546 tweets. For each tweet, the subjectivity and polarity scores were calculated. A text can have a subjectivity score between 0 and 1. A text with a subjectivity score of 0 is categorised as objective or based on facts. A text with a subjectivity score of 1 is classified as subjective or based on personal opinion or motivated by emotion. The sentiment analysis of the tweets resulted in 76.12% (79,580 tweets) of the tweets being subjective, while 23.88% (24,966 tweets) being objective or focused on facts. This shows that most of the tweets were based on the personal opinion of the Twitter user. Figure 4 represents the sentiment analysis subjectivity results.

Figure 4.

Sentiment analysis subjectivity score chart.

Using sentiment analysis on the data, the polarity score of the data was also calculated. The polarity score helps to determine the sentiment of the tweet, such as positive or negative, or neutral. For a tweet, the polarity score ranges from −1 to 1. A negative sentiment has a polarity score of −1, whereas a neutral sentiment has a polarity score of 0 and a positive sentiment has a polarity score of 1. Sentiment analysis on 104,546 tweets resulted in 42.16% (44,076 tweets) having positive sentiment, 29.60% (30,946 tweets) having neutral sentiment and 28.24% (29,524 tweets) having negative sentiment. Based on the polarity scores, it can be concluded that most of the tweets in this dataset had positive sentiment towards the BLM movement. Figure 5 represents the polarity score results.

Figure 5.

Sentiment analysis polarity score chart.

While Figure 4 shows that most of the tweets posted are based on the personal opinion of the Twitter user, Figure 5 shows that most of the tweets were positive in nature. Figure 6 depicts the positive, negative and neutral tweet rates over the data collection period. The chart shows that the number of positive tweets about BLM has always been more significant than negative and neutral tweets throughout the data collection period. At the beginning of the data collection period, the number of negative tweets about BLM was greater than the number of neutral tweets. However, between 3 June and 6 June, the number of negative tweets fell below the number of neutral tweets but subsequently increased during the remainder of the data collection period.

Figure 6.

BLM tweet polarity score chart on a daily basis.

4.2. LDA topic modelling

LDA is a generative statistical model that uses an unsupervised machine learning technique. It is used to identify topics or clusters of keywords. LDA first assumes there are a ‘k’ number of topics across the documents. Documents, in this case, refer to the extracted tweets containing the BLM hashtag. LDA assigns keywords to different topics by first determining the number of keywords in the document/tweet and then estimating the probability of a given keyword assigned to a specific topic. This process is repeated until all the keywords are assigned to their respective topic. An LDA model was developed for this study and was used to analyse 104,546 tweets extracted containing the BLM hashtag from 25 May 2020 to 10 June 2020. The LDA model identified a total of 500 topics. The reason for choosing 500 topics is because the dataset is large. Another rationale for selecting 500 topics is that the LDA model’s coherence score for 500 topics is around 0.6, indicating that the topics are of good quality. Each topic consisted of 20 keywords associated with the topic. The topic and its keywords were then analysed further to determine the dominant keyword or construct representing a specific area of interest. Some of the areas identified include equity, inclusion, justice, racism, riots, violence and so on. Figure 7 shows dominant keywords that characterised the discussion on the BLM hashtag.

Figure 7.

Word Cloud showing dominant keywords.

4.3. Biterm topic modelling

BTM is a form of topic modelling technique that is used to locate topics in collections of short texts. The wordco-occurrence is used by the biterm topic model to identify the topics. A BTM used word occurrences to identify topics by modelling word-to-word co-occurrences patterns commonly referred to as biterms [36]. From these patterns, it learns how to model a topic and its keywords. LDA and biterm vary in the approach and capabilities. While LDA employs word-to-document co-occurrences, biterm uses word-to-word co-occurrences. Both LDA and biterm are used to identify topics and its associated keywords that represent a specific area of interest or a category. Some of the areas identified reflect or create a positive image of ‘BLM’ as a social justice movement which is listed in Table 2.

Table 2.

Categories that create a positive image about BLM.

Categories/keyword	Related terms
Equity	equitable, equivalent, identity, humanity, equality, equally, equal, respect, uniting
Inclusion	inclusive, integrity, include, involve, involvement, recognise, supporting, accept, nurtured, extend
Justice	social justice, injustice, court justice, reform
Victim	victimhood, suffer, suffering, loss, accused
Peace	peaceful, peacefully, peaceful protest
Inequality	indifferent, inequality, inequity, wrongly, mistreat, misconduct, leftist, humiliation, biased, bias, disorder,discrediting, inhuman, privilege

The existence of certain keywords or categories as listed in Table 3 reflects negatively and creates a negative image of BLM as a violent movement.

Table 3.

Categories that create a negative image about BLM.

Categories/keyword	Related terms
Violence	violently, suppress, suppression, uncomfortable, police violence, brutality, cyberbullying, ruthless, vandalism,attack, abuses, attacked, upheaval, ignorant, oppressor
Protest	protester, protesting, demonstration, activists, movement, demands, support, antifa
Exclusion	unacceptable, offensive, revolution, revolutionary, exclude, divide, discriminate, discrimination, discriminatory,different, isolate
Racism	supremacist, supremacy, race, racial, racism, racist, racistinchief, radical
Riots	riotusa, revolt, outrage, disrupt, destroy, riot, rioter, rioting, revenge, loot, looters, escalate

Figure 8 shows the number of tweets corresponding to each of these categories or areas. The study results show that people who were tweeting about #BLM refer to being treated unequally or differently and are asking for equity, peace and justice. However, strong terms, such as violence, riots and protest, could be linked to misinformation and social noise. While certain tweets could indicate people opposing the discussion and their role might be to spread misinformation and cause trouble. A large number of other participants going along might have other motives for being there, which can amount to social noise.

Figure 8.

The number of tweets related to each of the categories.

4.4. Word associations

Word associations aid comprehension of the words that are linked together. Figure 9 is an example of the type of network diagram created to better understand the relationships between the dominant keywords in the BLM data. From Figure 9, it can be inferred that the word ‘blm’ is the centre of all the tweet data and this is because the data used in this research are about BLM movement. Also, the keyword ‘black’ forms the other centre of the network graph meaning the hashtag BLM was mainly used in reference to black people. It is interesting to know that the ‘black’ keyword is directly connected to the other keywords, such as justice, white, racism, violence, police and support. From Figure 9 network graph, we can see that the term police and the term white seem to be more associated with negative terms, such as racist, supremacy and brutality. Figure 9 is an interactive graph built using the pyvis library. The link to the interactive graph is https://drive.google.com/file/d/13LK4iRJKk_td7T5wSKydzkadX4X9N33H/view?usp=sharing (Note: The file needs to be downloaded to access the graph).

Figure 9.

Network graph representing term association between dominantly used terms.

4.5. Word associations using Bigrams

Bigrams refers to the combination of two words. As a part of further data analysis, for the entire tweet data, bigrams were constructed. The frequently occurring bigrams were then analysed, and interesting word associations of the data were made for the tweet data. Table 4 shows the top 36 bigrams in the data.

Table 4.

Top 36 bigrams in the data.

Bigrams/words	Association Count	Bigrams/words	Association Count
(‘blm’, ‘protests’)	1601	(‘racial’, ‘inequality’)	55
(‘police’, ‘brutality’)	1529	(‘hate’, ‘white’)	50
(‘systemic’, ‘racism’)	526	(‘black’, ‘culture’)	49
(‘peaceful’, ‘protest’)	441	(‘victims’, ‘police’)	43
(‘justice’, ‘george’)	410	(‘racial’, ‘discrimination’)	39
(‘police’, ‘violence’)	321	(‘racism’, ‘discrimination’)	39
(‘justice’, ‘blm’)	251	(‘blm’, ‘political’)	38
(‘blm’, ‘policebrutality’)	246	(‘racial’, ‘bias’)	38
(‘justice’, ‘peace’)	209	(‘revolution’, ‘blm’)	37
(‘racism’, ‘police’)	189	(‘police’, ‘misconduct’)	35
(‘racial’, ‘injustice’)	172	(‘blm’, ‘socialjustice’)	32
(‘demand’, ‘justice’)	153	(‘equity’, ‘fairness’)	32
(‘end’, ‘racism’)	121	(‘diversity’, ‘inclusion’)	30
(‘equality’, ‘blm’)	90	(‘white’, ‘leftists’)	28
(‘blm’, ‘antiracism’)	82	(‘cultural’, ‘revolution’)	17
(‘blm’, ‘endracism’)	76	(‘identity’, ‘politics’)	17
(‘hate’, ‘blm’)	63	(‘humiliate’, ‘police’)	15
(‘equal’, ‘rights’)	55	(‘racial’, ‘division’)	13

5. Discussion

The results have shown that the existence of social noise can affect the original message by diluting the focus or changing the subject to distort the original intended message through the spread of misinformation. From the topic modelling and data analysis techniques used in this study, we are able to identify a set of keywords that fits the four social noise constructs/categories identified originally by Zimmerman [19]. The study also identified sets of keywords that did not match the original four constructs and was initially classified as other. After a close examination of the list of keywords generated, we used the term association method to better understand the original purpose of the tweets and the context in which these tweets were generated. This led to the creation of two new social noise constructs.

Based on the analysis of the data and the relationship between various tweets, the researchers decided to create two additional social noise constructs/categories. The two additional constructs are ‘affiliation and politics’ and ‘norm and beliefs’. It is observed that most of the social noise generated around these two categories, whether the affiliation and politics category or the norms and belief category are tribal in nature. People without much thinking or further examination of the facts tend to agree and forward the original tweets. Both affiliation and politics, and norms and beliefs are deeply rooted in the culture. Table 5 shows the expanded constructs and categories representing social noise. The table provides definitions of each construct and sample keywords used to indicate or describe each construct. Further studies are needed to understand the factors that motivate people to participate in misinformation or social noise.

Table 5.

Social noise constructs with two additional constructs and their keywords.

Construct	Definition	Sample keywords
Image curation	Is the effort by a social media user, consciously or unconsciously, to craft their online identities	Please, willing, identity, recommend, report, thank, determination, express
Relationship management	Refers to a user’s understanding of their roles and responsibilities within social institutions and thelevel of confidence in their personal beliefs	Love, want, help, please, equivalent, humanity, stoptheviolence, awareness, defence, influence, mutual aid, absurdity, ordinated, advising
Conflict engagement	Is the level of social conflict with which a user is comfortable	Report, stop, humiliation, mistreat, misconduct, controversial, discord
Cultural agency	Is characterised by civic participation in social issues and is exhibited by individuals who believe in their own power to be heard and to shape culture and beliefs	radical, recommend, wedemandjustice, ourlivesmatter, revolution, leadership, diversity, community, social, systemic
Affiliation and politics	Is characterised by loyalty to a political party, religion or an organisation. This could include people paid to carry out certain activities or advertise certain products	foxnewsisracist, systemicracism, supremacist, supremacy, racistwhiteleft, fascist, abiding
Norms and beliefs	Is characterised by deep beliefs, culture, ideology, religion, a cause and so on.	changeculture, wedemandjustice, equity, inclusion, fightracism, socialists, communists, anarchists, conservative, extremist

On 26 May, a Twitter user tweeted, ‘How is it we are still dealing with this? How long would it take to solve the problem if a Black man did this to a cop? STOP IT! The 4 cops involved were fired. They should all be jailed for life for murder. #BLM’ This is an example of image curation and cultural agency, in which a Twitter user is constructing their online persona while also participating in social concerns to get their opinions heard. Another tweet read, ‘Joe Biden has been part of the system for 40 years. The system is broken. Which is exactly why I vote for Donald Trump. Criminal reform. More black jobs than any point in history. Record funding for black colleges. #GeorgeFloydWasMurdered #blm.’

This tweet is an example of political affiliation, in which the Twitter user declares their support for a political Party. The Twitter user, however, uses the BLM movement to promote a political Party, which is the strong proof of social noise.

The term association provides a better understanding of the context in which terms within the tweets are used. Term association and based on the repeated occurrences of the terms in the tweets measure the distance between certain concepts visually. As illustrated in Figure 9, the terms ‘blm’ and ‘black’ are located at the centre of the graph indicating that these are the two terms that have the highest occurrence in the dataset. Using bigrams to create word associations (Table 4) helps in arriving at some of the observations listed below.

From the term associations, it is evident that the term ‘BLM’ was most commonly used with the word ‘protests’. From this, it can be understood that people generally meant BLM protest when they tweeted about it. Justice, police brutality, equality, antiracism, end racism, hate, political, revolution and social justice were also frequently associated with the word ‘BLM’. These associations indicate that the term ‘BLM’ was primarily used to refer to and identify with police brutality and inequality towards black people.

The words brutality, violence, racism, victims, misconduct and humiliate were frequently associated with the word ‘police’. It is evident from this that individuals had a negative perception of the word ‘police’ and that the term ‘police’ was mainly used to refer to police aggression against black people.

The words systematic, injustice, end, inequality, discrimination, bias and division were mainly associated with words having the base word ‘race’. This demonstrates that the terms ‘racism’ and ‘racial’ were primarily used to refer to racism towards black people in this BLM dataset.

Many people used the term BLM to refer to it as a peaceful protest because the word protest was mainly connected with the word ‘peaceful’.

The words George, peace and demand were usually connected with the phrase ‘justice’. It can be concluded that the word ‘justice’ was employed to seek/demand justice for George Floyd.

Equal and equity were linked to the word’s rights and fairness, respectively, implying that BLM was employed to promote equality.

The word ‘white’ was mainly associated with the phrase’s ‘hate’ and ‘leftists’, hence many people referred to white people as ‘leftists’ and ‘haters’. The phrase ‘culture’ was associated with the word black; therefore, black was used to refer to culture.

It is also apparent from the bigrams table that the concepts of diversity and culture were linked to inclusion and revolution.

Another noteworthy finding is that the phrase ‘identity’ was commonly used in conjunction with the term ‘politics’, implying that individuals used the term politics with identity, which forms the basis for the ‘affiliation and politics’ construct.

Sentiment analysis is one of the most well-studied methods for determining the underlying sentiment, views of, attitudes towards a situation or opinions in agreeing or disagreeing. As an example, sentiment study of the US 2016 presidential election using Textblob revealed that Hillary Clinton received the most significant number of positive tweets, whereas tweets about Bernie Sanders were primarily based on the Twitter users’ personal opinions [39]. In this study, sentiment analysis helps us to better understand Twiiter users views, attitude and opinion about the BLM movement. Based on the subjectivity and polarity scores shown in the results section, we can see that most of the tweets connected to the BLM data were based on the Twitter user’s personal opinions, and many Twitter users had positive sentiments about the BLM movement. However, the existence of social noise and misinformation on social media do affect the overall sentiment. The problem with social noise is that most people are not aware of their role as participants in spreading misinformation for one or more of the reasons we listed in the social noise constructs in Table 5.

5. Conclusion

Sentiment analysis and topic modelling are interesting data analytics techniques used to process and mine large amounts of data. In this article, we examined the impact of social noise on the original intended message in the context of BLM movement data downloaded from Twitter. The study examined the notion of social noise where the participant engaged in social noise might unintentionally alter or dilute the original intended message of the movement. The sentiment analysis and topic modelling were performed on the 104,546 tweets data extracted for the ‘BLM’ hashtag from 25 May 2020 to 10 June 2020. The sentiment analysis of the tweet results shows 76.12% (79,580 tweets) of the tweets were subjective, while 23.88% (24,966 tweets) were objective or focused on facts. Further polarity scores revealed that 42.16% (44,076 tweets) were positive statements, 29.60% (30,946 tweets) were neutral statements and 28.24% (29,524 tweets) were negative statements.

The sentiment analysis results showed that most of the tweets were based on Twitter user’s personal opinions and that the majority of Twitter users had a favourable opinion about the BLM movement. Even though in the beginning of the data collection period, the number of negative tweets about BLM was more than the number of neutral tweets, during the period from 3 June to 6 June, the number of negative tweets fell below the number of neutral tweets and subsequently increased during the remainder of the data collection period. This can be attributed to possibly a decline in social noise and misinformation over time.

LDA and BTM techniques are used to identify keywords and terms that indicate the existence of social noise. The results show that the two methods used produced topics that are helpful in identifying keywords and terms that indicate the existence of social noise constructs. However, the Biterm method which is normally used for classifying short messages and consists of two word co-occurring to produce better results than the LDA. The analysis of the identified topics together with the results from term association using Bigram network graph results in identifying two additional constructs, namely ‘affiliation and politics’ and ‘norms and beliefs’. Such construct could be helpful in explaining social noise and the role that might play in diluting or altering the original intended message. This is necessary in dealing with misinformation that is generated by users who did not have the intent to create or spread misinformation. However, their unintentional participation and action in forwarding and spreading misinformation can contribute greatly to the social noise problem.

6. Limitations and further studies

While this study analysed the data related to hashtag BLM to determine misinformation and social noise, it has some limitations. For instance, images, videos and emojis were removed from the tweets during the data preprocessing stage. Furthermore, the data for this study were collected from 25 May 2020 to 10 June 2020. A more comprehensive study over a more extended period would create a richer picture and enable to dig deeper into the impact of misinformation and social noise on the intended message of movements, such as BLM.

Further studies using more advanced methods, such as natural language processing, or deep learning models using other social media platforms, such as Facebook, will help better understand the impact of social noise and misinformation. The topic modelling technique used in this article is also limited to the LDA and biterm methods. Different topic modelling techniques, such as latent semantic analysis (LSA) and k-means clustering, can be used for further study. Furthermore, emotions analysis can be applied to identify emotions, such as anger, sadness and happiness for each tweet. It is also important to note that this study is part of an ongoing research project to understand social noise and its impact on social media users.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Nayana Pampapura Madali

Manar Alsaid

Suliman Hawamdeh

References

Chander

. The racist algorithm? Mich Law Rev 2017; 115: 1023–1045.

Chambers

. Truth, deliberative democracy, and the virtues of accuracy: is fake news destroying the public sphere? Polit Stud 2021; 69: 147–163.

Schiffrin

. Disinformation and democracy: the internet transformed protest but did not improve democracy. J Int Aff 2017; 71: 117–126.

Wardle

Derakhshan

. Information disorder: toward an interdisciplinary framework for research and policy making. Council of Europe, https://edoc.coe.int/en/media/7495-information-disorder-toward-an-interdisciplinary-framework-for-research-and-policy-making.html

Zimmerman

Behpour

Hawamdeh

. Factors impacting social media users’ information behavior: the concept of social noise. In: iConference 2020 proceedings, 2020, https://www.ideals.illinois.edu/handle/2142/106527

Black Lives Matter, https://en.wikipedia.org/wiki/Black_Lives_Matter (accessed 18 April 2021).

Anderson

. Demystifying the Arab spring: parsing the differences between Tunisia, Egypt, and Libya. Foreign Aff 2011; 90: 2.

Howard

Duffy

Freelon

et al. Opening closed regimes: what was the role of social media during the Arab spring? https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2595096

Bellin

. Reconsidering the robustness of authoritarianism in the middle east: lessons from the Arab spring. Comp Polit 2012; 44: 127–149.

10.

Zhang

Counts

. Modeling ideology and predicting policy change with social media: case of same-sex marriage. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems, Seoul, Republic of Korea, 18–23 April 2015, pp. 2603–2612. New York: Association for Computing Machinery.

11.

Gibson

. Same-sex marriage and social media: How online networks accelerated the marriage equality movement. London: Routledge, 2018.

12.

Oleszczuk

. #hashtag: how selected texts of popular culture engaged with sexual assault in the context of the Me Too Movement in 2019. New Horiz Engl Stud 2020; 4: 208–217.

13.

Clare

. Black lives matter: the Black lives matter movement in the national museum of African American history and culture. Transfers 2016; 6: 122–125.

14.

Ince

Rojas

Davis

. The social media response to Black Lives Matter: how Twitter users interact with Black Lives Matter through hashtag use. Ethn Racial Stud 2017; 40: 1814–1830.

15.

Francis

JNP

. A macromarketing call to action – because black lives matter! J Macromarket 2021; 41: 132–145.

16.

Brock

. From the Blackhand side: Twitter as a cultural conversation. J Broadcast Electron Media 2012; 56: 529–549.

17.

Wardle

. Fake news. It’s complicated. First Draft, https://medium.com/1st-draft/fake-news-its-complicated-d0f773766c79 (2017, accessed 5 February 2022).

18.

Del Vicario

Bessi

Zollo

et al. Echo chambers in the age of misinformation. arXiv [cs.cy], 2015, http://arxiv.org/abs/1509.00189

19.

Zimmerman

. Examining human information behavior on social media: Introducing the concept of social noise. PhD Dissertation, University of North Texas, Denton, TX, 2020.

20.

Del Vicario

Bessi

Zollo

et al. The spreading of misinformation online. Proc Natl Acad Sci USA 2016; 113: 554–559.

21.

Lin

K-Y

H-P

. Why people use social networking sites: an empirical study integrating network externalities and motivation theory. Comput Human Behav 2011; 27: 1152–1161.

22.

Shao

. Understanding the appeal of user-generated media: a uses and gratification perspective. Internet Res 2009; 19: 7–25.

23.

Kietzmann

Hermkens

McCarthy

et al. Social media? Get serious! Understanding the functional building blocks of social media. Bus Horiz 2011; 54: 241–251.

24.

Scissors

Burke

Wengrovitz

. What’s in a Like? Attitudes and behaviors around receiving Likes on Facebook. In: Proceedings of the 19th ACM conference on computer-supported cooperative work & social computing, San Francisco, CA, 27 February–2 March 2016, pp. 1501–1510. New York: Association for Computing Machinery.

25.

Lance Bennett

Livingston

. The disinformation age: Politics, technology, and disruptive communication in the United States. Cambridge: Cambridge University Press, 2020.

26.

Luengo

García-Marín

. The performance of truth: politicians, fact-checking journalism, and the struggle to tackle COVID-19 misinformation. Am J Cult Sociol 2020; 8: 405–427.

27.

Nunziato

. Misinformation Mayhem: social media platforms’ efforts to combat medical and political misinformation. First Amend L Rev 2020; 19: 32.

28.

Kanno-Youngs

Kang

. ‘They’re Killing People’: Biden denounces social media for virus disinformation. The New York Times, 16 July 2021, https://www.nytimes.com/2021/07/16/us/politics/biden-facebook-social-media-covid.html (accessed 18 July 2021).

29.

Vosoughi

Roy

Aral

. The spread of true and false news online. Science 2018; 359: 1146–1151.

30.

Hogan

. The presentation of self in the age of social media: distinguishing performances and exhibitions online. Bull Sci Technol Soc 2010; 30: 377–386.

31.

Kim

EH-J

Jeong

Kim

et al. Topic-based content and sentiment analysis of Ebola virus on Twitter and in the news. J Inf Sci Eng 2016; 42: 763–781.

32.

Chatzakou

Vakali

. Harvesting opinions and emotions from social media textual resources. IEEE Internet Comput 2015; 19: 46–50.

33.

Routray

Swain

Mishra

. A survey on sentiment analysis. Int J Comput Appl Technol 2013; 76: 1–8.

34.

Chen

Lyu

Yang

et al. In the eyes of the beholder: sentiment and topic analyses on social media use of neutral and controversial terms for Covid-19. arXiv 2004, https://arxiv.org/abs/2004.10225

35.

Blei

Jordan

. Latent Dirichlet allocation. J Mach Learn Res 2003; 3: 993–1022.

36.

Wijffels

. Biterm topic models for short text [R package BTM version 0.3.6], https://cran.r-project.org/web/packages/BTM (2021, accessed 5 November 2021).

37.