Abstract
After a few years focusing on issues such as electoral prediction through social media data, many analysts turned their attention toward fake news spreading and misinformation. A coherent next step in elections research through social media data would be identifying what makes communities and individuals less open to manipulation. Misinformation is not simply bad or false information but selective information circulated among isolated and unconnected groups. Here, I will discuss common cognitive biases in link sharing behavior and its effects on politically shaped communities in the Twitter public debate on the 2019 Spanish general election campaign. Finally, I will present and discuss some data-driven mechanisms that may contribute to the mitigation of mass manipulation.
How Social Media Shaped Politics (and Vice Versa)
The 2008 US Presidential election is often considered the first state-level campaign in which social media acquired a central role (Metzgar & Maruggi, 2009). In the beginning of the 2010 decade, electoral prediction through social media data became the holy grail of social big data analysts, who attempted to move from offline survey designs to online data mining through Social Network Analysis (SNA) and Natural Language Processing (NLP) methods (Gayo-Avello, 2013). As more policy makers and marketers started making great use and misuse of social media—intervening on conversations and making voting forecasting even more complicated and unreachable for analysts—more researchers started seeing that grail as a jar full of sour grapes and turned their attention toward fake news spreading and misinformation (Grinberg et al., 2019).
After a very intense decade in which both the benefits and the hazards of big data have been exposed, experts—and hopefully, society as well—have shifted from a kind of ingenuous positivism to a sort of skeptical utilitarianism in their relation with social media. Nowadays, we are all quite aware that social media is not an unaltered and sterile laboratory in which we can measure social reality without taking action over it, but quite the opposite: a heavily coveted and disputed social and political arena with idiosyncratic forms of reciprocal influence. In my view, a coherent next step in elections research through social media data would be identifying what makes communities and individuals less open to manipulation. This brings us to the heart of the matter, which is not only political-communicational but also psycho-sociological.
Here, I will discuss the link sharing behavior of politically shaped communities in the Twitter public debate on the 2019 Spanish general election campaign through SNA techniques. I will explore to what extent specific links are unevenly spread within groups, as long as they conform to established beliefs (i.e., confirmatory bias) and benefit the in-group (i.e., in-group bias). Finally, I will present and discuss some data-driven mechanisms that may contribute to the mitigation of mass manipulation.
Cognitive Shortcuts (se non è vero, è ben trovato )
The famous Italian aphorism above—that can be translated into English as “Even if it is not true, it is well conceived”—eloquently captures one of the central issues of our times: that is, the very fact that the relations that humans tend to establish with information are not principally mediated by truth and factuality, but by belief-confirmation and convenience. One may say that cognitive biases are not new phenomena, and that misinformation is anything but new. That is plainly true. However, it is also true that in the 21st century social media citizens are not expected to be passive information-consumers but active producers, and that fact makes “old” misinformation diffusion patterns acquire a totally different feel, especially when it comes to election campaigns, which usually call upon the digital mobilization of voters.
Cognitive biases have very particular consequences on the Internet and social media. In recent years, expressions such as “filter bubbles” (Pariser, 2011) or “echo chambers” (DiFonzo, 2011) had been quite prominent among both academic and non-academic circles. Those expressions place emphasis on collective effects of cognitive biases that orient the actions that individuals exert on webs and social media sites: actions such as the selection of contacts or content cherry-picking, which can convert the Internet into a custom menu for your eyes only. These lead to the creation and maintenance of closed spaces in which belief systems are reinforced and reproduced. Paradoxically, one can say that the democratization of mass communication is contributing to the atomization of the public sphere.
The aforementioned expressions are metaphors that point at a very simple idea: the lack of diversity in information consumption and social relations is a problematic issue that makes our democracies more vulnerable. I am aware that selective information circulated among isolated and unconnected groups is not often viewed as a form of misinformation; however, given its effects, I will treat it as such here. Seen from this perspective, it looks more than reasonable to argue that submitting information to multiple and heterogeneous audiences, reviewing and dissemination may contribute to misinformation mitigation. That may happen in terms of both how audiences shape information (i.e., different social groups can detect and note more problems, errors, and biases in the information) and how information shapes audiences (i.e., having access to multiple opinions and worldviews improves coexistence in complex societies).
Most analysts and observers show themselves rather pessimistic, arguing that those cognitive mechanisms constitute the human psychological nature and that the field of action is very limited (Self, 2016). However, as I see it, the chance for psychosocial transformation might not be so little: information and knowledge constitute historically proven useful tools that have the capacity to correct and upgrade human cognitive limitations (e.g., although perceiving the earth as flat, we positively know it is a globe). In that sense, I would argue that knowing to what extent a piece of information is being shared by like-minded people—this is, people who share worldviews and beliefs, and who tend to cognitively favor the same social groups—can effectively mediate in information credibility-attribution processes.
Link Sharing in Social Media (Let Me Show You How Right I Am)
Cognitive biases in social media seem particularly relevant when it comes to link sharing practices. Link sharing is a very common practice on social media that allows it, as it permits publishers to share native and external content which may add great value and enhance user experience. In the specific case of political debates on social media such as Facebook or Twitter, link sharing might be used as a deliberative technique itself: external news, opinion pieces, or others’ tweets can be used as arguments and even as evidence.
In the campaign for the Spanish General Elections in April 2019, 77.58% of relevant tweets—which had been identified through SNA techniques 1 —contain at least one link, either to a tweet or to an external site. The total amount of shared links increased to 8.72 million, and 83.01% of them had been only spread along one single community of the six detected, which correspond to the principal Spanish parties. As the figures suggests, link sharing has been a very group-oriented practice in the analyzed conversation. The chance for sharing in-group disfavouring information has been minimal, and so it has been the chance of being exposed to those contents in echo chamber–shaped social media.
At that point, to what extent can we affirm that the truth or falsity of a particular information is a relevant factor in ongoing misinformation processes? It goes without saying that civic values and professional ethics of those who are committed to journalism and information spreading are opposed to falsity and manipulation. But most troubling of all is that, even if assuming that 100% of the links shared in the Spanish election campaign were true and peer-verified pieces of information, the overall result would also imply massive misinformation as a consequence of how people pre-process and pre-select information pieces based on biased criteria.
Audience Analytics as a Means of Misinformation Prevention
Knowing the audience—who they are, what they care about, what makes them angry, and so on—is today a central aspect of news business. The great majority of media companies have already embraced different kinds of data-driven models which affected several aspects of the industry. Social audience analytics have been largely exploited with the purpose of value creation and business boosting, but its role in political misinformation prevention remains largely unattended. Here, I would like to draw attention to the following relationships established between the users who share the same link on Twitter. 2
A very simple indicator in relational datasets (i.e., when data reflect not individual properties such as “gender,” “age,” “education level,” or “bounce rate” but node links that conform to networks) is the Clustering Coefficient. The measure informs about how well connected are each node’s neighbors within a network, taking values between 1 (i.e., all the neighbors are connected to each other) and 0 (i.e., all the neighbors are unrelated to each other). While Clustering Coefficient is a nodal measure, considering the average of all individual nodes’ Clustering Coefficients informs about how nodes in a network tend to cluster (Watts & Strogatz, 1998).
Let’s now consider two different networks (Figure 1). Both have a very similar number of nodes, but significant differences. Network A contains 980 following relationships among 411 users who shared a particular link. 3 The users who shared the link are distributed along the six clusters of the network. On the contrary, Network B contains 3,183 following relationships among 370 users who shared another link. 4 All the users who shared the link belong to the same cluster. In other words, nodes from Network A are politically diverse and mixed, whereas nodes from Network B are very much alike.

Network A (411 nodes and 980 links) versus Network B (370 nodes and 3,183 links).
The average Clustering Coefficient of a network is higher when its nodes are strongly bonded, and it is lower when its nodes are weakly related. In our case, the low figures of average Clustering Coefficient for Network A may well be derived from the fact that the users who shared that link belong to different clusters, that is to say, they probably support different parties and defend different ideas. In contrast, all the nodes that conform to Network B belong to the same cluster, which directly results in greater relationship density and in a higher average Clustering Coefficient.
One may argue that low average Clustering Coefficients do not ensure link content’s truth or quality at all. Nevertheless, while this is true and should be acknowledged, it also has to be considered that they may result as quite successful predictors for heterogeneous audiences, which are necessary conditions—though not sufficient—for global misinformation mitigation, especially in political or election conversations on social media.
Conclusion
User similarity is a key aspect of social networking sites, which guides self-learning algorithms as past click-behavior and search history do so in regular websites. Content is usually spread within closed social circles with very tall and tough entry barriers (i.e., algorithmically reinforced cognitive biases) that prevent diversity of opinions to spread. Thereby, I suggest that knowing whether the audience of a particular information is diverse or homogeneous can help in the detection of misinformation processes from a global perspective.
Average Clustering Coefficient is a very simple measure which can be calculated for every set of user relationships on Twitter—and also on Instagram or Facebook, if the companies would be willing to authorize so—and provides a 0 to 1 easily understandable index on how similar those that boost a particular link in social media are. Aggregately, it could also provide useful insights on how like-minded the active audience of any given media is.
The two analyzed cases look promising as long as average Clustering Coefficients seem to be good candidates for audience heterogeneity prediction. Of course, sample broadening will be an indispensable condition for proper hypothesis testing. In addition, systematic and cautious evidence-based research will also be required to clarify some blind spots such as the effect of network size in the measure and different sorts of contextual dependencies.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
