Abstract
This article seeks to address current debates comparing polls and opinion mining as empirically based figuration models of public opinion in the light of in-depth intellectual debates on the role and nature of public opinion that began after the French Revolution and the controversy over public opinion spurred by the invention of polls. Issues of historical quantification and re-conceptualisation of public opinion are addressed in four parts. The first summarises the history of the rise and fall of the concept of public opinion. The second re-examines the key controversies in the debates on the theoretical, empirical and social implications and consequences of the invention of polling. The third part scrutinises the datafication of public opinion that started with polling industry and continues in the age of big data and data mining. The final section discusses the controversial potentials of opinion-mining technology and suggests ways in which social scientists could critically respond to the big data and opinion-mining challenges in order to reintegrate the ideas of publicness, the public and public sphere into public opinion research.
Introduction
Less than a century ago, the founders of opinion polling proudly proclaimed that a new method of data gathering had solved the age-old problem of reliable empirical public opinion research, which also triggered an avalanche of critical concerns. Today, the primacy held by opinion polls for decades seems to be challenged by opinion mining or sentiment analysis based on machine learning and big data analytics. The social sciences, according to Boullier, are now moving away from the second generation, in which their key concept was
Opinion mining, as one of the most advanced areas of natural language processing designed to explore subjective opinions gathered from various sources on a particular topic, is believed to have ‘overcome the limits of traditional polls’ (Zhou et al., 2021) while ‘[p]olling was, and remains, in disarray. Now, it's being supplanted by data science: why bother telephoning someone to ask her opinion when you can find out by tracking her online?’ (Lepore, 2020). The data needed for analysis are harvested from social media, which journalists often use in the news to ‘infer and report public opinion’ and complement survey polling (Dubois et al., 2020: 57; Mc Gregor, 2019: 1070). However, in demonstrating the potential advantages of opinion mining over polls, it is overlooked that polls have in fact not solved most of the problems of public opinion research at all, so it is not justified to set them as the gold standard for ‘testing’ the effectiveness and validity of opinion mining. The basic postulates of public opinion as a complex social phenomenon or figuration, laid down in a long and intellectually exciting history of public opinion theory and research, are largely forgotten in these debates.
This article seeks to address current debates comparing polls and opinion mining as empirically based figuration models of public opinion in the light of in-depth intellectual debates on the role and nature of public opinion that began after the French Revolution and ended with the rise of opinion polls, and the controversy over public opinion spurred by the invention of polls. Addressing issues of historical re-figurations and re-conceptualisations of public opinion could add depth to the current discussions about the effectiveness of public opinion research. Specifically, the article addresses the tendency to reconceptualise public opinion by extending the concept to new empirical settings, such as new technologies, business models and political regimes, rather than by introducing genuine normative or theoretical innovation.
The article is composed of four parts. The first summarises the history of the rise and fall of the concept of public opinion. The second re-examines the key controversies in the debates on the theoretical, empirical and social consequences of the invention of polling. The third part scrutinises the datafication of public opinion that started with polling industry and continues in the age of big data and data mining. The final part suggests ways in which social scientists could critically respond to the big data and opinion mining challenges in order to reintegrate the ideas of publicness, the public and public sphere into public opinion research. The ideas outlined in the article are discussed in more detail in a forthcoming book on the impact of quantification and datafication on the critical conceptualisation of publicness (Splichal, 2022b).
The enlightenment legacy of the public and public opinion
The ideas of ‘the public’ and ‘public opinion’ were undoubtedly revolutionary when they were first conceptualised and popularised in the early 16th century. Machiavelli was probably the first to heretically trust the virtues of ‘general opinion’ more than his Prince. He praised ‘ordinary people’ as the most reliable guardians of human freedom and social prosperity, arguing that, ‘the multitude is wiser and more constant than a Prince’ and linking ‘
When the concept of public opinion was later adopted by the scholarly literature of the Enlightenment movement, it became the ‘magic key’ to the development of popular governance based on public opinion that provides
Rousseau attributed a similar role of the ‘censorial tribunal’ to public opinion, while he criticised ‘our political theorists’ for not recognising its great significance, ‘though success in everything else depends on it’. He added public opinion, together with morals and customs, as the fourth and the most important kind of law to the three conventional kinds of law – political, civil and criminal law regulating relations between the individual and society (Rousseau, 1762/2017: 28). For Rousseau, public opinion ‘forms the real constitution of the state; takes on new powers every day; restores or replaces other laws when they decay or die out, keeps a people in the spirit in which it was established, and gradually replaces authority by the force of habit’. Nevertheless, he also regarded public opinion as ‘the grave of a man's virtue and the throne of a woman's’, and characterised it as poison, slave, tyranny and yoke (Rousseau, 1762/1921).
Kant did not address ‘public opinion’ in his writings, but he should nevertheless be recognised as a founder of the modern conceptualisation of publicness and public opinion. His discussions of the supreme ethical
Kant's principle of publicness reconciled
At least since Hegel, public opinion was often seen as little important or irrelevant, or even socially harmful. While he acknowledged the importance of distinguishing between public and private reasoning by arguing that ‘[w]hat someone fancies when he is at home with his wife and friends is one thing, and quite another thing what happens in a great gathering where one cleverness annihilates the other’. He praised publicity as ‘the remedy against the arrogance of the individual and the multitude, and one of the greatest means of education for them’ Hegel (1821/1986: 482), but he also agreed with Ariosto, who compared public opinion with an ignorant vulgar person who reproves everyone and speaks most of what he understands the least. As Hegel suggested, in public opinion ‘the way is open to everyone to express and assert his subjective opinion about the general’ (ibid.: 477), which is why one can find as much truth as error in it. The idea that public opinion should influence the affairs of the state seemed absurd to Hegel, as it wrongly suggests that everyone understands state affairs. In reality, the task of transcending the contradiction between subjectivity and objectivity in public opinion cannot be performed by itself, but only by the self-dependent state and its organs. It is wise to know and understand public opinion, but not to follow it; on the contrary, ‘the great man of the time’ is for Hegel, who can ‘learn to despise public opinion’ (Hegel, 1821/1986: 486).
Despite concerns about the subjectivity and fallacy of public opinion, Hegel did not doubt that public opinion existed and was in some way relevant to governance and to ‘the great men’ in power. This general view of the contradictory nature of public opinion shaped the scholarly debates about public opinion for most of the next century, with one side emphasising the rational-democratic potential of public opinion and the other its gullibility and manipulability, or majority intolerance of dissenters. This controversy marked the subsequent decline of the critical momentum of
Figuration of public opinion in the social sciences
Public opinion remained a central issue in political-philosophical and sociological debates in the second half of the 19th and the beginning of the 20th century. J. S. Mill, Tocqueville, Bryce and other prominent 19th-century public opinion theorists offered many different accounts of the nature and role of public opinion. Although no agreement was reached on whether public opinion increases the resilience of a democratic social order or threatens it, the debates unequivocally confirmed the belief that it has a significant impact on society and the functioning of a democratic government. Yet the increase in the number of discussions on public opinion in the 20th century has intensified controversies over what exactly constitutes the object of discussion. Critical sociologists have offered a new perspective: the earlier democratic control over rulers as the main social function of public opinion has been increasingly substituted by
The individualisation of public opinion in the surveillance paradigm also revised the democratic political-philosophical tradition by considering public opinion merely the sum of individual opinions. The former ‘substantive’ conceptualisation of public opinion that conceived of public opinion as an
Nevertheless, the early 20th century was still a period of exciting in-depth intellectual debates about the complex nature of public opinion in the context of efforts to establish a general theory of society. They were focussed on public opinion as a dynamic social network of interdependent individuals or a ‘figuration’, to use Elias’ generic term for ‘a structure of mutually oriented and dependent people’ (Elias, 1968/2000: 482). At that time, seminal works theorising public opinion were published, including Gabriel Tarde's
Tarde and Tönnies sought to construct a sociological theory of public opinion as a socio-cultural phenomenon formed and expressed by
At the height of theoretical sociological debates on public opinion, psychologists had trivialised the concept of public opinion by reducing it to a simple aggregate of individual opinions (Bernays, 1923/1961: 61). Dissatisfaction with conflicting definitions of public opinion led a group of American political scientists had found the term public opinion useless and suggested to withdraw it as the subject of scientific inquiry (Binkley, 1928: 389). However, empiricist pessimism was an exaggerated reaction to the misunderstanding of the concept of publicness and the erroneous belief that the absence of a generally accepted definition absolutely undermines the validity of the concept. It was soon overcome by the substantial social and technological changes that occurred shortly after these debates and unexpectedly marginalised the social theories of public opinion that reached their historical climax in the 1920s. The reason was rather trivial. The birth and rise of opinion polls in the 1930s severely weakened the socially critical vigour that had been embedded in theories of public opinion since the 16th century and then intellectually anchored in the Enlightenment.
In 1932, the US electorate was surveyed in a poll by Houser Associates on voter choices and attitudes towards political issues (Norpoth, 2019), and Gallup conducted several polls for his mother-in-law's candidacy for the Iowa Secretary of State (Hawbaker, 1993: 107). Shortly afterwards, Gallup used the congressional election to test the reliability of his polling processes (1934) and in 1935 founded the American Institute of Public Opinion. The invention of opinion polls has rejuvenated public opinion debates, albeit mostly in an administrative direction, quite different from those that took place in the previous period. Over the next 10 years, the epistemic status of public opinion in the social sciences changed dramatically. The previously predominant reason for pessimism, the lack of reliable empirical procedures, was overshadowed by the absence of conceptual discussions of public opinion.
Polling: The advent of empirically based figuration of public opinion
The invention of polling was part of the pursuit of objectivity and social and institutional recognition, which led the social sciences to follow the path of quantification that has long prevailed in the natural sciences. Social scientists have gradually begun to ‘express in numbers what was previously expressed in words’ (Desrosières, 2016: 184) in order to overcome the limitations of qualitative analysis, such as inaccuracy and arbitrariness (Lasswell, 1949), and to supplement their (subjective) judgement and partly replace it with at least seemingly more objective quantitative data and statistical procedures. Opinion polls, which made it possible to obtain quantitative data on the opinions and behaviour of the people relevant to commercial interests and the political process, were a distinctive case of the quantification of the social sciences.
Polling reconfigured public opinion and strongly influenced the conceptualisation and, in particular, the popular perception of public opinion and its role in the democratic political process. Prior to opinion polls, the social sciences were rather unsuccessful in attempts to operationalise the normative concept of public opinion. With new methodological procedures of sampling, attitude measuring and scaling, polling seemed to provide a satisfactory degree of operational validity. It was soon domesticated in social scientific research, and in 1937 got its first ‘own’ journal,
The emergence of opinion polls was regarded as an important contribution to ‘solving the problem of quickly, economically and accurately determining the state and trends of public opinion on a large scale’, which made public opinion no longer considered ‘a mysterious force’ (Childs, 1965: 45). Dewey's robust theoretical definition of the public has been side-lined to promote public opinion as ‘the aggregate of the views men hold regarding matters that affect or interest the community’ (Bryce, 1888/1995; Gallup, 1957: 23) as the basis of the new operational concept. Polls provided information on the attitudes of individuals relevant to the political process and predictions of their electoral behaviour, as well as commercially relevant information on consumer preferences, habits and purchasing power, which became a key area of market research. Yet they have conceptually replaced the public as the subject of public opinion (e.g. in Bentham's “tribunal of public opinion” composed of politically reasoning individuals) with a dispersed mass or even any aggregate of individuals. Allport (1937: 9) excluded the public from the definition of public opinion as ‘superfluous for the purpose of research’ and reduced public opinion to a multi-individual situation, as earlier suggested by Bryce (1888/1995) and later adopted by Gallup (1957).
Discussions of reliability and validity of opinion polls were so prevalent in public opinion debates that polling soon became the ‘dominant paradigm’ marginalising earlier theoretical and normative conceptualisations that began to disappear in the last quarter of the century. ‘The firm establishment of a public opinion polling industry’ was believed to have ‘homogenized the definition [of public opinion] and stabilized it for foreseeable future’ (Converse, 1987: S13), but it also divided the academic community into those who saw opinion polls as means to democratise society and those who criticised them for undermining democratic life. Just as quantification made an important contribution to the emergence of the gap between ‘academics’ and ‘technicians’ in the social sciences (Lynd, 1939/1970: 1), polling contributed to the gap between ‘critical’ and ‘administrative’ social research (Lazarsfeld, 1941).
Polling is a clear example of the imitative effectiveness of ‘the quantitative technologies used to investigate social and economic life [which] work best if the world they aim to describe can be remade in their image’ (Porter, 1995: 43). Numerous studies, historical evidence, and experiments show that random sampling makes the boundary between the research process and the political process very porous. Ancient Athens is a prime historical example of the political use of
The very similarity with the electoral process as a research subject was claimed to be the main advantage of polls compared to other figurations of public opinion (Gallup 1971). Pre-election and particularly exit polls and elections measure the same phenomenon, citizens’ preferences for political parties and candidates for political positions, with the same instrument (secret ballot) but with different methodological procedures, degrees of reliability and different consequences. Polls are also similar to political processes in that they are in part legally regulated. For example, in many countries, the publication of exit poll results is not allowed on election day before polls close and the vote count begins; in some countries, an embargo is even imposed on poll results in the last 24 h before elections.
The main theoretical critique of polls emphasised the lack of efforts to define public opinion as a generic concept on which (empirical) research would be focussed, thus reducing the concept to a particular procedure to measure an unidentified phenomenon. Reducing public opinion to an aggregate of individual opinions identified in a random sample of the population but not publicly expressed independently of the polling procedure also opens the door to abuses by manufacturing and manipulating results. Even if all the conditions of methodological rigour are met in data collection and analysis, the artefact produced by polls is not ‘public opinion’ because it is produced in a purely artificial situation.
Adorno criticised the German translation of ‘public opinion polls’ as
While pollsters praised polling as a solution to the growing democratic deficit in mass democracies, its critics saw it as a misconception of public opinion. Due to private communication with respondents and the anonymity of the response data collection process, polling was criticised as the opposite of what was essential for public opinion in the ‘pre-polling’ figuration:
Critics have also reproached polls of not encouraging public debate, but replacing and even preventing it, thus representing ‘phantom opinions’ (Fishkin, 2009: 33). As there is no public debate involved in polling, relatively poorly informed and unmotivated respondents anonymously answer the questions with improvised and unreliable answers and rarely admit that they do not know the matter in question or do not have an opinion on it. Polls, by their very nature, cannot meet the standard of validity of public opinion research demanded by Adorno, Blumer or Bourdieu. Critical theory had to come to terms with the realisation that public opinion does not exist ‘in the form attributed to it by those who have an interest in asserting its existence’ (Bourdieu, 1973: 1309).
Because of the polls, public opinion was increasingly considered an individual and a behavioural phenomenon of little significance for institutional and action-oriented research topics prioritised in social research. While polls have reached the position of predominant conceptualisation of public opinion, they have mostly been used outside of social research to measure ‘mass attitudes’ and their potential impact on commercial and political outcomes. It is not surprising then that in the 1970s, much like the conclusion of the 1920 round table of American political scientists that the concept of public opinion would be best to abandon, critical theory has largely abandoned public opinion as a subject worth exploring in sociological studies on issues such as collective action and democratic governance, and/or questioned its legitimacy and efficacy as a national and transnational phenomenon (Fraser, 2007). The tacit consensus on the irrelevance of public opinion in (critical) social research grew despite the fact that it was a time of ubiquitous popular social rallies and movements that could undoubtedly be labelled and studied as public opinion phenomena. The sudden emergence and rapid popularisation of a new critical concept,
Digitisation of communication and the rise of opinion mining
Just as theoretical critiques of opinion polls have never threatened their social dominance or the apparent validity of this approach to measuring public opinion, the reasons for the recent decline in opinion polls are not scientific or epistemological but are technological and commercial in nature. In the crisis of opinion polling, which followed the long-standing conceptual polarisation associated with it, data and opinion mining and analysing large data sets from social media offered an opportunity to reflect on old problems of conceptualising and researching public opinion while looking for new approaches to solving them. Whereas polling was the result of the quantification of the social sciences, opinion mining is closely linked to the possibilities of datafication brought about by the Internet and digitisation of communication.
With the digitisation of communication and the Internet, complex integrated public-private communication networks have emerged, generating a large amount of private and public information, structured and unstructured (textual) data and digital traces that can be collected, stored and analysed in real time. Data and opinion mining in digital networks enable continuous automated recognition and prediction of individuals’ opinions and behavioural patterns from (meta-)data and digital traces that users partly unintentionally and unknowingly generate and leave behind in online networks. The physical non-invasiveness, availability of big data on all aspects of human behaviour, the reduced costs for researchers and the elimination of time-consuming respondent participation have made big data analytics an important complement and even alternative to some of the existing methods of data production, gathering, analysis, and management, including polling.
With digital information and communication technologies and the Internet, quantification has grown exponentially and reached a new level with
Opinion
Praising the correlational conceptualisation of problems and solutions is a further step in the direction of the ‘totalitarian Enlightenment’ criticised by Horkheimer and Adorno (1944/2002: 3). It is true that for practical purposes it may be more important to find ‘correlational’ solutions to a problem than to identify its causes. It is certainly more important for a person who falls ill to know that, for example, ‘certain combinations of aspirin and orange juice cause remission of a deadly disease’ (Mayer-Schönberger and Cukier, 2013: 19) than to know what actually
Prior to the advent of digitalisation and computer-mediated communication, the data gathering process (e.g. survey and text analysis) was mostly completely detached from the process of communication as the object of study, with the exception of the participant observation typically used in qualitative research. In digital communication, the relationship is reversed: extensive data sets are created in the online communication process continuously and independently of the research process, before it or in parallel with it. While communicating, Internet users leave quantifiable fingerprints of their personal information online and (often unwillingly) make them available to others. This information is inherent to online communication because it is automatically generated by the software that enables communication and cannot be prevented. Data generated in socio-digital networks independently of researchers’ efforts to collect data to study phenomena can generally increase the risks traditionally associated with the use of archival data, collected independently of the research design. The main danger of the use of pre-existing data sets in inductive empirical research lies in the fact that an interpretation that ‘fits the facts’ can always be found in this process, which often becomes the sole aim and determines the course of data analysis and research (Merton, 1945: 468).
Compared to data obtained from opinion polls, pre-existing data is a more cost-effective solution for clients funding research. As data are extracted from open-access online communication, commercial data providers often have little or no regard for the compensation of those who created them, even without consciously participating in the research. Achieving high accuracy of representative opinion poll data is associated with high research costs; it also requires respondents to contribute their leisure time, usually free or for a minimal reward. The pressures to reduce ‘unproductive’ spending and ‘survey fatigue’ of respondents, which reduces their willingness to respond, seriously jeopardise the survival of opinion polls when more cost-effective data mining emerges as an alternative.
Alternative empirical models of public opinion
When it comes to comparisons between two empirically-based figuration models of public opinion, opinion mining and polling, the first question that is usually asked is whether opinion mining and opinion polls gauge the same ‘public opinion’ or whether opinion mining is really an alternative or a potential substitute for polling. The proponents of data mining argue that opinion mining in online networks is no less accurate than opinion polls (Schober et al., 2016). This belief is supported by fairly convincing empirical evidence. Several electoral studies suggest that polls, with their largely proven high reliability that was once the main argument for their validity, may be losing its predictive power compared to opinion mining. Brexit and the 2016 U.S. presidential election results indicated, for example, that data from social media conversations can capture a more realistic picture of ‘public opinion’ than poll data. Just a few days before the election, Trump garnered 12 million total Facebook likes , 4.1 million more than Hillary Clinton, while national polls erroneously predicted a more than 65% chance of Hillary winning. Similarly, Brexit, which was poorly predicted by polls, was a preferred option in the Facebook posts (Pettigrew, 2016). Even as early as 2009 in the German federal election, the relative shares of parliamentary political parties mentioned in tweets during the campaign matched their election results quite well (Jungherr, 2015: 1–2). The above comparisons demonstrate that no major differences (need to) exist between the results of data mining and polling, or that new data mining procedures can reduce the differences, if they exist. The role of socio-digital networks, such as Twitter, in the political process also speaks in favour of opinion mining as they enable tracking of ‘public opinion’ in real time. They seem to have changed the (semi-)public expression of candidate support and voter decision-making much more than polling and television have changed political campaigns and voting behaviour in the 20th century, or other new technologies before that.
Nevertheless, the key methodological and epistemic differences between polling and data mining of already existing or ‘found’ social media content still seem to be in favour of polling, which is claimed to be epistemically superior for three main reasons. (1) Data in polls are obtained through random sampling, whereas social media data are not representative of any ‘offline’ population and ‘cannot replace data obtained through scientific sampling’. (2) Survey response data are considered ‘independent observations’ (although lured by interviewers) while the ‘found data’ are often reactions to what has been previously posted by other Internet users, which ‘may result in skewed data distributions where a few pieces of information catch most attention’. (3) All opinions are considered equal in polls, while social media data are derived from algorithmic manipulation that makes the most prominent information appear even more prominent (JRC, 2018: 9–10).
If we look at these three differences from a critical perspective, it turns out that they are not ‘weaknesses’ of opinion mining, which are supposed to cause the ‘unreliability’ of data extracted from social media platforms, but in fact
Random sampling was criticised by Blumer as ‘the inherent deficiency of public opinion polling’ Blumer (1948: 546). Due to the assumption of the independence of each sample unit, it promotes the notion of society as a mere aggregate of disparate individuals and ignores ‘the framework and the functional operation of the public opinion’ (ibid.: 547). Not all respondents are equal in shaping public opinion, while individuals who have the greatest influence on public opinion may not even be included in a random sample. In contrast, opinion mining reflects, at least in part, ‘the functional composition and organization of society’ by analysing content produced by both individuals and collective entities independently of research aims and procedures.
In addition, random sampling does not guarantee the general validity of polls. They are considered inherently accurate due to their rather spectacular success in predicting election results, but predictive validity is only one dimension of validity and cannot be generalised from a special case (election polls) to any use of polls. The problem of the operational validity of opinion polls is exacerbated by the growing non-response rates, which leads to higher survey costs and might distort results.
The interaction between the researcher and the respondent is not ‘neutral’ but often manipulative, for example, in choosing questions, question placing and wording. In contrast, the data and traces found in opinion mining are not contaminated with a research instrument as in polling, but may be an algorithmically stimulated reaction of individuals to what other Internet users have previously posted. Similar to the ‘standard practice of open discussion’, algorithms can give ‘too much weight to the opinions of those who speak early and assertively, causing others to line up behind them’ (Kahneman, 2011: 85). When trying to get the most useful information from multiple sources of evidence, the sources should be made independent of each other, as Kahneman argues, but a public discussion of opinions does not follow this rule, as it is precisely aimed at not only allowing but stimulating participants to influence each other.
In short, there is no justification to set polls as the gold standard for assessing the accuracy of opinion mining. Comparing the performance of opinion mining and polling is hardly relevant insofar as it is reduced to the question of the contestable superiority of reliability and predictive validity of polls as the basis for setting them as the gold standard of measuring public opinion. Taking into account historical critiques of opinion polls, opinion mining, in addition to very detailed presentations of the opinions and behaviours of individuals and groups, can better present genuine public opinion than polls. But first and foremost, the question of (the loss of) the bearer of public opinion, the public, needs to be addressed in relation to contemporary data mining technology. If we ignore this fundamental question, the ‘bearer of public opinion’ can become the entire social aggregate of Internet users, reduced by any operational criteria to a statistically manageable size.
Opinion mining as a ‘window into the public’
Opinion mining impressed a blogger on the MonkeyLearn platform with its capacity to ‘offer a window into the thoughts and feelings of the public, allowing businesses to improve the customer experience, perform competitive research, and understand opinions. … It allows you to get inside your customers’ heads and find out what they like and dislike, and why’ (Wolf, 2020). The potential of opinion mining technology is undoubtedly great, but breaking into the heads of customers means invading their privacy, which should be more a matter of concern than enthusiasm.
In principle, opinion mining can be seen as an AI development of the Evaluative Assertion Analysis, a (quantitative) content analysis method developed by Osgood et al. (1956), in which all the evaluative assertions are extracted from the text and converted into ‘subject-verb-complements’ form to assess ‘attitude objects’ in three dimensions (good–bad, strong–weak, active–passive) and thus identify them as unique entities. Traditional opinion mining approaches based on dictionaries and machine learning also require a lot of human operations, which makes high-quality data integration expensive and time-consuming. The introduction of deep learning methods that can automatically learn multi-level features is likely to solve the problem of the tedious manual extraction of properties associated with or dissociated from attitude objects, but not necessarily the reliability and validity of automated coding.
The transformation of public and private (communicative) actions into numerical data that allows tracking and forecasting seems to be a fairly recent phenomenon related to the digitisation of communication and the Internet. However, the latest digital forms of datafication represent a continuation and intensification of quantification in governance and research, exemplified by the quantification of public opinion with opinion polling in the 1930s and its important societal consequences. Earlier debates on the ambivalence of opinion polls – that they have important but controversial implications for politics and governance and scientific research, and that they can promote and hinder the development of democracy – could also help identify opportunities and threats of datafication in its modern digital formats.
Data mining raises several questions about the nature and future of online communication that are essential to the conceptualisation of public opinion. Such questions have always been asked throughout history in periods of communication revolutions and when important inventions occurred. Initial Enlightenment conceptualisations of public opinion and publicity were inextricably linked to the press. In fact, they were founded on it. With the mass press, Bentham's ‘subversive’ idea of publicity as a means of controlling the authorities reached a critical turning point: instead of controlling the powerful, publicity became a means of the representation and promotion of the powerful and the control of the powerless. In a similar vein, the inventions of broadcasting, polling and the Internet have had a significant impact on the processes of shaping and expressing public opinion, and have sparked heated debates and reconceptualisations of public opinion.
Publicity, and in particular Kant's idea of the public use of reason, although fundamental to the original conceptualisation of public opinion, played no role in the polling age in public opinion studies. In contrast, the nature and quality of ‘making things visible’ in online communication is the focus of opinion mining, which ‘calculates’ (public) opinion based on the characteristics extracted from online content. By extracting data from online content that is intentionally made public and data that Internet users would prefer (though unsuccessfully) to keep private, and combining them in analysis, data mining reinforces the blurring of the boundary between public and private that escalates with social media. Opinion mining generates data that can be used to effectively influence the behaviours and opinions of Internet users by creating or enforcing visibility or invisibility through various forms of algorithmically geared publicity. It is much more difficult to imagine how the publication of high-quality social information could facilitate the formation of public opinion, although the far-reaching consequences of the increasing permeability of the public-private boundary are visible in current political developments.
Following the argument of Adorno's critique of the (in)adequate translation of
The commercial feasibility of data and opinion mining is widely acknowledged and appreciated, but its use to censor people's opinions and control their behaviours is often not (made) visible and impossible to prove through simple statistical correlation. While opinion mining is a non-invasive method in terms of data
As Gupta and Agrawal (2020: 1) admit, the mining of huge amounts of data can be instrumentalised to discover ‘hidden and useful information with high potential’, which can help ‘both the organization and the individual to get the right opinion about the ongoing trends or unfamiliar things’. If the criticism of the administrative, authoritarian and commercial instrumentalisation of opinion polls in the last century seemed excessive, at least for some, it has become more convincing with the digitisation of communication in the 21st century. Quantitative analyses of big data found in socio-digital networks are not limited to measuring what surveys and polls (can) measure, but also include data beyond survey response data. In addition to the opinions expressed in polls, all human online activities have become quantitatively identifiable and thus potentially instrumentalised through big data analytics. By systematically collecting and analysing data on individuals’ online communication in conjunction with data on offline activities, both can be influenced quite effectively. The recent remarkable rise of algorithmically modulated decision making and its pervasive social implications call for increased efforts to ‘clarify the theoretical foundations of critical algorithm studies and highlight the importance of engaged scholarship’ (Brevini and Pasquale, 2020).
Algorithms in the service of the public and public opinion
With the recent algorithmisation of communication, the questions raised by critical sociology about the role of the press a hundred years ago must be raised again: Does new technology promote the formation and expression of public opinion and its control over government and corporate decision-making, or vice versa? Does it support public use of reason and critical publicity or disinformation and computational propaganda? Kennedy and Moss (2015) are optimistic about ‘conditions in which data mining is not just used as a way to know publics, but can become a means for publics to know themselves’. I am less optimistic, but I suggest that opinion mining should help the public not only to know, but to
Data mining and big data analytics have been developed as an effective way to extract opinions expressed online, to identify their direction, topicality and quality, and use extracted data for curation algorithms to arrange content supply by prioritising, classifying and filtering information. Automated recognition, extraction, summarisation, simulation, prediction and promotion of opinions and behavioural patterns are very effective for adapting the content supply to the interests and needs of the users, and for influencing users’ decisions and manipulating them. Due to the impact, for better or worse, of algorithmic systems on people's everyday life, information access and agency, awareness of algorithms is a critical issue and a prerequisite for greater involvement of people in critical use, if not co-creation of algorithms. This requires studying how algorithms are part of broader rationalities, programmes of social change and development, and part of power dynamics (Beer, 2017: 9).
The idea of a public-worthiness algorithm is intended to support such efforts. At first glance, it is reminiscent of newsworthiness studies, but in fact, it is completely different from it. In their seminal study of ‘how “events” become “news”, Galtung and Ruge (1965: 65) addressed the question of why the media feature particular news stories prominently and others receive little, if any, coverage, and tried to explain it with ‘news factors’. In contrast to newsworthiness, the idea of
By offering Internet users news stories on events with objectively assessed, potentially important long-term consequences, such an algorithm could radically change the way news media operate and the way we act as news consumers. Personalised search and recommendation aps reduce the opportunities to test the objective validity of our knowledge and opinions, which is a prerequisite for reflexive reasoning. As Kant's maxim of judgement demands, one must ‘think from the standpoint of everyone else’, detached from ‘the subjective personal conditions of his judgement, which cramp the minds of so many others, and reflects upon his own judgement from a universal standpoint’ (Kant, 1790/1952: 519). The norm that opinions (and reasons thereof) should be publicly available and comprehensible is a constitutive feature of deliberative democracy (Gutmann and Thompson, 2004: 4). Without the ability to take ‘everyone else's point of view’, a person is disconnected from relevant information, dissenting opinions, and their rationales, and thus unable to make informed judgements. The opportunity of ‘private publics’ emerging in social media and generated by recommendation algorithms to identify events and processes, which link them together even on a global scale, could be a major breakthrough for world-wide communication and defragmentation of the public sphere(s).
Conclusion
Critical examinations of public opinion and the public sphere so far have highlighted underdeveloped and inappropriate political, economic and cultural circumstances as the main reasons preventing the realisation of enlightenment ideas of publicness, publicity and public opinion. Perhaps in the digital age, for the first time ever, we are in a situation where we can actively help
The concept of public-worthiness valorises the idea of active (prod)users using recommendation algorithms to define their interests so that they can identify what potential consequences of current events and transactions they are (likely) exposed to and thus
Unlike opinion polls, such a facilitating algorithm should not be designed to detect and predict the dynamics of public opinion, allowing policymakers – as a kind of Hegelian ‘great men’ – to gauge the will of their constituencies. Rather, it would allow the ‘Deweyan public’ to be constituted, enabling its members to articulate their opinions on important long-term consequences of transactions that link them together, motivating their public engagement from the local to the global level.
It would be a fallacy of technological determinism to claim that such changes can be initiated solely by the technological capabilities of the Internet or even an algorithm. The seemingly emancipatory technological development triggered by the Internet and artificial intelligence has proven to be controversial, to say the least Despite the greater permeability between different modes and areas of social communication, there can be little talk of democratisation of communication, media and politics. The internetisation and globalisation of the economy are leading to an even greater media concentration than the traditional print and broadcast media have ever experienced, while data mining and algorithms enable hitherto unimaginable surveillance of physical and symbolic communication. Thus, if there is indeed such a thing as an ‘emancipatory technological potential of the Internet’, only its users as publics can use the new socio-digital networks and big data tools to cultivate reflexive publicity and create an effective public opinion. Without civil society actors struggling for the public use of reason and democratic communication, challenging the strategies of corporations to dominate the Internet and its users, and encouraging research to support and facilitate their actions, democratic changes in power relations will remain utopian, as has often been the case in history.
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Slovenian Research Agency (ARRS) under Grants P5-0051 and N5-0086.
