Sage Journals: Discover world-class research

Abstract

This article seeks to address current debates comparing polls and opinion mining as empirically based figuration models of public opinion in the light of in-depth intellectual debates on the role and nature of public opinion that began after the French Revolution and the controversy over public opinion spurred by the invention of polls. Issues of historical quantification and re-conceptualisation of public opinion are addressed in four parts. The first summarises the history of the rise and fall of the concept of public opinion. The second re-examines the key controversies in the debates on the theoretical, empirical and social implications and consequences of the invention of polling. The third part scrutinises the datafication of public opinion that started with polling industry and continues in the age of big data and data mining. The final section discusses the controversial potentials of opinion-mining technology and suggests ways in which social scientists could critically respond to the big data and opinion-mining challenges in order to reintegrate the ideas of publicness, the public and public sphere into public opinion research.

Keywords

Public opinion datafication opinion polling opinion mining publicworthiness the public

Introduction

Less than a century ago, the founders of opinion polling proudly proclaimed that a new method of data gathering had solved the age-old problem of reliable empirical public opinion research, which also triggered an avalanche of critical concerns. Today, the primacy held by opinion polls for decades seems to be challenged by opinion mining or sentiment analysis based on machine learning and big data analytics. The social sciences, according to Boullier, are now moving away from the second generation, in which their key concept was opinion, and the main means of data collection was opinion polls; they replaced society and the census, respectively, which prevailed in the first generation. A third generation is now emerging that is formed around vibrations and traces (big data); it is based on the development of computing, which allows ‘calculating and modeling the social as if the traces gathered provided access to the “real” social better than any poll, survey, or census’ (Boullier, 2015: 72).

Opinion mining, as one of the most advanced areas of natural language processing designed to explore subjective opinions gathered from various sources on a particular topic, is believed to have ‘overcome the limits of traditional polls’ (Zhou et al., 2021) while ‘[p]olling was, and remains, in disarray. Now, it's being supplanted by data science: why bother telephoning someone to ask her opinion when you can find out by tracking her online?’ (Lepore, 2020). The data needed for analysis are harvested from social media, which journalists often use in the news to ‘infer and report public opinion’ and complement survey polling (Dubois et al., 2020: 57; Mc Gregor, 2019: 1070). However, in demonstrating the potential advantages of opinion mining over polls, it is overlooked that polls have in fact not solved most of the problems of public opinion research at all, so it is not justified to set them as the gold standard for ‘testing’ the effectiveness and validity of opinion mining. The basic postulates of public opinion as a complex social phenomenon or figuration, laid down in a long and intellectually exciting history of public opinion theory and research, are largely forgotten in these debates.

This article seeks to address current debates comparing polls and opinion mining as empirically based figuration models of public opinion in the light of in-depth intellectual debates on the role and nature of public opinion that began after the French Revolution and ended with the rise of opinion polls, and the controversy over public opinion spurred by the invention of polls. Addressing issues of historical re-figurations and re-conceptualisations of public opinion could add depth to the current discussions about the effectiveness of public opinion research. Specifically, the article addresses the tendency to reconceptualise public opinion by extending the concept to new empirical settings, such as new technologies, business models and political regimes, rather than by introducing genuine normative or theoretical innovation.

The article is composed of four parts. The first summarises the history of the rise and fall of the concept of public opinion. The second re-examines the key controversies in the debates on the theoretical, empirical and social consequences of the invention of polling. The third part scrutinises the datafication of public opinion that started with polling industry and continues in the age of big data and data mining. The final part suggests ways in which social scientists could critically respond to the big data and opinion mining challenges in order to reintegrate the ideas of publicness, the public and public sphere into public opinion research. The ideas outlined in the article are discussed in more detail in a forthcoming book on the impact of quantification and datafication on the critical conceptualisation of publicness (Splichal, 2022b).

The enlightenment legacy of the public and public opinion

The ideas of ‘the public’ and ‘public opinion’ were undoubtedly revolutionary when they were first conceptualised and popularised in the early 16th century. Machiavelli was probably the first to heretically trust the virtues of ‘general opinion’ more than his Prince. He praised ‘ordinary people’ as the most reliable guardians of human freedom and social prosperity, arguing that, ‘the multitude is wiser and more constant than a Prince’ and linking ‘the voice of a people to that of God’ (Machiavelli, 1517/1996: 115, 118). The revolutionary character of Machiavelli's conceptualisation of public opinion was later demonstrated in Déclaration au peuple français of April 19, 1871, in which the protagonists of the Paris Commune relied on the voice of the people, emphasising that public opinion should not be divided.

When the concept of public opinion was later adopted by the scholarly literature of the Enlightenment movement, it became the ‘magic key’ to the development of popular governance based on public opinion that provides control of power and derives from public use of reason. This critical social perspective has long prevailed in normative political-philosophical conceptualisations of public opinion initiated by Rousseau, Kant and Bentham. Bentham was the first to conceptualise the rule of publicness as the foundation of the doctrine of the sovereignty of the people and public opinion, against the doctrine of separation of powers. He considered public opinion a form of social control and the press the ‘instrument of publicity and public instruction’, which ought to make primarily political actions and events available to the public, thus preventing the abuse of power by the legislators and maximising happiness of the greatest number of people (Bentham, 1791/1994; 1822/1990).

Rousseau attributed a similar role of the ‘censorial tribunal’ to public opinion, while he criticised ‘our political theorists’ for not recognising its great significance, ‘though success in everything else depends on it’. He added public opinion, together with morals and customs, as the fourth and the most important kind of law to the three conventional kinds of law – political, civil and criminal law regulating relations between the individual and society (Rousseau, 1762/2017: 28). For Rousseau, public opinion ‘forms the real constitution of the state; takes on new powers every day; restores or replaces other laws when they decay or die out, keeps a people in the spirit in which it was established, and gradually replaces authority by the force of habit’. Nevertheless, he also regarded public opinion as ‘the grave of a man's virtue and the throne of a woman's’, and characterised it as poison, slave, tyranny and yoke (Rousseau, 1762/1921).

Kant did not address ‘public opinion’ in his writings, but he should nevertheless be recognised as a founder of the modern conceptualisation of publicness and public opinion. His discussions of the supreme ethical principle of publicness (then called publicity), the universal right of public use of reason, and opinion as an insufficient mode of ‘holding-for-true’ are inscribed in the foundations of later critical theories of public opinion and the public sphere. For Kant, ‘public use of reason’ ideally brings about ‘the agreement of all judgements with each other, in spite of the different characters of individuals’ Kant (1781/1952: 240). Publicness is a necessary condition of right, because what is rational is always publicly communicable, and a criterion of rationality, because what is right should not contradict reason.

Kant's principle of publicness reconciled politics as coercive actions with the moral basis of democratic association. The two spheres can only be reconciled by fundamental human rights embodied in the principle of publicness – freedom of thought and freedom of public expression; neither can exist without the other. Kant's differentiation between politics (power) and morals (justice) is particularly important from the perspective of the 20th-century figuration of public opinion as a form of social control and surveillance that ensure rather than control the functioning of power, that is, its transformation from an associative into a coercive phenomenon, which strongly contradicted Kant's and the Enlightenment conceptualisations of publicness. The period of normative trust in public opinion, based on the belief in the moral judgement of the ‘common man’ proclaimed in the Enlightenment, was followed by a period of distrust in his capabilities and competence. Doubts were first clearly expressed by Hegel; in the second half of the 19th century, they were followed by political-philosophical critiques of the ‘tyranny of public opinion’.

At least since Hegel, public opinion was often seen as little important or irrelevant, or even socially harmful. While he acknowledged the importance of distinguishing between public and private reasoning by arguing that ‘[w]hat someone fancies when he is at home with his wife and friends is one thing, and quite another thing what happens in a great gathering where one cleverness annihilates the other’. He praised publicity as ‘the remedy against the arrogance of the individual and the multitude, and one of the greatest means of education for them’ Hegel (1821/1986: 482), but he also agreed with Ariosto, who compared public opinion with an ignorant vulgar person who reproves everyone and speaks most of what he understands the least. As Hegel suggested, in public opinion ‘the way is open to everyone to express and assert his subjective opinion about the general’ (ibid.: 477), which is why one can find as much truth as error in it. The idea that public opinion should influence the affairs of the state seemed absurd to Hegel, as it wrongly suggests that everyone understands state affairs. In reality, the task of transcending the contradiction between subjectivity and objectivity in public opinion cannot be performed by itself, but only by the self-dependent state and its organs. It is wise to know and understand public opinion, but not to follow it; on the contrary, ‘the great man of the time’ is for Hegel, who can ‘learn to despise public opinion’ (Hegel, 1821/1986: 486).

Despite concerns about the subjectivity and fallacy of public opinion, Hegel did not doubt that public opinion existed and was in some way relevant to governance and to ‘the great men’ in power. This general view of the contradictory nature of public opinion shaped the scholarly debates about public opinion for most of the next century, with one side emphasising the rational-democratic potential of public opinion and the other its gullibility and manipulability, or majority intolerance of dissenters. This controversy marked the subsequent decline of the critical momentum of publicness and publicity, which has previously triggered a fascinating rise of public opinion in the Enlightenment.

Figuration of public opinion in the social sciences

Public opinion remained a central issue in political-philosophical and sociological debates in the second half of the 19th and the beginning of the 20th century. J. S. Mill, Tocqueville, Bryce and other prominent 19th-century public opinion theorists offered many different accounts of the nature and role of public opinion. Although no agreement was reached on whether public opinion increases the resilience of a democratic social order or threatens it, the debates unequivocally confirmed the belief that it has a significant impact on society and the functioning of a democratic government. Yet the increase in the number of discussions on public opinion in the 20th century has intensified controversies over what exactly constitutes the object of discussion. Critical sociologists have offered a new perspective: the earlier democratic control over rulers as the main social function of public opinion has been increasingly substituted by control over the behaviour of individuals.

The individualisation of public opinion in the surveillance paradigm also revised the democratic political-philosophical tradition by considering public opinion merely the sum of individual opinions. The former ‘substantive’ conceptualisation of public opinion that conceived of public opinion as an opinion of (or opining by) the public – 'no mere aggregate of individual opinions, but a genuine social product’ (Cooley, 1909: 121) – was largely replaced by an ‘adjective’ conceptualisation of public opinion as ‘private opinions made public’ (Harrisson, 1940). The departure from the traditional normative concept of public opinion was exacerbated by the blurring of the distinction between public and private opinion(s) brought about by Lippmann's concept of ‘public opinions’ as individuals’ perceptions of ‘the world outside’ (Lippmann, 1922/1998: 3). Above all, the decline in critical understanding of public opinion was revealed by the de-rationalisation of publicity. Originally, publicity as a ‘natural instrument of justice’ was essential ‘for putting the tribunal of the public in a condition for forming an enlightened judgment’ (Bentham, 1791). However, rationally critically conceptualised publicity soon turned into ‘the activity of making certain that someone or something attracts a lot of interest or attention from many people, or the attention received as a result of this activity’ (Cambridge Dictionary¹).

Nevertheless, the early 20th century was still a period of exciting in-depth intellectual debates about the complex nature of public opinion in the context of efforts to establish a general theory of society. They were focussed on public opinion as a dynamic social network of interdependent individuals or a ‘figuration’, to use Elias’ generic term for ‘a structure of mutually oriented and dependent people’ (Elias, 1968/2000: 482). At that time, seminal works theorising public opinion were published, including Gabriel Tarde's L’Opinion et la foule (1901), Walter Lippmann’s Public Opinion (1922), Ferdinand Tönnies’ Kritik der öffentlichen Meinung (1922) and John Dewey’s The Public and Its Problems (1927).

Tarde and Tönnies sought to construct a sociological theory of public opinion as a socio-cultural phenomenon formed and expressed by the public, as part of pure or general sociology whose laws would be free of the contingencies of space and time. The starting point of Tarde's analysis of public opinion was his identification of a universal phenomenon and general law of imitation later applied to the phenomenon of public opinion (Tarde, 1901/1989). In Tönnies’ Gemeinschaft – Gesellschaft model, public opinion was one of the three complex forms of social will in society, in addition to ‘convention’ and ‘legislation’, and in contrast to religion in community. The Dewey-Lippmann debate in the 1920s was also of undeniable significance in theorising ‘the public and its problems’, as Dewey titled his book (1927), or disqualifying it as ‘the mystical fallacy of democracy’ – as ‘the phantom public’ (Lippmann, 1925). Of particular importance was Dewey's concrete definition of the public as the bearer of public opinion, consisting of ‘all those who are affected by the indirect consequences of transactions to such an extent that it is deemed necessary to have those consequences systematically cared for’ (1927/1946: 15–16).

At the height of theoretical sociological debates on public opinion, psychologists had trivialised the concept of public opinion by reducing it to a simple aggregate of individual opinions (Bernays, 1923/1961: 61). Dissatisfaction with conflicting definitions of public opinion led a group of American political scientists had found the term public opinion useless and suggested to withdraw it as the subject of scientific inquiry (Binkley, 1928: 389). However, empiricist pessimism was an exaggerated reaction to the misunderstanding of the concept of publicness and the erroneous belief that the absence of a generally accepted definition absolutely undermines the validity of the concept. It was soon overcome by the substantial social and technological changes that occurred shortly after these debates and unexpectedly marginalised the social theories of public opinion that reached their historical climax in the 1920s. The reason was rather trivial. The birth and rise of opinion polls in the 1930s severely weakened the socially critical vigour that had been embedded in theories of public opinion since the 16th century and then intellectually anchored in the Enlightenment.

In 1932, the US electorate was surveyed in a poll by Houser Associates on voter choices and attitudes towards political issues (Norpoth, 2019), and Gallup conducted several polls for his mother-in-law's candidacy for the Iowa Secretary of State (Hawbaker, 1993: 107). Shortly afterwards, Gallup used the congressional election to test the reliability of his polling processes (1934) and in 1935 founded the American Institute of Public Opinion. The invention of opinion polls has rejuvenated public opinion debates, albeit mostly in an administrative direction, quite different from those that took place in the previous period. Over the next 10 years, the epistemic status of public opinion in the social sciences changed dramatically. The previously predominant reason for pessimism, the lack of reliable empirical procedures, was overshadowed by the absence of conceptual discussions of public opinion.

Polling: The advent of empirically based figuration of public opinion

The invention of polling was part of the pursuit of objectivity and social and institutional recognition, which led the social sciences to follow the path of quantification that has long prevailed in the natural sciences. Social scientists have gradually begun to ‘express in numbers what was previously expressed in words’ (Desrosières, 2016: 184) in order to overcome the limitations of qualitative analysis, such as inaccuracy and arbitrariness (Lasswell, 1949), and to supplement their (subjective) judgement and partly replace it with at least seemingly more objective quantitative data and statistical procedures. Opinion polls, which made it possible to obtain quantitative data on the opinions and behaviour of the people relevant to commercial interests and the political process, were a distinctive case of the quantification of the social sciences.

Polling reconfigured public opinion and strongly influenced the conceptualisation and, in particular, the popular perception of public opinion and its role in the democratic political process. Prior to opinion polls, the social sciences were rather unsuccessful in attempts to operationalise the normative concept of public opinion. With new methodological procedures of sampling, attitude measuring and scaling, polling seemed to provide a satisfactory degree of operational validity. It was soon domesticated in social scientific research, and in 1937 got its first ‘own’ journal, Public Opinion Quarterly.

The emergence of opinion polls was regarded as an important contribution to ‘solving the problem of quickly, economically and accurately determining the state and trends of public opinion on a large scale’, which made public opinion no longer considered ‘a mysterious force’ (Childs, 1965: 45). Dewey's robust theoretical definition of the public has been side-lined to promote public opinion as ‘the aggregate of the views men hold regarding matters that affect or interest the community’ (Bryce, 1888/1995; Gallup, 1957: 23) as the basis of the new operational concept. Polls provided information on the attitudes of individuals relevant to the political process and predictions of their electoral behaviour, as well as commercially relevant information on consumer preferences, habits and purchasing power, which became a key area of market research. Yet they have conceptually replaced the public as the subject of public opinion (e.g. in Bentham's “tribunal of public opinion” composed of politically reasoning individuals) with a dispersed mass or even any aggregate of individuals. Allport (1937: 9) excluded the public from the definition of public opinion as ‘superfluous for the purpose of research’ and reduced public opinion to a multi-individual situation, as earlier suggested by Bryce (1888/1995) and later adopted by Gallup (1957).

Discussions of reliability and validity of opinion polls were so prevalent in public opinion debates that polling soon became the ‘dominant paradigm’ marginalising earlier theoretical and normative conceptualisations that began to disappear in the last quarter of the century. ‘The firm establishment of a public opinion polling industry’ was believed to have ‘homogenized the definition [of public opinion] and stabilized it for foreseeable future’ (Converse, 1987: S13), but it also divided the academic community into those who saw opinion polls as means to democratise society and those who criticised them for undermining democratic life. Just as quantification made an important contribution to the emergence of the gap between ‘academics’ and ‘technicians’ in the social sciences (Lynd, 1939/1970: 1), polling contributed to the gap between ‘critical’ and ‘administrative’ social research (Lazarsfeld, 1941).

Polling is a clear example of the imitative effectiveness of ‘the quantitative technologies used to investigate social and economic life [which] work best if the world they aim to describe can be remade in their image’ (Porter, 1995: 43). Numerous studies, historical evidence, and experiments show that random sampling makes the boundary between the research process and the political process very porous. Ancient Athens is a prime historical example of the political use of random selection by sortition (lot) to appoint city-state administration (Aristotle 350BC/1952: 29). A lottocratic system or demarchy is considered to have many advantages but also disadvantages over the model of parliamentary elections (Guerrero, 2014). Fishkin's renowned deliberative opinion poll experiments – a representative random sample of the people brought together at a place to deliberate about an issue – have been used to recommend policy decisions (e.g. in Texas and China) and even for random-sample voting in primaries in Greece in 2006 (Fishkin et al., 2008). From September 2021 to January 2022, three European political institutions – the European Parliament, the Council of the EU, and the European Commission – organised a four-panel conference with 800 randomly selected citizens from across the EU to discuss the future of Europe.²

The very similarity with the electoral process as a research subject was claimed to be the main advantage of polls compared to other figurations of public opinion (Gallup 1971). Pre-election and particularly exit polls and elections measure the same phenomenon, citizens’ preferences for political parties and candidates for political positions, with the same instrument (secret ballot) but with different methodological procedures, degrees of reliability and different consequences. Polls are also similar to political processes in that they are in part legally regulated. For example, in many countries, the publication of exit poll results is not allowed on election day before polls close and the vote count begins; in some countries, an embargo is even imposed on poll results in the last 24 h before elections.

The main theoretical critique of polls emphasised the lack of efforts to define public opinion as a generic concept on which (empirical) research would be focussed, thus reducing the concept to a particular procedure to measure an unidentified phenomenon. Reducing public opinion to an aggregate of individual opinions identified in a random sample of the population but not publicly expressed independently of the polling procedure also opens the door to abuses by manufacturing and manipulating results. Even if all the conditions of methodological rigour are met in data collection and analysis, the artefact produced by polls is not ‘public opinion’ because it is produced in a purely artificial situation.

Adorno criticised the German translation of ‘public opinion polls’ as Meinungsforschung (opinion research) for the absence of the adjective ‘public’ Adorno (1964/2005: 120). In fact, the German term is more accurate, as there is no element of publicness in the poll, but that was not the point of Adorno's critique. He demanded that ‘opinion research’ be transformed into actual research into public opinion: ‘not be a mere technique, but just as much an object of sociology as a science that inquiries into the objective structural laws of society’ (ibid.: 534). Similar to Blumers (1948: 543) critique that the study of public opinion must reflect the functional composition and organisation of society, especially the interaction of social groups, which is completely ignored in polling, Adorno argued that public opinion research should capture the complexity of social conditions.

While pollsters praised polling as a solution to the growing democratic deficit in mass democracies, its critics saw it as a misconception of public opinion. Due to private communication with respondents and the anonymity of the response data collection process, polling was criticised as the opposite of what was essential for public opinion in the ‘pre-polling’ figuration: free and public use of reason. Private and anonymous communication may occasionally be associated with public reasoning, but it is unlikely to become the site of social criticism because it lacks the self-referential features of public discourse promoted by democratic publicity.

Critics have also reproached polls of not encouraging public debate, but replacing and even preventing it, thus representing ‘phantom opinions’ (Fishkin, 2009: 33). As there is no public debate involved in polling, relatively poorly informed and unmotivated respondents anonymously answer the questions with improvised and unreliable answers and rarely admit that they do not know the matter in question or do not have an opinion on it. Polls, by their very nature, cannot meet the standard of validity of public opinion research demanded by Adorno, Blumer or Bourdieu. Critical theory had to come to terms with the realisation that public opinion does not exist ‘in the form attributed to it by those who have an interest in asserting its existence’ (Bourdieu, 1973: 1309).

Because of the polls, public opinion was increasingly considered an individual and a behavioural phenomenon of little significance for institutional and action-oriented research topics prioritised in social research. While polls have reached the position of predominant conceptualisation of public opinion, they have mostly been used outside of social research to measure ‘mass attitudes’ and their potential impact on commercial and political outcomes. It is not surprising then that in the 1970s, much like the conclusion of the 1920 round table of American political scientists that the concept of public opinion would be best to abandon, critical theory has largely abandoned public opinion as a subject worth exploring in sociological studies on issues such as collective action and democratic governance, and/or questioned its legitimacy and efficacy as a national and transnational phenomenon (Fraser, 2007). The tacit consensus on the irrelevance of public opinion in (critical) social research grew despite the fact that it was a time of ubiquitous popular social rallies and movements that could undoubtedly be labelled and studied as public opinion phenomena. The sudden emergence and rapid popularisation of a new critical concept, the public sphere, further reduced interest in critical theory and public opinion research in the 1990s (Splichal, 2022a, 2022b).

Digitisation of communication and the rise of opinion mining

Just as theoretical critiques of opinion polls have never threatened their social dominance or the apparent validity of this approach to measuring public opinion, the reasons for the recent decline in opinion polls are not scientific or epistemological but are technological and commercial in nature. In the crisis of opinion polling, which followed the long-standing conceptual polarisation associated with it, data and opinion mining and analysing large data sets from social media offered an opportunity to reflect on old problems of conceptualising and researching public opinion while looking for new approaches to solving them. Whereas polling was the result of the quantification of the social sciences, opinion mining is closely linked to the possibilities of datafication brought about by the Internet and digitisation of communication.

With the digitisation of communication and the Internet, complex integrated public-private communication networks have emerged, generating a large amount of private and public information, structured and unstructured (textual) data and digital traces that can be collected, stored and analysed in real time. Data and opinion mining in digital networks enable continuous automated recognition and prediction of individuals’ opinions and behavioural patterns from (meta-)data and digital traces that users partly unintentionally and unknowingly generate and leave behind in online networks. The physical non-invasiveness, availability of big data on all aspects of human behaviour, the reduced costs for researchers and the elimination of time-consuming respondent participation have made big data analytics an important complement and even alternative to some of the existing methods of data production, gathering, analysis, and management, including polling.

With digital information and communication technologies and the Internet, quantification has grown exponentially and reached a new level with datafication, a process of ‘taking information about all things under the sun – including ones we never used to think of as information at all, such as a person's location, the vibrations of an engine, or the stress on a bridge – and transforming it into a data format to make it quantified’ (Mayer-Schoenberger and Cukier, 2013: 19). Moreover, datafication transforms social action into a data format to make it quantified, thus allowing for real-time tracking and predictive analysis. The emergence of big data and datafication has radically changed the types and amounts of data collected on individuals, organisations and society, and the ways and purposes of how they are handled, but they are still part of the age-old efforts to quantify behaviour. Nevertheless, the nature and principles of quantification are the same today as they were two centuries ago: quantification (and its most recent variation, datafication) is a social technology that is ‘a crucial agency for managing people and nature’ (Porter, 1995: 50).

Opinion mining is based, even more directly than opinion polling, on the principle ‘correlation instead causality’ codified by the rise of quantification. Faced with a hitherto unimaginable amount of data provided by the universal digitalisation and datafication of communication and society, Mayer-Schönberger and Cukier (2013: 19) suggest that the era of big data overturns centuries of established practices and challenges our most basic understanding of how to make decisions and comprehend reality, ‘the way we live and interact with the world. Most strikingly, society will need to shed some of its obsession for causality in exchange for simple correlations: not knowing why but only what. … There is a treasure hunt under way, driven by the insights to be extracted from data and the dormant value that can be unleashed by a shift from causation to correlation’ (emphasis added).

Praising the correlational conceptualisation of problems and solutions is a further step in the direction of the ‘totalitarian Enlightenment’ criticised by Horkheimer and Adorno (1944/2002: 3). It is true that for practical purposes it may be more important to find ‘correlational’ solutions to a problem than to identify its causes. It is certainly more important for a person who falls ill to know that, for example, ‘certain combinations of aspirin and orange juice cause remission of a deadly disease’ (Mayer-Schönberger and Cukier, 2013: 19) than to know what actually caused the disease. But ‘correlational knowledge’ of technical or instrumental solutions to problems alone is insufficient to ensure an understanding of the problems. Lack of understanding has at least two important consequences: first, it does not allow identifying (and eliminating) the causes of problems, and second, it does not allow predicting possible latent and unintended consequences (side effects) of technical solutions. Mass access to all kinds of data allows anyone to determine and interpret correlations between data events and data sets. Without understanding the relationship between causes and effects, people nevertheless draw conclusions about causes and effects based on arbitrary and spurious correlations. The correlational perplexity is an important component of the processes that caused the epistemic crisis of modern digital times, which became most apparent during the coronavirus disease 2019 (COVID-19) crisis.

Prior to the advent of digitalisation and computer-mediated communication, the data gathering process (e.g. survey and text analysis) was mostly completely detached from the process of communication as the object of study, with the exception of the participant observation typically used in qualitative research. In digital communication, the relationship is reversed: extensive data sets are created in the online communication process continuously and independently of the research process, before it or in parallel with it. While communicating, Internet users leave quantifiable fingerprints of their personal information online and (often unwillingly) make them available to others. This information is inherent to online communication because it is automatically generated by the software that enables communication and cannot be prevented. Data generated in socio-digital networks independently of researchers’ efforts to collect data to study phenomena can generally increase the risks traditionally associated with the use of archival data, collected independently of the research design. The main danger of the use of pre-existing data sets in inductive empirical research lies in the fact that an interpretation that ‘fits the facts’ can always be found in this process, which often becomes the sole aim and determines the course of data analysis and research (Merton, 1945: 468).

Compared to data obtained from opinion polls, pre-existing data is a more cost-effective solution for clients funding research. As data are extracted from open-access online communication, commercial data providers often have little or no regard for the compensation of those who created them, even without consciously participating in the research. Achieving high accuracy of representative opinion poll data is associated with high research costs; it also requires respondents to contribute their leisure time, usually free or for a minimal reward. The pressures to reduce ‘unproductive’ spending and ‘survey fatigue’ of respondents, which reduces their willingness to respond, seriously jeopardise the survival of opinion polls when more cost-effective data mining emerges as an alternative.

Alternative empirical models of public opinion

When it comes to comparisons between two empirically-based figuration models of public opinion, opinion mining and polling, the first question that is usually asked is whether opinion mining and opinion polls gauge the same ‘public opinion’ or whether opinion mining is really an alternative or a potential substitute for polling. The proponents of data mining argue that opinion mining in online networks is no less accurate than opinion polls (Schober et al., 2016). This belief is supported by fairly convincing empirical evidence. Several electoral studies suggest that polls, with their largely proven high reliability that was once the main argument for their validity, may be losing its predictive power compared to opinion mining. Brexit and the 2016 U.S. presidential election results indicated, for example, that data from social media conversations can capture a more realistic picture of ‘public opinion’ than poll data. Just a few days before the election, Trump garnered 12 million total Facebook likes , 4.1 million more than Hillary Clinton, while national polls erroneously predicted a more than 65% chance of Hillary winning. Similarly, Brexit, which was poorly predicted by polls, was a preferred option in the Facebook posts (Pettigrew, 2016). Even as early as 2009 in the German federal election, the relative shares of parliamentary political parties mentioned in tweets during the campaign matched their election results quite well (Jungherr, 2015: 1–2). The above comparisons demonstrate that no major differences (need to) exist between the results of data mining and polling, or that new data mining procedures can reduce the differences, if they exist. The role of socio-digital networks, such as Twitter, in the political process also speaks in favour of opinion mining as they enable tracking of ‘public opinion’ in real time. They seem to have changed the (semi-)public expression of candidate support and voter decision-making much more than polling and television have changed political campaigns and voting behaviour in the 20th century, or other new technologies before that.

Nevertheless, the key methodological and epistemic differences between polling and data mining of already existing or ‘found’ social media content still seem to be in favour of polling, which is claimed to be epistemically superior for three main reasons. (1) Data in polls are obtained through random sampling, whereas social media data are not representative of any ‘offline’ population and ‘cannot replace data obtained through scientific sampling’. (2) Survey response data are considered ‘independent observations’ (although lured by interviewers) while the ‘found data’ are often reactions to what has been previously posted by other Internet users, which ‘may result in skewed data distributions where a few pieces of information catch most attention’. (3) All opinions are considered equal in polls, while social media data are derived from algorithmic manipulation that makes the most prominent information appear even more prominent (JRC, 2018: 9–10).

If we look at these three differences from a critical perspective, it turns out that they are not ‘weaknesses’ of opinion mining, which are supposed to cause the ‘unreliability’ of data extracted from social media platforms, but in fact solutions to the problems caused by conceptualisation of public opinion through opinion polls.

Random sampling was criticised by Blumer as ‘the inherent deficiency of public opinion polling’ Blumer (1948: 546). Due to the assumption of the independence of each sample unit, it promotes the notion of society as a mere aggregate of disparate individuals and ignores ‘the framework and the functional operation of the public opinion’ (ibid.: 547). Not all respondents are equal in shaping public opinion, while individuals who have the greatest influence on public opinion may not even be included in a random sample. In contrast, opinion mining reflects, at least in part, ‘the functional composition and organization of society’ by analysing content produced by both individuals and collective entities independently of research aims and procedures.

In addition, random sampling does not guarantee the general validity of polls. They are considered inherently accurate due to their rather spectacular success in predicting election results, but predictive validity is only one dimension of validity and cannot be generalised from a special case (election polls) to any use of polls. The problem of the operational validity of opinion polls is exacerbated by the growing non-response rates, which leads to higher survey costs and might distort results.

The interaction between the researcher and the respondent is not ‘neutral’ but often manipulative, for example, in choosing questions, question placing and wording. In contrast, the data and traces found in opinion mining are not contaminated with a research instrument as in polling, but may be an algorithmically stimulated reaction of individuals to what other Internet users have previously posted. Similar to the ‘standard practice of open discussion’, algorithms can give ‘too much weight to the opinions of those who speak early and assertively, causing others to line up behind them’ (Kahneman, 2011: 85). When trying to get the most useful information from multiple sources of evidence, the sources should be made independent of each other, as Kahneman argues, but a public discussion of opinions does not follow this rule, as it is precisely aimed at not only allowing but stimulating participants to influence each other.

In short, there is no justification to set polls as the gold standard for assessing the accuracy of opinion mining. Comparing the performance of opinion mining and polling is hardly relevant insofar as it is reduced to the question of the contestable superiority of reliability and predictive validity of polls as the basis for setting them as the gold standard of measuring public opinion. Taking into account historical critiques of opinion polls, opinion mining, in addition to very detailed presentations of the opinions and behaviours of individuals and groups, can better present genuine public opinion than polls. But first and foremost, the question of (the loss of) the bearer of public opinion, the public, needs to be addressed in relation to contemporary data mining technology. If we ignore this fundamental question, the ‘bearer of public opinion’ can become the entire social aggregate of Internet users, reduced by any operational criteria to a statistically manageable size.

Opinion mining as a ‘window into the public’

Opinion mining impressed a blogger on the MonkeyLearn platform with its capacity to ‘offer a window into the thoughts and feelings of the public, allowing businesses to improve the customer experience, perform competitive research, and understand opinions. … It allows you to get inside your customers’ heads and find out what they like and dislike, and why’ (Wolf, 2020). The potential of opinion mining technology is undoubtedly great, but breaking into the heads of customers means invading their privacy, which should be more a matter of concern than enthusiasm.

In principle, opinion mining can be seen as an AI development of the Evaluative Assertion Analysis, a (quantitative) content analysis method developed by Osgood et al. (1956), in which all the evaluative assertions are extracted from the text and converted into ‘subject-verb-complements’ form to assess ‘attitude objects’ in three dimensions (good–bad, strong–weak, active–passive) and thus identify them as unique entities. Traditional opinion mining approaches based on dictionaries and machine learning also require a lot of human operations, which makes high-quality data integration expensive and time-consuming. The introduction of deep learning methods that can automatically learn multi-level features is likely to solve the problem of the tedious manual extraction of properties associated with or dissociated from attitude objects, but not necessarily the reliability and validity of automated coding.

The transformation of public and private (communicative) actions into numerical data that allows tracking and forecasting seems to be a fairly recent phenomenon related to the digitisation of communication and the Internet. However, the latest digital forms of datafication represent a continuation and intensification of quantification in governance and research, exemplified by the quantification of public opinion with opinion polling in the 1930s and its important societal consequences. Earlier debates on the ambivalence of opinion polls – that they have important but controversial implications for politics and governance and scientific research, and that they can promote and hinder the development of democracy – could also help identify opportunities and threats of datafication in its modern digital formats.

Data mining raises several questions about the nature and future of online communication that are essential to the conceptualisation of public opinion. Such questions have always been asked throughout history in periods of communication revolutions and when important inventions occurred. Initial Enlightenment conceptualisations of public opinion and publicity were inextricably linked to the press. In fact, they were founded on it. With the mass press, Bentham's ‘subversive’ idea of publicity as a means of controlling the authorities reached a critical turning point: instead of controlling the powerful, publicity became a means of the representation and promotion of the powerful and the control of the powerless. In a similar vein, the inventions of broadcasting, polling and the Internet have had a significant impact on the processes of shaping and expressing public opinion, and have sparked heated debates and reconceptualisations of public opinion.

Publicity, and in particular Kant's idea of the public use of reason, although fundamental to the original conceptualisation of public opinion, played no role in the polling age in public opinion studies. In contrast, the nature and quality of ‘making things visible’ in online communication is the focus of opinion mining, which ‘calculates’ (public) opinion based on the characteristics extracted from online content. By extracting data from online content that is intentionally made public and data that Internet users would prefer (though unsuccessfully) to keep private, and combining them in analysis, data mining reinforces the blurring of the boundary between public and private that escalates with social media. Opinion mining generates data that can be used to effectively influence the behaviours and opinions of Internet users by creating or enforcing visibility or invisibility through various forms of algorithmically geared publicity. It is much more difficult to imagine how the publication of high-quality social information could facilitate the formation of public opinion, although the far-reaching consequences of the increasing permeability of the public-private boundary are visible in current political developments.

Following the argument of Adorno's critique of the (in)adequate translation of public opinion polling into German as opinion research, the adjective public is justifiably not (yet) attributed to ‘opinion mining’ because publicness is not its concern (while with the term ‘sentiment analysis’, it is excessively narrowed down in the opposite direction). Instead, opinions in opinion mining are private opinions that analysis can make profitable for commercial clients in developing their products. The importance of the commercial component of opinion mining is clearly demonstrated in the attempts to define opinion mining as ‘a set of tools to identify and extract opinions and use them for the benefit of the business operation’ (emphasis added).³

The commercial feasibility of data and opinion mining is widely acknowledged and appreciated, but its use to censor people's opinions and control their behaviours is often not (made) visible and impossible to prove through simple statistical correlation. While opinion mining is a non-invasive method in terms of data gathering, it can prove to be a very invasive method in terms of data use. Opinion mining is indeed not in itself surveillance or profiling, but the data extracted from online posts and user metadata can be used precisely for these purposes, too. It is no secret that large online corporations and government intelligence agencies around the world conduct mass data collection and data mining that is usually covert, unregulated, inaccessible to the public, and often misused for surveillance for commercial and political purposes (Snowden, 2019). Coordinated actions by ISPs and content providers to prioritise, store, block, slow down, or charge extra for access to certain content suggest that ‘90% or more of contrary information could soon be vulnerable to the censorship algorithms that can quickly detect and stamp out divergent points of view’ (Parry, 2017). During the COVID-19 pandemic, the worrying effectiveness and extent of datafication of personal lives and the potential misuse of personal data were well demonstrated by otherwise well-intentioned smartphone contact-tracking applications that allow authorities to identify those who have been in contact with people infected with the coronavirus (Naughton, 2020).

As Gupta and Agrawal (2020: 1) admit, the mining of huge amounts of data can be instrumentalised to discover ‘hidden and useful information with high potential’, which can help ‘both the organization and the individual to get the right opinion about the ongoing trends or unfamiliar things’. If the criticism of the administrative, authoritarian and commercial instrumentalisation of opinion polls in the last century seemed excessive, at least for some, it has become more convincing with the digitisation of communication in the 21st century. Quantitative analyses of big data found in socio-digital networks are not limited to measuring what surveys and polls (can) measure, but also include data beyond survey response data. In addition to the opinions expressed in polls, all human online activities have become quantitatively identifiable and thus potentially instrumentalised through big data analytics. By systematically collecting and analysing data on individuals’ online communication in conjunction with data on offline activities, both can be influenced quite effectively. The recent remarkable rise of algorithmically modulated decision making and its pervasive social implications call for increased efforts to ‘clarify the theoretical foundations of critical algorithm studies and highlight the importance of engaged scholarship’ (Brevini and Pasquale, 2020).

Algorithms in the service of the public and public opinion

With the recent algorithmisation of communication, the questions raised by critical sociology about the role of the press a hundred years ago must be raised again: Does new technology promote the formation and expression of public opinion and its control over government and corporate decision-making, or vice versa? Does it support public use of reason and critical publicity or disinformation and computational propaganda? Kennedy and Moss (2015) are optimistic about ‘conditions in which data mining is not just used as a way to know publics, but can become a means for publics to know themselves’. I am less optimistic, but I suggest that opinion mining should help the public not only to know, but to constitute themselves. Opinion mining and analytics could provide an important source for articulating opinions by enabling people to think in community with others in order to make personal opinion objectively more certain and subjectively more sufficient, as Kant suggested. To make this possible, data mining and analytics need to be changed and democratised so that all available information and opinions can be put to a publicness test and their public-worthiness assessed. So far, such ideas have been utopian, but opinion mining and communication algorithms have provided realistic possibilities that all issues of public concern can be addressed in the public sphere, and algorithms have to provide reasons for their public-worthiness.

Data mining and big data analytics have been developed as an effective way to extract opinions expressed online, to identify their direction, topicality and quality, and use extracted data for curation algorithms to arrange content supply by prioritising, classifying and filtering information. Automated recognition, extraction, summarisation, simulation, prediction and promotion of opinions and behavioural patterns are very effective for adapting the content supply to the interests and needs of the users, and for influencing users’ decisions and manipulating them. Due to the impact, for better or worse, of algorithmic systems on people's everyday life, information access and agency, awareness of algorithms is a critical issue and a prerequisite for greater involvement of people in critical use, if not co-creation of algorithms. This requires studying how algorithms are part of broader rationalities, programmes of social change and development, and part of power dynamics (Beer, 2017: 9).

The idea of a public-worthiness algorithm is intended to support such efforts. At first glance, it is reminiscent of newsworthiness studies, but in fact, it is completely different from it. In their seminal study of ‘how “events” become “news”, Galtung and Ruge (1965: 65) addressed the question of why the media feature particular news stories prominently and others receive little, if any, coverage, and tried to explain it with ‘news factors’. In contrast to newsworthiness, the idea of public-worthiness is based on Dewey’ conceptualisation of the public as consisting of all those who are affected by extensive and enduring indirect consequences of transactions between persons to such an extent that it is deemed necessary to have those consequences systematically cared for (Dewey, 1927/1946: 16). It requires the development of computer algorithms to help identify public-worthiness of events and bring that information to potentially emerging publics protected from threats to privacy and autonomy. It goes beyond the descriptive concept of newsworthiness, traditionally applied in the studies of news selection in the media, and includes four dimensions. Three dimensions of public-worthiness could be directly measured by big data analytics: (1) news values correlated with the prominence of news or events on a global, regional, national or local scale in the media, (2) the attention given to news by the Internet users (popularity) globally, regionally, nationally or locally, and (3) the reliability and trustfulness of news sources. (4) The fourth dimension includes the development of specific indicators of ‘long-term significant consequences’ of the reported events/processes/transactions, which is the key variable to determine the relevance of a process or event needed to initiate public articulation of opinions and thus public opininig. An algorithm should provide internet users with information about the public-worthiness of the news based on their issue-related interests in the common good, rather than on what recommendation algorithms decide they would ‘like’ to see based on the characteristics of news items that a user reads and/or on the likes and dislikes of other users.

By offering Internet users news stories on events with objectively assessed, potentially important long-term consequences, such an algorithm could radically change the way news media operate and the way we act as news consumers. Personalised search and recommendation aps reduce the opportunities to test the objective validity of our knowledge and opinions, which is a prerequisite for reflexive reasoning. As Kant's maxim of judgement demands, one must ‘think from the standpoint of everyone else’, detached from ‘the subjective personal conditions of his judgement, which cramp the minds of so many others, and reflects upon his own judgement from a universal standpoint’ (Kant, 1790/1952: 519). The norm that opinions (and reasons thereof) should be publicly available and comprehensible is a constitutive feature of deliberative democracy (Gutmann and Thompson, 2004: 4). Without the ability to take ‘everyone else's point of view’, a person is disconnected from relevant information, dissenting opinions, and their rationales, and thus unable to make informed judgements. The opportunity of ‘private publics’ emerging in social media and generated by recommendation algorithms to identify events and processes, which link them together even on a global scale, could be a major breakthrough for world-wide communication and defragmentation of the public sphere(s).

Conclusion

Critical examinations of public opinion and the public sphere so far have highlighted underdeveloped and inappropriate political, economic and cultural circumstances as the main reasons preventing the realisation of enlightenment ideas of publicness, publicity and public opinion. Perhaps in the digital age, for the first time ever, we are in a situation where we can actively help creating and/or expanding publics through technological innovation without having to change social circumstances first. So far, the public has never looked for new communication technology, but new technology looked for the public, to paraphrase Brecht's idea of the ‘radio situation’ (Brecht, 1932/1979: 24). The situation now needs to be turned downside up, and the algorithmisation of communication makes it possible.

The concept of public-worthiness valorises the idea of active (prod)users using recommendation algorithms to define their interests so that they can identify what potential consequences of current events and transactions they are (likely) exposed to and thus constitute themselves as the public. The public-worthiness algorithm would have a positive impact on the rights and interests of Internet users by providing them with information about events and transactions with potentially important long-term consequences. Such an algorithm would be the exact opposite of the emerging robotic news production and dissemination software that targets users with tailored news or generates automated news about niche or local issues that is then disseminated to many small audiences.

Unlike opinion polls, such a facilitating algorithm should not be designed to detect and predict the dynamics of public opinion, allowing policymakers – as a kind of Hegelian ‘great men’ – to gauge the will of their constituencies. Rather, it would allow the ‘Deweyan public’ to be constituted, enabling its members to articulate their opinions on important long-term consequences of transactions that link them together, motivating their public engagement from the local to the global level.

It would be a fallacy of technological determinism to claim that such changes can be initiated solely by the technological capabilities of the Internet or even an algorithm. The seemingly emancipatory technological development triggered by the Internet and artificial intelligence has proven to be controversial, to say the least Despite the greater permeability between different modes and areas of social communication, there can be little talk of democratisation of communication, media and politics. The internetisation and globalisation of the economy are leading to an even greater media concentration than the traditional print and broadcast media have ever experienced, while data mining and algorithms enable hitherto unimaginable surveillance of physical and symbolic communication. Thus, if there is indeed such a thing as an ‘emancipatory technological potential of the Internet’, only its users as publics can use the new socio-digital networks and big data tools to cultivate reflexive publicity and create an effective public opinion. Without civil society actors struggling for the public use of reason and democratic communication, challenging the strategies of corporations to dominate the Internet and its users, and encouraging research to support and facilitate their actions, democratic changes in power relations will remain utopian, as has often been the case in history.

Footnotes

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Slovenian Research Agency (ARRS) under Grants P5-0051 and N5-0086.

ORCID iD

Slavko Splichal

Notes

References

Adorno

(1964/2005) Opinion research and publicness (meinungsforschung und Öffentlichkeit). Sociological Theory 23(1): 116–123.

Allport

(1937) Toward a science of public opinion. Public Opinion Quarterly 1(1): 7–23.

Aristotle (350 BCE/1952) The Athenian Constitution. Cambridge: Harvard University Press.

Beer

(2017) The social power of algorithms. Information. Communication & Society 20(1): 1–13.

Bentham

(1791/1994) Of publicity. Public Culture 6(3): 581–595.

Bernays

(1923/1961) Crystallizing Public Opinion. New York: Liveright.

Binkley

(1928) The concept of public opinion in the social sciences. Social Forces 6: 389–396.

Blumer

(1948) Public opinion and public opinion polling. American Sociological Review 13(4): 542–554.

Boullier

(2015) The social sciences and the traces of big data: Society, opinion, or vibrations? Revue Française de Science Politique 65(5–6): 71–93.

10.

Bourdieu

(1973) L’opinion publique n’existe pas. Les Temps Modernes 318: 1292–1309.

11.

Brecht

(1932/1979) Radio as a means of communication: A talk on the function of radio. Screen 20(3–4): 24–28.

12.

Brevini

Pasquale

(2020) Revisiting the black box society by rethinking the political economy of big data. Big Data & Society 7(2): 10. DOI: 10.1177/2053951720935146.

13.

Bryce

(1888/1995) The American Commonwealth. Indianapolis: Liberty Fund.

14.

Cooley

(1909) Social Organization. New York: Charles Scribner’s Sons.

15.

Childs

(1965) Public Opinion: Nature, Formation, and Role. Princeton: Van Nostrand.

16.

Converse

(1987) Changing conceptions of public opinion in the political process. Public Opinion Quarterly 51, (4, pt. 2): S12–S24. DOI: 10.1093/poq/51.4_PART_2.S12

17.

Déclaration au peuple français (1871) https://fr.wikipedia.org/wiki/D%C3%A9claration_au_peuple_fran%C3%A7ais (accessed 20 April 2022).

18.

Desrosières

(2016) The quantification of the social sciences: A historical comparison. In Bruno

(ed) The Social Sciences of Quantification. From Politics of Large Numbers to Target-Driven Policies. Cham: Springer International, 183–204.

19.

Dewey

(1927/1946) The Public and its Problems. Chicago: Gateway.

20.

Dubois

Gruzd

Jacobson

(2020) Journalists’ use of social media to infer public opinion: The citizens’ perspective. Social Science Computer Review 38(1): 57–74.

21.

Elias

(1968/2000). Postscript. In The Civilizing Process: Sociogenetic and Psychogenetic Investigations. Oxford: Blackwell, 449–484.

22.

Fishkin

(2009) Virtual public consultation: Prospects for internet deliberative democracy. In Davies

Gangadharan

(eds) Online Deliberation: Design, Research, and Practice. Stanford: CSLI Publications, 23–36.

23.

Fishkin

Luskin

Panaretos

, et al. (2008) Returning deliberative democracy to Athens: Deliberative polling for candidate selection. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1142842.

24.

Fraser

(2007) Transnationalizing the public sphere. Theory, Culture & Society 24(4): 7–30.

25.

Galtung

Holmboe Ruge

(1965) The structure of foreign news. The presentation of the Congo, Cuba and Cyprus crises in four Norwegian newspapers. Journal of Peace Research 2: 64–91.

26.

Gallup

(1957) The changing climate for public opinion research. The Public Opinion Quarterly 21(1): 23–27.

27.

Gallup

(1971) The public opinion referendum. The Public Opinion Quarterly 35(2): 220–227.

28.

Guerrero

(2014) The lottocracy. https://aeon.co/essays/forget-voting-it-s-time-to-start-choosing-our-leaders-by-lottery.

29.

Gupta

Agrawal

(2020) Application and techniques of opinion mining. In Bhattacharyya

et al. (eds), Hybrid Computational Intelligence: Challenges and Utilities. London: Academic Press, 1–23.

30.

Gutmann

Thompson

(2004) Why Deliberative Democracy? Princeton: Princeton University Press.

31.

Harrisson

(1940) What is public opinion? Political Quarterly 11(4): 368–383.

32.

Hawbaker

(1993) Taking “the Pulse of Democracy”: George Gallup, Iowa, and the origin of the Gallup pool. The Palimpsest 74(3): 98–113.

33.

Hegel

GWF

(1821/1986) Grundlinien der Philosophie des Rechts. Frankfurt/Main: Suhrkamp.

34.

Horkheimer

Adorno

(1944/2002) Dialectic of Enlightenment. Stanford: Stanford University Press.

35.

JRC - Joint Research Centre (2018) The Governance of Data in a Digitally Transformed European Society. Second Workshop of the DigiTranScope Project. Luxembourg: Publications Office of the European Union.

36.

Jungherr

(2015) Analyzing Political Communication with Digital Trace Data: The Role of Twitter Messages in Social Science Research. Cham: Springer.

37.

Kahneman

(2011) Thinking, Fast and Slow. New York: Farar, Straus and Giroux.

38.

Kant

(1781/1952) The Critique of Pure Reason. Chicago: Encyclopaedia Britannica.

39.

Kant

(1790/1952) The Critique of Judgment. Chicago: Encyclopaedia Britannica.

40.

Kennedy

Moss

(2015) Known or knowing publics? Social media data mining and the question of public agency. Big Data & Society 2(2): 1–11. DOI: 10.1177/205395171561114.

41.

Lasswell

(1949) Why be quantitative? In Laswell

Leites

et al. (eds), Language of Politics: Studies in Quantitative Semantics. New York: George W. Stewart, 40–54.

42.

Lazarsfeld

(1941) Remarks on administrative and critical communications research. Studies in Philosophy and Social Science 9: 2–16.

43.

Lepore

(2020) Scientists use big data to sway elections and predict riots — welcome to the 1960s. Nature , September 16. https://www.nature.com/articles/d41586-020-02607-8.

44.

Lippmann

(1922/1998) Public Opinion. New Brunswick: Transactions.

45.

Lippmann

(1925) The Phantom Public. New York: Harcourt, Brace and Co.

46.

Lynd

(1939/1970) Knowledge for What? The Place of Social Science in American Culture. Princeton: Princeton University Press.

47.

Mayer-Schoenberger

Cukier

(2013) Big Data: A Revolution That Will Transform how we Live, Work, and Think. London: John Murray.

48.

McGregor

(2019) Social media as public opinion: How journalists use social media to represent public opinion. Journalism 20(8): 1070–1086.

49.

Merton

(1945) Sociological theory. American Journal of Sociology 50(6): 462–473.

50.

Naughton

(2020) Smartphones could help us track the coronavirus – but at what cost? The Guardian . March 21.

51.

Norpoth

(2019) The American voter in 1932: Evidence from a confidential survey. Political Science & Politics 52(1): 14–19.

52.

Osgood

Saporta

Nunnally

(1956) Evaluative assertion analysis. Litera 0(3): 47–102.

53.

Parry

(2017) The rise of censorship algorithms. The Greanwille Post, 27 July. https://www.greanvillepost.com/2017/07/27/robert-parry-nyt-cheers-the-rise-of-censorship-algorithms/

54.

Pettigrew

(2016) How Facebook saw trump coming when no one else did. Medium , November 9. https://medium.com/@erinpettigrew/how-facebook-saw-trump-coming-when-no-one-else-did-84cd6b4e0d8e

55.

Porter

(1995) Trust in Numbers: The Pursuit of Objectivity in Science and Public Life. Princeton: Princeton University Press.

56.

Rousseau JJ (1762/2017) The social contract or principles of political right. www.constitution.org/jjr/socon.txt

57.

Rousseau

(1762/1921) Emile, or education. The Online Library of Liberty. https://oll-resources.s3.us-east-2.amazonaws.com/oll3/store/titles/2256/Rousseau_1499_EBk_v6.0.pdf

58.

Snowden

(2019) If I happen to fall out of a window, you can be sure I was pushed. Spiegel Online . September 13.

59.

Schober

Pasek

Guggenheim

, et al. (2016) Social media analyses for social measurement. Public Opinion Quarterly 80(1): 180–211.

60.

Splichal

(2022a) The public sphere in the twilight zone of publicness. European Journal of Communication 37(2): 198–215.

61.

Splichal

(2022b) Datafication of Public Opinion and the Public Sphere: How Extraction Replaced Expression of Opinion. London: Anthem Press.

62.

Tarde

(1901/1989) L’Opinion et la Foule. Paris: Les Presses universitaires de France.

63.

Tönnies

(1922) Kritik der öffentlichen Meinung. Berlin: Julius Springer.

64.

Wolf

(2020) What is opinion mining & why is it essential? MonkeyLearn Blog . https://monkeylearn.com/blog/opinion-mining/

65.

Zhou

Serafino

Cohan

et al. (2021) Why polls fail to predict elections. Journal of Big Data 8, 137. DOI: 10.1186/s40537-021-00525-8.