Sage Journals: Discover world-class research

Abstract

Intelligence on mass media audiences was founded on representative statistical samples, analysed by statisticians at the market departments of media corporations. The techniques for aggregating user data in the age of pervasive and ubiquitous personal media (e.g. laptops, smartphones, credit cards/swipe cards and radio-frequency identification) build on large aggregates of information (Big Data) analysed by algorithms that transform data into commodities. While the former technologies were built on socio-economic variables such as age, gender, ethnicity, education, media preferences (i.e. categories recognisable to media users and industry representatives alike), Big Data technologies register consumer choice, geographical position, web movement, and behavioural information in technologically complex ways that for most lay people are too abstract to appreciate the full consequences of. The data mined for pattern recognition privileges relational rather than demographic qualities. We argue that the agency of interpretation at the bottom of market decisions within media companies nevertheless introduces a ‘heuristics of the algorithm’, where the data inevitably becomes translated into social categories. In the paper we argue that although the promise of algorithmically generated data is often implemented in automated systems where human agency gets increasingly distanced from the data collected (it is our technological gadgets that are being surveyed, rather than us as social beings), one can observe a felt need among media users and among industry actors to ‘translate back’ the algorithmically produced relational statistics into ‘traditional’ social parameters. The tenacious social structures within the advertising industries work against the techno-economically driven tendencies within the Big Data economy.

Keywords

Big Data algorithms surveillance media industries media use digital media

Introduction

In the age of the mass media, intelligence on audiences was mainly founded on representative statistical samples, analysed by statisticians at the market departments of media corporations. The techniques for aggregating user data in the age of pervasive and ubiquitous personal and mobile media (e.g. laptops, smartphones, credit cards/swipe cards and radio-frequency identification (RFID)) build on large, algorithmically produced aggregates of information that are the basis for the construction of the audience commodity. While mass media audience intelligence was premised on socio-economic census data variables such as age, gender, ethnicity, education, and media preferences (i.e. categories recognizable to media users and industry representatives alike), Big Data technologies register consumer choice, geographical position, web movement, and behavioural information in technologically complex ways that are too abstract for most laypeople to appreciate the full consequences of. Much of the current discussion on Big Data – in academia and in business discourse – results from this shift in the business models of the media industry, and the resulting changes in the ways media users are addressed.

The debate around the societal and cultural effects of Big Data is extensive, and one of the pressing questions has been whether to see the turn to Big Data as merely a shift in scale, reach, and intensity (a quantitative shift) or as a more profound, truly qualitative shift – implying both a shift in being (ontology) and meaning (epistemology). As Couldry (2014), Kitchin (2014) and Mosco (2014) have recently remarked, it is important to clarify whether what we are observing is an academic or a governmental turn to actually taking up Big Data as a heuristic – or if it is more a case of powerful institutions using the discourse or ‘myth’ of ‘Big Data’ (Boyd and Crawford, 2012; cf. Boellstorff, 2013) to foster an almost religiously tainted ‘dataism’, as van Dijck (2014) calls this structure of belief (cf. Mager, 2012). Are we dealing with powerful agents increasingly privileging ‘correlation over causation, predictability over referentiality’ (Andrejevic, 2013: 40) whereby certain forms of institutional intelligence would be favoured at the expense of interpretation, giving way to confrontations between anti-hermeneutical impulses (naïve empiricism) and hermeneutical traditions, or are we merely witnessing old wine being poured into new bottles?

Big Data – however defined – deals with the employment of statistics. There are various ways in which the field of statistics can be conceptualized, and traditionally a distinction tends to also be asserted between descriptive and inferential statistics, often perceived of in the debates as statistical ‘laws’. Belief in such statistical ‘laws’, like all kinds of belief, is socially constructed (Hacking, 1975/2006, 1990). The 19th century saw an erosion of determinism and an increasing belief in ‘objective chance’. These two developments were interrelated and very much followed from the emergence of statistical bureaus, a torrent of statistics, and a subsequent popular recognition of widespread statistical regularities. In his historiography, Hacking (1990) shows how the growing belief and trust in data engendered new normative categories, whereby social norms came to hinge upon a conception of the average type.

So, against these belief structures, one should ask whether the idea of a shift has come to be so incessantly invoked by various institutions, regulators, governments and individual media users that it has become ‘real enough’, meaning that people have begun acting as if this was the new administrative reality. We suggest that this is actually the case, and that the ways in which the Big Data discourse has been interpreted have had a manifest social impact on how both executives in the media industries and ordinary media users act, much along the same lines as the Thomas theorem ascribes, that if ‘men define situations as real, they are real in their consequences’ (Merton, 1995; cf. Jensen, 2013).

The aim of the paper is to discuss some principles of algorithmic data capture technologies and the ways these principles are interpreted, understood and acted upon by media producers and media users, that is, those people who are involved in the production and consumption of media. In order to understand the roots of the discourse on Big Data and, following from that, apprehensions of ‘what data can do’ to the perception of media users, we will first distinguish between two key modes of statistical inquiry – a statistics of discrete data points, and a statistics of interconnected data points. Since what is seized upon in data mining operations is a statistics of pure relation, the latter mode of statistical imagination becomes central in the analysis of the audience generated in database economies.

We then discuss the panspectric (ubiquitously tracking) nature of data collection arising as both a result of and a boon to post-war (cybernetic) modes of governing, and the ways the images of the media user have shifted during the same post-war period. This is followed by a discussion of avoidance strategies (by media users) and translation practices (within the media industries).

Lastly, we use these backgrounds to discuss three types of ‘translations’ that arise as a result of the new data capture technologies and the surrounding discourse.

The argument is mainly theoretical, but will be supported with examples from our own as well as others’ previous research. Regarding the self-perception of media users, we draw on examples from an on-going interview study (focus groups and individual) of media use as value-generating labour.

Two statistical principles

To begin with, it is important to differentiate between accumulations that have arisen out of randomness falling along a continuous probability distribution (‘normal curve’ accumulations), or out of a functional relationship between two (or more) quantities, whereby one quantity varies as a power of another (‘power-law’ accumulations). In recent years, various scholars have noted that a world premised on networked complexity begets a flurry of phenomena characterized by such ‘power-law’ distributions (Hart, 2004; 2010; Lehmann et al., 2007; Urry, 2004). A market or a population characterized by this form of accumulation has somewhat different characteristics than populations characterized by ‘normal-curve’ distributions (Barabási, 2002).

A central aspect of the work of Carl Friedrich Gauss (1777–1855) concerns calculations of probability. A Gaussian distribution – often called a ‘normal’ or ‘bell-curve’ distribution – describes any variable that clusters around its central tendency along a span of probable outcomes (Kitchin, 2014). It describes parameters that are causally unrelated to one another and tend to fall into statistical patterns based on random variation (one example being human height or shoe size).

Gaussian models, however, fail to account for the emergent properties of systems that derive from connectivity within or among systems. This, on the other hand, can be anticipated through a fundamentally different probability distribution. While Vilfredo Pareto (1848–1923) is most famous for the ‘80/20 rule’ (the idea that 20 per cent of a population tends to stand for 80 per cent of a parameter, e.g. output), he is also credited for observing the tendency of systems comprising positive-feedback processes to generate aggregations that veer towards highly biased distributions. The dynamics underlying such aggregations tend to be called ‘power laws’ since they express processes in which one quantity varies with the power of another.

Interestingly, if we are said to live in a ‘network society’ (Castells, 2000), a logical upshot of this would be that one would expect to see a greater prevalence of such accumulations than before:

[M]ost quantitative research involves the use of statistical methods presuming independence among data points and Gaussian ‘normal’ distributions. The many findings of natural and social [power-law] phenomena, however, indicate that interdependence is far more prevalent than ‘normal’ statistics assume and the consequent extremes have far greater consequences than the ‘averages’ in between. (Andriani and McKelvey, 2011: 96)

In a network of interconnected, interdependent nodes, the shape of the network tends to follow a ‘scale-free’ dynamic, resulting in aggregations of a ‘power-law’ or ‘long-tail’ kind (Anderson, 2006) that can be said to be typical of ‘networked accumulation’ (Andersson Schwarz, 2013: 46ff). Such aggregations are hard to describe through applying traditional Gaussian statistical concepts such as mean, variance and standard deviation (see Table 1).

Table 1.

Differences between Gaussian and Paretian statistical principles.

Gaussian	Paretian
• Normal-curve function (‘bell curve’)	• Power-law function (‘long tail’)
• Based on independent variables	• Based on interdependent variables
• Describes randomly occurring discrete frequencies	• Describes networked accumulation
• Instantiations (discrete nodes)	• Relations (interconnected nodes)
• Often associated with classical representational sociological variables (age, gender, ethnicity, education, income, preferences)	• Often associated with non-representational correlations (network analysis, aleatory connections, ‘spurious correlations’)

Interestingly, the turn to Big Data in many ways implements a return to descriptive statistics, in that it often entails a record of entire populations, leading observers to assume a mirroring of reality – which should be contrasted with the classical approach of inferential statistics: eliciting samples or panels that are assumed to be indicative of the greater whole. Much of the ‘novelty’ of Big Data appears to stem not only from perceived distinctions between Gaussian and Paretian principles for accumulation, but also from mistaking this more direct and complete form of data collection for a mirror of reality. We argue that such (mis)understandings of the abstract principles of data generation are translated into heuristic and pragmatic knowledge which, in turn, produces new types of social relations. Let us continue unpacking some of these understandings below, before outlining the translational manoeuvre.

Ubiquitous tracking and prediction of the media audience

In line with such statistical understandings of the workings of large, interconnected, digital platforms, contemporary technologically advanced societies are increasingly characterized by ‘the use and convergence of the web, mobile phones, electronic financial systems, biometric identification systems, RFIDs, global positioning systems (GPS), ambient intelligence and so forth’ (Gutwirth and Hildebrandt, 2010: 32), generating a glut of sensory and behavioural data. Much of this is not even observational – no human actor ever ‘sees’ the data points; they are generated by means of an automated, ubiquitous quantitative tracking system built into the very infrastructure itself. Andrejevic (2013: 12ff) has argued that this new mode of data collection constitutes a kind of registering rather than description – an automated collection of an allegedly more ‘direct’ reality, preceding interpretation.

The increased use of various media via digital platforms, which – by their very nature – are premised on this automated, continual tracking, brings with it that media audiences participate in the metrification of their habits either wittingly or largely unwittingly (e.g. music consumption being tracked on streaming platforms; search behaviours being tracked for the purpose of providing customized ads; etc.). This development is intrinsically connected to a broader development of algorithmic power and new forms of economic value, meaning that the political economy of the media is shifting, for example through the introduction of new business models (Bolin, 2011; Fuchs et al., 2012).

The management of industrialized mass media has long involved an estimation of the nature and inclinations of audiences (Bjur, 2009), which have thus been central to the strategic management of broadcasting and the press. However, with the emergence of new modes of social control, based on technological aids such as data mining, pattern recognition and predictive algorithms, an alternative to the traditional notion of audience prediction is emerging.

The new mode of control should be contrasted with panoptic surveillance (Gandy, 1993) in that it is premised on ‘panspectric’ vision (Palmås, 2011). The nature of the panspectric gaze (DeLanda, 1991) differs from the panoptic gaze (Foucault, 1977), in that

instead of positioning some human bodies around a central sensor, a multiplicity of sensors is deployed around all bodies […]. The Panspectron does not merely select certain bodies and certain (visual) data about them. Rather, it compiles information about all at the same time, using computers to select the segments of data relevant to its surveillance tasks. (DeLanda, 1991: 206)

The ‘panspectric diagram’ involves not only the panspectric apparatus sketched above but, equally, ‘a set of knowledges and perspectives on the nature of human nature’ (Palmås, 2011: 339). Importantly, the ubiquitous nature of the panspectric gaze should not lead us to believe that it implies a situation of total supervision, nor horizontal interveillance (everyone watching one another). On the contrary, the actor holding the access to the database remains firmly at the centre of the panspectric diagram. Panspectrism is a mode of surveillance highly biased in favour of those in control of the monitoring infrastructures, compared to those lacking access or control. Moreover, it thrives on the real-time processing of entire audience populations, rather than on sample-based, inferential estimations of the whole.

What is ‘seen’, then, by panspectric algorithms? In short: detections, rather than measurements. Signifiers whose referent is no longer a human subject but a cluster of correlations – be it a topological mapping of relations (sociometry), or a topical mapping of preferences or even personality types (psychometry). This can be argued to constitute a qualitative shift compared to more classical statistical approaches aiming at validating or invalidating already proposed correlations. In this new mode, the detection of the correlation is the information:

Detections, however, are much wider than measurements; they don’t have a specific meaning, but they will have an impact if used or applied, and their meaning is produced by their application. In other words, the qualitative shift lies in the fact that correlations and profiles get generated before any preceding interest or question. This is why it can be said that humans have become detectable far beyond their control: their actions have become the resources of an extensive, if not unlimited, network of possible profiling devices generating knowledge affecting and impacting upon them. (Gutwirth and Hildebrandt, 2010: 33, authors’ emphasis)

When it comes to data generated from social networking sites and search, these detections denote traffic rather than usage; traces left by the machines rather than the human users. In the next section we will discuss how this shift in measurement has affected institutional views on audiences and media users over the years.

Institutional perceptions of the media user

So, how does such a shift in technologies for tracking media use and producing the audience commodity affect the relations between the producers within the media and culture industries, and ordinary media users? How are the roles of production and consumption affected, and how are they discursively constructed within the industry and the academy? Historically and from the perspective of the traditional (mass) media industry, media audiences were constructed from sociologically based Gaussian variables such as age, gender, socio-economic status, and education (Bjur, 2009; Napoli, 2011; Webster and Phalen, 1997). Initially, media users were targeted as consumers of media products, and with growing reliance on advertising for their revenues, the press, radio and television developed a ‘need’ to control – and thus survey – its potential customers (Bermejo, 2009). However, in the process of increased surveillance of these users, their status was shifted from that of the end consumer to the commodity itself. It was rather the advertising industry that was the commercial media producers’ prime customer. Thus, the first shift in the institutional perception of the audience was instigated within the media industries, where the media users turned from being customers to being the commodity.

The media user, or correspondingly the audience member, was no longer the customer. But how was he or she to be considered? In the process of increased measurement or surveillance of the audience, notions were born – not within the media and culture industries themselves but within Marxist political-economic perspectives in the academy – of seeing audience members as a labour force working in the service of media producers. Viewers were contributing to media companies’ profits through, for example, watching the commercials on television. Such ideas were first formulated in the 1970s by Smythe (1977), but were later further developed by Jhally and Livant (1986) and others.

Now, as already noticed at the time by Meehan (1984), there are problems connected to the imaginary of audiences as workers in the service of broadcasters. It is, for example, not entirely clear what kind of compensation the audience worker receives for his or her labour. It is clear that what the audience member receives is, at best, access to television shows, but it is also clear that this ‘salary’ cannot be further converted into, for example, food or housing for the worker. A possible way to solve this would be to make the analogy with (unpaid) household work, in the same way as feminists have argued that household work actually contributes to the value within production-consumption circuits (Folbre, 1982). However, (feminized) household work more obviously contributes to value generation by offloading (masculinized) workers from some of the labour needed to reproduce their labour capacity. This is not as obvious when it comes to the ‘work’ of watching television. As Bolin (2009, 2011: 32ff) has pointed out, drawing on theories of sign value by Baudrillard (1972/1981; 1973/1975), the statistical audience is an intangible commodity, crafted into being by the sampling procedures directed by the statisticians of the marketing departments of large-scale media corporations (or their subcontractors). Viewed in this perspective, the ‘labour’ of ‘active audiences’ is but an abstract estimation by statisticians; it is their information about the audience that becomes the prime raw material in the production process (cf. Bolin, 2012). Such a perspective is also more congruent with how the media industries relate to the media user (albeit not always framed within the same terminology).

However, the idea of the audience, the referent if you will, was still modelled on the social: the audience commodity was constructed out of Gaussian statistical categories like age, gender, income, education, etc. Admittedly, over the years, increasingly more fine-tuned technologies have been developed for constructing the audience commodity, for example through an increased regionalization of advertising, aiming for increasingly delimited local target audiences and the avoidance of ‘waste’ (Bolin, 2002). The principles for audience targeting (and hence measurement), however, were still founded on this Gaussian conception of the social.

With algorithmically based marketing, however, these ideas of the audience have changed in fundamental ways (Bermejo, 2009; Napoli, 2014). Data mining and especially algorithmic targeting can be seen as a move away from the social towards the relational. As Andrejevic (2013) has pointed out, there are a number of fundamental differences between the ‘old’ statistical, representational model of media user surveillance and the new, population-based models of algorithmic (panspectric) data mining, changing the very nature of the conception of social agents. The most fundamental of these is the divergence from the principles of representation. When audience information is produced through ‘population-level data capture’ (Andrejevic, 2013: 35), the idea that data correspond to the social through its representation becomes lost. In this ‘post-referential’ situation, correlation becomes the main focus. The explanatory dimension of representational statistics (e.g. ‘this group of people behave like this due to their social composition and their habitus privileging certain kinds of action over others’) becomes less important than the establishment of correlations between (probable) behavioural patterns. The socially explainable ‘who’ behind this pattern is less important than the algorithmically predictable behavioural ‘how’. From a pragmatic viewpoint, if prediction of behaviour is possible, the need on the part of the media industries is nullified (Andrejevic, 2013: 40): The key concern is the frequent detection of ‘sociograms’ or ‘psychograms’ (always ad hoc, never a priori) that match particular queries (e.g. ‘every user watching streamed episodes of TV show X’) – and how well the groupings generated can be matched with potential target groups (e.g. for advertising). ‘We only see aggregated data, never at the individual level, as we do not want to ruin the trust that our users have in Google’, says Jonas Wallentin, Nordic Sales Operations Manager, Google Sweden (in Karlsson, 2014).

In sum, the image of the media user as seen from the commercial side of the industry has been tightly connected to the commodity form, and has also shifted over the past century, from ‘the reader/listener/viewer’, over ‘the commodity’ and ‘the raw material’, to increasingly ‘the predictive behavioural pattern’ of today.¹ Seeing this non-representational, non-anthropomorphic character of the accumulated traces or ‘surveillant assemblages’ (Haggerty and Ericson, 2000: 606) of media users that are captured by the digital back-end infrastructure is, in our view, an apt corrective to the more anthropomorphized imagery of ‘data doubles’ (Haggerty and Ericson, 2000) or ‘digital persons’ (Solove, 2004). As Ruppert (2011: 218) puts it, by recourse to Foucauldian theory: ‘People are not governed in relation to their individuality but as members of populations.’ This is not to downplay the surveillant nature of these infrastructures but rather to de-anthropomorphize it.

Users’ perception of their changing roles

While the views upon audiences have changed over the years, media users’ self-perception has been dealt with much less. This is not strange, given that a great deal of audience research is descriptive and asks the same questions as the media industries (‘Who are the audience members? Why do they watch/read/listen?’), even in Big Data analysis (e.g. Bruns, 2012; cf. Lohmeier, 2014: 80). These questions are very seldom, if ever, asked by media users themselves. Media users are less interested in ‘who they are’, and more concerned with how to orient themselves in relation to the growing capacities of data collection of their everyday behaviour. Even if the perspective from the media industries, as Deuze (2011: 143) illustratively argues, is that ‘you are not special’, the ways data are collected provoke unease and a specific kind of self-reflexivity in relation to one’s own media habits. As Hacking (2002) has noted, the reflexive constitution of personhood is tightly interconnected with the means of classification of personhood available; historically, such classifications are malleable and prone to change, effectuating new possibilities for action and new kinds of persons.

Arguably, the present information society prompts users to become something akin to ‘everyday intelligence analysts’, united by a common logic: the need to make sense of a welter of information (Andrejevic, 2013: 4). This includes a need to consider how other actors classify you. As the social costs of staying outside social networking platforms are palpable, there is also an increased need to handle the fact that you are constantly ‘detected’, leaving bundles of digital traces when navigating in panspectric digital space. In the wake of this, a dialectic of the digital presents itself: as an opportunity structure (or informational possibility), and as a burden (or restraint). Opportunity structures typically manifest themselves in the form of greater access, for example to ‘locative news’ (Goggin et al., 2015) or to more convenient ways of finding nearby friends through Foursquare, Facebook, Tinder or other applications, while the burdens of sharing too much, or being unwillingly tagged in photos on social networking sites and commented on by known or unknown others, are revealed in the ways avoidance tactics are developed (cf. Hjorth, 2014).

In our own research, building on focus group and individual interviews with media users in Sweden, we have seen that social networking sites are used by all ages (at least in countries with high Internet penetration such as Sweden).² However, the extent to which older people engage with Facebook, Twitter, etc. is indeed limited compared to those who are younger. Older users spend considerably less time on social network services (SNSs), which can be partly explained by the fact that smartphone penetration is considerably higher among people under the age of 50 (Bolin, 2014). The differences in access (in Sweden) are illustrated in Figure 1.

Figure 1.

Access to smartphone and ordinary mobile phone among mobile phone users in Sweden by year of birth. Sliding mean, Q4 2013 (Bolin, 2014: 230).

Access to smartphones brings with it dramatically different usages of mobile telephony, and dramatically increases the amount of data each user produces. Not surprisingly, smartphone owners are considerably more active on SNSs, and also routinely use geo-local positioning functions. For example, while around 66 per cent of users born in the 1980s access SNSs via mobile on a daily basis, the corresponding figure for those born in the 1950s is 15 per cent (Bolin, 2014). Although the penetration of smartphones is higher among those born in the 1980s, access among those born in the 1950s is clearly above 50 per cent, and the difference in use is wider than what would be explained by difference in access. Our interviews with media users of various ages also show that the most active users are those who are the most concerned about both the gains and the perceived threats of digital, personal and locative media.

While few of our interviewees had chosen to drop out of SNSs, there were a few examples among them. The motivation behind this is often not only a mix of unease regarding the surveillant element, but also the feeling that SNSs waste valuable time. In two focus groups, high-school students related how friends had opted out because of painful experiences, with too much private information being distributed to too many irrelevant others. Several informants not only expressed resignation over the massive surveillance operations of Internet-based media, but also regarding swipe cards and other pervasive and ubiquitous media. One of our interviewees, who was working with network engineering, was explicitly concerned over the tracking procedures of the communications industries. Three other interviewees, also professional media producers in the early stages of their careers (two photographers and one filmmaker), had elaborate principles regarding whether or not to post on open network systems. On the other hand, one musician, equally in an early career stage, regarded social networking sites as an opportunity to reach out (cf. Baym, 2013). Some users had clearly developed avoidance tactics, including the systematic avoidance of certain services perceived to be based on algorithmic marketing (e.g. Gmail) or for security reasons (Internet banking), or careful privacy administration on Facebook. Overt techno-scepticism appeared more habitual among slightly older informants (i.e. born between the early 1960s and late 1970s), while ambivalence seemed to be a more apt characteristic among those introduced to the digital world during the formative years of their late teens and early twenties. Informants born into the digital world are more clearly living a ‘media life’ (Deuze, 2011) in the sense that their digital media use is less actively reflected on, is less controversial, and seems to produce less anxiety.

Based on our interviews with media users, we would also emphasize the tenacious structures we revealed. The shifts we describe in this article are not universal, and nor do they occur overnight. Since the knowledge among media users is varied, and few have explicit knowledge about the exact operations of algorithms and the surveillant assemblages, the possibility for agency (in terms of access, oversight, and command) is also unevenly distributed. The technological systems are far too complex for most laypeople to understand, although there is partial knowledge about the discourse on algorithms, digital tracking and Big Data. In the next section, we will discuss various translational strategies that are adopted in negotiations with technology – both among representatives of the media industries and among everyday users.

Big Data and operations of translation

Despite the promises the Big Data discourse offers, and the arguably technologically advanced modes of capturing the audience commodity, broad modes of address (often implying average types, i.e. Gaussian modes of audience interpretation) are still efficacious for mass appeal. Broadcasters and advertisers often remain faithful to the well-worn, scattershot broadcasting heuristic rather than taking the risk of relying on a highly tailored, convoluted process of identifying ‘relevant’ patterns and then tailoring communication to those fractions of user profiles that emerge.

There is clearly some tenacity inherent in the Gaussian mode of imagining the audience. According to representatives of Google’s Swedish head office (unnamed YouTube executive, personal communication, 20 March 2014), the company has been engaged in this debate for several years, in their dealings with media buyers. Still, as the cost of using the Big Data approach gradually decreases, norms and practices might change. What is more, the two need not cancel each other out. What YouTube tends to do is offer advertisers a parallel strategy alongside the conventional one, allowing customers to try the correlational mode of targeting as a complement to their usual strategy. This has been standard practice among traditional media industries in the past as well, when seeking to optimize advertising placement strategies; for example, already in the early 1990s Swedish commercial broadcaster TV4 developed sophisticated regionalized advertising strategies, building on refined (although still Gaussian) intelligence about niche audiences. However, 80 per cent of TV4’s advertising sales were still administered in the traditional way (Bolin, 2011: 53ff).

In the everyday appropriation of Big Data that we see taking place within institutional settings – i.e. the media industry, the advertising industry, editorial settings, and governmental intelligence – it is apparent that different institutional actors resort to different manoeuvres of ‘translating back’ into conventional doxa and praxis the novel insights garnered from the correlational modes of statistical reasoning found within database economies.

There is a recursive logic to the media and culture industries, in that we are all – to a lesser or greater degree – media users. This involves a somewhat circular game: (a) In their daily operation, professionals have to anticipate what the end-user will think and feel; the designers of user experiences relate back to what it is like to be a user (‘model users’, as idealistic constructs, often result in caricatures of the user; see Ivory and Alderman, 2009). (b) As we saw above, many everyday users try to anticipate what the surveillant media designs will do to them; these users, in a sense, try to anticipate the producers’ intentions – which involves a recourse back to (a). Put in more philosophical terms, there seems to be a ‘common logic’ of usage: one and the same relation, running through media users and data analysts – professional and non-professional alike.

The non-representational and correlational nature of Big Data, by definition, is highly abstract. In the daily transactions with such data or, equally, in the anticipations of what the data will register, ordinary actors in the life world will have to translate this abstraction back into more easily comprehensive categories. Figure 2 illustrates three such operations of translation that appear within the media and communications industries as well as among ‘ordinary’ media users.

Figure 2.

Ways Big Data correlations tend to be translated back into more manageable, familiar categories.

Referring to the first type of translation, it could be argued that the commercial media industries are characterized by institutional inertia; algorithmically based Internet technologies develop faster than the speed at which the conceptual principles they effectuate are established among the customers, i.e. those media agencies that act as brokers of advertising inventory. This mismatch is not exclusive to algorithmic surveillance, but can also be traced back to the efficacy of Gaussian models of advertising administration, as seen in the TV4 example above.

In the context of Big Data, to solve the ‘problem’ with varied understandings of how the algorithms function, those who work with predictive, algorithmically based targeting have a need to ‘translate back’ the correlational data into normal-type, normal-curve terminology in order to make the information intelligible for the buyer, since most buyers are used to thinking in terms of age, gender, income, education, etc. (unnamed YouTube executive, personal communication, 20 March 2014). In reference to the famous Peter Steiner cartoon from The New Yorker (5 July 1993), one could say that in the age of algorithmic surveillance, ‘on the Internet, nobody cares that you are a dog’, as long as your entry point into the system produces data relevant enough for you to be targeted with commercial messages. This is especially true when it comes to what we have elsewhere called the ‘general traffic commodity’ (Bolin, 2011: 122) – that is, the commodity consisting of volumes of bytes carried over infrastructure, i.e. the basis for telecommunications industry revenue. However, although the technological system as such might care very little about special sociological traits (i.e. whether or not you are a dog), the market for advertising does. Here, what is sought is the ‘specific traffic commodity’ – that is, the combination of traffic information that is algorithmically computable and possible to identify as a target point for advertising. This means that advertisers would be inclined to seek to translate algorithmically produced information into the language of traditional market segmentation.

Regarding the resilience of this conventional type of segmentation, algorithm-based companies such as Google and Spotify are developing technologies for structuring ‘user clusters’ through manual editing that greatly resembles earlier audience approaches (Gustavsson, 2014), since it ‘is far more efficient for Google to target relevant market segments than individuals’, and thus ‘Google has operationalized the techniques developed by its researchers for correlating real-world user attributes with features extracted from their online actions’ (Gould, 2014). Thus, rather than anticipating an either-or scenario for algorithmic surveillance for commercial purposes, one should most probably expect the approach in the future to be both-and: in some instances it will be unimportant whether or not you are a dog; in others, it will be of vital importance.

Secondly, if the advertising industry and advertising-based tech companies such as Google have an ambivalent and complex relationship to Gaussian and Paretian categories, government security agencies definitely do care about the specific individual. While in many cases a tenable percentage of predictability is acceptable for the commercial industry – like a CEO of a Big Data-based sentiment analysis company, as stated in Andrejevic (2013: 56), ‘If we’re right 75 to 80% of the time, we don’t care about any single story’ – for government surveillance the equation would be the inverse: It is rather the remaining 20 to 25 per cent of deviance that is of interest, if the objective is to track down potential terrorists or other individual threats to society (Gates, 2011; cf. Parks, 2014). The objective is still to make sociograms, but for the purpose of narrowing down one’s focus to a manageable number of particular individuals. In its search for outliers and ‘black swans’ (Taleb, 2007/2010) – that is, that which is beyond expectations – one could therefore say that government surveillance works in reverse compared to market surveillance. As Andrejevic (2013: 20) argues, there is a central difference between predicting everyday, recurrent habits and rare events; it is easier to predict sales of soap than mass murders. Whereas market surveillance strives for the former, governments are more interested in the latter.

Thirdly, in the advertising industry – and within journalism, one might add – we are seeing a mode of algorithmic thinking seeping into the editorial process, anticipating ‘viral’ popularity, as it seems, and responding to apparent patterns, e.g. in the Twittersphere or on YouTube, or through the active production of ‘click-bait’ journalism. Infrastructural actors such as Facebook are actively encouraging advertising agencies to operate through strategies of ‘content marketing’ and ‘social media marketing’, whereby the content is designated to have native appeal among consumers (unnamed Swedish Facebook executive, personal communication, 8 April 2013) and has to conform to Facebook’s own internal filtering and ranking of content. Here, there is both continuity with the standard mode of operation for content producers, anticipating audience preferences and popular appeal – but also a break, since this anticipation is modulated through networked accumulation coupled with a notion of purely relational measures of preference, disregarding standard categories of audience segmentation. We could ask ourselves if we are witnessing how the emergence of predictive algorithms affects sociocultural editorial norms, meaning that we are seeing an increased prevalence of cybernetic thinking, anticipating algorithmic success; a tendency that is arguably exacerbated – not dampened – by the editorial second-guessing provoked by the opacity of existing platform logics (cf. Gillespie, 2010; van Dijck and Poell, 2013).

This mindset can also be found among ‘ordinary’ media users, and is revealed in the actions of users of social networking sites such as Facebook, where the logics metrification steers user behaviour. As part of his artistic practice Benjamin Grosser (2014) launched Facebook Demetricator, an app that removes all the metrics from the social networking site so that the user does not see how many ‘likes’, friends or other metrics relate to postings. Although this has not been followed up by generalizable data, Grosser gives examples of how users have reacted to the application, and of how a ‘demetricated’ Facebook has changed their ways of using the site:

Demetricator reveals how Facebook draws on our deeply ingrained ‘desire for more’, compelling us to reimagine friendship as a quantitative space, and pushing us to watch the metric as our guide. But the metric is an agent of the system, a thing with intention that adheres to various powers, be they designers, programmers, Facebook the corporation, or the system itself. It places us within a graphopticon, asking us to evaluate the metrics of our friends while at the same time internalizing our need to excel quantitatively. (Grosser, 2014)

The metric of social networking sites, then, is both the driver and the expression of what we would like to call the metricated mindset whereby the quantities presented by the metrics – and the anticipation of popularity expressed – privilege certain types of social action and guide behaviour in digital space.

Conclusions

There is, in one sense, continuity between the statistical inference of yesterday and the predictive statistics of today, since the media and communications industries so heavily rely on the quantitative prediction of audiences and uses. However, one should heed the risk of confusing a shift in network architecture with the onset of networked knowledge per se, not least because what is technologically possible does not always correspond to practices developed either among the media industries or with ordinary media users.

The pressing issue at hand for media researchers is to better understand the ways industry and audiences alike relate to each one of these topics. For ‘exceedingly complex systems’, with their own inner dynamics (Pickering, 2010: 381), prediction is never about determinism; it can only address probabilities. Hence, it is important to discover the ways audience members think of prediction – do they conceive of it as determinism (predicting singular events, individuals, behaviours) or as a field of latent probabilities?

Further, one should ponder media users’ self-reflective abilities to understand themselves metaphysically or ontologically (Hacking, 2002) given that, in a historical perspective, there has been a move in the perception of audiences – from commodities to ultimately becoming data resources to be mined (Prodnik, 2012). Arguably, the nature of the mining in question has gradually shifted to highlight abstract bundles of behavioural patterns (correlations and sociograms), downplaying the referential subject. We ask future research to heed this shift to de-subjectification, and to even employ models of what could be called ‘post-referential’ agency.

Moreover, media users’ own relations to the networked accumulations generated should also be pondered – that is, the superabundance of data (surpassing time and attention), arguably surpassing narrative and hermeneutic intelligibility (Couldry, 2014; Mosco, 2014), and the manifest inequalities arising (in access as well as in reach), alongside the civic potentials for tactical anticipations (‘outsmarting’ the data glut, as it were).

As we have discussed above, there is no doubt that the media industries and the media users will develop their own heuristic understanding of algorithmic surveillance. Whether their respective heuristics will adjust to the technological possibilities and the de-subjectified logics of the algorithm, or align with principles from earlier modes of surveillance (e.g. re-subjectification), is an open question that remains for future empirical research to answer.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes

This article is part of a special theme on Data and Agency. To see a full list of all articles in this special theme, please click here: http://bds.sagepub.com/content/data-agency.

References

Anderson

(2006) The Long Tail: Why the Future of Business is Selling Less of More, New York, NY: Hyperion.

Andersson Schwarz

(2013) Online File Sharing: Innovations in Media Consumption, London: Routledge.

Andersson Schwarz J (forthcoming) Public service broadcasting and data-driven personalization: A view from Sweden. Television and New Media (Special issue: ‘Rearticulating audience engagement: Social media and television’).

Andrejevic

(2013) Infoglut. How too Much Information is Changing the Way We Think and Know, New York, NY: Routledge.

Andriani

McKelvey

(2011) Managing in a Pareto world calls for new thinking. Management 14(2): 89–118.

Barabási

(2002) Linked: The New Science of Networks, Cambridge, MA: Perseus.

Baudrillard

(1972/1981) For a Critique of the Political Economy of the Sign, St. Louis, MO: Telos.

Baudrillard

(1973/1975) The Mirror of Production, St. Louis, MO: Telos.

Baym

(2013) Data not seen: The uses and shortcomings of social media metrics. First Monday 18(10): . Available at: http://firstmonday.org/ojs/index.php/fm/article/view/4873/3752 (accessed 21 September 2015).

10.

Bermejo

(2009) Audience manufacture in historical perspective: From broadcasting to Google. New Media & Society 11(1&2): 133–154.

11.

Bjur

(2009) Transforming Audiences: Patterns of Individualization in Television Viewing, Göteborg: JMG.

12.

Boellstorff

(2013) Making big data, in theory. First Monday 18(10): . Available at: http://firstmonday.org/ojs/index.php/fm/article/view/4869/3750 (accessed 21 September 2015).

13.

Bolin

(2002) In the market for symbolic commodities. Swedish lottery game show ‘Bingolotto’ and the marketing of social and cultural values. Nordicom Review 23(1–2): 177–204.

14.

Bolin

(2004) The value of being public service. The shifting of power relations in Swedish television production. Media, Culture & Society 26(2): 277–287.

15.

Bolin

(2009) Symbolic production and value in media industries. Journal of Cultural Economy 2(3): 345–361.

16.

Bolin

(2011) Value and the Media. Cultural Production and Consumption in Digital Markets, Farnham, UK: Ashgate.

17.

Bolin

(2012) The labour of media use: The two active audiences. Information, Communication and Society 15(6): 796–814.

18.

Bolin

(2014) Skiftningar i mobillandskapet. In: Bergström

Oscarsson

(eds) Mittfåra och marginal, Göteborg: SOM-Institutet, pp. 229–237.

19.

Boyd

Crawford

(2012) Critical questions for big data. Information, Communication & Society 15(5): 662–679.

20.

Bruns

(2012) How long is a tweet? Mapping dynamic conversation networks on Twitter using Gawk and Gephi. Information, Communication and Society 15(9): 1323–1351.

21.

Castells M (2000) The Rise of the Network Society, 2 ed. Cambridge, MA, & Oxford: Blackwell.

22.

Couldry N (2014) Time and digital media: Time-deepening, time-deficits and narrative configuration. Draft paper to Time, memory and representation group, Södertörn University, 14 March.

23.

DeLanda

(1991) War in the Age of Intelligent Machines, New York, NY: Zone.

24.

Deuze

(2011) Media Life, Cambridge: Polity.

25.

Folbre

(1982) Exploitation comes home: A critique of the Marxian theory of family labour. Cambridge Journal of Economics 6(4): 317–329.

26.

Foucault

(1977) Discipline and Punish: The Birth of the Prison, New York, NY: Knopf Doubleday.

27.

Fuchs C, Boersma K, Albrechtslund A, et al. (eds) (2012) Internet and Surveillance: The Challenges of Web 2.0 and Social Media. London: Routledge.

28.

Gandy

Jr (1993) The Panoptic Sort: A Political Economy of Personal Information, New York, NY: Westview Press.

29.

Gates

(2011) Our Biometric Future: Facial Recognition Technology and the Culture of Surveillance, New York: New York University Press.

30.

Gillespie

(2010) The politics of ‘platforms’. New Media & Society 12(3): 347–364.

31.

Goggin

Martin

Dwyer

(2015) Locative news: Mobile media, place informatics, and digital news. Journalism Studies 16(1): 41–59.

32.

Gould J (2014) The natural history of Gmail data mining. Gmail isn’t really about email – it’s a gigantic profiling machine. Medium, 24 June. Available at: https://medium.com/@jeffgould/the-natural-history-of-gmail-data-mining-be115d196b10 (accessed 21 January 2015).

33.

Grosser B (2014) What do metrics want? How quantification prescribes social interaction on Facebook. Computational Culture: A Journal of Software Studies. http://computationalculture.net/article/what-do-metrics-want (accessed 15 June 2015).

34.

Gustavsson M (2014) Google utmanar Spotify i Sverige. Dagens Nyheter, 18 March.

35.

Gutwirth

Hildebrandt

(2010) Some caveats on profiling. In: Gutwirth

(eds). Data Protection in a Profiled World, Dordrecht: Springer.

36.

Hacking

(1990) The Taming of Chance, Cambridge: Cambridge University Press.

37.

Hacking

(2002) Historical Ontology, Cambridge, MA: Harvard University Press.

38.

Hacking

(1975/2006) The Emergence of Probability: A Philosophical Study of Early Ideas about Probability, Induction and Statistical Inference, 2nd ed. Cambridge: Cambridge University Press.

39.

Haggerty

Ericson

(2000) The surveillant assemblage. British Journal of Sociology 51(4): 605–622.

40.

Hart

(2004) From bell curve to power law: Distributional models between national and world society. Social Analysis 48(3): 220–224.

41.

Hart

(2010) Models of statistical distribution: A window on social history. Anthropological Theory 10: 67–74.

42.

Hjorth

(2014) Locating the social and mobile: A case study of women’s use of Kakao social media in Seoul. Asiascape: Digital Asia 1–2(2014): 39–53.

43.

Ivory

Alderman

(2009) The imagined user in projects: Articulating competing discourses of space and knowledge work. Ephemera: Theory & Politics in Organization 9(2): 131–148.

44.

Jensen

(2013) How to do things with data: Meta-data, meta-media, and meta-communication. First Monday 18(10): . Available at: http://firstmonday.org/ojs/index.php/fm/article/view/4870/3751 (accessed 21 September 2015).

45.

Jhally

Livant

(1986) Watching as working: The valorization of audience consciousness. Journal of Communication 36(3): 124–143.

46.

Karlsson

(2014) Vi träffar Jonas Wallentin som är operativ chef på Google och diskuterar Analys. Dagensanalys.se. 9 June.

47.

Kitchin

(2014) The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences, London: Sage.

48.

Lehmann N, Qvortrup L and Kampmann Walther B (eds) (2007) The Concept of the Network Society: Post-ontological Reflections. Frederiksberg: Samfundslitteratur.

49.

Lohmeier

(2014) The researcher and the never-ending field: Reconsidering big data and digital ethnography. Studies in Qualitative Methodology 13: 75–89.

50.

Mager

(2012) Algorithmic ideology. Information, Communication & Society 15(5): 769–787.

51.

Meehan

(1984) Ratings and the institutional approach: A third answer to the commodity question. Critical Studies in Mass Communication 1(2): 216–225.

52.

Merton

(1995) The Thomas theorem and the Matthew effect. Social Forces 74(2): 379–424.

53.

Mosco

(2014) To the Cloud: Big Data in a Turbulent World, Boulder, CO: Paradigm Publishers.

54.

Napoli

(2011) Audience Evolution: New Technologies and the Transformation of Media Audiences, New York, NY: Columbia University Press.

55.

Napoli

(2014) Automated media: An institutional theory perspective on algorithmic media production and consumption. Communication Theory 24(3): 340–360.

56.

Palmås

(2011) Predicting what you’ll do tomorrow: Panspectric surveillance and the contemporary corporation. Surveillance & Society 8(3): 338–354.

57.

Parks

(2014) Drones, infrared imagery, and body heat. International Journal of Communication 8: 2518–2521.

58.

Pickering

(2010) The Cybernetic Brain: Sketches of Another Future, Chicago, IL: University of Chicago Press.

59.

Prodnik

(2012) A note on the ongoing processes of commodification: From the audience commodity to the social factory. tripleC 10(2): 274–301.

60.

Ruppert

(2011) Population objects: Interpassive subjects. Sociology 45(2): 218–233.

61.

Smythe

(1977) Communications: Blindspot of Western Marxism. Canadian Journal of Political and Social Theory 1(3): 1–27.

62.

Solove

(2004) The Digital Person: Technology and Privacy in the Information Age, New York, NY: NYU Press.

63.

Taleb

(2007/2010) The Black Swan: The Impact of the Highly Improbable, 2nd ed. New York, NY: Random House.

64.

Urry

(2004) Small worlds and the new ‘social physics’. Global Networks 4(2): 109–130.

65.

van Dijck

(2014) Datafication, dataism and dataveillance: Big data between scientific paradigm and ideology. Surveillance & Society 12(2): 197–208.

66.

van Dijck

Poell

(2013) Social media logic. Media and Communication 1(1): 2–14.

67.

Webster

Phalen

(1997) The Mass Audience. Rediscovering The Dominant Model, Mahwah, NJ: Lawrence Erlbaum.

Heuristics of the algorithm: Big Data,user interpretation and institutional translation

Abstract

Keywords

Introduction

Two statistical principles

Ubiquitous tracking and prediction of the media audience

Institutional perceptions of the media user

Users’ perception of their changing roles

Big Data and operations of translation

Conclusions

Footnotes

Declaration of conflicting interests

Funding

Notes

References