Sage Journals: Discover world-class research

Abstract

The article examines the construction of “Big Data” in media discourse. Rather than asking what Big Data really is or is not, it deals with the discursive work that goes into making Big Data a socially relevant phenomenon and problem in the first place. It starts from the idea that in modern societies the public understanding of technology is largely driven by a media-based discourse, which is a key arena for circulating collectively shared meanings. This largely ignored dimension invites us to appreciate what matters to journalists and the wider public when discussing the collection and use of data. To this end, our study looks at how Big Data is framed in terms of the governmental use of large datasets as a contentious area of data application. It reconstructs the perspectives surrounding the so-called “Handygate” affair in Germany based on broadcast news and social media conversations. In this incident, state authorities collected and analyzed mobile phone data through a radio cell query during events to commemorate the Dresden bombing in February 2011. We employ a qualitative discourse analysis that allows us to reconstruct the conceptualizations of Big Data as a proper instrument for criminal prosecution or an unjustified infringement of constitutional rights.

Keywords

Media discourse media framing public understanding of Big Data radio cell query Handygate German media

Speaking about “Big Data” brings up a range of concerns about its technological intricacies, political significance, or cultural impact. The increasing engagement with structured sets of data held by actors ranging from telecommunication providers, retailers, and banks to health institutions and state agencies has been accompanied by debates about its character, conditions, and consequences. Unsurprisingly, given such broad interest, the ambiguous term “Big Data” points to a range of issues. For one, it is seen as a technological innovation in the collection and use of large datasets, whose sheer volume demands extensive computing capacities and technologies to convert empirical circumstances into data that can be aggregated and analyzed to generate probabilistic conjectures. Yet Big Data can also be viewed as a transformative ambition to make more accurate and reliable predictions in order to solve complex problems, from climate change to terrorism. Moreover, Big Data can be understood as representing a regulatory challenge due to the extensive data accumulation and control undertaken by state agencies or corporate ventures.

Looking more closely at one field of data application and the associated slice of the overall spectrum of contention, this study examines the communicative constitution of “Big Data” in media discourse. It focuses on the sector of state surveillance as an area, where “dataveillance” through the exploitation of aggregate data has stirred extensive debates about security, privacy, information control, and freedom (Lyon, 2014; van Dijck, 2014). In the study, we sampled and examined material from broadcast news reports and social media sources relating to the so-called Handygate affair. “Handygate” is a neologism using the German word for mobile phone and the suffix “gate” to invoke a sense of scandal. In the incident, government authorities inspected mobile phone data obtained through a radio cell query during the events to commemorate the Dresden bombing in February 2011. In order to reconstruct the circulating media frames, we employ a qualitative discourse analysis that allows us to reconstruct semantic patterns.

Our investigation ties in with debates in cultural analysis and critical data studies that question the ontology and epistemology of data (Andrejevic, 2014; Boellstorff, 2015; Halavais, 2015; Star and Bowker, 1999). Challenging their objective facticity and abstract neutrality, they hold that data presuppose interpretation. “Data need to be imagined as data to exist and to function as such,” Gitelman and Jackson (2013) thus argue, “and the imagination of data entails an interpretative base” (p. 3). It is in this regard that Big Data give rise to their own mythology, as boyd and Crawford (2012) point out, “the widespread belief that large data sets offer a higher form of intelligence and knowledge … with the aura of truth, objectivity, and accuracy” (p. 663). The article builds on this line of scholarship and looks at the “interpretative work” (Bowker, 2013:. 170) involved in making sense of data and the sociotechnological assemblage that shapes its production and understanding (Kitchin, 2014: 24). It follows Michael and Lupton’s (2015) call for a public understanding of Big Data that is communicated in contemporary media environments. “Publics are,” they say, “in varying degrees, both the subjects and objects of knowledge, both authors and texts, simultaneously informants, information and informed” (p. 105).

By taking the understanding of Big Data as a matter of contingent articulation, the study helps to dismantle claims about the given and irrevocable facticity of data formats and data analytics so as to explore ways of reimagining their status and implications. By doing so we seek to gain leverage in critically examining how its social imagination is shaped and also to enable alternative readings. Approaching Big Data through discourse resonates with other recent attempts to establish qualitative analyses in the field (Couldry and Powell, 2014; Ribes and Jackson, 2013). Problematizing the apparent implicitness of data and developing agendas for further engagement is, it seems, all the more necessary because considerable numbers of citizens only have a scant grasp on the multifaceted character of Big Data and have little capacity to address the potential impact on their everyday lives. For instance, a survey among US residents by the Pew Research Center found that although many held strong views about privacy and had little confidence that their data would remain secure, only about 10 percent actually changed their behavior to avoid being tracked (Madden and Rainie, 2015). Likewise, in the Eurobarometer survey on data protection conducted among people living in the EU, less than two out of 10 respondents thought they had complete control over their personal data and only one-fifth said they felt informed about the conditions of data collection. Yet just over four in 10 had ever tried to change their privacy settings (European Commission, 2015).

Problematizing Big Data in discourse

For sure, the epistemological shift that comes with using Big Data to make sense of the world itself raises many questions. Yet studying the way people in turn reason about the nature and impact of Big Data is a task in its own right, too. As Ricoeur (1984) reminds us, it is essentially through cultural sensemaking that people come to be able to place new technologies in social life (Couldry and Hepp, 2016: 115). Terminology and rhetoric enable individuals to gain awareness of and think intelligibly about data and data-related practices.

With this in mind, Portmess and Tower (2015) claim that the concept is “a trove of suggested meanings for semantic exploration” (p. 3). The metaphorical discourses around Big Data are therefore practical laboratories that “carry suggestive implications for exploring different ways of envisioning our relationship to emerging information technologies and embody forms of thought and practice” (p. 3). These become manifest in representations of a “dataverse,” “data deluge,” or “data explosion,” as well as in colloquial phrases of “mining,” “dredging,” or “harvesting” data. In the same vein, Puschmann and Burgess (2014) point to the interpretative flexibility of data analytics, which gives way to an “ongoing contestation over their exact meanings and values” (p. 1691). Looking at such emergent understandings, they analyze news items from US business magazines, technology journals, and reports issued by nonprofit organizations and consulting firms. On that basis, they describe the prevalent metaphors used to make sense of Big Data as either a force of nature to be controlled or as a form of nourishment to be consumed. So while the genesis of the word “Big Data” might be traced back to commercial contexts that favor particular kinds of data and data-based procedures, rhetorical instruments are, they argue, indispensable devices for configuring and disputing Big Data’s character and status and its corollaries, which arise when practices and discourses expand to other fields of application. Considering Big Data in historical, political, and sociological terms, Beer (2016) asserts that “in many ways the power dynamics of Big Data are to be found just as much in the way that those data are labelled and described as it is in the actual data themselves” (p. 2). Indeed, he adds, “given the difficulties of data access and the technical and computing skills required to analyse Big Data, it might even be argued that the concept has far greater reach than the material phenomenon” (p. 2).

It is important to note that the turn to the semantic layer of Big Data does not mean to detract from its concrete technological force. Quite the contrary, from a social constructivist perspective this move appreciates the eminent reality-making power of discourses, whose programs of thought actively shape the social constitution of Big Data and translate into peculiar practices, organizational forms, and institutions (Berger and Luckmann, 1967: 172; Keller, 2013). In modern societies, media-based discourses are thus key arenas for circulating cultural understandings and hence for articulating versions of social realities (Foucault, 1972). So discourses are not some supplementary interpretation of existing brute facts; instead their frameworks of intelligibility represent an integral dimension of how we engage with Big Data as a phenomenon and a problem. In this respect, a field of research has emerged that looks at the public understanding of technoscientific innovations more generally like genetics, bioengineering, or nanomechanics. As these are a matter of discursive contestation rather than neutral information, they are conceptualized with regard to different areas of application and reasoning that prioritize certain ways of knowing and acting (Burgers, 2016; Druckman and Bolsen, 2011). Discourses are “representing aspects of the world”, Fairclough (2003) writes, so that “different discourses are different perspectives on the world” (p. 124). Unsurprisingly, Big Data also form subject of such strategies, Kitchin (2014) diagnoses, which render it necessary to look at “how a powerful set of rationalities is being developed to support the roll-out and adoption of Big Data technologies and solutions” (p. 126). In this respect Beer (2016) suggests seeing Big Data as “an interweaving of a material phenomenon and circulating concept” (p. 4). Its basic rationality legitimizes the aggregation and computation of calculative and numerical knowledge by invoking notions of unprecedented scale, eradication of error, and increased analytical efficiency (Kitchin, 2014: 126). That way the discursive appropriation of Big Data is informed by a suggestive idea that the notion of “data” itself seems to evoke. It comes, Markham (2013) states, with the misleading assumption that data is raw, pre-existing entities that can be gathered and that prefigure any kind of analysis.

The Handygate Affair 2011

A useful way of dealing with both the forms that discourses take and their formative social power is offered by the study of frames. Frames are, according to Goffman (1974), “principles of organization” (p. 10) that allow people “to locate, perceive, identify, and label” (p. 21) phenomena. Capturing at least parts of their broad capacities, Reese (2001) holds that frames found in media texts represent “organizing principles that are socially shared and persistent over time, that work symbolically to meaningfully structure the social world” (p. 11). Media frames can be understood as strategic tools employed in debates on the contentious interpretation of issues so as to “mobilize potential adherents and constituents, to garner bystander support, and to demobilize antagonists,” as Snow and Benford (1988: 198) put it. To accomplish such tasks, frames define a phenomenon, condition, or process as problematic and in need of change. They attribute blame, illustrate alternatives, and urge people to act (Entman, 1993). In doing so, they give rise to the incorporation of privileged understandings in policy-making processes that reflect competing viewpoints of different advocacy coalitions (Schön and Rein, 1995).

Building on this line of inquiry, we assume that in the current broadcast media and online “mass self-communication” (Castells, 2009: 70), frames around Big Data have emerged that associate the term and the issues it purportedly refers to with different rationales, assessments, and effects. Instead of reconstructing some authoritative statement of what Big Data is or should be, an analysis of its social meanings should consequently look at fragments from a kaleidoscope of semantic perspectives. Rather than addressing its essence, the study of a particular discourse can assess the usage of the plastic notion in a field of analysis and concern. In broad terms, Kitchin (2014: 126) identifies four such major areas, which center around one of the pivotal tasks of “governing people,” “managing organizations,” “leverage value,” or “producing capital.” Focusing on the first of these, the present study looks at how Big Data is framed in terms of the governmental use of data as a contentious area of application and contention (for a study on the use of data by businesses, see Puschmann and Burgess, 2014). In this field, “Big Data is seen as a troubling manifestation of Big Brother, enabling invasions of privacy, decreased civil freedoms, and increased state and corporate control,” boyd and Crawford (2012: 664) summarize.

The study reconstructs the discursive perspectives surrounding the so-called Handygate affair in Germany from broadcast news reports and social media conversations. The Handygate affair refers to the aggregation and analysis of mobile phone data through a radio cell query executed by the police of the German state of Saxony during the February 2011 commemorations of the Dresden bombing by Allied Forces in the final months of World War II. The traditional ceremonies, which include vigils, readings, and bell-ringings, have been disrupted in recent years by marches of right-wing and neo-Nazi groups and counter-demonstrations from civil society organizations, opposition political parties, and left-wing organizations, all of whom are competing over the appropriate interpretation of the historic events (for background, see Joel, 2013). Radio cell query is a method of investigation used by the German police in order to catch data for all individuals who were using a telecommunications service close to an alleged crime scene. By obtaining a warrant under the German Code of Criminal Procedure (§100 g StPO), law enforcement authorities can gain access to the traffic data for all Global System for Mobile Communications (GSM) standard subscribers who were in the range of the radio cell tower closest to the crime scene at the time a serious crime was committed. In the operation, investigators kept track of cell phones (Cell-ID to establish spatial linkability) within the Dresden city center area during the protests and gathered about one million data records for almost 60,000 particular mobile devices (by means of the Mobile Subscriber Integrated Services Digital Network Number; MSISDN). The customer data available upon subscription makes it possible to link the MSISDN to an identified person. The police did not inform the individuals affected, as required by law. Revelations about the incident led to a political crisis, in which the allowability and utility of the action were scrutinized in parliament, by journalists, and among the wider public (Freedom House, 2014).

Undoubtedly, given the prevalence of certain forms of Big Data analytics such as machine learning, personalization, or nudging, the Handygate affair seems to be quite trivial and perhaps of limited interest and information value. Yet notwithstanding its somewhat anecdotal character, it served as a point of reference for the heated public debates that followed the disclosures by Edward Snowden as well as the outcry caused by several other information leaks on global surveillance programs. In the wake of these revelations, the Handygate affair provided a blueprint that helped to capture the social and technological complexity of such issues. It mobilized concerns about privacy, public free speech, and information control, which are often treated as a distinct feature of the intense German discourse on mass surveillance that in part arose due to a troubled history of extensive state intelligence (Krieger, 2004). The neologism “Handygate” became synonymous with data breaches more generally and was, for instance, taken up again in reference to the alleged monitoring of German Chancellor Angela Merkel’s cellphone by the US National Security Agency (Sanger and Mazzetti, 2013).

Compared to other data sets, the records gathered through the radio cell query were probably small in size, even though the data volume was open to debate. So despite official statements on the correct count of data records, the range of numbers that were circulated called the precision of provided information into question. It was not clear if this lack of certainty regarding the quantity of data was due to the inherent technical difficulties of doing an exact count or if the correct figure was available but not shared. Unsurprisingly, the latter assumption gave way to guesswork about the real dimensions of the data collected and the covert agendas for not disclosing them.

Moreover, definitions of data as being more or less “big” should not only be based on abstract measures; instead, it is important to consider the social relevance and normative valence ascribed to them in their original context of collection and analysis. So in 2011, at a time when the Dresden police along with the local and national public had just started to come to terms with such form of analytics, it was their assumed transformative import, not the actual volume that fueled the debate. “Big Data is less about data that is big,” boyd and Crawford (2012) conclude, “than it is about a capacity to search, aggregate, and cross-reference large data sets” (p. 663). So the narrow contours of the case allow us to reconstruct field-specific framings of Big Data in close detail. In its apparent insignificance, it can speak to those mundane or marginal areas where Big Data increasingly plays a role, too.

In contrast to other studies, we did not focus on the phrase “Big Data” itself (see Diebold, 2012; Puschmann and Burgess, 2014). Actually, the notion never surfaced in the discourse, as journalists and social media users typically used descriptions that helped them to illustrate, describe, or challenge the material and the incidents in question. This inconsistent use of terminology is not surprising given the then-evolving state of data practice. “Indeed, relying purely on the terminology of ‘Big Data’ is problematic,” Taylor et al. (2014) explain, “as economists and others working in this area may or may not actually use the term even if they are clearly operating within our definition or within a broader conception of computational methods” (p. 2).

Alongside the political, judicial, and administrative investigations that arose as part of the Handygate affair, a cross-media debate unfolded, involving broadcasting outlets as well as social media. The discourse encompassed different speakers, communicative forms, and messages and opened up diverse configurations for discussion and interaction beyond and in close relation to more traditional mass media. The analytical units collected for the study were all articles published on the incident by two national German daily newspapers (Süddeutsche Zeitung, Frankfurter Allgemeine Zeitung), four major local daily newspapers in the state of Saxony (Sächsische Zeitung, Dresdner Neueste Nachrichten, Freie Presse, Leipziger Volkszeitung), and their respective web editions. In addition, the online editions of the German weeklies Die Zeit and Der Spiegel were included. Websites and weblogs of the involved governmental agencies, political representatives, and civil society actors were also sampled as well as Facebook posts, Facebook comments, tweets, forum entries, and videos and comments from YouTube.

The sampling was done manually following a list of 35 keywords related to the topic. This register was established on the basis of a sub-sample of all collected newspaper articles published two weeks after the incident. The keywords, for example, “IMSI-Catcher”, “Handygate,” “Paragraph 100 g,” or “Soko 19/2,” served as search terms to trace further documents. In addition, snowball sampling allowed us to follow hypertext links to additional discursive fragments. In total, we collected 361 documents published between 23 February 2011 and 21 November 2011. From this corpus, we derived 3,031 single coding units as separate articles, website pages, text documents, blog entries, blog comments, forum or website posts, Facebook posts, Facebook comments, tweets, images, comments on images, transcribed video shots, and comments on videos.

Framing the Dresden radio cell query

We employed a qualitative discourse analysis that allowed us to reconstruct verbally constituted frames. The cumulative analytical process followed the open, axial, and selective phases of coding and conceptualizing taken from Grounded Theory (Glaser and Strauss, 1967). The material was primarily interpreted with regard to the following sensitizing aspects: the sequences of events (What events became topics of discourse, e.g., the neo-Nazi protests, counter-demonstrations, or cell-site retrieval?); the areas of problematization (How were data collection and analysis turned into issues and which aspects were taken up in which regard?); the ascribed responsibilities (Who was held accountable in what kind of function, e.g., as rioter, neo-Nazi, citizen, police officer, political opposition member, or public officials?); the suggested types of relations (What relations were established between agents, data records, data-related actions, and consequences/demands, e.g. the success of the operation, the deletion of data, government declaration, or political resignations?); the challenges for legitimacy (In what regard were actors and actions attributed or stripped of legitimacy, e.g., by deeming the actions/actors judicially reviewed, in need of procedural improvement, or antidemocratic?).

The first analytical phase focused on the formulation of initial codes associated with suitable semantic units from the material. The second phase comprised of developing, comparing, and integrating these codes into more comprehensive concepts, which constituted the provisional components of the frames. The final phase centered on the validation of meaningful relations between core concepts so as to reconstruct four coherent topical frames, that is, the justification frame, the criticism frame, the resignation frame, and the self-responsibility frame. The procedures were done in tandem by separately performing the different tasks and comparing their outcomes at the end of every step (for an overview of method, see Pentzold et al., 2016). Note that in line with the interpretative process it is not possible to calculate the distribution of frames but only to evaluate their salience.

Justification frame

The justification frame understood the activities as lawful activities that were fundamentally in need of better presentation. In light of a presumed criminal offense on part of the violent protesters, this perspective argued that the collection and analysis of data by the police should be viewed as a proportionate and correctly executed instrument. Ultimately, according to this perspective, the inquiry was put in place to safeguard public security as a core value of civic life and to protect the rights of the demonstrators regardless of their political views, a right that was violated by the counter-protests. In the face of these imperatives, the flaws in procedural implementation and public communication should be regarded as negligible and will be improved. This reasoning was expressed in a passage taken from a local newspaper:

“The collected data should only be used in trials investigating the breach of the peace and not against protesters of the deployment. Only connection data were collected, not the content of conversations or short messages. Moreover, the data of innocent bystanders were deleted immediately. The justification for the action comes from a ruling of the Dresden district court.” (Freie Presse, online edition, 21 June 2011; our translation)

According to this perspective, the inquiry was justified due to a legally approved and operationally necessary investigation that followed the outbreak of riots. Thus, the procedures were warranted because the investigation had a rightful purpose, that is, to prosecute the criminal actions that had presumably been committed, and because the investigation made progress by exploiting the accumulated data. Only the allocation of competences and the public presentation were considered to be in need of improvement. Hence, the shortcomings were considered to be grounds for better investigations, but not for their prohibition (cf. Boellstorff, 2015: 104).

The affirmative frame focused on violent protesters and their criminal actions on the one hand and the appropriate reactions of executive and judicial institutions on the other. This bipolar classification of agents helped to weaken other positions, for instance, that of nonviolent protesters and civil rights groups, because it offered the choice between successful prosecutions through data analytics or the failure to impose responsibility under criminal law. As a result, the critics of the incident were marginalized either due to their supposed indifference towards the excesses or their lack of involvement—assuming they had not taken part in the violent offenses themselves, they need not fear the investigation but should support its cause.

The justification frame was especially taken up in newspaper reports when these discussed governmental and police perspectives. It was also found in official statements and interviews, on Facebook and Twitter accounts of political actors or declared supporters of the then ruling Christian Democratic Party (CDU) as well as on related blogs, websites, and in affirmative comments posted on these sites.

Criticism frame

The criticism frame was used to disapprove of the radio cell query because it constituted, according to this line of thought, an unjustified and disproportionate abuse of power by the law enforcement authorities in the state of Saxony. In this perspective, the data collection and analysis were evidence of governmental surveillance that contravened the counter-demonstrators’ constitutional right of freedom of assembly, the sanctity of telecommunications, and the protection of privacy. According to this perspective, the investigation’s generalized coverage of nonspecific people in the vicinity of the commemorative events, for whom there were no initial suspicions, was not in proportion to the intelligence gained. This view is, for example, voiced in a Facebook post which also draws a parallel that is often referenced in this frame to the Ministry of State Security (“Stasi”) of the former German Democratic Republic:

“… putting a whole area under general suspicion is quite serious. Reminds me strongly of Stasi intrigues. Names and sources must be disclosed very quickly: who was the investigator in charge and what judge approved this.” (Source: Facebook profile, 21 June 2011; our translation)

The dismissive frame saw the collection of traffic data and metadata among counter-demonstration participants, who ultimately stood up against right-wing ideologies, as improper. The reasons provided in this regard are the lack of success of previous inquiries and the sheer technological feasibility of gathering plenty of data. The gathered data was challenged for its ability to provide individual profiles, and the measures were declared to be out of proportion to the insignificance of the crimes. Moreover, this frame holds the police and government responsible for covering up the incident. Such collusion reflected, so the critical argument went, the democratic deficits among the state authorities, somewhat ironically named the “Saxon state of democracy.” Collusion would lead to the debasement of civil cooperation, not its vindication. In consequence, this frame called for an official inspection of the incident and the formulation of principles for future large-scale data analyses. In addition, this perspective demanded transparency regarding screening operations and administrative decision-making processes. It also asked for a more widespread public outrage and, in a more tangible form, demanded the resignations of the responsible office-holders. The frame concentrated on activists involved in the counter-demonstrations and the nonparticipating citizens who were inadvertently involved in the screening.

The criticism frame was mainly employed in newspapers, including in critical journalistic reports, and in statements from oppositional politicians, data protection officials, and representatives of civil rights organizations. Accordingly, this frame was prominent on the websites of those actors and on their profiles on social media platforms. It also manifested itself in a number of lawsuits and requests for information.

Resignation frame

The resignation frame was mobilized to declare the matter an uncontrollable and inevitable condition of modern societies, which did not come as a surprise but attested to the reality of all-encompassing state monitoring. The data collection and analysis were hence neither justified nor rejected but presented as a historically continuous and omnipresent aspect of everyday contemporary life that was promoted by state interests. In this regard, one reader commented on the site of the national newspaper Süddeutsche Zeitung (SZ):

“The problem is this: tomorrow, the news will be forgotten and without the SZ we would not have known of this anyway. Who knows how often a complete radio cell query happens? Nothing changes …” (Source: online edition, 20 June 2011; our translation)

Those who were disillusioned as a result of the incident and its wider political and societal context saw the reasons for the collection of sensitive data as related to technological progress on the one hand and the continuous efforts to increase surveillance on the other. The frame envisaged a coalition of unaccountable police, administration, and party politicians as standing against powerless citizens. It thus established a sort of hierarchy between those engaged in data practices and those without the ability to elude such operations. This reasoning diagnosed a data abuse without consequences. It bemoaned the fruitless protests in the face of an unimpressed surveillance apparatus and sought to scrutinize the antidemocratic public conditions. In effect, because it saw no effective remedies, the perspective suggested adopting a wait-and-see approach regarding things to come instead of advocating concrete legal or political measures. Overall, it was characterized by an ironic and somewhat distant attitude towards the incident. In line with this, it featured a number of incoherent references to the historic Stasi, Orwell’s novel 1984, and the United States as supposed vanguard of citizen surveillance. It thus constructed a kind of cumulative argument about the persistence of such data practices regardless of their dubious character.

The pessimistic view was particularly evident in posts and personal comments on social media platforms as well as in the readers’ comments sections of online newspapers but it was virtually absent from press reports and official statements. So, while it seemed to have helped individual citizens to voice their inability to address the actions of the state, journalistic media outlets avoided this discourse, perhaps because such defeatism and lack of critical vision would not be compatible with their self-image as fourth estate.

Self-responsibility frame

The self-responsibility frame deemed the radio cell query to be a self-inflicted failure of the people in question who ultimately made it possible through their imprudent use of mobile phones. Therefore, avoiding such data collection and analysis was treated as a personal responsibility of the users themselves, who, it was thought, should take care of their technological devices, for instance, by managing the subscriber identity module (SIM) integrated in their cell phones. In a post, a Facebook user expressed such an idea:

“Things like this often happen, only this time it was in the media. We say this: Leave your cell phones at home or take your SIM cards out …” (Source: Facebook post, 20 June 2011; our translation)

This viewpoint argued that it was possible to scrape sensitive information because of the careless exposure of personal data. It addressed itself to self-responsible citizens and naïve users of technology, who, it claimed, should seek to acquire capabilities to obfuscate their data and thus repel surveillance by state institutions. As a consequence, this perspective called on people to engage in counteractions like swapping SIM cards, encryption, and responsible data use, which would allow the citizens now exposed to the gaze of the police to be incommunicado or to disguise their personal identities (see Dencik et al., 2016). In its more extreme form, it called for technological abstinence, such as not having a cell phone at all.

Like the resignation frame, the self-responsibility frame mainly arose in posts or comments on social media platforms as well as in the readers’ comments sections of online newspapers. Hence, journalistic media outlets steered away from speculation about DIY hacks. They did not offer practical help or ponder possible loopholes but relied on the corrective force of political and legal processes.

Data discourse and data practice

In the Handygate affair, the aggregation, tabulation, and consolidation of data records affected the police’s way of knowing and doing its inquiry. In addition, these emerging data practices stimulated a discourse about the necessity, justification, proportionality, and effectiveness of a data-driven criminal justice system.

The incident and its discourse might already be called historical or dated given the swift transformations of datafication. Yet despite all the progress made, computing and quantification is still in a state of exploration, in which we find a vital entanglement of data discourse and practice. So today, law enforcement agencies in different countries are seeking to make sense of and utilize advanced technologies, such as predictive policing and crime prevention software (Mantello, 2016). These scenarios for computational methods can either be full of potential or fraught with pitfalls depending on how they are imagined and implemented. In this respect, the topic frames found in the Handygate discourse assumedly also give voice to four more general strands of reasoning that semantically structure other debates as well. “There is undoubtedly more to Big Data than its discursive framing, it has material properties that make it Big Data,” Beer (2016) thus recaps, “yet the particularities of that discursive framing shape those material presences and the integration of Big Data into broader social structures and orders” (p. 7).

Against this backdrop, the discourse around the Handygate affair can help us to reflect more on the public opinion regarding Big Data issues more broadly. Hence, the insights gained from studying these collective social understandings can ground consultations and inform policy processes that seek to take into account what matters to journalists and the wider (social) media audiences when these address the collection and application of personal information. Moreover, considering the aspects the press and media users discuss can also improve the communication of big-dated related operations and regulatory decisions to stakeholders and social groups.

A theme that ran through the four different frames and surfaces, we assume, in other situations of public contestation around Big Data too related to possible additional investigations. In this respect, van Dijck (2014) already stated that “whereas surveillance presumes monitoring for specific purposes, dataveillance entails the continuous tracking of (meta)data for unstated preset purposes” (p. 205). In order to cover a whole population of people and to maximize future analytical insights, the data recorded in the radio cell query was supplemented with other datasets that the police collected for example through oral interrogations and participant observation. This quest for completeness was coupled with a belief in granularity, that is, Ruppert et al. (2013) explain, “the way that amalgamations of databases can allow ever more granular, unique, specification” (p. 38).

The attempt, however, to gain such a thorough overview implied to accept that the data were heterogeneous, not uniform. The discourse consequently revolved around the correctness of the data and the way the expansive accumulation actually multiplied rather than reduced the “data doubles” of offenders. These problems not only related to noise in sorting relevant from irrelevant data. The inclusion of people regardless of their participation (or noninvolvement) in the local protests also raised a question of legitimacy. In consequence, critical commentators stressed the need to establish clear links between the data and individual suspects, which would justify the measure. This would mean recontextualizing the abstract data in line with the gravity of offenses (blockade, riot, assault, material damage, etc.) and the status of the people within the scope of the radio cell query (hooligans, protesters, residents, passersby, etc.). In this setting, the question of harm became relevant as well. On one level, the issue was the harm done to the inalienable rights of individual citizens—harms to their freedom of assembly and protection of privacy. At another level, it pointed to the violation of democratic norms. In the eyes of its critics, it was understood as an attack on a liberal political culture because it targeted those who mobilized against the revisionist usurpation of war memory.

The discourse was furthermore not only characterized by divergent perspectives on how to assess the legality and usefulness of the monitoring. The different speakers also possessed partial knowledge about the events in the streets and the parallel police investigations. Interestingly, rather than having a privileged position as overseers of the operation, even the representatives of the police and state government claimed to have limited insight into the turbulent situation because of masked troublemakers and a lack of eyewitnesses; this they stated, underscored the need for electronic surveillance. In turn, civil society groups and oppositional politicians claimed to possess this local knowledge, which they used to challenge arguments that the gathering of mobile phone data was necessary. So the disputes were consequently also an attempt to shed light on the data and on the reasons for using them; however, the division of roles was not straightforward. Thus, there were disagreements about who was informed (the minister of justice, the ministry of the interior, the attorney general, etc.), who needed to be informed (the administrative tribunal, parliamentary committees, Saxon state minister, etc.), or who wanted to be informed (data security officer, journalists, advocacy groups, etc.) and about which kind of evidence. Unsurprisingly, the superintended of the Dresden police, Dieter Hanitsch, had to leave office due to a “lack of information” (Informationsdefizit) according to a statement by Saxony’s minister of the interior, Markus Ulbig.

Conclusion

The study presented in this article started from the idea that large data sets and the associated analytical operations are related to public debates in intricate ways. In such debates, their cultural significance, social value, and political relevance are negotiated. Media discourses are thus not only spheres of communicative deliberation that are somewhat distant from actual engagement with aggregate data. Instead, they form an integral part of the current reality of data practices and they allow people to imagine different future scenarios that remain to be configured.

Investigating this articulatory dimension, which is deeply implicated in all forms of handling Big Data, the analysis examined one contentious area, namely the exploitation of large data sets by state authorities for surveillance and police action. It used material from the debate surrounding the Handygate affair 2011 in Germany to look at the collective, media-based meaning work accomplished to make sense of this particular data application and to question its rationale and implications. In order to examine the semantic variations in evaluations of Big Data, the qualitative analysis reconstructed media frames, that is, coherent patterns of social meaning that materialize in discursive utterances. In their basic design, the frames named events, identified occasions and reasons, inferred demands, and pointed to consequences as well as to involved agents and actions. Overall, the frames found in the broadcast news reports and social media communications revolved around a fundamental conflict: If the use of data gathering in this incident is to be viewed as a proper instrument for criminal prosecution or as an unjustified impairment of constitutional rights. These contradictory perspectives translated into different frames that justified or criticized the activities. Other frames were employed to either express frustration or to hold the citizens responsible for countering the actions they criticized.

The study underlines the vital link between discourses and practices, which might be exploited for alternative ways of conceptualizing Big Data. One immediate task would be to look for common or divergent framings in other debates beyond the narrow case we analyzed. As of now, the frames surrounding the Handygate affair pertain to the subject matter, yet in their overall structure they could apply to other fields as well. So they may be more than just issue-specific frames that are exclusive to the details of the case, and they might transcend such thematic limitations and thus apply to different discourses, too. Most notably, the justification and criticism frames may also apply in other arenas because they involve the rhetoric of polarized expectations and anxieties accompanying technology diffusion more broadly (Feenberg, 2002). Comparisons of different framings will complement studies that account for the fields of Big Data analytics in general (Lupton, 2015). With respect to this challenge of seeing the similarities and differences across sectors, Ulbricht and Grafenstein (2016) acknowledge that “we cannot observe one phenomenon that we would call Big Data: there are many” (p. 5).

Capitalizing on the conjunction between discourses and action, social movements have been campaigning for their cause in media forums, and not only with respect to parliamentary policy-making or in the bargaining between interest-groups. Research on, for example, the women’s rights movement or the antinuclear movement and their meaning work, that is, “the struggle over the production of mobilizing and countermobilizing ideas and meanings” (Benford and Snow, 2000:. 613) has thus been interested in the conflict-laden production of public discourses. It does not operate aside from political and societal fault-lines but is rather a catalyst in enacting different socio-political agendas. As for Big Data discourses, one task would then be to track the discursive coalitions involved in contentious disputes around Big Data applications. Instead of only looking at hegemonic worldviews, such an analysis could also be sensitive to marginal or neglected framings. With regard to the Handygate affair, the next step could therefore be to compare the professional news reporting with the amateur social media communication to examine if these posts and comments still mostly mirror broadcast-media positions and are thus guided by core aspects of frames established in journalistic media or if they allow users to develop divergent perspectives (e.g., Cacciatore et al., 2012; Gerhards and Schäfer, 2010). The insight that the resignation frame and the self-responsibility frame was particularly strongly expressed in text fragments derived from social media might be taken as a sign for such an extension of the discursive spectrum.

Yet to say that there are multiple framings does not necessarily mean that there are infinite ways of understanding what Big Data is or should be. Big Data, in this sense, is “more than one but less than many” (cf. Mol, 2002). More precisely, whereas Foucault (1972) highlights the productive force of discourses in generating knowledge and legitimating points of view, he is keen to stress that their politics of representation also work by way of exclusion. Hence, discursive “rules of formation” (p. 38) regulate the inventory of conceivable positions of what to acceptably talk, think, and know. In other words, while there may be multiple ways of viewing Big Data, these are not arbitrary, but embedded in social contexts in which diverse stakeholders operate. Their perspectives on Big Data translate into different framing perspectives, which should not only be analyzed for what is made salient but also for what is left out.

Footnotes

Declaration of conflicting interests

The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

References

Andrejevic

(2014) The Big Data divide. International Journal of Communication 8: 1673–1689.

Beer

(2016) How should we do the history of Big Data? Big Data & Society 3(1): . Available at: http://journals.sagepub.com/doi/abs/10.1177/2053951716646135 (accessed 23 November 2017).

Berger

Luckmann

(1967) The Social Construction of Reality, Garden City, NY: Anchor Books.

Benford

Snow

(2000) Framing processes and social movements. Annual Review of Sociology 26: 611–639.

Boellstorff

(2015) Making Big Data, in theory. In: Boellstorff

Maurer

(eds) Data. Now Bigger and Better, Chicago, IL: Prickly Paradigm Press, pp. 87–108.

Bowker

(2013) Data flakes. In: Gitelman

(ed.) “Raw Data” is an Oxymoron, Cambridge, MA: MIT Press, pp. 167–172.

boyd

Crawford

(2013) Critical questions for Big Data. Information, Communication & Society 15(5): 662–679.

Burgers

(2016) Conceptualizing change in communication through metaphor. Journal of Communication 66(2): 250–265.

Cacciatore

Anderson

Choi

D-H

et al. (2012) Coverage of emerging technologies: A comparison between print and online media. New Media & Society 14(6): 1039–1059.

10.

Castells

(2009) Communication Power, Oxford: Oxford University Press.

11.

Couldry

Hepp

(2016) The Mediated Construction of Reality, Cambridge: Polity.

12.

Couldry

Powell

(2014) Big data from the bottom up. Big Data & Society 1(2): . Available at: http://journals.sagepub.com/doi/abs/10.1177/2053951714539277 (accessed 23 November 2017).

13.

Dencik

Hintz

Cable

(2016) Towards data justice? The ambiguity of anti-surveillance resistance in political activism. Big Data & Society 3(2): . Available at: http://journals.sagepub.com/doi/abs/10.1177/2053951716679678 (accessed 23 November 2017).

14.

Diebold F (2012) On the origin(s) and development of the term “Big Data“. Penn Economics Working Paper No. 12-037. Available at: https://economics.sas.upenn.edu/pier/working-paper/2012/origins-and-development-term-“big-data (accessed 23 November 2017).

15.

Druckman

Bolsen

(2011) Framing, motivated reasoning, and opinions about emergent technologies. Journal of Communication 61(4): 659–688.

16.

Entman

(1993) Framing: Toward clarification of a fractured paradigm. Journal of Communication 43(4): 51–58.

17.

European Commission (2015) Special Eurobarometer 431: Data Protection. Available at: http://ec.europa.eu/public_opinion/archives/ebs/ebs_431_en.pdf (accessed 23 November 2017).

18.

Fairclough

(2003) Analysing Discourse. Textual Analysis for Social Research, London: Routledge.

19.

Feenberg

(2002) Transforming Technology, Oxford: Oxford University Press.

20.

Foucault

(1972) The Archaeology of Knowledge, London: Tavistock.

21.

Freedom House (2014) Freedom on the Internet: Germany. Available at: http://www.refworld.org/pdfid/549026060.pdf (accessed 23 November 2017).

22.

Gerhards

Schäfer

(2010) Is the internet a better public sphere? New Media & Society 12: 143–160.

23.

Gitelman

Jackson

(2013) Introduction. In: Gitelman

(ed.) “Raw Data” is an Oxymoron, Cambridge, MA: MIT Press, pp. 1–14.

24.

Glaser

Strauss

(1967) The Discovery of Grounded Theory, Boston: Aldine.

25.

Goffman

(1974) Frame Analysis: An Essay on the Organization of Experience, Cambridge/MA: Harvard University Press.

26.

Halavais

(2015) Bigger sociological imaginations: Framing big social data theory and methods. Information, Communication & Society 18(5): 583–594.

27.

Joel

(2013) The Dresden Firebombing. Memory and the Politics of Commemorating Destruction, London: I.B. Tauris.

28.

Keller

(2013) Doing Discourse Research, London: Sage.

29.

Kitchin

(2014) The Data Revolution: Big Data, Open Data, Data Infrastructures & Their Consequences, London: Sage.

30.

Krieger

(2004) German intelligence history. Intelligence and National Security 19(2): 185–198.

31.

Lupton D (2015) The Thirteen Ps of Big Data. Available at: https://simplysociology.wordpress.com/2015/05/11/the-thirteen-ps-of-big-data/ (accessed 23 November 2017).

32.

Lyon

(2014) Surveillance, Snowden, and Big Data: Capacities, consequences, critique. Big Data & Society 1(2): . Available at: http://journals.sagepub.com/doi/abs/10.1177/2053951714541861 (accessed 23 November 2017).

33.

Madden M and Rainie L (2015) Americans’ Attitudes about Privacy, Security and Surveillance. Available at: http://www.pewinternet.org/2015/05/20/americans-attitudes-about-privacy-security-and-surveillance/ (accessed 23 November 2017).

34.

Mantello

(2016) The machine that ate bad people: The ontopolitics of the precrime assemblage. Big Data & Society 2(3): . Available at: http://journals.sagepub.com/doi/full/10.1177/2053951716682538 (accessed 23 November 2017).

35.

Markham

(2013) Undermining ‘data’: A critical examination of a core term in scientific inquiry. First Monday 10(18): . Available at: http://firstmonday.org/article/view/4868/3749 (accessed 23 November 2017).

36.

Michael M and Lupton D (2015) Toward a manifesto for the “public understanding of big data”. Public Understanding of Science. Epub ahead of print 2015. http://journals.sagepub.com/doi/abs/10.1177/0963662515609005 (accessed 23 November 2017).

37.

Mol

(2002) The Body Multiple, Durham, NC: Duke University Press.

38.

Pentzold C, Sommer V, Meier S, et al. (2016) Reconstructing Media Frames in Multimodal Discourse: The John/Ivan Demjanjuk Trial. Discourse, Context & Media 12: 32–39.

39.

Portmess

Tower

(2015) Data barns, ambient intelligence and cloud computing: The tacit epistemology and linguistic representation of Big Data. Ethics and Information Technology 17: 1–9.

40.

Puschmann

Burgess

(2014) Metaphors of Big Data. International Journal of Communication 8: 1690–1709.

41.

Reese

(2001) Framing public life: A bridging model for media research. In: Reese

Gandy

Grant

(eds) Framing Public Life, Mahwah, NJ: Erlbaum, pp. 7–31.

42.

Ribes

Jackson

(2013) Data bite man: The work of sustaining a long-term study. In: Gitelman

(ed.) “Raw Data” is an Oxymoron, Cambridge, MA: MIT Press, pp. 147–166.

43.

Ricoeur

(1984) Time and Narrative Vol. 2, Chicago, IL: University of Chicago Press.

44.

Ruppert

Law

Savage

(2013) Reassembling the social science methods. Theory, Culture & Society 30(4): 22–46.

45.

Sanger DE and Mazzetti M (2013) Allegation of U.S. spying on Merkel puts Obama at crossroads. New York Times, 24 October.

46.

Schön

Rein

(1995) Frame Reflection: Toward the Resolution of Intractable Policy Controversies, New York, NY: Basic Books.

47.

Snow

Benford

(1988) Ideology, frame resonance and participant mobilization. International Social Movement Research 1: 197–219.

48.

Star

Bowker

(1999) Sorting Things Out. Classification and Its Consequences, Cambridge, MA: MIT Press.

49.

Taylor

Schroeder

Meyer

(2014) Emerging practices and perspectives on Big Data analysis in economics: Bigger and better or more of the same? Big Data & Society 1(2): . Available at: http://journals.sagepub.com/doi/full/10.1177/2053951714536877 (accessed 23 November 2017).

50.

Ulbricht

Grafenstein

(2016) Big Data: Big power shifts? Internet Policy Review 5(1): . Available at: http://policyreview.info/articles/analysis/big-data-big-power-shifts (accessed 23 November 2017).

51.

van Dijck

(2014) Datafication, dataism and dataveillance. Surveillance and Society 12(2): 201–208.