Abstract
It has become increasingly common to talk about “digital traces”. The idea that we leak, drop and leave traces wherever we go has given rise to a culture of traceability, and this culture of traceability, I argue, is intimately entangled with a socio-economics of data disposability and recycling. While the culture of traceability has often been theorised in terms of, and in relation to, privacy, I offer another approach, framing digital traces instead as a question of waste. This perspective, I argue, allows us to connect to, extend and nuance existing discussions of digital traces. It shows us that data traces raise questions about not only how data capitalism tracks individual and multiple data behaviours, but also how it links to social and environmental toxicities in the form of abuse and environmental pollution, which follow gendered and colonial structures of violence.
This article is a part of special theme on Knowledge Production. To see a full list of all articles in this special theme, please click here: http://journals.sagepub.com/page/bds/collections/knowledge-production
Big Data: A matter of waste
Datafication is often framed as an innovative form of knowledge production that distils ‘old’ information into pure data points that can then be repurposed for new insights (Flyverbom and Murray, 2018; Kitchin, 2014; Ruppert et al., 2017). While the emphasis is often put on novelty in matters of digitisation, this article argues that the logic of datafication is premised on a logic of waste and recycling, with significant implications for how we consider datafication’s politics and ethics. To support my argument I draw on the insights of discard studies, first to unpack the concept of digital traces as a question of waste, second to link digital traces to an economic logic of extraction, and third to show how the data culled from these processes of extraction is never ‘pure’ but rather marked by the traces of the bodies from whence it came.
We arguably live in a new culture of traceability. Platforms accumulate and distribute subjects’ traces, using machine learning processes to construct histories – and ultimately patterns – of action (Chun, 2016b). The concept of digital traces thus signifies more than pure data points (Hepp et al., 2018: 2); rather, digital traces raise questions about how we make meaning with Big Data. Understanding datafication as a question of meaning-making attunes us to two major claims: that the knowledge produced by Big Data is also marked by relations of power; and that these relations constitute a system of representation, which consists of different ways of organising, clustering, arranging and classifying concepts, and establishing complex relations between them (Hall, 2013). Within this system, we find a constant recycling of digital traces that extracts and repurposes previous forms of meaning, embedding them in new representations and ‘dramas’ of Big Data (Chun, 2016a). Big Data knowledge production, as Lisa Blackman (2020) notes, is therefore a question of invention as much as discovery.
The discourse of digital traces plays a special role in today’s digital economy, which largely bases itself on the assumption that the knowledge gleaned from the digital traces left behind by users (intentionally or unintentionally) translates into power in the marketplace. Digital traces have become ‘a primary resource for value creation, influence, and knowledge production’ in Big Data (Flyverbom and Murray, 2018: 2), because they can help ‘internet companies know users, gain insights about customer preferences and design new products and markets’ (Flyverbom and Murray, 2018: 6). Investments are therefore made to collect and extract even more digital traces from users of platform-based online services. At the same time, however, the close monitoring, collection and analysis of digital traces has also given cause for concern, resulting in new user behaviours that seek to minimise users’ digital traces in online spaces – for instance, through ‘detoxing’ and other practices of disconnectivity (Hesselberth, 2018), as well as new legal measures such as the General Data Protection Regulation to counter predatory and exploitative behaviour by digital companies. Yet even these evasion strategies, which seek to provide some form of privacy, produce digital traces, as Mayer-Schönberger and Cukier (2014: 154) note: ‘if everyone’s information is in a dataset, even choosing to “opt out” may leave a trace’. (Hesselberth (2018) makes the same point, arguing that there is no disconnectivity without connectivity.)
As Sarah Myers West (2017: 2) points out, the commodification of digital traces introduces a logic of data capitalism that ‘places primacy on the power of networks by creating value out of the digital traces produced within them’. The logic of data capitalism thus shares clear affinities with Shoshana Zuboff’s (2019) concept of surveillance capitalism, which posits a new subspecies of capitalism that profits from the combined surveillance and modification of human behaviour.
While the issue of traceability and its implications for how we theorise knowledge production have been treated at length in the realm of privacy theory and dataveillance (Agre, 1994; Cohen, 2012; Mai, 2016; Nissenbaum, 2010), producing insightful work on the ways in which our actions are tracked, captured, transformed into (big) data and made actionable by companies and states (e.g. Lyon, 2008; Zuboff, 2019), this article offers a different perspective on digital traces: through cultural theories of waste and recycling.
The contribution of discard studies, as it is formulated by feminist and anti-colonial scholars such as Josh Lepawsky (2018) and Max Liboiron (2016; Liboiron et al., 2018; Jennifer Gabrys (2010) and Jussi Parikka (2011) among many others offers not only a theoretical grounding for the already pervasive metaphorical presence of waste in critical data studies (e.g. ‘garbage-in-garbage-out’, data exhaust, dirty data, bit rot, toxic data, data sweat and data trash), but also a way of thinking about approaches to broader systems of power in Big Data knowledge production and their complex ecological systems.
Cultural theories on waste and discard, I suggest, allow us to connect to, extend and nuance existing discussions of digital traces. They show us that data traces raise questions not only about how data capitalism tracks individual and multiple data behaviours, but also about how it links to social and environmental toxicities in the form of abuse and environmental pollution, which follow gendered and colonial structures of violence. Rather than eliminating the displaced and suppressed narratives of these structures, waste theories show us that datafication is instead always haunted by these traces and the threat that they will resurface.
The concept of digital traces
The notion of digital traces has become a common way to describe what we leave behind online. As such, it constitutes both a cultural metaphor and a material externality. According to the Oxford English Dictionary, ‘trace’ can be taken to designate ‘a non-material indication or evidence of the presence or existence of something, or of a former event or condition; a sign, mark’ and also a very material ‘track made by the passage of any person or thing, whether beaten by feet or indicated in any other way’ (OED, 2019). The purpose of this section is to explore both the cultural work and the material externalities of this metaphor. In its attention to the materiality of the digital trace, this article joins current efforts in critical media studies to explore the environmental costs of digitisation and datafication (Cubitt, 2014; Gabrys, 2010, 2016; Hogan, 2015; Parikka, 2011).
If the concept of data appears to describe a wholly immaterial phenomenon that does not engage the senses, the concept of digital trace adds a more particular signification as something that is intentionally or unintentionally left behind. In digitisation, this signification has come to signal a way of viewing and treating data as a form of material that can be gleaned, mined and put to new uses, irrespective of the system to which that data previously belonged. Data appears as a processed good that may nevertheless be mined as raw material to make it give up its meaning (Gitelman, 2013; Räsänen and Nyce, 2013).
As one book on computational methods states, ‘unlike footprints in the sand, digital traces in silica are not wiped away by the tide; instead they accrete, leaving behind incredibly detailed records of social interaction’ (Welser et al., 2010: 117). Digital traces have become a coveted asset in many fields of cultural analysis, because they are framed as providing ‘unobtrusive measures of people’s thoughts at a given point of time’ (Alexander et al., 2018: 2). This perceived quality also makes them prized commodities. As Sarah Myers West (2017: 2) notes, the traces generated by our daily lives are ‘collected, aggregated, fed into algorithms, and used to predict our behavior for a variety of purposes: to sell advertisements, certainly, but also to calibrate technologies, improve search results, contribute to valuable research, and more nefariously, to feed intelligence agencies’ insatiable appetite for knowledge about our global communications’. While datafication has given rise to a new ‘science of traceability’ (Bigo, 2006: 60), researchers have ‘always relied on media inscriptions to investigate collective phenomena’ (Venturini et al., 2018). In that sense, digital traces belong to already existing dynamics of knowledge production (Boullier, 2017). Yet datafication has also accelerated and distributed the means of traceability exponentially, thereby amplifying and modulating processes of inscription and tracking (Venturini et al., 2017: 2).
Contemporary discourses on these new cultures of traceability articulate the phenomenon not only as a question of innovative methodologies, but also as an ethical concern. On the one hand, social scientists are understandably excited about the prospect of gaining more insights into the social dimension through access to the by-products of online human behaviour on a data set scale that matches that of the natural sciences (Venturini et al., 2015). On the other hand, historical experiences of surveillance and recent information scandals have also emphasised the volatile nature of digital traces and our vulnerability to them. Ironically, Cambridge Analytica’s Michael Kosinski warned about the dangers of digital traces as early as 2012: importantly, given the ever-increasing amount of digital traces people leave behind, it becomes difficult for individuals to control which of their attributes are being revealed. For example, merely avoiding explicitly homosexual content may be insufficient to prevent others from discovering one’s sexual orientation. (Kosinski et al., 2013: 4)
The Sciences Po Médialab ventures a more specific understanding of digital traces by distinguishing them from inscriptions and data: ‘by “digital traces”, we intend loosely all the inscriptions produced by digital devices in their mediation of collective actions – for instance, a post published on a blog, a hyperlink connecting two websites or the log of an e-commerce transaction’ (Venturini et al., 2017: 2). With the term ‘traces’, Venturini et al. (2017) thus refer to ‘inscriptions as originally produced by digital devices’ vis-à-vis data, which they use to refer to the ‘same inscriptions having undergone the cleaning and refining necessary to make them useful knowledge objects’ (3). While this distinction is more detailed than that provided by many other researchers, it is also more uncertain. Thus, as the authors concede themselves, their distinction ‘is somewhat artificial (there are no such things as “raw traces” and all inscription processes entail adjustments and correction)’ (Venturini et al., 2017: 3).
As the next section shows, this conceptual uncertainty about digital traces is not a bug, but a feature that emphasises the links between the politics of datafication and the politics of waste.
Digital traces as by-products
In an article on digital traces, Deborah Maron of the University of North Carolina and Erin Carter of Cisco Systems note: ‘digital traces are an unavoidable byproduct of computer-mediated human interaction’ (Maron and Carter, 2017: 7). Several other data discourses echo this framing of digital traces as by-products (Jungherr, 2017; see also Giles, 2012; Howison et al., 2011; and many more).
This, then, is essentially what the Big Data hype is about: the waste-related epiphany that seemingly useless data can be extracted, recycled and resold for large amounts of money. An article in Harvard Business Review offers a good example of this sentiment: ‘today, companies in almost every industry are generating another valuable byproduct: data. Seemingly mundane accounting systems and customer databases now yield the raw materials that can be transformed into lucrative new services’ (Lewis and McKone, 2016). Such discourses on digital traces echo slogans such as ‘from trash to treasure’, and draw on the coding of trash as a resource and a rich arsenal of invention histories, where by-products suddenly become the main product (ranging, for example, from urine in toothpaste whitener, cow intestines in tennis rackets, coal tar in saccharine and swim bladders for wine-brewing to less fanciful by-products, such as bran and spirulina).
Viktor Mayer-Schönberger and Kenneth Cukier (2014: 112) provide us with a useful example of how value is extracted from digital waste. By analysing its users’ failed search attempts and typos, Google developed a multilingual and constantly updating spellchecker. While spellcheckers abound online, Google’s novelty was that it relied not only on correct data, but also on ‘defective’ data to create value. In the words of Mayer-Schönberger and Cukier (2014: 113), ‘only Google recognized that the detritus of user interactions was actually gold dust that could be gathered up and forged into a shiny ingot’.
Without linking to waste theories, new media theorist José Van Dijck (2014) places this form of rhetoric squarely at the heart of digital capitalism, noting that tech companies turned social activities into algorithmic relations. The latter in turn were ‘made accessible to third parties’, thereby spawning an ‘industry that builds its prowess on the value of data and metadata […] not too long ago considered worthless byproducts of platform-mediated services’ but now ‘turned into treasured resources that can ostensibly be mined, enriched, and repurposed into precious products’ (Van Dijck, 2014: 199). Mayer-Schönberger and Cukier (2014: 113) echo this analysis and link it to the concept of data exhaust as ‘data that is shed as a byproduct of people’s actions and movements in the world’, which companies can ‘harvest’ and ‘recycle’ to improve and innovate their products. But what does it mean to describe data traces as by-products, and what political ecologies does such a discourse yield?
Pulverising and remoulding data traces
If memory traces are ‘signs in which remembering and forgetting are inextricably encoded’ (Assmann, 1996: 132), Big Data processes could be likened to the waste management processes that pulverise traces, turning them into a pulp from which new values can be sifted and mined (Bertolini, 1992). Such processes enable ‘a wide range of data about users and their digital traces to be folded back into data, in an endless cycle of “informating” happening inside digital infrastructures’ (Flyverbom and Murray, 2018).
A crucial component in this waste handling process is the divestment of identity from data. In order to divest the data they use of the clinging identities of the bodies from whence it came, Big Data companies need to subject it to a process ‘of pulverizing, dissolving and rotting’ to the point where ‘all identity is gone’, where ‘the origin of the various bits and pieces is lost’ and ‘they have entered into the mass of common rubbish’ (Douglas, 2001: 161). Removing identities from digital traces makes them less dangerous, since ‘where there is no differentiation there is no defilement’ (Douglas, 2001: 161). This process is less a question of finding fictitious ‘raw’ data, then, and more a question of ‘processing’ data in the right way so that it yields new insights without compromising the bodies from which it originated. As a report on open data and privacy states: data needs treatment prior to publication. Often this involves “cleaning up” the data, removing egregious errors or inconsistencies, and generally improving quality. Preparation is sometimes needed to reduce the risk of the publication, by removing more sensitive aspects (from a privacy or commercial perspective) or by anonymising it such that it no longer constitutes personal data. (Simperl et al., 2016: 16)
At this stage the next phase of data treatment begins. From their pulped state, digital traces are once again reinserted into the digital economy. As Walter Moser (2002: 96) notes in his article on the acculturation of waste, this reintegration ‘is possible only on the condition that things undergo a moment of negation as useful objects […] that can assume various concrete forms: an act of rejection, total devaluation, or material destruction’. This is why the Big Data treatment is necessary: if data is to be presented as valuable by-products, it first has to be framed as initially useless and granular objects. The data companies’ economic revaluation of waste therefore occurs ‘less in the form of recovered or salvaged objects than as a formless mass that must undergo a process of recycling in order to once again become material’ (Moser, 2002: 96). This negation is an important stage in the socio-economic process of data-structuring: if our activities are valuable as Big Data, they are only made so through the algorithmic resolution of digital traces into a data pulp that is then reassembled as valuable information in the form of new data, often going by the name ‘data exhaust’ or ‘tertiary data’ (Williams, 2013). Initially, these terms were used in the sense of ‘by-product’, signalling a new, revolutionary and potentially empowering use of digital traces.
In light of the frequent use of this notion of the by-product, it is relevant to take a closer look at its conceptual-legal framework and distinctive politics. According to the European Union’s (EU) Directive 2008/98/EC on waste, a by-product is ‘a substance or object, resulting from a production process, the primary aim of which is not the production of that item’, and it can ‘come from a wide range of business sectors, and can have very different environmental impacts’ (European Commission, 2016). Drawing a clear distinction between waste and by-products thus offers us a semantic challenge. In a communication on how the EU’s Waste Framework Directive should be interpreted, the European Commission (2007) states: ‘in EU waste law, notions such as by-product or secondary raw material have no legal meaning – materials are simply waste or not’. In reality, however, there is ‘not a black and white distinction, but rather a wide variety of technical situations with widely differing environmental risks and impacts and a number of grey zones’ when one is deciding whether something is a by-product or pure waste, which is why definitions can only be made on a case-by-case basis (European Commission, 2007). This semantic instability echoes the classificatory issues raised by waste more broadly, and foregrounds the semiotic activity at work in waste (Douglas, 2001: 36). The relativity of the term ‘waste’ is dependent on a semiotic system of classification that has the power to determine one thing as waste, another as by-product, and yet another as main product. Hence, defining digital traces as by-products situates them in a legal grey zone, and also creates a cultural imaginary in which digital traces are accidental rather than purposeful products. This grey zone opens up to the political decision-making power of whoever gets to determine the values and categories of waste, and when something is waste or value.
The following sections explore the politics of this data-recycling, and the ways in which the power to determine whether something is waste or value is unevenly distributed.
Extracting value from toxic data sets
Machine learning relies on massive data sets where the data used for one purpose is repurposed for another. Facial recognition technologies pose particular analytical challenges, requiring enormous amounts of input to yield results. Offering facial recognition technologies the opportunity to validate their results, the United States’ National Institute of Standards and Technology (NIST) launched a new facial recognition-testing programme in 2017. The purpose is to ‘assess facial recognition systems on an on-going basis’, and the programme will focus on how the tested systems perform with respect to ‘accuracy, speed, storage and memory consumption, and resilience’ (NIST, 2019). The basis of these tests is a data set of millions of images, which were collected for a different purpose but are now being used to test the algorithms. NIST is thus a classic example of Big Data-recycling: using digital by-products to create new inventions. It is also, as I shall argue, a perfect example of why the aforementioned semantic and legal grey zone of by-products raises the question not only of value but also of politics.
Although the data sets used to train facial recognition technologies are often treated as unremarkable, Os Keyes et al. (2019) recently published an article in Slate showing that the training data used by NIST contained images of people in vulnerable situations. They revealed that the United States’ government’s Facial Recognition Verification Testing programme depends on images of children who have been exploited for child pornography; U.S. visa applicants, especially those from Mexico; and people who have been arrested and are now deceased. Additional images are drawn from the Department of Homeland Security documentation of travelers boarding aircraft in the U.S. and individuals booked on suspicion of criminal activity. (Keyes et al., 2019)
Indeed, what is most valuable to Big Data companies is often precisely the capacity to connect a trace to its origin, at least performatively. For example, the promotional material for FindFace, a product by nTech, reveals that the company leverages its archive of images to offer extreme precision in its analytical work: ‘FindFace Public Safety is able to simultaneously analyse data from hundreds of thousands of surveillance cameras, instantly distinguishing and storing the meaningful information (people’s faces) from video stream’ (nTech, n.d.). Of course, the linking of digital traces to identities relies more on a certain type of discourse than on an ontological reality, since the identity of a given object or subject is always already unstable. Hence, as Jacques Derrida (1976: 74) reminds us, ‘a meditation upon the trace should undoubtedly teach us that there is no origin, that is to say simple origin; that the questions of origin carry with them a metaphysics of presence’. Few data companies, however, draw on Derrida’s deconstructionist framework when proclaiming their own capabilities. Yet the performative effects of these companies’ knowledge production fold identities into complex fabrications of truth in which the link between data and bodies persists.
In this way, the ghostly presence of those marked by violence ends up haunting Big Data’s knowledge production processes through their digital traces, just as the knowledge produced by datafication in turn comes back to haunt those marked by structural inequalities (Blackman, 2020). This insight points to the need to reinforce public debates and establish legislative frameworks concerning the ethics of reuse. Researchers on colonial imagery have long raised concerns about the ethics of recycling, recirculating, and repurposing colonial images (Agostinho, 2019; Danbolt, 2017; Johnson, 2018; Meyer, 2016; Sutherland, 2020). These concerns can meaningfully be mobilised in conjunction with discard studies’ nuanced perspective on the material processes of recycling, to challenge the apparent truism that data is ‘just data’, and to amplify the assertions already made within computational and social justice communities that reuse practices such as those of NIST disregard the residual presence of humans in their data sets (Amoore, 2019; Keyes et al., 2019). Indeed, the NIST example shows that traces of vulnerable individuals haunt not only the standards developed by NIST, but also all the facial recognition technologies that are tested against those standards, which are thereby entangled within larger structural inequalities. As Jacqueline Wernimont noted in a conversation with me recently, the government’s handling of the NIST case resembled the green washing of toxic datasets as good and responsible ‘recycling’. At a deeper level, then, the NIST example also underlines how Big Data ecologies rely ‘on disjunctures and contradictions (for discursive/greenwashing purposes in particular)’ (Hogan, 2019): tech companies increasingly ‘partner with/enslave’ people ‘in order to maintain and grow [their own] operations’, while also demonstrating concern for those same people via large-scale infrastructural developments deployed to care for vulnerable subjects (Hogan, 2019).
Concluding remarks
Digital traces are at the crux of Big Data knowledge production. While often used to describe large amounts of data, the notion of the digital trace has also come to connote that these Big Data sets are haunted and can haunt us in return (Blackman, 2020). Moreover, the material quality of the trace reminds us of ‘the hazards posed by the massive computation of data on an increasingly fragile environment’ (Gregg, 2015: 46), and of how our mediated existence hovers in a complex and toxic techno-ecology underpinned by the polluted work of mining for minerals and ‘disposing of toxic creations of our own making’ (Hogan, 2015). The knowledge production of Big Data thus appears before our eyes not only as a computational phenomenon that can yield new insights, but also as a recent point in a much longer history of production and destruction, power and suppression. As Aleida Assmann (1996: 132) reminds us, Jonathan Swift already exposed the tendency towards restless and reckless innovation in relation to print media, describing it as a dialectical process of production and rubbishing. Digital traces belong to this dialectical process, making datafication as much a material problematic of waste management as an analytical problem of information. As the NIST example shows, the knowledge production processes of Big Data thus present us with a new form of waste colonialism that extends beyond the extraction of minerals and the disposal of mineral e-waste (Hogan, 2019; Parikka, 2015) and into racialised, gendered and classed data sets. This form of value extraction from data-recycling reproduces a biopolitics of disposability (Mbembe, 2016) where individuals become by-products of the data capitalist mode of knowledge production.
Footnotes
Acknowledgements
I gratefully acknowledge the support and generosity of Danish Research Council’s YDUN programme, without which the present article could not have been completed and Mer Storr’s wonderful and always attentive proof-reading, without which the article would not have been readable. I would also like to express my gratitude to Kristin Veel and Daniela Agostinho, who have read and offered invaluable perspectives to previous versions of this article. Moreover, I thank my co-speakers on the panel on ‘Digital Excess’ at ICA 2019, including Fenwick McKelvey, Nora Draper, Elizabeth Wissinger and Stephanie Schulte, and our audience for enriching feedback. Finally, I extend a warm thank you to the anonymous peer reviewers, who have generously and in a constructive manner given their time and shared their knowledge to help me sharpen my arguments.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
