Abstract
Data analytics, particularly the current rhetoric around “Big Data”, tend to be presented as new and innovative, emerging ahistorically to revolutionize modern life. In this article, we situate one branch of Big Data analytics, spatial Big Data, through a historical predecessor, geodemographic analysis, to help develop a critical approach to current data analytics. Spatial Big Data promises an epistemic break in marketing, a leap from targeting geodemographic areas to targeting individuals. Yet it inherits characteristics and problems from geodemographics, including a justification through the market, and a process of commodification through the black-boxing of technology. As researchers develop sustained critiques of data analytics and its effects on everyday life, we must so with a grounding in the cultural and historical contexts from which data technologies emerged. This article and others (Barnes and Wilson, 2014) develop a historically situated, critical approach to spatial Big Data. This history illustrates connections to the critical issues of surveillance, redlining, and the production of consumer subjects and geographies. The shared histories and structural logics of spatial Big Data and geodemographics create the space for a continued critique of data analyses’ role in society.
Dear current resident, Congratulations! You’ve been pre-approved for our special offer …
Some marketing firm thinks that I or at least a “current resident” of my neighborhood is a good (but not great) credit risk. I toss the junk mail onto a growing pile and move on to a more important task: dinner. With many restaurants closed on Monday, I turn to my smartphone to look for ones that are open. Based on my location, past searches, and other information, a targeted ad pops up for a new Indian restaurant a block away. It’s an easy choice. Using spatial “Big Data”, the advertising successfully triggered a craving for rogan josh.
On paper, delivered to every mailbox on my street, the credit promotion feels worlds away from the advertisements for restaurants and clothing brands on my smartphone, targeted at me using location-indexed Big Data.
1
The geographical Big Data targeting of ads on my phone may seem new and exceptional, but it has precursors, including neighborhood geodemographic targeting like the credit offer. Current technical definitions of Big Data tend toward data that pushes existing technology to its limits in three ways: volume, velocity, and variety (Horvath, 2012; Laney, 2001). Such data “forces us to look beyond the tried-and-true methods that are prevalent at that time” (Jacobs, 2009: 44), to a mythological belief that “large data sets offer a higher form of intelligence and knowledge” (Boyd and Crawford, 2012: 663). In a general sense,
Despite the concerns over Big Data’s influence on society raised by critical scholars (Boyd and Crawford, 2012; Crampton et al., 2013; Dalton and Thatcher, 2014; Wilson, 2012), for many “[T]he big ethical issue … is that nobody thinks this is an ethical issue” (Paul, 2013). Despite a literature that stretches back to at least 1997 (Cox and Ellsworth, 1997), Big Data advocates and practitioners can avoid ethical and social considerations by framing the field as perpetually new and innovative, thus legitimizing itself as natural and inevitable (Leszczynski, 2014; Puschmann and Burgess, 2014).
We undermine this ahistorical myth by contextualizing spatial Big Data, charting a recent history through a confluence of geographic, capitalist, and technological interests and impetuses. Spatial Big Data did not emerge from a vacuum, but presenting it that way helps attract significant capital investment. Such an ahistorical approach facilitates a simplistic and self-interested recounting of spatial knowledge that obscures the asymmetric relations of power and profit it produces. In particular, we build on recent critical work on the geoweb 2 and neo-geography (Barnes and Wilson, 2014; Kingsbury and Jones, 2009; Leszczynski, 2014; Wilson, 2012), and older critical Geographic Information Systems (GIS) work on geodemographics (Goss, 1995a; Phillips and Curry, 2003; Graham, 2005), to begin to develop a historically grounded critical data studies approach (Dalton and Thatcher, 2014).
Spatial Big Data is big business. Consumers bought approximately 1.3 billon location-aware smartphones in 2014 alone, each of which collects loads of personal data including location, purchases, status updates, social media connections, calendars, calls, web-browsing, etc. (Arthur, 2014). Companies that collect and process large consumer datasets with a geographic component are increasingly valuable. 3 According to BIA/Kelsey senior analyst Michael Boland, location-aware applications offer the “holy grail of advertising:” to be able to tell if any given ad resulted in the targeted consumer going to the advertiser’s store (Peterson, 2013). Location data is a “hot commodity” (Profitt, mobile technology expert, quoted in McBride and Oreskovic, 2013) as it, and the companies built around its creation, analysis, and control, are valued for the targeted advertising opportunities that their data makes possible.
Spatial Big Data isn’t just business, it’s “big” science as well. Universities are forming partnerships with private industry and government agencies to develop algorithms for analyzing “big” spatial datasets. Graphics computer chip manufacturer NVIDIA recently launched the NVIDIA CUDA Center of Excellence Program, which “recognizes, rewards, and fosters collaborations” with research institutions such as UNC-Charlotte’s Center for Applied GIScience (NVIDIA Corporation, 2014a, 2014b). As Kitchin (2014b) suggests with Big Data in general, spatial Big Data is reshaping science (Goodchild, 2013; Gorman, 2013) and society as marketers (Ratner, 2004), urban planners (Townsend, 2013), political analysts (Ansolabehere and Hersh, 2012), and national security agencies (Crampton et al., 2014) use it to understand, model, and attempt to shape the world.
Such initiatives are administratively valorized and well-funded, but often give little consideration to the broader impacts of Big Data science. In this article, we contextualize recent spatial Big Data developments within the longer history of geographically targeted marketing. Historically situating spatial Big Data opens a possibility to learn from existing critical approaches and to develop new ones with the promise of better informed research, critique, and resistance involving Big Data. To that end, the article proceeds in three sections: First, we detail how both geodemographic and spatial Big Data analyses attempt to quantify social identity, though spatial Big Data promises an epistemic break by focusing on an individual person, not a geodemographic area. Second, the shared history illustrates three shared logics that shape how spatial Big Data emerges from the milieu of geodemographic marketing: their market orientation, technological black-boxing, and the promises of ever more fine, ever more relevant analysis. Third, with that foundation, we present approaches to geodemographics, both applied and theoretically oriented, to better understand spatial Big Data. Together, these sections situate spatial Big Data in terms of the past, highlighting its underlying structural logics, issues, and limits.
All in the family: Geodemographics’ and Big Data’s shared foundations
Data is getting big(ger) (Farmer and Pozdnoukhov, 2012). As data storage capacity gets larger and computation faster, the exact technical composition of “big” has endured a “relentless march from kilo to mega to giga to tera to peta to exa to zetta to yotta” and beyond (Doctorow, 2008). For this reason, while there exist a myriad of definitions of Big Data (c.f. Kitchin and Lauriault, 2014; Laney, 2001; Manyika et al., 2011; Mayer-Schonberger and Cukier, 2013, etc; for a general review of 12 definitions see Press, 2014) most emphasize data that stress existing technology, often in terms of “three Vs:” data volume, velocity, and variety (Horvath, 2012; Laney, 2001). At the same time, Big Data promises more than simply large data sets. For its boosters, it has created a “breathtaking time in science” (Frankel and Reid, 2008: 30), in which the “enterprise become[s] a full-time laboratory” (Bughin et al., 2010). In a recent webinar, HP CEO Meg Whitman suggested that Big Data is going to quite literally transform “everything” (Whitman and Youngjohns, 2014). As recent critical scholarship shows, such views are modern myths (Boyd and Crawford, 2012) with their own set of epistemological commitments (Thatcher, 2014). Even as few practitioners focus on the ethics of spatial Big Data (Paul, 2013), popular (Marcus and Davis, 2014), academic (Kitchin, 2014a), and state (Executive Office of the President, 2014) sources are beginning to question both the efficacy and morality of Big Data. These critiques begin to explore how data are always expressions of power (Wilson, 2014a) that are never ontologically prior to their interpretation (Boellstorff, 2013). Spatial Big Data has forerunners in other modern attempts to represent, model, and ultimately produce social geographies of consumption. Situating its preconditions in terms of a history of geodemographics makes clear the shared structural logics, criticisms, and resulting basis for a promised epistemic break in spatial Big Data.
Proponents of geodemographics define it as “the analysis of socio-economic and behavioral data about people, to investigate the geographical patterns that structure and are structured by the forms and functions of settlements” (Harris et al., 2005: 225) or simply the “analysis of people by where they live” (Sleight, 1997: 6). It can involve a range of topics including policing or urban planning, but in practice, geodemographics is chiefly dedicated to consumer profiling and opinion polling based on residency (Burrows and Gane, 2006; Longley, 2005; Sleight, 1997). To develop a new geodemographic system, experts identify a number of statistical socio-economic clusters (profiles) based on dozens of indicators from demographic and/or consumer datasets. Similar clusters are lumped into labeled groups which can range from the “Upper Crust” to “Affluent Achievers” to “Thriving Greys” to “Hard-Pressed Families” to “The ‘Have-Nots’” (Batey and Brown, 1995: 94; Harris et al., 2005: 13). In practice, the geodemographic system uses a proprietary algorithm to assign a cluster/group designation to each geographic area, such as a postal code, in a study region. The underlying geographic logic is that people tend to reside near similar people, making for a socio-economically homogeneous geographical unit. This is typically expressed in the literature in terms of deterministic clichés such as “birds of a feather flock together” (Burrows and Gane, 2006; Flowerdew and Leventhal, 1998; Harris et al., 2005; Longley, 2012; Nelson and Wake, 2005) and “You are where you live” (Burrows and Gane, 2006; Leslie, 1999; Mitchell and McGoldrick, 1994; Phillips and Curry, 2003). Once applied, the classification is a social profile for the postal code, allowing companies and public agencies to better allocate their resources geographically. For example, a company selling luxury cars can focus their advertising on postal codes identified as “Thriving Greys” whereas sub-prime mortgage lenders can market to postal codes identified as “Hard-Pressed Families.”
Geodemographics practitioners point to their own precursors in Charles Booth’s maps of poverty in London in the 1880–1890s, the later Chicago School of Sociology, and mid-century factorial ecology and social area analysis (Harris et al., 2005; Singleton and Spielman, 2014). Geodemographics began to develop as a field and thereafter as an industry in the mid-20th century amidst geography’s quantitative revolution. Scholars, most notably William Warntz, developed forms of spatial analysis based on social physics, a monistic idea that social relations follow the laws of physics, to analyze geographic areas using contemporary computers (Barnes and Wilson, 2014; Warntz, 1964). In the early 1960s, newly granular data in the form of ZIP codes in the US and similar neighborhood data in the UK allowed academic social researchers to study areas comprising 15,000 people or less. Researchers in both countries, including Jonathan Robbins in the US and Richard Webber in the UK, developed algorithms using that demographic data to identify areas in need of public subsidies (Harris et al., 2005).
Direct marketers quickly took notice of how these methods could be used for targeted advertising. Both Robbins and Webber entered the private sector as geodemographics experts. Through the 1970s and 1980s, they and others built the geodemographic powerhouse firms
Technologically, the implementation was the same. This nascent geodemographics industry grew in connection with contemporary developments of GIS. Connecting tabular and spatial data in a GIS facilitates the spatial designation of geodemographic profiles to areas and subsequent geographic analysis. However, geodemographics had little connection with academic geography research, a tendency apparent in the small number of publications about geodemographics in academic geography journals, particularly in the United States (Harris et al., 2005: 228, 241; Openshaw, 1989; Singleton and Spielman, 2014). By the early 1990s, the combination of unprecedented computing power in GIS, growing market demand, and readily available capital fed an explosion in the geodemographics industry. Geodemographic services included profiles of existing customers to identify those most likely to make more purchases, generating new business through direct mail marketing, credit scoring, analyzing media markets for advertisers, survey design, and planning (Mitchell and McGoldrick, 1994; Phillips and Curry, 2003).
The networked consumer
In the late 1990s and early 2000s, web-based companies outside geodemographics, such as
Critical scholars today describe spatial Big Data as part of the shift in “production, dissemination, and institutionalization” of spatial media (Leszczynski, 2014: 62) occurring as mobile applications’ move from simply capturing consumption patterns to attempting to actively shape them (Wilson, 2012). The difference is not in the intent, as shaping consumption patterns has long been the goal of geodemographic marketing (Goss, 1995a), but in the scale and methods through which spatial Big Data functions. Both processes are market driven and demand “a correspondence between the dispositions, attitudes, and socioeconomic characteristics of a significant majority of the data subjects
Spatial Big Data commodifies the individual. One’s personal locations, dispositions, attitudes, and socioeconomic characteristics are the object of analysis, rather than geographically located populations. Industry hype promises this shift will create the “killer application of the 21st Century”: individually targeted location-specific ubiquitous advertising (Krumm, 2011). Such applications use location data, combined with other information, to serve advertisements. For example, if an application on my phone knows that I’m at Lowes, it could remind me that my mother’s birthday is coming up and suggest I purchase some tomato plants for her garden. If the application records me moving at jogging speed over long distances, it might advertise new running shoes or a gym with a track in the winter. Mobile devices, enabled with location-tracking, accelerometers, pedometers, and even heart rate monitors, transform individual people into both sensors of the surrounding world (Goodchild and Li, 2012) and sensors of themselves. Wolf, Kelly, and others in the Quantified Self movement engage this shift on a personal level (Wolf, 2011). On a wider level, those who purchase, analyze, and leverage this data collect an individual’s information to target ads at that individual person, not families or people residing in the same postal code. More flows of data, not only from one’s purchase history, but also from one’s locations, speed, and even heart rate, facilitate more personalized targeting.
On a fundamental level, both geodemographics and spatial Big Data assume that social identity can be reduced to measurable characteristics that can be algorithmically classified. Furthermore, as with social physics, this assemblage of personal data is predictive, or can be made to be so, commodifying it as valuable in marketing and ultimately making a sale. While geodemographics and spatial Big Data are hardly alone on this point, this commodification proceeds in specifically geographic ways. A neighborhood’s assigned geodemographic class or an individual’s assembled profile can become a self-fulfilling prophecy as people respond to advertising as consumers. Feedback loops may form in which consumers are presented with options based on available data. Their choices are then used by marketers to target subsequent advertisements, advancing some options and limiting the consumer’s perceived choices (Lohr, 2012a). In this way, geodemographics and spatial Big Data do not merely represent or target people, they produce social relations and geographic spaces of consumption (Burrows and Gane, 2006; Goss 1995a, 1995b; Zook and Graham, 2007). Advertising, buying, and subsequently using running shoes may lead a consumer to walk more often, contributing to demand for tracks in parks and more athletic shoe production. That consumer may also be manipulated into spending too much of his/her income on shoes. These processes are already ongoing in everyday life; for example, the online dating serviceMatch.com aggregates and analyzes a variety of individual data points, including location, to determine who is romantically matched with whom (Lohr, 2012b). The stakes of both geodemographic and Big Data analyses are not merely about data, they are about who we are and how we live.
Shared traits: Markets, black boxes, and epistemologies
Market orientation, market epistemology
Building from the shared foundational concept of a measurable, malleable social geographic identity, both geodemographics and spatial Big Data rely on exploratory correlations to arrive at analytical outcomes, instead of more rigorous geographic or sociological methods. As far back as the 1970s, Richard Webber proposed that geodemographic classification was an inductive approach for identifying new insights (1975). “In this regard, geodemographics is regarded as a data exploration tool, not a statistical method of hypothesis confirmation or rejection” (Harris et al., 2005: 15). Regardless of these limitations, geodemographic methods provide sufficient grounds for corporate decision making. [Geodemographics] has been used in the business sector for 25 years now … and it is still here, stronger than ever! Given the nature of business decisions, the cost of using geodemographics would not be borne if the technique could not prove its worth. (Harris et al., 2005: 225)
Spatial Big Data is similarly market oriented. Correlative algorithms identify people geographically who are likely interested in a given product; explanation is not the point. At an extreme, this view argues “Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity” (Anderson, 2008). In this formulation,
The focus on data correlation for capital gain is apparent in
A similar impetus drove
“Big” black boxes
The most valuable assets of both IT companies like
The commodification of the analytic process itself, of digital information rather than physical good, creates a monopoly on data output. This commodified, proprietary output data may then be used to target ads internally or to sell portions of it to other firms. Those who buy ad services and data, but who do not know the algorithms that go into its parsing, cannot know the details of how it was created. However, following the market epistemology, so long as it correlates well enough to the real world, it holds value.
The high valuation of software and data in a market context reinforces companies’ drive to classify their analytical procedures and resulting data as secret intellectual property to protect their assets. After all, it is much easier for a competitor to steal and replicate an algorithm than an oil rig or silicon chip. Spatial Big Data and geodemographics share the market orientation and resulting black-boxing of analysis. The exact algorithms that firms such as
The black-boxing of analytics and data for the accumulation of capital has profound effects on who has access and thus what can be known through that process. The market orientation of both geodemographics and spatial Big Data means that the accuracy of a company’s output data are verified in competitive marketplaces, rather than more formal scientific or similar scholarly verification processes. As a result, data is “good enough” because it facilitates competitive advantage. On a deeper level, Burgess and Bruns (2012) show that the very structure of data, and how it is accessed through networked streams of data, shapes what can be known through said data. Outside researchers may be unable to conduct research because a data source is cut off for market reasons. For example, once a market developed for
This market logic inhibits scientific work by preventing researchers from emulating an approach, much less replicating its results (Longley, 2012). Conversations about understanding how or to what standards the knowledge is produced, much less built-in problems or biases, are cut off by the trade secrets of production. As a result, the commodified trade secrets of geodemographics and spatial Big Data conceal some of their own analytical limits. 7
One foundational limit remains clear: both the market orientation and its resulting black-boxing rest upon a belief in the quantification of representation. For geodemographics, this quantification occurred at the level of an areal unit, while spatial Big Data promises to go a step further, to fully represent an individual—and allow for the market targeting thereof.
From quantified space to the quantified individual
The ties between geodemographics and spatial Big Data go beyond their market orientation and privatization of analyses and data. The structure and inherent limits of geodemographics laid the epistemological groundwork for spatial Big Data as it exists today. Geodemographics demonstrated the market value of geographically targeted advertising, but it suffered from two core epistemological uncertainties that undercut its promises: first, the lack of diverse data sources and second, the heterogeneity of human populations. Spatial Big Data is the logical outcome of long-running attempts to resolve these two built-in uncertainties of geodemographics. It does so through the promise of representing a fully measured, quantified, geolocated individual, rather than the homogenized, quantified areal units of geodemographics.
For decades, geodemographic analyses in the US and Europe relied primarily on publically generated data, chiefly census data, but also voting records, housing registries, and other, similar sources (Batey and Brown, 1995). Over that time, geodemographic experts recognized populations to be increasingly diverse and that people had increasingly selective tastes as consumers. Modeling heterogeneous tastes required additional data beyond the census and its categorical limits (Longley and Harris, 1999; Phillips and Curry, 2003). Furthermore, relying on governmental data created a boom and bust cycle around governmental data releases and subsequent geodemographic activity (Mitchell and McGoldrick, 1994). As a commodified product, geodemographic firms needed to produce a continual stream of sales (Leys, 2001) that accurately represented increasingly selective consumer tastes. To meet this need, geodemographics needed to diversify its data sources and produce continually relevant-looking results. To this end, the 1990s saw geodemographic firms increasingly turning to “lifestyle” data from consumer surveys, retail purchases, and credit records for their analyses (Debenham et al., 2003; Longley and Harris, 1999; Phillips and Curry, 2003). New data sources involved additional analytical issues. Unlike a census, lifestyle data necessarily represents an incomplete population. It also required other kinds of quantification. As opposed to age or median family income, these inputs involved criteria such as interests in “home baking” or “theatre” from consumer surveys. Practitioners quantified such interests in variables based on standardized check box answers on the surveys (Longley and Harris, 1999; Phillips and Curry, 2003).
Beyond the push for more diverse, continually accessible data sources, practitioners recognized epistemological problems with the areal units at the heart of geodemographics. First, such areal units fall victim to an ecological fallacy, an “error of deduction that involves deriving conclusions about individuals solely on the basis of an analysis of group data” (O’Dowd, 2003). Geodemographics ascribes common, quantified characteristics to everyone in a given area, such as a postal code, based on its analysis. Few areas are actually that socially homogeneous. This problem was recognized as early as the 1890s by Charles Booth in his attempts to map economic classes by city block in London. His cartographic solution defined eight economic classes and represented a given street as a pure- or mixed-class using seven different colors (Booth, 1902; Harris et al., 2005). Booth was prescient in recognizing the problem. However, his fix, along with later increasingly complex geodemographic models, did not resolve the ecological fallacy that haunted geodemographics for the next 120 years. Second, geodemographic analyses are subject to the modifiable areal unit problem (MAUP). Since geodemographic areas are not naturally occurring, the geographic scale and boundaries between studied areas can affect analytic results (Openshaw, 1984).
Geodemographics practitioners from the 1970s through the early 2000s attempted to address these problems with perpetually smaller geographic units. Perceived accuracy was valuable for geodemographics firms, and smaller units looked more accurate. In the US: They made their locational analysis more and more precise in the desperate belief that at some level – if not 40,000 people then 1,000 people, and if not there, well, then 40 people – they could discover and resuscitate the ideal refuge of a like-minded group of neighbors. (Phillips and Curry, 2003: 144)
Spatial Big Data purports to resolve both of geodemographics’ core issues. It offers continuous streams of diverse data generated at the individual level. While Big Data often involves public data sources, like national level censuses, it often also includes other, more granular consumer information such as credit card transactions, frequent customer programs, web-browsing histories, and a variety of social media information such as Facebook profiles, Twitter accounts, and Instagram feeds. For example,
Geodemographics and spatial Big Data.
What can critical data studies learn from geodemographics?
The parallels and connections between geodemographics and spatial Big Data begin to situate the latter within its historical context. As these histories of Big Data and its connections to social physics (Barnes and Wilson, 2014) and geodemographics emerge, its rhetoric of exceptionalism and newness is diminished. Drawing concrete lines between the past and present opens the door to a more rigorous, critical analysis of not only what is, but what might be (Horkheimer, 1995; Wilson, 2015). When spatial Big Data is situated, earlier critical assessments of geodemographics provide useful points of reference. Building from these earlier critiques, this final section outlines both practical and theoretical approaches to spatial Big Data moving forward.
Practical approaches
The proponents of geodemographic analysis continue to address its epistemological uncertainties through the narrowing of concern to methodological issues. They acknowledge the problem of the data’s scale, discussed above, noting concerns with the ecological fallacies of data and the modifiable aerial unit problem (Debenham et al., 2003; Duckham et al., 2001). Nevertheless, such foundational questions are set aside as geodemographic systems’ relation to the real world and hence utility remain judged by their competitiveness on the market (Burrows and Gane, 2006; Harris et al., 2005). This can have unforeseen consequences when, due to funding cutbacks and neoliberalization of geographic government services (Burns, forthcoming; Leszczynski, 2012), commercial geodemographic systems, with their inherent biases and gaps, are applied to public sector issues, such as public housing and elderly care.
While scientists have raised concerns over Big Data’s methodological issues (Lazer et al., 2014), spatial Big Data practitioners lean on market justifications for their products akin to their geodemographic brethren. In the competitive market of multiple mobile applications,
As scholars incorporate spatial Big Data into their analyses, some have articulated other methodological criticisms. First, blackboxing of methods continues to create roadblocks for researchers. Full spatial big datasets can be difficult and costly to access, if they are available at all. In addition, Big Data researchers outside key companies typically do not have access to the core, proprietary algorithms that process and interpret the data. Consequently, the limits of their analysis may be shaped or inhibited in ways that may remain entirely opaque to the researcher. Second, unlike total population data such as a census, geodemographic “lifestyle” data tends to offer less than a complete population, presenting issues of representation and bias in the data around class, language, and use of technology. Recent research has demonstrated that spatial Big Data, like
Theoretical approaches
Other scholars offer more theoretical critiques of geodemographic analysis that retain relevance for critically understanding spatial Big Data. Goss (1995a) outlines how geodemographic systems are a strategy for producing a “control society” and the social subjects within that context. Surveillance technologies and practices first collect data about individuals that is classified within geodemographic systems. The resulting socially classified knowledge defines social and geographical subject positions through advertising, reproducing those social, geographical categories in people’s material consumptive practices. In this way, a geodemographic system “produc[es] the conditions of its own reproduction” (Goss, 1995a, 1995b). At stake in such systems is not merely the invasion of privacy, but also the very autonomy to choose individual paths and life outcomes. For example, if based on census data, a particular neighborhood fits into the “Hard-Pressed Families” geodemographic class, that neighborhood may become the target of a direct mail campaign for sub-prime mortgage refinancing. As residents of that neighborhood refinance their properties, the neighborhood itself is materially reproduced by and in the terms of that geodemographic class, making it ever more ripe for subsequent sub-prime marketing.
Though powerful, Goss’ critique is contingent on geodemographic targeting and the resulting advertising actually functioning as well as proponents claim. However, geodemographics need not be wholly effective to raise social and ethical questions.
Lyon (2002) argues that geodemographics relies on a “phenetic fix” of classification with consequences for social opportunity. Geodemographic systems “capture personal data triggered by human bodies and … use these abstractions to place people in new social classes of income, attributes, preferences, or offences, in order to influence, manage or control them” (Lyon, 2002). In practice, some classes of people will be more promising for particular ends than others and will therefore garner more or better advertised opportunities, while others are ignored or offered inferior options. Parker et al. (2007: 917) highlight the recursive nature of this process: “Class places people into different types of places,” in turn producing the spaces of particular classes. “The application and impact of geodemographic classification recursively reinforces this spatialization of class” (2007: 917) by quantifying and codifying it through the presentation of advertising and services meant for that recursively constituted class. Within trends towards more splintered urban realities (Graham and Marvin, 2001), Phillips and Curry (2003) suggest a darker consequence: codifying classes through geodemographic divisions constitutes a form of redlining based on geodemographic classes and geographic units, and therein a loss of the public domain.
Spatial Big Data’s production of social subjects presents similar issues, but at a more personalized scale. For example, a Big Data analysis of data from a smartphone locative application, such as
The charge of class-based division and redlining is no less relevant to spatial Big Data, though again at a different scale. Even personalized spatial Big Data algorithms rely on some form classification to match ads with consumers, such as calculated relevance or distance to a given consumer subject.
Conclusions for a critical data studies
Whatever the sales pitch, spatial Big Data is definitively tied to the problems and limits of its precursors. Drawing the connections to geodemographics highlights the shared foundational assumption of quantifiable, predicable social identity. From that common foundation spring shared traits of a market-based epistemology and black-boxing, as well as the problems of data diversity and scale that helped lead to current spatial Big Data applications.
As practices shaped by Big Data fade into the banality of everyday life, it is vital to remember the social contingencies that led to these services and technologies. For consumer users, this context is a means to prevent the complete naturalization of Big Data services. It is important to continually explore creative possibilities found within large data sets and to highlight the development of alternative relations to them.
For scholars and Big Data practitioners, the stakes are even higher. Basing research on individualized spatial data run through black boxed processes with an epistemology of market competition marks several serious issues. What research is too private or connects too many bits of personal information to be ethical? Who defines those standards? With as few as four spatio-temporal points necessary for unique identification (de Montjoye et al., 2013), the power of cutting edge data analytics has far outstripped the protections offered by traditional Institutional Review Boards, and no formal or legal ethical standards exist in the private sector. Beyond fundamental questions of what ethical data
The technological basis of data collection and analysis also points to a variety of social issues. As Goss (1995b) and Parker et al. (2007) argued about geodemographics, spatial Big Data can create a societal feedback loop, creating individual subjects and a society whose views and actions reflect the limited choices that their technological devices optimize to their constructed class profile. Technology enables and constrains actions and thoughts (Feenberg, 1999), and we must ask whether our phones have unintentionally locked us into sets of epistemologies and identities; how these tools that enable so much in our daily lives, simultaneously constrain what we know and what we do. Eating at an Indian restaurant? Purchasing the latest pair of running shoes? Furthermore, as Phillips and Curry (2003) pointed out concerning geodemographics, spatial Big Data need not even be successful in that pursuit to have serious social implications. Spatial Big Data presents not only a splintered urban environment (Graham and Marvin, 2001), but one of uneven development globally (Smith, 2008). Spatial Big Data and its analyses force consideration of data divides: between those who produce data and those who don’t (Kelley, 2014) as well as those who have the tools to analyze it and those who don’t (Andrejevic, 2014). The targeting and personalization that spatial Big Data facilitates reflects this uneven geography and social reach. Even in the US and UK, personalized spatial Big Data services by their very definition create different, unequal choices and opportunities, depending on who and where you are. Redlining in the 21st century need not be by neighborhood, it is individualized.
Any spatial Big Data initiative must be prepared to face these issues, for Big Data’s rhetoric of exceptional newness may distract from them, but it cannot resolve them. As scholars and practitioners using spatial Big Data, we are in part responsible for the knowledge and social relations that these issues create and re-create. In this context, it is crucial to develop critical (and self-critical) perspectives and approaches to spatial Big Data and subsequent technologies (Dalton and Thatcher, 2014). Just as GIS practitioners cannot stand aside from the processes and consequences of the technology (Crampton, 2010), we scholars and practitioners of spatial Big Data must evaluate our situatedness and positionality (Haraway, 1991; Harding, 2004) and the forms of knowledge and social relations we help produce. Such an approach entails more than refining practices. It requires reflexively analyzing spatial Big Data and its own analytics in context, not as indicators of some event, but as phenomena and epiphenomena in and of themselves (Wilson, 2014b). Establishing the historical, social context of a technology is a key step in demystifying and denaturalizing it. The historical pre-conditions for spatial Big Data set by geodemographics and earlier developments (Barnes and Wilson, 2014) allow us to learn from those earlier processes to better critically evaluate current technologies and knowledge. Big Data has historical antecedents and so too does critical thought on technology (Feenberg, 1999; Marcuse, 1982; O’Sullivan, 2006) and its spatial aspects (O’Sullivan, 2006; Pickles, 1995; Schuurman, 2000). These approaches can and should inform reflexive, critical engagements of spatial Big Data.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
