Abstract
In the previous two reports in this series, we discussed the history and current status of quantitative geography. In this final report, we focus on the future. We argue that quantitative geographers are most helpful when we can simplify difficult problems using our distinct domain expertise. To do this, we must clarify the theory underpinning core conceptual problems in quantitative geography. Then, we examine the social forces that are shaping the future of quantitative geography. We conclude with criteria for how quantitative geography might succeed in addressing these challenges.
Keywords
I Introduction
The future of quantitative geography lies in the comparative advantage of its theory. If it’s the case that ‘by our theories you shall know us’ (Harvey, 1969: 486), we should ask ‘what unique output or consequences occur due to spatial and geographic thinking?’ (Golledge, 2002: 11). Searching questions like this were common in the early 2000s as quantitative geographers sought to mobilize the field for a new millennium. Yet, they remain difficult to answer about the past 20 years of quantitative geography, let alone the next.
Unfortunately, predictions about the past are less useful than predictions about the future. To clarify the future of quantitative geography, it will help to first discuss the core theory of quantitative geography. Then, we discuss current forces that act upon this core. We will conclude with a sketch of what success might look like.
II Theoretical forces driving quantitative geography
It is true, as in the case of other ossifications, that attacking this ossification is almost sure to reduce the apparent neatness of our subject. But neatness does not accompany rapid growth. (Tukey, 1962: 8)
To understand what quantitative geography may offer in the future, we need to take stock. Accumulation of knowledge in quantitative geography is a cyclical process: waves of scholars converge on similar empirical and computational challenges, then follow their own rivulets back to common conceptual understanding. Historical shifts in perspective are tidal; geography is transformed both by individual geographers’ mercurial interests as well as shifts in socially or scientifically-relevant questions (Johnston and Sidaway, 2015). Beyond continual change, it is erosion of complexity – slowly and progressively by accumulated knowledge – that is the point.
Thus, we present geographical theories as heuristics that quantitative geographers use to simplify reality. Fortunately, ‘accumulated’ knowledge in quantitative geography involves only a few heuristics (Goodchild, 2004). Previous reports in this series have traced them through time. Here, we define their main points.
1 Obeying the law is not the point of the law
Spatial dependence is often explained in terms of Tobler’s Law: Everything is related to everything else, but near things are more related than distant things. (Tobler, 1970: 236)
But, he also bends the meaning of ‘law’ (Smith, 2004). In this definition, Tobler’s Law is too vague for most of social science (Merton, 1949). Further, as Gibbons and Overman (2012) argue, our knowledge about geographical processes probably will not benefit from exploring the many notions of ‘near’ that may support Tobler’s Law. The impact of dependence implied by Tobler’s Law often stays the same (LeSage and Pace, 2014) and is rarely the point of an empirical model.
Instead, Tobler’s Law should be used as Waldo Tobler (1970) intended: to simplify problems. In this spirit, Church (2018) uses the Law to simplify warehouse location problems. Classical methods consider all warehouses when supplying each demand site, so odd decisions are considered but never made, like Detroit, Michigan, supplying Bakersfield, California. Church (2018) only explicitly models ‘near’ facilities, and finds optimal solutions by gradually increasing what is ‘near’ each facility. This is Tobler’s Law enforced, not obeyed. Future work should pursue simplification in this manner, too.
2 MAUP and the challenge of uncertainty laws
The Modifiable Areal Unit Problem (Openshaw, 1984), and its recent relatives (Kwan, 2018), reflect another kernel of geographical knowledge. The MAUP implies separate ‘scale’ and ‘zoning’ problems (Fotheringham et al., 2000). Neither helps us simplify analyses, but each clarifies assumptions we must make. The ‘zoning’ problem implies: Place characteristics are contingent on where place boundaries are drawn. Characteristics of a geographical process are contingent on the scale at which the process is measured.
2.i Building a better MAUP trap
Optimal zoning methods have long shown promise for solving the MAUP. Optimal zoning methods estimate boundaries/places given assumptions about how they should look. This usually solves the ‘zoning’ problem, and the ‘scale’ component is solved separately. Promising work provides mathematical proof of minimum-error aggregations (Bradley et al., 2016), rigorously characterizes the sensitivity of statistics to the MAUP (Duque et al., 2018), builds zones with less internal uncertainty (Spielman and Folch, 2015), or develops general-purpose zones from demographic data (Johnston et al., 2004; Singleton and Spielman, 2014). Zoning algorithms have improved remarkably (Tam Cho, 2018), so this remains promising.
Beyond an ‘optimal’ zoning, we might analyse random samples from a distribution of zoning systems (Tapp, 2019). This would liberate us from any single zoning and provide tests on how unusual observed zones are. However, we must be cautious: Tam Cho and Liu (2018) show that failure to sample uniformly from the unfathomably many possible zoning systems will unpredictably bias even simple statistics.
2.ii Empirical answers to theoretical questions
One source of optimal zoning, Isard’s (1956) ruminations about ‘true’ regions (p. 17), illustrates a tension from Jones (1998). One perspective on scale and zoning is that they define an epistemological frame – a hierarchy in which geographical processes ‘make sense’. Another is ontological, with scale and zoning describing a system of relationships from which independently meaningful contexts emerge. This ontological perspective grounds the half-century search for ‘conchorations’ (Nelson, 2020), where regions found in different processes converge to common boundaries (Isard, 1956: 20).
But, convergence itself does not map useful theoretical concepts like ‘context’ or ‘exposure’ onto empirical zones (King, 1996; Kwan, 2018). This means optimal zoning offers ‘empirical answers to theoretical questions’ (King, 1996: 160). Optimal zones themselves are atheoretical, with no direct link to a specific process or theory about place, context, or exposure. As such, they only ‘fix’ our frame of analysis. They cannot tell us if estimates about this frame make sense.
Therefore, we must accept that the MAUP ‘is not an empirical problem; it is a theoretical problem’ (King, 1996: 163). Re-aggregation is a theoretical act with empirical effects. A new theory about the relevant ‘place’ or ‘context’ is posited by each new aggregation/scale; we should not be surprised that estimates change, too (Petrović et al., 2018a). Thus, we cannot escape the MAUP by drawing more or better areal units: we must make stronger theories about what context, exposure, or place does.
2.iii The future of areal data
Finally, Fotheringham and Wong (1991) suggest we leave the areal unit altogether. ‘Accidental’ data (Arribas-Bel, 2014), our individual digital ‘by-products’ (Kitchin, 2013), may empower us to study people and the effect of their environs. This data is not perfect (Wyly, 2014; Ash et al., 2018), but can be useful with the right theoretical grounding (Shelton et al., 2015: 199). Indeed, bias from non-representative samples (Boeing and Waddell, 2017; Folch et al., 2018; Zhang and Zhu, 2018) or spatial disparity in coverage (Shelton et al., 2014) is increasingly well studied.
However, ‘intentional’ census data is not representative, either. Census units are drawn to facilitate data collection and comparison over time (US Census Bureau, 2007). While Krieger (2006) claims estimates from the 2000 US Census areas are ‘on par with those obtained with individual-level socioeconomic measures’ (Krieger, 2006: 358), recent work disagrees. Census aggregates’ uncertainty varies but is ignored (Spielman et al., 2014). Petrović et al. (2018b) find different probabilities of interaction in the Netherlands comparing individual responses to official aggregates. Fowler et al. (2020) also show the representativeness of 2010 US census data aggregates varies geographically. This does not invalidate “intentional” census data, but shows areal census data requires critical empiricism, too. Thus, the most plausible ‘solution’ to the MAUP may be to leave the modifiable areal unit behind.
III Forces shaping quantitative geographers
Beyond theory, the following social and material forces are changing geographers and their production of geographic knowledge.
1 Reproducibility
Hardly anybody takes data analyses seriously. Or perhaps more accurately, hardly anyone takes anyone else’s data analyses seriously. (Leamer, 1983: 13)
Reproducibility is transforming science. Bollen et al.’s (2015) definition of reproducibility entails two separate but related concepts. The first, reproducibility, means we can ‘duplicate the results of a prior study using the same materials and procedures’ (p. 3). The second, replicability, means we can ‘duplicate the results of a prior study if the same procedures are followed but new data are collected’ (p. 4). Thus, while reproducibility might be attained by code and data sharing, replicability is an epistemological standard of generalization which the theories discussed in previous sections must pass.
Failure to replicate has challenged accumulated knowledge across social science (Ioannidis, 2015; Collaboration, 2015). Projects like Retraction Watch and PubPeer provide a platform for community critique where bloggers publicly shame offending scholars in hopes of discouraging future malfeasance (Didier and Guaspare-Cartron, 2018: 165–6). Often, these retractions involve replication failures, but Moylan and Kowalczuk (2016) find retractions are still driven by misconduct which requires structural fixes to prevent (Munafò et al., 2018). Regardless, we should not forget: ethical research can also fail to replicate.
Whether this transfers to geography remains to be seen. Brunsdon (2016) outlines the challenges posed for reproducibility in geography, and Kedron et al. (2020) provide a further elaboration of the theoretical challenges to replication in geography. Further, Retraction Watch shows retractions in a few geography journals, but these are rare and will hopefully remain so. Some welcome ‘disciplining’ of poor practice (O’Loughlin, 2018a, 2018b), but this remains controversial.
If replication remains inhibited by the field’s theoretical issues discussed in Section II, it will remain elusive. For instance, is Church (2018) a ‘replication’ of Tobler’s Law because it proved useful in simplifying a problem? Or, is Tobler’s Law only replicated by distance decays in spatial auto-covariance functions? Regardless, replication entails a new epistemological standard that is changing other branches of science. Quantitative geography will be stronger if it embraces replication, too.
2 Inclusion
One opportunity for replication will come from replicating research across a large and diverse set of places by people with different perspectives. This requires a more inclusive geography in both scope and composition. Our future, in terms of the people that inhabit it and the places it studies, must be broader than our past.
2.i Inclusion as self-awareness
One area where geography can become more inclusive is in its undergraduate intake. Dorling’s (2019) recent reflections on his time as an admissions officer are illustrative: I have appeared to sanction what are clearly discriminatory practices…I have interviewed children at the University of Bristol who were in want of a place to study Geography and, because I was told I should do so, I turned the majority away. […] In reality, I thought that I had the ability to spot the potential in others – how wrong I was! (p. 4)
While Crow and Dabars (2015) and Dorling (2019) write about undergraduate admissions, they have a wider point. Elite institutions make safe bets on students whose life-courses hardly improve from having been admitted. Further, these same institutions make safe hires to consolidate their leads: the best want to be at powerful programs and powerful programs want the best. At no point is there any fundamental interrogation beyond superfluous interview questions: why here, why now, why us? The answer – because this has been the top program for years – remains unspoken.
Despite our many shared beliefs and objectives, a new radical self-knowledge must involve a recognition of the limits of our capacity to know the potential of others. Thus, in absence of perfect knowledge, we should be inclusive: embrace perspectives that are different from our own even if we do not yet know the precise benefit. Quantitative geographers have a lot to learn: we should include new and different voices in our conversations.
2.ii Moving beyond our old bounding box
We also need to extend the bounding box of our domain. There is immense opportunity for a truly global quantitative geography. Replication requires new and different settings in which we test our theories. And, we must not forget that “our theories” includes non-Western geographers with ideas not yet published in English. Most people who benefit from our replicable research will not inhabit the places that currently dominate our journals.
An adjacent field, city science, is instructive here. City science focuses on empirical regularities of city systems. In doing so, city scientists seek empirical regularities across cities and, less frequently, structural differences between cities (Brelsford et al., 2018; Boeing, 2019). While structural differences are important, the search for commonality across cities is the target of replication; cities with different histories, cultures, ages, locales, or sizes provide the points at which replication succeeds or fails.
Quantitative geography does not have a coherent subject like city science, so it has been difficult to agree on replication targets. Fortunately, replication will benefit many other parts of quantitative geography’s future, so including more of the world in our science will help us across the board.
3 Focusing conversations on common tasks
We have […] many ‘approaches’ but few arrivals. (Merton, 1949: 458)
Beyond inclusion, quantitative geography requires more direct engagement. A former quantitative geography journal editor argued that replication has not caught on as much in geography because geographers are not paying attention to one another (pers. comm., 2017). We must fix this: paying diligent attention to other scholars and developing or incorporating constructive criticism is difficult but necessary for our future.
One way to focus attention is what Donoho (2017) calls the ‘common task framework’ (CTF). In a CTF, a practical common challenge is identified and teams of scholars compete to provide the best solution. This constructive competition has been a critical driver of progress in data science and machine learning. Further, as Watts (2017) argues, the process of specifying and solving practical problems can invigorate social science, since it forces scholars to find common ground about measures, theories, and methods.
In fact, CTFs exist in some areas of quantitative geography already. Beyond clear examples in spatial statistics (Heaton et al., 2019), Steinitz’s (2012) ‘geodesign’, an ambitious repurposing and extension of GIScience (Goodchild, 2010; Batty, 2013), provides guidelines for successful CTFs in planning (IGC, 2020). For example, the 2020 International Geodesign Collaboration (IGC, 2020) gathers teams of scholars to build solutions to urban planning challenges using common empirical measures (e.g. Millennium Sustainable Development Goals) and cartographic/written styles. While this standardization is not without critique (Wilson, 2015; Elwood, 2006), CTFs gain prominence because they are accessible to many (contra Cressie and Wikle, 2017), are challenging, and have societal value if solved. Thus, a better quantitative geography will involve more attentive conversations, including constructive critique about solutions to common problems.
4 Disabling technologies
The emergence of geographic information science has long been intertwined with the maturation of systems for processing geographical data (Goodchild, 1992). One foundational insecurity of GIScience is that our notation, either intrinsically or through its computational implementation, might affect the questions we think are valuable to answer. Whereas Iverson (1980) argued that computational notation can be more useful than mathematical notation, Gahegan (1999) drew attention to the constraints GIS placed on geographic thinking. He claimed ‘generic solutions’ to data processing problems forced us to ‘adopt impoverished representational and analysis capabilities…in exchange for ditching the Fortran, getting some sleep and producing much prettier output’. Fundamentally, Gahegan’s (1999) concern is about the limits our computational tools place on our thinking; our ‘technical debt’, the accumulated technological and social costs associated with past engineering decisions, may leave us penniless, with no new theory in the bank.
This concern is still with us. Programming in GIScience instruction is on the rise (Etherington, 2016; Bowlick et al., 2017; Arribas-Bel, 2019), underwritten by high-quality spatial analysis libraries (Rey and Anselin, 2007; Bivand et al., 2011) and pedagogy treating ‘code as text’ (Rey, 2009, 2018). This has resulted in community-led scientific infrastructure (Wolf et al., 2019) where international teams contribute to a shared body of implemented knowledge.
As data gets larger and techniques more complicated, specialized high-performance computing frameworks have become more common in cutting-edge geographical analysis. ‘Distributed’ computing libraries, such as spark (Zaharia et al., 2016), can spread computational load across many computers. Alternatively, ‘tensor’ computing libraries, such as tensorflow (Abadi et al., 2016) or PyTorch (Paszke et al., 2019), automatically optimize large chains of computations required by machine learning. This is not to say that all quantitative geography requires substantial computing, but many new methods will be inaccessible to geographers without these frameworks. Unfortunately, these are next-generation disabling technologies in two senses.
4.i Platform capitalism
First, these libraries are large corporate-led open source projects. Indeed, the two main tensor computing frameworks are dominated by Google and Facebook. This complicates the political economy of community-led scientific infrastructure, since large companies realized: Sharing, rather than building proprietary code, turned out to be cheaper, easier, and more efficient. This increased demand puts additional strain on those who maintain this infrastructure, yet because these communities are not highly visible, the rest of the world has been slow to notice. (Eghbal, 2016: 9)
4.ii Constraining the possible analyses
In addition, these packages are usually not designed for geographical applications. While these frameworks make it easy to implement complex neural networks, they also enforce representations that make sense to their designers. Since these designers are optimizing for specialized industrial data science and machine learning applications, it can be challenging to build scientifically-useful geographical models.
To illustrate, Singleton and Arribas-Bel (2019) make a case for a new ‘geographic data science’ integrating geography and data science. The extent of integration may take a few forms. First, commodity data science (Singleton, pers. comm., 2020; Maskell, 2019) uses standard algorithms on data ignoring geography. However, the results might be visualized in a geographical way. Second, data science enhanced with geography enriches data with geographical information and analyses it using standard algorithms. Third, explicitly geographic data science harnesses the geographical structure of data to give novel insights.
Explicitly geographic methods are usually not supported or are difficult to optimize in common high-performance computing frameworks. The second ‘enhanced’ form exists at the frontier of geography (de Sabbata and Liu, 2019; Zhu et al., 2020), but is not routine. The third form also requires new foundational work. Thus, a successful quantitative geography must build its own tools or, better still, commandeer frameworks to build a fairer community-owned computational infrastructure.
5 Causality
Singleton and Arribas-Bel’s (2019) geographic data science is visionary. However, the integration they see is not sufficient to define a new domain beyond geography itself. Replacing data science for something else only changes what gets integrated with geography: all quantitative geographic methods exist in a similar hierarchy of integration. This also applies to the burgeoning field of causal inference.
While ‘causal geographies’ are still elusive, geography increasingly enhances standard causal inference methods. One clear case, regression discontinuity designs (RDDs), estimate the causal impact of an intervention applied according to a fixed, exogenous threshold (Imbens and Lemieux, 2008). This threshold is the ‘discontinuity’ separating two groups that are otherwise fundamentally similar.
For geographical problems, boundaries provide this discontinuity: adjacent areas may be demographically similar, but laws change abruptly between jurisdictions. The causal effect of a new law can thus be estimated from boundary discontinuities. Despite controversy (Chen et al., 2013; Pope and Dockery, 2013), geographic RDDs (Keele and Titiunik, 2016) saw early use in the analysis of government policy (Holmes, 1998) and have recently been used to study financial risk (Goetz et al., 2016), crime and policing (MacDonald et al., 2016; Twinam, 2017), electoral turnout (Keele and Titiunik, 2018), and school quality (Gibbons et al., 2013). Because boundaries are intrinsic to administrative/areal data, geo-RDDs remain a causal inference method of choice.
However, geo-RDDs in quantitative political geography exhibit what O’Loughlin (2018b) calls ‘political geometry’. Among other methods, geo-RDDs ‘reject the possibility of contextual effects that complicate the usual socio-demographic or ideological predictors of political behavior’ (O’Loughlin, 2018a, 2018b). Beyond political geography, this means the effect of ‘place’ is modelled using a ‘distance to boundary’ measure. This leaves no room for substantive place-based differences: one is either ‘near’ or ‘far’, ‘in’ or ‘out’. Thus, when results are sensitive to changes in distance metrics, something more than the discontinuity must be at play.
We must approach this carefully. The ‘Cartesian coordinate approach’ setting up the title fight between ‘The Good, the Bad, and the Ugly’ in O’Loughlin (2018b) is only a ‘general orientation towards data’ (Merton, 1949). But, changing a distance metric is a theoretical act with empirical effects. If the distances core to our ‘Cartesian’ approaches are only fungible in theory, then better theory is needed. Indeed, ‘testing theories means correctly estimating the coefficients on specific causal variables’ (Gibbons and Overman, 2012: 186). There is no reason to believe that the various theories implied by different notions of ‘near’ should yield identical causal coefficients. As such, we must redouble our efforts to engage with the theoretical implications of causality in geography. Indeed, for us to ever conduct replicable research, geographers must explicitly define specific concepts, theories, causes, and processes to continue to integrate successfully with other domains.
IV Conclusion
In this report, we outlined theoretical and social challenges that quantitative geography faces. Broadly, if we are to have a future, we must strive to be
This may manifest in a few ways.
For our theory, we may broaden ideas considered here or develop entirely new heuristics. With a better Tobler’s Law, we might help other domains simplify difficult analytical problems – after all, geographical data is pervasive. With a better theoretical understanding of the MAUP, we become more explicit about how changes in aggregation change the theory of context, place, or exposure within ethically-sourced secure but accessible individual data.
Practically, challenges to open and reproducible quantitative geography will be overcome by strengthening our community. Resisting pressures from large-scale single-party control of computational frameworks will not be easy. Access to open source scientific tools is improving, and the maintenance of this critical digital infrastructure may still return to the commons. Further, successful campaigns to build a more inclusive scientific community need quantitative geographers to play their part in making our own community more inclusive in focus and composition.
Beyond community, replication will still be challenging. We will continue to find success by taking causality seriously, building specific testable theories amenable to replication. With better geographic data, we will need better geographic theory. What does this new geography ‘do’, beyond acting as a proxy for something else? Our aspirations about ‘contextual effects’ should motivate us to seek replicable designs that explicitly theorize what place, context, or exposure do in geographical processes. Alternatively, what assumptions about place, context, or exposure are already embedded in our current notions of ‘distance’ or ‘region’?
In some cases, the effect of geography (be it as Tobler’s ‘space’ or Openshaw’s ‘place’) may boil down to a pure geometric relationship – in or out, near or far. If so, we should seek to replicate the result, and examine where replication fails and why. If not, this simplicity should not be disappointing. It is the erosion of complexity – slowly and progressively by accumulated knowledge – that is the point.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
