Abstract
In this second report on quantitative methods, I consider the foundational triumvirate that underpins the sub-discipline: data, methods, and theory. I argue that, although ‘big data’ and, more recently, ‘big code’ may have captured the limelight, theory is invaluable and should not be disregarded. Showcasing the integral role of theory in quantitative human geography, I identify three kinds of useful theory (old school, enduring and conceptual) and, along the way, highlight beneficial ways in which they contribute to our thinking about data and methods. Throughout, I emphasize that theory is a crucial component of contemporary and future quantitative methods development.
I Introduction
What does theory have to do with quantitative methods? If the two were to publicly declare their relationship, surely the answer would be, ‘it’s complicated’. On the one hand, quantitative methods can be described as mere analytical tools, divorced from theoretical constraints. On the other hand, quantitative methods and quantitative human geography have their antecedents in the quantitative revolution and are linked to a surfeit of theory, although not of the sort that most quantitative geographers would espouse in the 21st century. The relationship is rendered even more complicated, however, when questions around disciplinary identity and perception are considered. Although many might disagree, the stereotype of atheoretical, ‘turning the crank’, methods-driven quantitative analysis certainly exists.
This report seeks to dispel any rumours that may have developed about theory’s rocky relationship with Quantitative Methods. The topic is timely for a couple of reasons. The first is the arrival on the scene of ‘big data’, which has impacted what sorts of research questions are asked and how they are answered, but has also raised important concerns around validity, representation and generalizability. Big data is accompanied by what may be termed ‘big code’ (Rey, 2022): the apparatuses or infrastructures supporting contemporary quantitative research. Big code includes not only the code itself, but also computing platforms and capacities, as well as skills and community. This paradigm shift in how quantitative human geography research is conducted leaves open an obvious window for examining the role for ‘big theory’ in this new landscape.
The second reason for a report on theory relates to how quantitative human geography fits into, and is viewed by, the wider discipline. Although it has been true for a while now that quantitative methods are no longer as tightly associated with 1950s and 1960s, law-pursuing, quantitative revolution geography (Johnston et al., 2014; Poon, 2004; Poorthuis and Zook, 2020), there remain dissenters who persist in connecting the big data era in geography to the established limitations of the quantitative revolution. Barnes (2013), for example, writes that, in both instances, ‘techniques and numbers become fetishized, put on a pedestal, prized for what they are rather than for what they do’ (p. 299). And there is no denying that the sub-discipline’s contemporary relationship with theory remains murky, with many suggesting a need for more theoretical consideration, implying a paucity of theory in actually existing quantitative human geography. By evaluating quantitative human geography’s standing, where theoretical geography is concerned – Big Theory – I contend that it is an equal intellectual partner in our discipline.
In this second of three reports, I maintain that theory is as important to contemporary quantitative geography as it is to the rest of human geography and that, within the sub-discipline, it is just as important as methods and information (i.e. Big Code and Big Data). I highlight the longstanding role of theory in quantitative human geography, whilst also underscoring ways in which it is inextricably connected to methods and data in the more recent big data era. To that end, I first provide an overview of big data and big code, before turning to big theory and illustrative examples of current research.
II Big data
We are well into our second big data decade and there is not much new to add that has not already been written. The big picture perspective is captured optimistically by Anderson (2008) and more critically by Boyd and Crawford (2012), with the former prophesizing an era of such abundant numbers that they ‘speak for themselves’ and the latter issuing a suite of provocations that foreground the many challenges associated with the production and use of big data. Within geography, Kitchin (2013) and Miller and Goodchild (2015) highlight opportunities and challenges where data-driven geography is concerned. Arribas-Bel (2014) makes a similar contribution on the subject of cities research. There are two important elements of the general conversation that merit elaboration in the context of geographical theory.
The first is the prevailing sentiment that the ‘big data’ era, or revolution, represents much more continuity in the data landscape than is often acknowledged. This is evident in Barnes (2013), when he describes the wider context and development of geographic methods and data from the 1950s up to the present day, and notes that big data suffers the same shortcomings as the quantitative revolution: fetishization of data over real-world importance, loss of valuable context and lack of interpretation (i.e., the risk of letting ‘the data speak for themselves’). 1 By quoting from Harvey (1972), with regard to complex theories and methods that fail to elucidate real-world phenomena, Barnes draws a direct line between the 1960s and the big data era of the 21st century. Miller and Goodchild (2015) emphasize data ‘evolution’ and not ‘revolution’, but take a more positive stance, arguing that big data represents less of a step change for geographers, who are already accustomed to dealing with large, messy and complex datasets. Other disciplines have made similar points. For example, Kashyap (2021), writing about the data revolution in demography, notes that new and emerging forms of data bear many similarities to the established (and big and complicated) types of data that demographers have employed for decades.
Second is the inextricable link between data and theory. Contrary to Anderson’s (2008) argument (widely questioned, if not derided) that big data would render both models and theory irrelevant, the utility and value of data, whether big or small, are only as great as their theoretical underpinnings. While it may be true that, with enough numbers, heretofore invisible patterns or groupings emerge, in and of itself, big data is less able to explain causal relationships or underlying processes (Coveney et al., 2016). That is the job of theory, however it is defined. Moreover, the (big) data themselves do not exist in a theoretical vacuum; they are generated by social and political processes and structures, and their interpretation is rarely neutral and cannot claim to be objective (Boyd and Crawford, 2012; D’ignazio and Klein, 2020; Gitelman, 2013). Making the most of big data requires an engagement with theory.
III Big code
How we work with big data has also recently been the subject of considerable and wide-ranging discussion, including the development of open-source software and methods, inclusive community building, and, especially, the need for tools and practices that facilitate open, reproducible and replicable research (Brunsdon and Comber, 2021; Rey, 2022). In this sense ‘big code’ comprises the various infrastructures – human and machine – that enable contemporary quantitative human geography research. The emerging computational era is also occasionally framed as an analytical paradigm, in which code itself is just one element of a larger approach that includes computation and exploration (Brunsdon, 2016).
The connection between big code and quantitative methods is straightforward. Methods – the sets of operations employed to work with data to answer research questions – are fundamentally code, or programs or applications, that apply a set of analytical procedures to datasets. Whether the methods are from the 1960s, 1980s, or 2020s, their basis is in lines of programming instructions that implement methods. Where methods are concerned, big code can also be interpreted broadly to encompass the ways in which big data are managed, wrangled and explored – that is, the myriad analytical steps that necessarily come before the actual analysis, or application, of quantitative methods. The push towards openness of methods is an additional characteristic of ‘big code’. This extends beyond cost (free), to include a general belief that methods are intended to be iterated upon and extended by a broad community of contributors.
Big code is also intrinsic to the development and testing of theory. This applies to the ‘soft’ elements of big code, where openness and community are concerned: robust theory that works to explain a complex world requires input from a broad and diverse range of experts working collaboratively to build models. It also applies to the core elements of big code, which ideally enable a range of researchers to replicate studies across a range of settings and also reproduce results using the publicly available underlying code (i.e. the methods applied as well as decisions and assumptions made about the data). In this sense, the ‘replication crisis’ can be viewed as a theory crisis – how do we know how the world really works if our results are different each time? As Sui and Kedron (2021) put it, in the context of geography, being able to generalize results (which is what replication and reproducibility accomplish) is a key feature of quantitative geography. Sui and Kedron point to the need for reproducibility and replication in geography if it aspires to be a law-seeking discipline. However, they posit the extreme case: even testing of a theory or hypothesis requires replication and reproducibility.
IV Big theory
Before embarking on a discussion of how theory has mattered, and continues to matter in quantitative methods, it is worth clarifying what I mean by ‘theory’. Although researchers from across quantitative human geography assert the importance of theory (e.g., Johnston et al., 2014; Kedron and Holler, 2022; Oshan et al., 2022; Poorthuis and Zook, 2020), the terminology is slippery. In some cases, theory can be read as short for social theory (Poorthuis and Zook, 2020), while at other times, it appears to refer to concepts more than processes (Oshan et al., 2022; Poon, 2004). At still other times, in the context of quantitative methods, it appears to include mainly law-seeking, quantitative revolution era theory (Sui and Kedron, 2021). A flexible (and modern) definition of theory is this: an explanation of how some aspect of the world works, or a framework for approaching questions that ask, ‘why’? 2 This explanation may be testable (and certainly quantitative human geographers are often – but not always! – seeking to test their ideas of how the world works). It may also endeavour to explain what happens to Y, if X changes, and what the mediating factors may be.
Applying this definition, I identify three main types of theory that are relevant to contemporary quantitative methods in the big data and big code era. The categories are not hard and fast – or even discrete – but are more akin to fuzzy sets. They are old school, enduring, and conceptual theory. Below I discuss each in turn, before summarizing with a few examples of how big theory is implicated in recent quantitative human geography research.
1 Old school theory
Harvey (p. 3, 1972) sums up the quantitative revolution as ‘distance decay function, the threshold and the range of a good, and the measurement of spatial pattern’. Although we are now half a century beyond this assessment, it remains quite common for discussions of geographic theory in a quantitative methods setting to be framed in terms of 20th century narratives around debates between geography as nomothetic or ideographic (Johnston et al., 2014; Kedron and Holler, 2022; Sui and Kedron, 2021), or as a process culminating in, and following, the quantitative revolution (Barnes, 2013).
What is certainly true is that there is very little contemporary quantitative human geography which imposes rigid constructs such as an isotropic or featureless plain, a uniform distribution of population, or a rational decision maker. Central Place Theory, industrial location theory, and fundamental retail geography theories, such as the Hotelling model or Reilly’s ‘law’ are seldom the full theoretical foundation for research. That is not to say that classic theory is no longer taught, or that no stylized facts have emerged from this body of research. On the contrary, there is a great deal from the 20th century that still provides an underpinning to much of the analysis in the sub-field. But quantitative geography has long moved past the search for universal laws or even the idea that data and methods are wholly objective. To reiterate an argument I have made elsewhere (Franklin, 2021), it is occasionally startling that the sub-discipline continues to be defined in historical terms, relative to the quantitative revolution, and rarely on its own contemporary merits.
2 Enduring theory
That said, there are definitely elements of 20th century quantitative geographical theory that continue to resonate. For instance, distance decay and spatial interaction are still with us. These are examples of ‘enduring’ theories. The fundamental idea that interaction intensity and relatedness attenuate with distance provides a theoretical basis for research but also informs the construction of methods and choice of data. Tobler’s Law is always with us (Tobler, 1970). On the enduring topics of spatial structure and spatial interaction, Oshan (p. 926, 2021) in this journal writes, ‘how spatial structure is conceptualized and operationalized in SI [spatial interaction] models has implications for how we build knowledge about SI phenomena’. This highlights that an important component of enduring theory is its tight and bi-directional integration with methods. Theory that describes how accessibility, interaction, clustering or location (as just a few examples) operates requires methods that are able to capture and model these complex phenomena. No one is more aware than the researchers themselves of the danger that a mismatch across method, data and theory will obfuscate actual relationships or, worse, indicate relationships or processes that do not in fact exist.
In addition, there has been a qualitative shift in the enduring theories that frame current quantitative research in geography (as well as geographical research in other disciplines). Johnston et al. (2014) make a convincing case that ‘place’ has replaced ‘space’ as the organizing theoretical concern in quantitative human geography. Context is king and hypothesized to be a key driver for a range of individual-level outcomes, from health to social and economic mobility (Kwan, 2012). Whether referred to as ‘neighbourhood effects’, ‘spatial opportunity’, or the ‘geography of opportunity’ (Galster, 2012; Galster and Sharkey, 2017; Knaap, 2017; Galster and Killen, 1995), the theoretical link between local context and socio-economic outcomes is possibly the dominant quantitative human geography social theory, even if much of it originates from outside the discipline of geography. Fowler and Jensen (p. 1396, 2020) make a similar point in the opening sentence of their article on defining labour market areas, stating that, ‘A fundamental tenet of geographic research is that outcomes are shaped by geographic context’.
Enduring theories may also be characterized by their attention to outliers and exceptions. Where, arguably, ‘old school’ theories were preoccupied with a need for universality or ‘law’ status – general rules that hold everywhere and for all – the nature of contemporary quantitative human geography is often to seek out the ways in which theory does not perform as expected because it is here that valuable insights into behaviour, preferences and interactions can be generated. This, to my mind, is one distinguishing characteristic of enduring geographical theory, as applied to quantitative methods.
3 Conceptual theory and theoretical concepts
In quantitative methods, a great deal of theorizing takes place at the building block, or conceptual, level. This is partly because, to develop theory about processes, interaction, or context, the spatial units or locations experiencing the process, doing the interacting, or performing the role of contextual container, must be clearly defined. It is also the case that higher-level theoretical relationships will vary, depending on how the foundational concepts have been theorized. Moreover, the implementation of models and methods, and how results are interpreted, are heavily reliant on this initial conceptual theorizing.
The modifiable areal unit problem (or, MAUP) provides a classic example (Openshaw and Taylor, 1979). The modifiable areal unit problem is not a theory, and yet a large and growing body of research has established the importance of boundary and zonation choices in geographical analysis and, in fact, in any research that employs geographical units as an input or basic construct. Furthermore, quantitative geographers wrestle with these effects in a way other parts of the discipline might wrestle with theory. The measurement of local, or neighbourhood, context provides an additional example, where alternatives to conceptualizing and operationalizing context have drawn extensive attention (Fowler et al., 2020; Kwan, 2012).
Similar conceptual theorizing is evident in recent reflections on spatial scale (Oshan et al., 2022; Poon, 2004; Poorthuis and Zook, 2020) and in time geography (Miller, 2005). On the subject of geographic network analysis, there has also been discussion about the meaning of edges and nodes and what is captured or missed with various definitions (Gibadullina, Bergmann, and O’Sullivan, 2021; Uitermark and Van Meeteren, 2021). In the same special issue, Derudder (2021), still on the subject of networks, reflects on the implications of conceptualizing cities as discrete units. The Geographically Weighted Regression (GWR) community, which typically focusses on model implementation and extension, has recently addressed the importance of modelling choices matching theoretical expectations (Comber et al., 2022; Wolf, 2022). On land use and land cover change, Comber and Wulder (2019) highlight the importance of attending to underlying process, or theory, alongside ‘big data analysis’. Shelton and Poorthuis (2019), on neighbourhood definition, even include a section of their paper titled, ‘Theorizing the Neighborhood’. Across all the above examples, a common and distinctive feature is the explicit and implicit connection of theory and concept to method development – big code, big data, and big theory, all working in concert.
4 Big theory as goldilocks theory
Sayer (1984) writes that data and methods do not exist independently of theory and conceptualization and that the choices we make about our methods and data should reflect underlying theories, or hypotheses about how the world works. Bigger data does not render this insight obsolete; if anything, the need for theory is even greater. Complex data requires robust theory to understand the mechanics of underlying processes, but, in addition, it is imperative that we acknowledge the myriad ways in which theory is embedded in data generation processes as well as methods development and application. Big theory is ‘Goldilocks’ theory and comes in all sizes – big, small and bespoke – and includes, especially, the just-right theoretical framework, often embedded within conceptual and methodological decisions, that provides the motivational glue that holds data, methods and research questions together. Big theory is also the natural helpmeet of what Harris (in Johnston and Harris, 2022) refers to as ‘pragmatic quantitative geography’ – a more self-aware sub-discipline which is ‘reflectively conscious of its own limitations but careful not to throw the baby out with the bathwater’ (p. 7).
A great deal of current research in quantitative human geography successfully marries big theory and pragmatic quantitative geography. Some is the conceptual theory research discussed above. Other examples, especially in segregation and neighbourhood context are covered in my first report (Franklin, 2022). Indeed, good examples abound from across the entire spectrum of quantitative human geography. For example, Robinson et al. (2019) employ local methods to conceptualize and measure household energy vulnerability, leveraging the strength of local statistics for understanding a phenomenon that is highly spatially variable. Employing a spatial optimization approach, Robinson et al. (2022) evaluate equity of coverage in smart city infrastructures. Looking at neighbourhood change, Delmelle (2022) emphasizes the importance of underlying processes and the need for methods that can adequately measure the nature and scope of these processes. Starting from a blank slate (but not an isotropic plain!), Poorthuis et al. (2022) and Shelton and Poorthuis (2019) ask theoretical questions about the nature of gentrification and neighbourhoods, respectively, using Twitter data to evaluate their conceptualizations: big data, big code and big theory.
V Conclusions: There’s something about theory
A hallmark of academic disciplines is gatekeeping: deciding not only what merits study, but by whom and how. Many of these gates are widening within quantitative human geography and this is to be welcomed. A defining feature of ‘big code’, after all, is precisely the expanding of access and incorporation of more voices into the methodological conversation, both in development of methods and in their application. Big data offers similar potential, in that the increase in timeliness and variety of information should increase the range of questions that can be asked (and answered), and from a diversity of perspectives and temporal and spatial scales.
The theory gate, however, remains narrow for quantitative human geographers, for reasons external and internal to the sub-discipline, but not inherently due to an underlying paucity of theory. On the contrary, quantitative methods are big with theory, situated across a historical context of ‘old school theory’ and enduring, evergreen theory, and an ever-growing base of conceptual theory that engages with sticky concepts such as neighbourhood, context or scale.
Big theory – the myriad ways in which quantitative researchers wrestle with explaining the world – is ready for the opportunities that await but faces three ‘big’ challenges. The first is communication – the need for a vociferous articulation on the part of quantitative human geographers to the rest of the discipline that their theory is real theory and that it can engage beneficially with other varieties of theory, including social. The second is epistemological and ontological inclusivity, which as a discipline we should all be practicing. There is no one single way to do theory.
Third, and lastly, as many others in the field continually exhort, quantitative methods require active and thoughtful engagement with theory. An asymptotic approach to understanding the world, even a small piece of it, requires data, methods, but also theory. And these are exciting times: one of the great promises of big data is the potential to uncover not only heretofore shadowed phenomena, but also those that are newly emergent. Theory has always been important, but, in a big data and big code era, assumes additional significance, as it is theory that holds the entire assemblage together. Against a backdrop of methodological change (big code) and the appearance of new and emerging forms of information (big data) the linchpin role of theory – big, small and just right – has never been more important.
Footnotes
Acknowledgements
A big thanks, as always, to those who listened to my ideas, helped sharpen them, and provided essential feedback. There is no big anything in research without community. In particular, thanks this time around to Jessa Loomis and Danny MacKinnon for numerous hallway and pub conversations, and to Steven Manson, David O’Sullivan and Caitlin Robinson for comments and suggestions on earlier drafts.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
