Abstract
Predictive policing has become a new panacea for crime prevention. However, we still know too little about the performance of computational methods in the context of predictive policing. The paper provides a detailed analysis of existing approaches to algorithmic crime forecasting. First, it is explained how predictive policing makes use of predictive models to generate crime forecasts. Afterwards, three epistemologies of predictive policing are distinguished: mathematical social science, social physics and machine learning. Finally, it is shown that these epistemologies have significant implications for the constitution of predictive knowledge in terms of its genesis, scope, intelligibility and accessibility. It is the different ways future crimes are rendered knowledgeable in order to act upon them that reaffirm or reconfigure the status of criminological knowledge within the criminal justice system, direct the attention of law enforcement agencies to particular types of crimes and criminals and blank out others, satisfy the claim for the meaningfulness of predictions or break with it and allow professionals to understand the algorithmic systems they shall rely on or turn them into a black box. By distinguishing epistemologies and analysing their implications, this analysis provides insight into the techno-scientific foundations of predictive policing and enables us to critically engage with the socio-technical practices of algorithmic crime forecasting.
Keywords
Predictive policing has become a new panacea for crime prevention. More and more law enforcement agencies take their decisions based on algorithmic crime forecasts, often in response to budget cuts and a growing pressure to increase both the efficiency and the objectivity of criminal justice. Broadly speaking, predictive policing makes use of information technology, data and analytical techniques in order to identify likely places and times of future crimes or individuals at high risk of becoming a future (re)offender or victim (Perry et al., 2013; Uchida, 2014). 1 Meanwhile, there is a great variety of software solutions to perform these tasks, ranging from departments’ in-house developments to commercial off-the-shelf products.
The growing importance of predictive policing in crime prevention strategies is accompanied by considerable hype both in media and law enforcement discourse (Brayne et al., 2015).
These practical, material and discursive developments have sparked widespread scholarly attention and have motivated various lines of investigation. There are debates on the effectiveness (Gerstner, 2018; Hunt et al., 2014; Mohler et al., 2016; Saunders et al., 2016) and the discriminatory consequences (Angwin et al., 2016; Brantingham et al., 2018; Lum and Isaac, 2016) of predictive policing practices. It has been analysed how predictive policing is transforming police organizational practices (Brayne, 2017; Egbert and Leese, 2020), how it blurs regulatory boundaries and violates legal provisions (Ferguson, 2012; Završnik, 2019), and how it emerged from a historical background of scientification and technologisation of the criminal justice system (Wilson, 2019b). 2 Moreover, numerous studies stress that predictive policing forms part of a general paradigm shift in social control by enabling ‘near-real-time decision-making’ (Aradau and Blanke, 2016), ‘stochastic governance of populations’ (Sanders and Sheptycki, 2017) and pre-emptive security measures (Andrejevic, 2017; Mantello, 2016; van Brakel, 2016).
However, we still know too little about the performance of computational methods in crime forecasting. Most importantly, we lack a systematic account for the differences within the work of algorithms that make sense of data, model reality and generate predictions. As Perry et al. (2013) and Kaufmann et al. (2018) have shown, there is not one dominant way but a plurality of ways to the production of crime predictions. Hence, we need to look behind the over-generalized use of terms such as the relevance that is ascribed to subject-matter theories; the limits of prediction that are set; the general explanations of crime that are given; and the conditions for implementing algorithmic accountability.
By distinguishing epistemologies and analysing their implications, this paper provides insight into the techno-scientific foundations of predictive policing. While certain implications of algorithmic crime forecasting have been addressed in other studies before (particularly with regard to ML, see: Chan and Bennet, 2016), they have not yet been analysed in a systematic and comparative manner.
Just as any other algorithm, algorithms designed for the purpose of crime forecasting are embedded in wider socio-technical practices and work in contextual and contingent ways. Many of the predictive policing software solutions that are currently in use combine at least two of the three epistemologies. Moreover, the use of this software is regularly embedded in processes of human-centred formation of judgement and sense-making that may follow their own logics. This starts with the collection and preparation of data and ends with the interpretation of visualisations of predictions. Finally, crime forecasting is just one element within a comprehensive process of crime prevention ranging from the specification of objectives and requirements to the specific actions of police officers on duty. Thus, predictive policing – understood as a complex socio-technical practice – amalgamates different forms of knowledge production, different rationalities and different valuations (Kaufmann, 2017). Keeping in mind this ‘messy’ reality, an essential part of the critical endeavour to unpack the complexity of predictive policing is to understand its underlying epistemologies of prediction. The following analysis allows us to account for the different ways future crimes are rendered knowledgeable in order to act upon them. Moreover, it can serve as a heuristic for future investigations into the production and application of predictive policing software.
Predictive policing and predictive models
When data about the past is used to predict events in the future, generally, a predictive model is involved. This applies to crime forecasts as much as to any other prediction based on statistics or data. Mathematical models strive for the formalisation of relationships between different entities or variables within a given set of data. The variables that form part of the mathematical equations of a model are its parameters. When the relationships between these variables are stochastic (that is, non-deterministic), the model is a statistical model. While statistical models, in general, are used to generate or test causal explanations and to describe a data sample (that is, to describe the probability distribution in a mathematical space), a statistical model is used as a predictive model when its purpose is to predict the value of a dependent variable from the value(s) of the independent variable(s). Forecasting is the process of applying this predictive model to estimate the future value of a random variable (a value that cannot be known in advance).
There are approaches to forecasting crime that are not based on a predictive model (Groff and La Vigne, 2002: 32). Forecasts relying on univariate methods that simply use previous values of one variable to predict its future value do not require a model per se (Gorr and Harries, 2003). As in a certain type of retrospective hotspot policing (Townsley et al., 2000), predictions about future crimes are based on the assumption that given spatial concentrations of crime in small geographical areas, so-called hotspots (Sherman et al., 1989), will pertain and therefore indicate future crime occurrence (Kennedy et al., 2011: 340). However, predictive policing differs from just summarising ‘historic risk equates to making the assumption that crime will happen only where it has in the past’ (Bowers K and Johnson, 2014: 569). Instead, it ‘explicitly models change over time, often relying on evidence of statistically broader geographical impact of a single crime event (Bennett and Chan, 2017: 3).
Predictive policing makes use of crime forecasts based on a predictive model relying on multivariate methods that use current and past values of independent variables to predict the future value of the dependent variable (Bowers et al., 2004; Groff and La Vigne, 2002). As one of the most popular modelling techniques used for crime forecasting, regression analysis represents statistical relationships between a definite number of covariates or independent (explanatory) variables and a single dependent variable representing the feature of interest that is to be predicted. The selection of a regression model will directly affect the type of the dependent variable and vice versa. Logistic regression enables the estimation of the probability of a certain event occurring in a certain geographical area or location, whereas linear regression enables the prediction of future volume of crime within this spatial unit (Hunt et al., 2014).
However, the statistical model of choice does not inform about the variables of the mathematical equations. Model-specification requires the selection of independent variables that represent relevant aspects of the subject-matter problem. Selecting these variables is choosing a convenient set of parameters of the model. This process is hence called
Variable selection and parametrisation of a predictive model can follow many ways. The three epistemologies of predictive policing I am going to present in the following are three distinct ways to construct and specify a predictive model.
Mathematical social science
The first epistemology of predictive policing is based on the concept of MSS. In this strand of social science, mathematical models are used to both create and test explanations of social behaviour and, at least in some cases, to predict social behaviour. Accordingly, the first approach to predictive policing translates explanatory knowledge about criminal behaviour into the mathematical formula of a predictive model. This approach can be empirical and may follow sociological modes of inquiry (Chan and Bennett, 2015: 27). For example, Bowers et al. (2004) conducted interviews with offenders in order to propose a model for prospective hot-spotting based on past burglary events. The empirical knowledge gathered through the interviews was then translated into a predictive model which in turn was to be tested regarding its predictive value through the analysis of crime data (Chan and Bennett, 2015: 27). While theory was instructive to the selection of interviewees, the construction of questionnaires, the process of analysing the interview data and, finally, to the translation of empirical insight into mathematical formulas, there are ways of predictive modelling that rely on theory much more explicitly.
One of the most common theory-driven approaches to crime forecasting is based on rational choice theory and routine activities theory. Both theories make certain assumptions on the meaning of criminal acts and the reasoning of the human agent performing these actions. In general, crime is seen as purposive, and the criminal offender is regarded as a rational and self-determining being: criminals would seek to benefit themselves when committing a crime, and therefore certain considerations and perhaps even calculations would be involved before committing a criminal act (Becker, 1968; Clarke, 1997; Cornish, 1986). Moreover, according to routine activities theory, the decision-making of potential offenders would take place with regard to certain promising or unpromising offending opportunities of a given situation – typically a setting where (a) motivated criminals, (b) potential targets (victims or their property) and (c) the absence of capable guardianship converge in time and space (Brantingham and Brantingham, 1978; Cohen and Felson, 1979; Felson and Cohen, 1980). Routine activities come into play since they: ‘bring together at various times of the day or night persons of different background, sometimes in the presence of facilities, tools or weapons which influence the commission or avoidance of illegal acts’ (Cohen and Felson, 1979: 591). For predictive model specification, these criminological theories offer guidance for the selection of independent variables and parameters – namely factors that influence criminal reasoning such as the presence of desirable targets and an environment that offers promising offending opportunities (Groff and La Vigne, 2002: 32). One of the most influential applications of this approach to predictive modelling is the forecasting of near repeat burglary. The term
While criminology and crime prevention have long focused on the (ascribed) criminogenic nature of individuals, environmental criminology offered a way to focus on the criminogenic features of a social environment instead, both in research and practice (Cohen et al., 2007: 106). Among the various approaches of environmental criminology, risk terrain modelling (RTM) particularly enables the predictive modelling of spatial crime patterns (Caplan et al., 2011). RTM takes into account multiple physical and social characteristics of the environment that influence how crime emerges, concentrates and evolves (Kennedy et al., 2011: 340). According to its proponents, ‘the risk of crime in places that share criminogenic attributes is higher than other places as these locations attract offenders (or more likely concentrate them in close locations) and are conducive to allowing certain events to occur’ (Caplan et al., 2011: 377). The basic methodological approach of RTM is to conceptualise criminality as a function of the dynamic interaction between social, physical and behavioural factors that occurs at places (Kennedy et al., 2011: 342), then to measure and weight what Brantingham and Brantingham (1995) called ‘crime generators’ and ‘crime attractors’ and to eventually identify the most opportune places for offenders to commit crimes (Kennedy et al., 2011: 341). However, those risk factors need to be identified through a meta-analysis of empirical studies, literature review, professional experience and practitioner knowledge in the first place (Caplan et al., 2011; Kennedy et al., 2011: 341). Sufficiently informed, RTM assigns an ordinal value to every place throughout an area of interest according to the attributed presence (or absence) of those risk factors. As the result of a modelling process and by using a geographic information system software, a risk terrain map created of an area of interest will include all-composite risk values for every place (defined by equally sized cells) within this area accounting for all risk factors previously identified as crime generator or crime attractors (Kennedy et al., 2011: 343). Finally, those risk values can be used to predict where certain types of crime will occur (Caplan and Kennedy, 2014: 1684). RTM analysis can be done manually or automatically by using the corresponding software solution RTM Diagnostics. 4 In the latter case, the system is fed with crime data for a specific locale along with other data about the physical environment. Based on given weights, the system then uses this data to compute a probability for a new crime incident that is likely to occur nearby in the near future, providing a tabular or visual output.
Social physics
The second epistemology of predictive policing is based on the concept of SP. The origins of this concept can be traced back to the early 19th century and the works of Henri de Saint-Simon and Auguste Comte and their faith in the ‘physics of society’ (Ball, 2002: 5). The social physicists were driven by the belief that physical processes and human behaviour are governed by the same principles. Therefore, the ‘laws, theories, and models of physics applied as much to social as to natural worlds’ (Barnes and Wilson, 2014: 2). Inspired by the physical theories of matter as comprised of atoms and molecules that are moving randomly but can be described by mathematical laws, a scientific endeavour emerged that sees societies as comprised of individuals that are characterized by randomness and idiosyncrasy but are predictable on the collective scale through statistical analysis. The inquiry into the laws of criminality was one of the first manifestations of this endeavour. In the 1820s, Adolphe Quetelet applied statistical analysis, mainly used in astronomy at that time, to large data sets of crime figures to gain insight into relationships between crime and social factors such as age, gender and education.
While SP remained a vital field of investigation throughout the 19th century and the decades after WW2, it has received a significant boost with the advances in statistical physics, network science, complex systems science, evolutionary game theory and computational social science. Since the end of the 20th century, statistical methods and models from these disciplines have been applied to a wide range of social phenomena such as economics, traffic flow, pedestrian motion and voting (Ball, 2002: 2; Perc, 2019). Insights from various scientific fields offered resources for modelling and predicting criminal behaviour as well (D’Orsogna and Perc, 2015; Groff et al., 2019).
One particular model found its way into the work routines of police forces all over the US and other countries. Jeff Brantingham and George Mohler, co-founders of PredPol, 5 the self-claimed market leader in predictive policing, proposed to adapt an epidemic-type aftershock sequence (ETAS) model to predict a variety of crimes (Mohler et al., 2011: 105). The origins of ETAS models go back to population genetics in epidemiology (Wyss et al., 1999: 486). In seismology, they are used to analyse seismic activity as an interaction of physical events which may trigger a cascade of earthquakes (Benbouzid, 2019b; Mohler et al., 2016: 1400). Diverting an ETAS model for crime forecasting is a classic example of SP. While the model was constructed to formalise cause and effect relationships within the natural world, it is adapted to formalise cause and effect relationships within the social world. Thus, by applying an ETAS model to crime forecasting, PredPol is ‘completing a circle whose trajectory commenced centuries previously’ (Ball, 2002: 2).
However, this translation from the physical to the social world only becomes feasible against the backdrop of a certain understanding of crime. While generally assuming that criminal events arise out of interactions between environmental conditions and situational decision-making (Mohler et al., 2016: 1399), Brantingham and Mohler specifically aim at forecasting near repeat victimisation (Mohler et al., 2016: 1400). Instead of drawing directly from the boost hypothesis, however, their approach stands in the tradition of a research strand that addresses near-repeat victimisation by using epidemiological methods for the study of infectious diseases (Reingle Gonzalez, 2015). 6 In an influential study on ‘infectious burglaries’ (Townsley, 2003), the authors apply a statistical model for spatial-temporal clustering which is widely used by epidemiologists. The authors justify this methodological approach by stating that ‘victimisation can be “passed” from victim to victim in a similar way to that which occurs in diseases’ (Townsley, 2003: 618). Contagion models have also informed the study of violent crime. Papachristos (2009) argues that murders spread through social networks following an epidemic-like process of social contagion. Green et al. (2017) analyse gunshot violence according to an epidemiological contagion model in order to predict individuals at high risk for involvement in future gun violence. This work inspired the controversially discussed strategic subject list (SSL) (often referred to as ‘heat list’), used by the Chicago Police Department to predict likely perpetrators or victims of homicide. 7 Accordingly, the SSL is based on the probability of a person becoming a homicide victim or perpetrator. This risk is calculated based on the number of co-arrest with previous homicide victims, both directly (co-arrested with someone who later became a homicide victim) and indirectly (co-arrested with s. o. who, in turn, was co-arrested with s. o. who later became a homicide victim) (Saunders et al., 2016: 354). Following the same line of argument, the developers of PredPol speak of criminal events as contagious: ‘As events occur, the rate of crime increases locally in space, leading to a contagious sequence of “aftershock” crimes’ (Mohler et al., 2016: 1402).
Mathematically, this contagious process of near-repeat victimisation can be modelled as a self-exciting point process (SEPP). Point process is a classical approach that is used in statistics and probability theory to model the distribution of a set of events (points) in a mathematical space. In a point process, the distribution of these events in time and space is the result of a stochastic process. A point process is self-exciting if the occurrence of an event in the past makes the occurrence of future events more likely. Therefore, a SEPP can be modelled without drawing on external explanatory variables. This approach has been used to analyse spatiotemporal data in various disciplines. In seismology, ETAS models have been developed and expanded to capture spatiotemporal aftershock triggering as a SEPP (Reinhart, 2018: 311). Starting with the assumption that the contagious nature of criminal events is a sufficient condition to model them without drawing on external variables, the PredPol developers introduced the mathematical concept of a SEPP and, more specifically, ETAS models to crime forecasting (Benbouzid, 2019a: 122). Thus, they were able to model the mechanisms that drive the emergence and diffusion of various crime types, both property and violent crime, without any further assumptions about the decision-making of criminals and information on given environmental conditions. From this methodological standpoint, an ETAS model reflects the dynamics of criminal activities as much as does reflect the dynamics of seismic activities or epidemic outbreaks.
Machine learning
The third epistemology of predictive policing uses ML, a sub-branch of artificial intelligence (AI). ML (Jordan and Mitchell, 2015; Mitchell, 2010) amounts to algorithms that learn from data and improve their performance with experience. This process is inspired by human cognition and can also be referred to as inductive learning in order to solve problems. To say that a ML algorithm learns by experience means that it is drawing probabilistic inferences from the training data it is fed with. This training data simply is the input–output examples from which the algorithm constructs a model. Since the most common application of ML is to make predictions, a predictive model is created that generates these predictions (Moses and Chan, 2014: 648). 8 By comparing the predicted output with a set of test data, the algorithm updates its model and eventually identifies the ‘best’ solution to a given problem or tasks. According to a specific purpose, different ML-based modelling techniques are available for predictive policing.
ML-based regression analysis is used when the task is to predict the value of a response based on the value of a known predictor. Chen et al. (2008) used an autoregressive integrated moving average (ARIMA) model to forecast property crimes in a Chinese city one week ahead based on a data sample of 50 weeks. The ARIMA model is not based on a subject-matter theory but simply on the assumption that the variable to forecast regresses on the historical data of it and the prediction error depends linearly on past and current data. This type of regression is known as parametric regression since it requires that the function that describes the relationship between the dependent and independent variables is known and can hence be parameterised in terms of basic functions. An ML algorithm automatically performs this parametrisation process based on the training data. Non-parametric regression analysis might be an option, if it is not predetermined which independent variable(s) are good predictor(s) for a dependent variable and the ML-task is to determine the best predictor based on a given data sample or to adjust the form of a function in order to capture unusual or unexpected features of the data.
Modelling based on Bayesian learning is useful if the probability of a certain outcome depends on the probability of another outcome, for instance: predicting crime is related to predicting weather conditions. In this case, the parameters of the predictive model for crime are automatically updated with more information on the weather. Based on geographical information on crime sites and victim characteristics, Liao et al. (2010) created a geographic profile which is the probability distribution of crime occurrences. The prediction of future locations of crime was updated with more information on the geographic profile.
Another ML-based modelling technique for automatic classification is using decision trees. A decision tree is a representation of decisive rules that takes the visual form of a tree structure. Decision trees can either be manually deducted from expert knowledge or automatically induced from data samples using ML algorithms. In the latter case, the predictive model that is learned from the training data takes the form of a decision tree. It represents the formal rules that lead to an outcome (the classification of an item as annotated in the training data) to be predicted. The software HunchLab, for instance, combines RTM with the decision tree technique to create a predictive model that estimates the likelihood of a particular crime type to occur at a location across a certain period. 9 Moreover, the Memphis Police Department claims to have used a decision tree algorithm as part of its Blue Criminal Reduction Utilizing Statistical History crime-prevention strategy (Utsler, 2011). A so-called random forest is a large ensemble of randomised decision trees (usually around 500) that is used to create a predictive model based on the mean of the individual decision rules/predictions. Richard Berk, professor of statistics and criminology at the University of Pennsylvania, used this technique for the analysis of parole decisions and outcomes undertaken for the Philadelphia Department of Adult Probation and Parole (Berk, 2008: 232). The goal of the study was to forecast which individuals under supervision would commit a homicide or attempted homicide within two years after intake and to assign probationer to a high-, moderate- or low-risk category.
The currently most prominent – and most controversial – method of ML, however, is deep learning based on artificial neural networks (LeCun et al., 2015). An artificial neural network (ANN) consists of layers of neurons. Since the number of layers can be very high, those networks are also called deep neural networks. During the training of the network called deep learning, these neurons are connected, so that each neuron of a layer can receive inputs from neurons of the previous layer. Moreover, these connections are weighted, and neurons are programmed to only ‘fire’ once a certain threshold is reached. The training starts with a random distribution of weights generating a certain output from an input fed forward into the system. The difference between the predicted output and the target or true output is used to raise the predictive quality of the ANN. That is, the ANN adjusts the weights of its connections in response to error signals transmitted back through the network. Consequently, the output changes which, in turn, initiates another feedback loop. Step-by-step, layer-by-layer, the ANN ‘learns’ to determine which part of the input is the best predictor for a given output. This recursive process called backpropagation continues until a desired predictive quality is reached. Since the weighted connections between neurons are representations of the data sample, a trained ANN is nothing but a model of this data. The current hype around ML and AI is not least a result of successful applications of ANNs. However, ANNs are one of the oldest AI techniques and exists – as a concept – since the 1960s. Already in 1997, computer scientist Andreas Olligschlaeger (1997) employed an ANN to predict areas where future drug markets would emerge. The ANN was capable of representing complex space–time patterns, but results have never been tested regarding their predictive performance (Groff and La Vigne, 2002: 46). More recently, Yu et al. (2011) used an ANN to predict burglary type crime hot spots at the monthly level. The data sample consisted of police records and data on crime-related events. Bogomolov et al. (2014) used an ANN to predict crime surges in London areas using mobile phone, crime and census data. Wang et al. (2017) adapted a state of the art spatiotemporal deep learning predictive model to collectively predict crime distribution over the Los Angeles area using data on all types of crime in LA over six months. Stec and Klabjan (2018) used different types of ANNs to predict next-day crime counts. The data sample included Chicago and Portland crime data and additional datasets covering weather, census data and public transportation.
Implications for the constitution of predictive knowledge
The different epistemologies of predictive policing constitute distinct ways to render future crimes knowledgeable and, thus, resources for the endeavour to prevent these crimes from happening. They are performative in the sense that they enable the transformation of data into actionable intelligence. Beyond that, as it is shown in the following, the epistemologies have implications for the constitution of predictive knowledge in terms of its genesis, scope, intelligibility and accessibility.
Genesis
The first implication is the relevance that is ascribed to subject-matter theories in generating predictive models. On the procedural level, the first two epistemologies do not deviate substantially from the standard approach to statistical modelling. This process starts with a hypothesis about the relationship between variables. Second, a formalized model of this relationship is constructed which is then either verified or refuted by testing and validating it on a data sample. In the case of a predictive model, the testing is done by comparing the predicted values of a dependent variable with the actual values of this variable. The model is then applied to unknown data in order to create predictions accordingly. This approach can be referred to as theory-driven because the hypothesis and hence the model is informed by subject-matter theories (which may, in turn, be built on empirical observations or data analysis). In the case of the first epistemology, the model is informed by criminological theories such as rational choice, routine activities and environmental criminology. In the case of the second epistemology, the model is informed by existing empirical knowledge and theories about the spread of infectious diseases and seismic activities – just to name the examples described in more detail above.
The third epistemology, however, challenges the significance of subject-matter theory as such. By using ML algorithms to forecast crime, predictive models are not constructed according to a specific theory but directly as a result of analysing data with regard to patterns and correlations. Based on an iterative learning, testing and feedback process, the algorithm adjusts the predictive model until a desired predictive quality is reached (Zweig et al., 2018). Ideally, there is no need for any kind of human-made predictive modelling informed by subject-matter theories (Amoore and Raley, 2017: 4). 10 This affects the meaning of crime patterns as well. While any approach to predict future events is based on exploring patterns and regularities in the data, predictions based on ML are the product of a reasoning that is independent of theories about the origin or cause of the patterns and correlations identified in the data. Hence, for a ML approach, crime patterns are not something that has to be explained with reference to existing theoretical knowledge but simply an enabler of practical knowledge acquisition.
This difference between the two approaches (theoretical explanations of otherwise meaningless data vs. atheoretical meaningfulness of data) may also be categorized as top-down vs bottom-up: top-down refers to a theory-driven approach where an expert in criminology uses his knowledge to create a predictive model which is then translated into code and applied to a given a data set; bottom-up refers to the data-driven approach (Kitchin, 2014) where a data scientist provides an algorithm with the training data and guides the algorithmic process of learning from the data and creating the predictive model accordingly (McCue and Parker, 2003: 116).
Scope
The second implication is the limits of prediction that are set by the three epistemologies and thus the scope of predictive knowledge. When it comes to model specification, a theory-driven approach can draw from a certain repertoire of theoretical explanations but is at the same time bound to the limits of these theories. In the case of the first epistemology, only criminal behaviour that matches the concepts of rational choice and routine activities can be predicted (Kaufmann et al., 2018). These ‘criminologies of everyday life’ (Garland, 1999) may offer a fairly plausible, albeit simplified explanation of planned or opportunistic crimes. However, they not only fail to adequately account for offenders acting in the heat of the moment and under the influence of alcohol or drugs (and hence the types of crimes typically associated with these circumstances). Besides, these theories are an abstraction from structural conditions (social disorganization, anomie, social strain, etc.) that may lead to criminal activities. For them, it is the criminogenic routines, situations and urban landscapes that breed criminality, not the social structure of a society. This may lead to a stigmatisation of certain neighbourhoods and populations and a self-fulfilling prophecy when predictive policing leads to arrests which are fed into a database which – in turn – is used to generate new crime forecasts (Harcourt, 2005).
Neither SP- nor ML-based approaches to crime forecasting are known for being part of the endeavour to account for the larger societal conditions of crime or the self-reinforcing effects of situational crime prevention. However, this is not the case because of their assumptions about criminal behaviour and its roots but because of the fact that they, too, form part of those strategies that only consider the immediate environment in its simplest form as a crime generator. With regard to the variety of crimes that are predictable (in principle), however, the three epistemologies vary significantly.
Some ML approaches to crime forecasting are seen as being capable to transgress the boundaries of existing criminological knowledge, enabling crime analysts to identify seemingly random patterns, for which no explanatory theory and, thus, no model/algorithm exists that could be applied top-down. Based on so-called unsupervised learning (where no known outcomes are provided during the learning process), an algorithm can assign data to clusters that are not predefined, detect anomalies and discover patterns. The ML algorithm creates specific outputs from the unstructured inputs by looking for similarities and differences in the data set and discovers relationships that are not necessarily known and might not be obvious. Besides, learning systems can better adapt to a rapidly changing environment – for instance, an unexpected change in crime rate or spatiotemporal crime patterns. They keep learning from new data they are provided with and improve their performance instead of having to cut out the unknown as an exception to the rule (that is, treat it as a statistical outlier).
Another crucial factor for the limits of prediction is the adaption of models from heterogeneous contexts. For SP, models are boundary objects (Star and Griesemer, 1989) 11 that enable a translation process between different worlds – the natural and the social world. They are used by actors from various groups but for different ends by each. As described above, ETAS models are used by seismologists to analyse seismic activities and by mathematicians and anthropologists to predict criminal behaviour. They are adapted to the different needs (by model specification/parametrisation 12 ) but remain a common identity across the disciplines and fields of application. This mobility of the models of SP opens up new possibilities for crime forecasting. For example: Despite the statistical observation that a large amount of crime is committed in or by groups, the criminological theories relevant for the first epistemology focus primarily on individuals. In the field of SP, by contrast, scientific efforts have been made to predict street gang behaviour by adapting a predictive model that has been developed for predicting the behaviour of coyotes (Smith et al., 2012). Gang-related graffiti were used as a proxy for scent marking which is used by some species to claim their territory (Smith et al., 2012: 3241).
Intelligibility
The third implication is the explanation of crime that is given by the epistemologies. SP-based and ML-based approaches to crime forecasting are not only questioning the need
This epistemological divide is reflected in the difference between giving intelligible explanations and generating accurate predictions. From an MSS perspective, predictive modelling always has to account for the human agency behind the causal mechanisms by which the values of a dependent variable are generated as a particular function of independent variables (Berk, 2013: 1). For SP, in contrast, a predictive model is an abstraction from crime as an intelligible act. It requires no theory about the criminal subject, its reasoning and the (subjective) meaning of its actions. If there is a subject at all in SP, it is the average man (Quetelet’s
Accessibility
The fourth implication is the accessibility of the algorithmic system and its accountability. If decisions of the criminal justice system are taken in response to predictions generated by an algorithmic system, police officers, judges, probation officers and parole boards can only hold accountable if they understand the accomplishment of these predictions (at least to a certain degree). They have to be able to interpret and evaluate outputs according to their own professional standards or general criteria of fairness and non-discrimination. A system that allows for these forms of human reasoning is usually called a white box (or sometimes glass box). Conversely, if a system is inaccessible, input–output relations may be observable, but the inner workings of the system are obscure. In these cases, the system is called a black-box. Predictive models that are informed by criminological theories are rather easy to grasp for professionals from the criminal justice systems and would hence meet the requirement of model interpretability (Kaufmann et al., 2018). However, the system can still become a black box if information on the data and/or the model is scarce and the source code of the software is a trade secret. With regard to crime forecasting based on ML or SP, on the contrary, even providing open-source code and training data would not be sufficient to make algorithmic predictions fully understandable, especially not for end-users from the criminal justice system who are often laypersons in computational science (Ananny and Crawford, 2018). ANNs, in particular, are capable of representing complex spatiotemporal relationships across data features by increasing the number of parameters and variable interactions included in their models, making it practically impossible for a human operator to retrace its internal operations and assess its outcomes. Hence, many systems that are built on ML algorithms would remain a black box even when transparency is given. With regard to the epistemology of SP, it is less the complexity of the predictive models that is an obstacle to accountability but the mobility of these models. The process of translation from one context to another goes hand in hand with a loss of intelligibility. Thus, SP may explore causal mechanisms that connect variables and, once translated into a predictive model, transform input into output, but it does not provide understandings of these causal mechanisms. What does aftershock triggering mean in the context of criminal events? How does one criminal ‘infect’ the other? Without any human agency, these concepts remain vague and at best descriptive by means of analogies.
Conclusion
Driven by the expectation that algorithmic forecasts will enable both more cost-effective and more objective forms of crime prevention, numerous law enforcement agencies around the globe already integrated predictive policing into their daily practices. Moreover, the ongoing digitalisation of infrastructures, the availability of large, heterogeneous data sources (either as open-source intelligence or data warehousing) and the attractiveness of new (seemingly) less expensive and less demanding options for end-users (software as a service) suggest that predictive policing is going to play an even bigger role in the future. While this trend towards algorithmic crime forecasting has raised considerable interest among critical scholars, journalists and NGOs alike, knowledge about the computational methods that generate predictions remains vague or concealed behind the claims of software companies and other actors directly involved with the development and implementation of the tools that are supposed to identify crime before it happens.
This paper has addressed this research gap by distinguishing three epistemologies of predictive policing and analysing the ways they make sense of data, model criminal activity and create crime forecasts. Moreover, it has been shown that these epistemologies have significant implications for the constitution of predictive knowledge in terms of its genesis, scope, intelligibility and accessibility (see also Table 1): ML-based approaches are questioning the role of subject-matter theories in predictive modelling and generate predictions bottom-up as a result of a data-driven approach; SP-based and ML-based approaches are pushing the boundaries of crime forecasting in terms of discovering unknown patterns in the data and predicting all types of crimes; SP-based and ML-based approaches favour accurate predictions over coherent causal explanations and therefore disregard the value of explainability of criminal behaviour; SP-based and ML-based approaches lead to a lack of algorithmic accountability through a lack of model interpretability.
Implications of epistemologies.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The article processing charge was funded by the Baden-Württemberg Ministry of Science, Research and Art and the University of Freiburg in the funding programme Open Access Publishing.
