Abstract
Asylum-related migration is highly complex, uncertain, and volatile, which precludes using standard model-based predictions to inform policy and operational decisions. At the same time, asylum's potentially high societal impacts on receiving countries and the resource implications of asylum processes call for more proactive approaches for assessing current and future migration flows. In this article, we propose an alternative approach to asylum modeling, based on the detection of early warning signals by using models originating from statistical control theory. Our empirical analysis of several asylum flows into Europe in 2010–2016 demonstrates the approach's utility and potential in aiding the management of mixed migration flows, while also shedding more light on the work needed to make better use of the “big data” and scenario-based methods for comprehensive and systematic examination of risk, uncertainty, and emerging trends.
Keywords
Introduction
It is a truism to state that contemporary migration processes are very complex and driven by a vast array of interrelated factors and drivers. One implication of this complexity is that the predictability of migration flows is very limited, much more so than is the case for other demographic processes (Bongaarts and Bulatao 2000). The limits of migration's predictability are strongly pronounced even in the case of relatively regular flows, such as labor or student migration, and much more so for various forms of forced migration, including asylum flows, which can have large impacts on both sending and receiving countries (Bijak et al. 2019).
Asylum migration is so unpredictable because it is highly dependent on specific underlying geopolitical events and processes, which themselves are difficult to foresee, and because these political events and policy developments interact with migration flows in a feedback loop (Castles 2004; Pijpers 2008; Bijak 2010; Bijak and Czaika 2020). The inherent uncertainty associated with asylum migration is further compounded by the agency of the various actors involved, from migrants to policy-makers (Castles 2004). Still, from a policy and planning viewpoint, there is a need for at least some form of quantitative foresight into the future, or even current, magnitude of migration flows. The armed conflicts that took place in recent decades, such as those in the former Yugoslavia, Iraq, Afghanistan, or Syria, the political crisis in Venezuela, and the associated flows of asylum-seekers migrating toward Europe, have generated an increased desire among recipient countries’ governments to have access to predictions of the nature, magnitude, and composition of future flows (Sohst et al. 2020).
Migration's high volatility means that traditional forecasting attempts, based on “orderly” assumptions implying stable trends, are problematic (Pijpers 2008). Simple time-series-based forecasts can yield reasonable predictions in the perspective of a few years ahead, but with uncertainty rapidly increasing with the forecast horizon (Bijak and Wiśniowski 2010), even more so for asylum migration (Bijak et al. 2019). Even though some migration indicators may exhibit long-term regularities, detectable over a span of a few decades (Azose and Raftery 2015), this time perspective is too coarse to enable responding to the immediate policy needs in such rapidly changing areas as an asylum.
As many of the recent asylum flows were sizeable and long-lasting, there was an increasing need for national and European Union (EU) policy-makers to facilitate planning and to ensure that asylum claims are processed quickly and efficiently and that relief is provided promptly to those in need (Kegels 2016). The utility of asylum forecasts, even with short horizons, is that they can aid planning to ensure that the necessary support, as well as administrative and welfare capacity, is in place ahead of time, enabling more proactive responses and planning. For asylum-related migration, preparedness is often related to securing basic operational needs: host countries and relief organizations need to provide food and shelter to people seeking asylum to the right extent and in the right places; administrations need adequate resources for registering asylum-seekers, verifying their identity, issuing documentation, processing applications, and resettling refugees into their receiving societies. As became apparent during the 2015–2016 peak of asylum migration into Europe, without an early warning system in place, the building of additional capacity often must be done on a short notice, especially as by law, asylum claims must be processed within six months (ECRE 2016).
This article aims to respond to the policy and operational needs for forward-looking information on asylum-related migration by offering a tool for early detection of changes in migration flows based on higher-frequency data (e.g., monthly or weekly). To that end, we advance existing quantitative approaches for migration modeling by introducing an early warning model based on change-point detection, inspired by statistical control theory, and then assessing the feasibility of its use. We demonstrate this approach's utility on weekly and monthly data of migrant arrivals and asylum applications in Europe and on detections of irregular border crossings, and assess its practical feasibility for monitoring and managing asylum-related migration into and within the EU. We focus on Europe because of the presence of unique regulated regional cooperation in establishing responsibility for examining asylum applications and the exchange of information and creation of a joint system for early warnings, preparedness, and management of asylum crises. Our proposed methodology, however, has broader applicability to the global problem of forecasting and responding to forced displacements, subject to the availability of appropriate data.
From a theoretical standpoint, the tools we propose aim to help nation-states and supranational bodies, such as the EU, better discharge their legal and moral obligations toward refugees and asylum-seekers by providing them with a more efficient system of humanitarian protection. In particular, the approach we propose focuses on the supply of immediate humanitarian needs—offering refuge to those in need (Owen 2020)—and, thus, takes the perspective of receiving states or entities, while recognizing these states’ role in constructing the legal notion of refugeehood and the practice of granting asylum.
Our article’s structure is as follows. After this introduction, the section “Modeling Asylum-Related Migration: From Concepts to Operational Needs” provides a brief discussion of the conceptual difficulties with defining the analytical categories and variables of interest and offers pragmatic solutions related to their operationalization for modeling and early warnings. One such model, based on statistical control theory, is introduced in the section “Early Warning Models of Asylum Migration: State of the art and Application”, and the results of its application to weekly and monthly data on asylum applications and detections of irregular border crossings in Europe are summarized and discussed in the section “Detecting Changes in Asylum Flows Into the EU”. Section “Sensitivity Checks and Perspectives for Early Warnings for Asylum Migration” presents the results, as well as a range of sensitivity checks and perspectives for using early warnings for asylum migration. The article finishes with a discussion of practical recommendations and possible extensions of the approach presented in the section “Conclusions and Recommendations”. In addition, an Online Appendix provides a review of formal, quantitative methods and models applied to asylum migration, as well as data availability for the purpose of building models providing early warnings related to asylum migration and the computer code for simulations presented in the section “Detecting Changes in Asylum Flows Into the EU”.
Modeling Asylum-Related Migration: From Concepts to Operational Needs
Formal quantitative work on asylum-related migration faces four key challenges, related to the ways in which the asylum processes are conceptualized, defined, explained, and measured (e.g., Singleton 2016; Bijak, Forster and Hilton 2017; Erdal and Oeppen 2018). In this section, these four challenges are discussed in turn. First, at the conceptual level, contemporary migration scholarship has moved away from simple dichotomies, such as “forced” versus “voluntary” migration, and toward a multidimensional spectrum in which individual migration motives, including different levels of human agency, interact, overlap, and are difficult to isolate (King 2002; Foresight 2011; Erdal and Oeppen 2018; Hatton 2020).
Second, the very concept of “asylum migration” similarly evades precise analytical definition and must remain relatively vague to accommodate different forms of international mobility, the common element of which is related to the need for international protection (Böcker and Havinga 1997). As recognized in the theoretical literature in the field of refugee studies, there is an important conceptual gap between the legal (political, constructivist) status of refugees and asylum-seekers and the practical (humanitarian, realist) situation of people in need of protection (FitzGerald and Arar 2018; Owen 2020). Still, the state’s role in constructing the legal and practical aspects of the status of refugees and asylum-seekers, who find themselves as outsiders “between sovereigns” (Haddad 2008), cannot be overlooked. Equally important are state responses to the challenges of asylum, or migration more generally, through changes in the spatial and organizational features of the different types of borders, be they intra-European or external (e.g., Geddes 2005; Geddes and Scholten 2016).
Third, the known theoretical gaps and limitations of migration studies (Arango 2000) are exacerbated in the case of asylum. In particular, the caveats regarding weak and fragmented theories of migration are further amplified by the theoretical disconnection between refugee scholarship and mainstream migration studies (FitzGerald 2015). As a result, theoretical work on refugee or asylum processes is still relatively rare, with the seminal overview of migration theories (Massey et al. 1993) mentioning refugees only in the context of the world systems theory and global impacts of military interventions.
Notable examples of existing theoretical studies on refugee and asylum processes date back at least to the comprehensive framework offered by Zolberg (1989). Building on the world systems theory, this framework added important elements from state theory and highlighted nation-states’ pivotal role in shaping asylum policies and, hence, flows. This approach has recently been revisited, reviewed, and revised by FitzGerald and Arar (2018), who advocate for a multiperspective view on refugee and asylum flows and draw attention to distinct developments in the Global South. Other theoretical perspectives of refugee and asylum migration highlight the role of international coordination (or competition) between the countries hosting refugees (for a review, see Suriyakumaran and Tamura 2016), as well as the interplay between asylum migration, policy, and public attitudes toward migrants (e.g., Hatton 2020), both studied typically from an economic viewpoint.
For the purpose of empirical modeling, a range of political, policy, and socioeconomic proxy variables can be used as predictors (see the Online Appendix for details), an approach dating back to the push-and-pull factors framework proposed in Lee’s (1966) classic paper. The push-and-pull approach, which has inspired several updates and refinements in more contemporary literature (e.g., Arango 2000; Carling and Collins 2018; Van Hear, Bakewell and Long 2018), can be used as a practical guide for discussing migration factors and drivers. One of its important extensions, crucial from the viewpoint of asylum flows’ inherent uncertainty, is chance (Böcker and Havinga 1997), which of itself reduces the importance of other factors and drivers of asylum. In this context, Öberg’s (1996) distinction between “soft” and “hard” factors is relevant, the latter including drivers more associated with the involuntary end of the migration spectrum and implying less choice and self-selection among prospective migrants. A recent survey of asylum migration's factors and drivers is available in the EASO (2016a) report.
At the same time, migration processes are known to self-perpetuate through a variety of mechanisms (for a review, see Massey et al. 1993). Asylum flows are no exception—in their case, the “cumulative causation” results from the existence of migrant networks and the presence of formal family reunion routes. Such self-perpetuation mechanisms result in high inertia of migration processes. This inertia can be formally described and modelled, for example, by using time series approaches, such as autoregressive models (e.g., Bijak 2010). Many existing models of asylum migration include autoregressive features, aimed to capture the self-perpetuating nature of migration (see the review in the Online Appendix for details on selected examples of how various migration theories could be operationalized in models).
Fourth, in addition to the theoretical considerations presented above, the measurement of asylum-related migration is problematic. Not only is population-level data collection on asylum migration heavily politicized and subject to external pressures (Bakewell 1999; Crisp 1999), but it also often focuses on operational humanitarian data or administrative procedures, such as registrations (NASEM 2019). Administrative records, especially those of different countries, are likely to contain duplications for those who have applied for international protection more than once in different countries, either after initial rejections or due to different lengths of the asylum processes (Singleton 2016). Even within the EU, the development of a harmonized system of asylum statistics remains an ongoing process involving Eurostat, the European Asylum Support Office (EASO), and other agencies, with much current work centered on the use of biometric data, such as fingerprints, held at the EURODAC database.
A pragmatic solution to the dilemmas posed by the difficulties associated with conceptualizing, defining, and measuring asylum-related migration is to adopt the perspective of potential users of the early warning models and to focus on the indicators that are of relevance for them. The main indicator used to analyze and manage asylum processes is the number of applications lodged in a given country (or reception center) in a unit of time: day, week, month, or year (e.g., EC 2016). Given this indicator's availability for most European countries, from national statistical and migration authorities, Eurostat, or EASO Early warning and Preparedness System data, 1 and its importance for operational reasons, the empirical analysis presented in this article is based on the numbers of applications.
In this way, at the strictly operational level, the conceptual and theoretical challenges related to the quantification of asylum migration become less relevant, as do other definitional or measurement criteria, such as the duration of stay. What matters, instead, is the purely administrative fact of lodging an asylum application by a migrant at a specific time and place. This approach, relying on the analysis of crude numbers of asylum applications is, of course, a reductionist view of the asylum process but, nonetheless, enables a forward-looking analysis that can have a direct policy use through helping with preparedness planning.
Early Warning Models of Asylum Migration: State-of-the-art and Application
At a general level, the main purpose of an early warning model is to detect changes in the trend of a variable or variables early enough to allow for adaptation or undertaking remedial action at least sometime before the consequences of the changes become too overbearing. Early warning modeling is frequently used in the financial context, chiefly in central banking and macroprudential regulation (Lang, Peltonen and Sarlin 2018). Such models gained popularity after the 2008–2009 economic downturn, as they are designed to help detect the first signs of an upcoming economic slowdown or recession (see Lang, Peltonen and Sarlin 2018 for a recent overview).
To that end, the purpose of early warning modeling—defined as “identifying vulnerable states prior to… crises, which can also be viewed as a standard two-class classification task, where the key objective is to separate the vulnerable from non-vulnerable states” (Lang, Peltonen and Sarlin 2018: 5)—can be easily generalized beyond finance. Depending on whether an indicator being monitored crosses a certain threshold, the trend is classified as either exhibiting regular behavior (“in control”) or generating a warning signal indicating vulnerability of the process (“out of control”). In addition, in this article, we understand an early warning system as a collection of interlinked early warning models and the underpinning sources of information and data, serving the same purpose, only for a broader range of indicators and sources.
Statistical models for monitoring data to identify change points in series have a long history in the area of industrial process control, where measures of process quality are monitored to minimize losses on production lines (Page 1954, 1957). The literature on early warnings and statistical control theory offers many choices of modeling approaches, levels of the warning thresholds, and model calibration and validation methods (e.g., Zeileis 2004; Li et al. 2013). Of particular importance are the tradeoffs between the frequency of “false alarms” (false positives), unwarranted complacency (false negatives), and the correct classification of changes in trends. The presence of these tradeoffs necessitates the direct involvement of migration policy-makers in the design of early warning models so that they can describe the possible losses which can occur under different circumstances, for a range of possible policy decisions (Bijak, Forster and Hilton 2017; Lang, Peltonen and Sarlin 2018). An all-encompassing methodological framework combining all desirable elements is offered by statistical control theory and change-point models.
In terms of existing applications, early warning macroeconomic models are usually based on large general equilibrium approaches, the equivalents of which do not yet exist in the context of migration (Barker and Bijak 2020). Still, in the migration literature, the use of early warnings has been proposed for the most volatile and weakly predictable flows, such as asylum migration (see Shellman and Stewart 2007; Regehr 2014), ideally in conjunction with a coherent description of uncertainty and impact of individual decisions for the purpose of managing the associated risk (Bijak et al. 2019).
For a single early warning model, as mentioned above, using exogenous variables as predictors does not seem useful, as such predictors’ values would need to be modelled and forecast separately, increasing the overall prediction uncertainty (Bijak 2010). On the other hand, if models are to be deployed as a part of a wider system for several countries or flows, some exogenous variables—most notably, migration or asylum policies for destinations (see Beine et al. 2016) or conflict intensity measures for origin countries or regions—can help model the interactions between the different elements of such a system.
In the context of asylum migration, in common with the general settings of statistical control theory, the aim of an early warning analysis is to identify as early as possible when the process governing a series has changed, while minimizing the number of false alarms (Page 1954). To achieve this goal, a set of rules must be devised governing when action is to be taken. In the case of asylum, such actions might involve planning and preparation for the arrival of more asylum-seekers. These decision rules should account for the variability of the process under “normal” conditions and should be sensitive enough to detect subtle shifts at an early stage.
The cumulative sum (or “Cusum”) model set out by E.S. Page (Page 1954) and developed by a range of subsequent authors (Barnard 1959; Harrison and Davies 1964) provides a setting for such rules. As the name suggests, this approach monitors the cumulative sum of some function of the observed data. In the original formulation of this method by Page (1954), which is summarized below, this function is chosen to produce a score that is negative when the data are behaving as expected and positive when unexpectedly large values are observed.
Through monitoring the cumulative sum of this score, an action can be recommended when the difference between the cumulative sum's current value and minimum value passes a certain limit (threshold), which must be set in accordance with knowledge about what change in the data is deemed important. This rule has the effect of detecting change both in the face of a very large single observation and in the case of many moderately large observations in a relatively short space of time, both of which would indicate an increase in the mean of the process.
More formally, assuming some series of data observed across time t,
A simple example is given in Figure 1 to aid understanding. Here, a fictitious series of simulated observation points

Cusum Control Chart Applied to Simulated Data with a Change Point at Time 30. (A) Simulated Observations x; (B) Score Function y; (C) Cumulative Sum S; (D) Cumulative Sum V, Restarted at Minimum Values.
We can observe in Figure 1C that the individual values of the vector of cumulative sums,
Figure 1D is an example of a control chart, used to provide a simple graphic illustration of when a series exceeds expected limits. We might also be interested in cases when a series shows a substantial decline. In this case, a two-sided control chart can be used (Page 1957; Barnard 1959). Two-sided charts would involve constructing an equivalent scoring function for declines in the series with an opposite sign and monitoring changes in the cumulative sum of this sequence relative to its maximum, as an equivalent to the series
Furthermore, if the series to be monitored is assumed to follow a particular distribution or model, then both the scoring function and the control limits can be chosen to reflect the probability of observing particular sequences of values of the series, conditional on some model parameters (Page 1957). Alternatively, the likelihood of a model describing the system's normal behavior can be compared with an alternative model with, for example, a shifted mean (Page 1957).
In the example above, by using the score
Variables and Parameters of the Cusum Model
The sequential updating of parameters has a natural Bayesian statistical interpretation. West and Harrison (1986) introduced a Bayesian approach to the problem of change-point detection by comparing the relative evidence for a “standard” model against one which assigns a greater probability to extreme observations, using cumulative Bayes factors. If the evidence strongly indicates the latter model, a change-point can be considered to have occurred (see also Tartakovsky and Moustakides 2010 for a recent review of Bayesian approaches to change-point detection). The identification of change points can also be posed as a decision theory problem, providing a theoretical framework within which to set control limits h based on the losses and gains associated with particular outcomes (Harrison and Veerapen 1994).
A separate question concerns the possible methods of evaluating the performance of such early warning models. Two main perspectives for assessing model performance are (1) internal, looking at how well the model performs by using either analytical techniques or real or simulated datasets, and (2) external, comparing with other benchmark approaches. The internal approach involves measuring the frequency of false positives and false negatives and combining them in a quality indicator, possibly allowing for asymmetry in impact, cost, or other consequences between false alarms and false complacency. A special case, focusing just on false positives, as measured by using p-values, is well documented in the literature on statistical process control and Cusum models (e.g., Zeileis 2004; Li et al. 2013). Judgment on whether the alarm was true or false, to a large extent, rests with the user, which is a subjective element of the internal assessment.
A mathematical analysis of alarm schemes also helps assess model performance from an internal perspective. Work by Lorden (1971) and Moustakides (1986) show that the Cusum procedure described by Page (1954) for choosing a stopping time has some desirable qualities. More specifically, for any chosen probability of false alarms, it is a so-called minimax decision rule, as it minimizes the expected delay in detecting a change in the level under the worst possible conditions, in terms of the run of observations preceding the change in level (Moustakides 1986). As Yashchin (1993) notes, minimax rules have desirable properties in monitoring schemes, as in general, an observer is not able to make assumptions about the typicality of observations preceding a change in level. The precautionary nature of minimax approaches is certainly helpful in the case of monitoring asylum applications, where worst-case scenarios can form a basis of planning.
In the external approach to evaluation, a selected quality indicator can be compared with the corresponding measure obtained for the benchmark model. At the same time, there is no “gold standard” for selecting the right benchmark for such models. In the next two sections, we expand this discussion by illustrating the framework with a few examples related to recent asylum migration flows into Europe.
Detecting Changes in Asylum Flows Into the EU
To illustrate the Cusum model's potential application for detecting changes in asylum trends, this section presents a selection of examples based on data about the number of asylum applications lodged in the EU. In these examples, Cusum is applied to raw application data, without any additional modeling or smoothing. 2 As for the data used, although the administrative practice of evaluating an asylum application may vary between EU member-states, to access the procedure, the applicant usually must file their asylum application. This procedure is based on three steps, which consists of making, registering, and lodging the application. The process of making an application starts as soon as the person expresses a wish to receive international protection from authorities, which in theory usually happens after crossing the border of the first safe country and lasts until all necessary documents are completed. By the EU law, the application must be registered by representatives of the national authority responsible for refugees within a certain time limit, which depends on the host country (EC 2016). This process of registering a claim should normally take place within three working days from the time the migrant crossed the border and informed the authorities about their wish to apply for international protection (EC 2016). Lodging the application is the final step required to access the procedure to obtain international protection and is an official acknowledgment of the start of the determination process when the asylum-seeker receives a certification of acceptance of their application (EC 2016).
Ideally, a Cusum model for asylum migration would be based on the monitoring of the number of individuals who are in the process of making an application, since this quantity is most relevant to decision-makers for planning and operational purposes. However, in most EU countries, the national asylum authorities only register the last step of the procedure—the lodging of an application formally accepted by the relevant authorities, while the start of the process of making an application is not recorded. The main disadvantage of setting up a Cusum model for data on the number of lodged asylum applications is that as asylum flow increases, these numbers at some point may start to reflect, not the size of the flow, but the relevant authorities’ maximum capacity to process applications. Reflecting the processing capacity, rather than the real size of asylum-seeker flows, is an acute problem when the number of applicants is higher than the number of available staff to process applications. As a result, in such periods, the number of asylum applications observed each week or month will approximately correspond to the number of available staff multiplied by the average number of applications they can process in a given time period, which in turn will vary (i.e., depending on the time it takes for each application to be processed).
For these reasons, any variation observed in applications between time periods could be explained by the increase or decrease in available staff's working hours, not by real changes in inflows. For instance, analysis of variation in the weekly number of lodged applications (see Figure 2) shows that during the weeks with more than one day of public holidays, the numbers of lodged applications dropped significantly, increasing by almost exactly the same quantity in the following week, as authorities tried to catch up with the backlog. Sometimes, the lodging of applications could also be postponed for other reasons such as when, for example, interpreters are not available. In such cases, a Cusum model might trigger alarms both in the week of the holidays and in subsequent weeks. These alarms could become problematic, especially when inaccurately interpreted as a change in trends of inflows, but such an issue is easy to address by monitoring the situation together with the calendar of public holidays and employees’ leave schedules and adjusting the data accordingly.

Holiday Periods (Both Religious and Public) Influence the Numbers of Applications Lodged in Italy (End of 2016 and the First Half of 2017) (Upper Panel), the Impact of the Choice of Thresholds Boundaries on the Sensitivity of the Early Warning System Based on Cusum (Middle Panel), Cusum Applied to the Numbers of Applications Lodged in Greece (End of 2015 to the Beginning of 2017) (Bottom Panel).
When setting up the early warning system based on a Cusum model, the choice of process parameters is critical to the system's level of sensitivity. For instance, establishing a Cusum model to provide warnings about abnormal values using an upper limit h set at one historical standard deviation from the observed average values will be far more sensitive than deciding on a threshold based on two standard deviations. As an example, when systems based on these specifications are applied to the same real data series from Italy, the former system triggered nine alerts while the latter only four (see Figure 2).
To choose the parameters h and k and determine how often various combinations of these parameters triggered alarms in cases where there had been no “real” change in level, we conducted simulations, using the statistical programming language R (R Core Team 2020). More specifically, we conducted 1,000,000 simulations of repeated observations drawn from a standard normal distribution for a Cusum model with

Examples of Application of Cusum to Greek Data on the Monthly Number of Applications Lodged (Eurostat Data) and the Monthly Number of People Apprehended Trying to Cross the Border (Eastern Mediterranean Route) (Frontex Data). The Upper Limit is Set Up on one Standard Deviation.

Early Warning Alerts Generated by a Cusum Model Applied to Applications Lodged by Asylum-Seekers (Eurostat Data) and Detections of Irregular Border Crossings (Frontex Data) to Flows From Ukraine to EU28 Between January 2009 and February 2019. The Color of the Bar Corresponds to the Alarm Triggered by Cusum Based on Respective Data.
Another aspect to be considered when building an early warning system based on a Cusum model is the calibration period. Ideally, the information used for calibration should come from the periods prior to the asylum crisis. Longer data series will not only allow for a better understanding of past developments but also help detect specific features, such as seasonality in data. The weekly data on asylum applications lodged in Greece provide an example of how the choice of calibration period may impact the system's functionality (see Figure 2, bottom panel). The alarm triggered by the Cusum model using these data occurred only in the third week of February 2016, when the crisis had started in 2015. For comparison, Pew Research Center analysis based on big data, which looked at the Arabic-language Google searches in Turkey for the word “Greece,” peaked in August 2015—two months before the increased inflow of asylum-seekers (Connor 2017). Yet other Cusum models, based on different data sources (such as numbers of apprehensions reported by Frontex) started triggering alerts already in 2014 (see Figure 3).
In this case, the choice of the calibration period was limited, as the collection of these data started only from the fortieth week of 2015, making the calibration period relatively short. Moreover, the information used for calibration came from the period just before the crisis, during which the number of asylum applications was already elevated. Using too short a calibration timeframe from the precrisis period resulted in the alert's delay to the moment of the next significant increase in the number of applications lodged, which did not occur before February. Besides, even this increase could to some extent reflect the higher processing capacity related to the new staff hired in response to the crisis observed already during 2015.
For these reasons, another potential limitation of using a Cusum model based on registration data on lodged applications is that the alarms triggered by observed changes may happen too late. In particular, such warnings may not be useful whenever the decisions to increase the processing capacity by national authorities must be urgently taken earlier, in response to the observed increase in inflows of migrants or existing long-lasting backlogs. While recruitment and training of new staff usually last at least a few weeks, 3 any alarm triggered by Cusum based on the monitoring of registry data of lodged applications would alert about the changes in asylum flows post factum.
The comparison of Greek data from Summer 2016 shows this problem clearly. When the asylum crisis started, many asylum-seekers were waiting in reception centers in Greece to lodge their applications. 4 In response to the crisis, the Hellenic Asylum Services progressively increased the number of staff responsible for processing asylum applications, from 218 people at the end of 2014 to 290 staff a year later and 650 workers by January 1, 2017. 5 Looking at the developments in asylum migration trends in this period, we observe that on average 4,000 applications per month were lodged in the first half of 2016, with a peak of 8,000 registered in November 2016 (see Figure 3). That notwithstanding, a large-scale exercise launched in June 2016 to preregister asylum-seekers in mainland Greece, in only the first month of activity, revealed that an additional 15,500 asylum-seekers were present but had not yet had a chance to register their applications. 6
Looking retrospectively at data on monthly applications lodged in Greece, an early warning system based on the Cusum model with a calibration period of 2008–2013 would have triggered an alarm in January 2015. In contrast, for a Cusum model based on the number of people apprehended when trying to cross the Eastern Mediterranean EU borders by land or sea, an alert would have been issued a few months earlier (see Figure 3). The latter data are collected by Frontex, the EU external border management agency, and may be better suited for reflecting the temporal pattern of real inflows, although they do not allow users to distinguish between asylum-seekers and other migrant groups (for a data discussion, see Vespe, Natale and Pappalardo 2017). Furthermore, the relationship between the number of apprehensions and the number of successful crossings is not known and may vary over time.
Sensitivity Checks and Perspectives for Early Warnings for Asylum Migration
To better prepare for future asylum inflows, many countries are in the process of creating, or have already built, monitoring systems for asylum flows (Carammia and Dumont 2018). Some countries have created predictive systems utilizing machine learning algorithms (i.e., Sweden; see Berggren and Al-Talibi 2017), while other use historic data and expert knowledge (i.e., Switzerland, see Bijak, Forster and Hilton 2017 for a review). In this section, we show results of tests of the Cusum model's utility in detecting shifts in the level of both applications for asylum and border crossings. Results of this test are shown in Figure 4 for four groups of asylum-seekers: Syrians, Afghans, Iraqis, and Nigerians, who jointly accounted for around 60 percent of the total asylum applications lodged in Europe between 2015 and 2018.

Early Warning Alerts Generated by a Cusum Model Applied to Applications Lodged by Asylum-Seekers (Eurostat Data) and Detections of Irregular Border Crossings (Frontex Data) to Flows of Afghans, Iraqis, Syrians, and Nigerians to EU28 Between January 2009 and February 2019. The Color of the Bar Corresponds to the Alarm Triggered by Cusum Based on Respective Data.
In all four cases, the alarms triggered by early warning systems based on Cusum models set up on Eurostat data happened a few months earlier than for the systems based on the number of detections of irregular border crossings (see Figure 4). In part, the early alarms could be explained by some applications not being filed by applicants for the first time. For example, people who were already present in the EU used the worsening situation in their country to reapply for asylum, even though the new waves of refugees did not arrive until a few months after the situation deteriorated. This explanation is supported by the high shares of non-first-time applicants among Iraqis or Afghans. Another explanation of the alarms triggered too early is that not all asylum-seekers enter the EU irregularly. For instance, the Cusum model based on Frontex data failed to issue any alerts for groups of migrants who may have entered the EU with a visa or under a visa-free scheme (see the example of Ukraine in Figure 5).
Venezuela is another recent example of a situation in which the presence of a visa-free travel regime may hinder the detection of changes in migration trends in some data sources. Because Venezuelan citizens do not need a visa to enter most EU countries, 7 between 2009 and 2019, only eight were detained for trying to cross the border in an irregular way. Consequently, setting up a Cusum model on the Frontex dataset to detect changes in asylum inflows would be of no use for groups of citizens with similar entry rights to the EU, such as Venezuelans or Ukrainians. By contrast, basing the Cusum model on Eurostat asylum data worked quite well. Prior to 2016, very few applications were lodged, and due to this low average during the calibration period, the first alert was issued in May 2015. This alert reflects the start of Venezuelans’ increasing mobility in reaction to their home country's intensifying economic crisis in early 2015, when thousands of people started fleeing their country (UNHCR 2018). Later, new alerts were triggered as the numbers of lodged applications doubled every quarter. In such a case, to avoid multiple alerts generated one by one, it is worth considering (1) the use of a version of the Cusum model in which the upper limit is calculated based on a 12-month moving average and (2) whether to reset the Cusum value to zero every time a new alert is triggered (see Figure 6).

Early Warning Alerts Generated by a Cusum Model Based on a 12-Month Moving Average and Upper Limited Set Up at Two Standard Deviations Applied to Applications Lodged by Asylum-Seekers (Eurostat Data) to Flows from Venezuela to EU28.
The decision on whether to reset the Cusum values to zero after an alarm is triggered raises substantive questions worth exploring in further investigations. If the variable being monitored is, for example, the logarithm of the daily (or weekly) rate of change, resetting is not necessary. On the other hand, resetting is particularly warranted if the analyst believes that the process based on migration counts moved to a new equilibrium and that its past ceases to matter. This is the case for nonstationary processes, whereby the characteristics governing the underlying random process change over time—a feature shared by many migration flows (Bijak et al. 2019). Nonstationarity does not have to signify a complete change of the whole underlying migration system, which is rarer (e.g., de Haas 2010), but just a shift of the volume of flows, for example, in response to the changing political, policy, or economic environment. One promising line of future enquiry could look into running two Cusum models in parallel—one for migration counts and one for rates of change—to get a better understanding of the characteristics of the underlying processes.
To externally validate the Cusum model's performance for detecting changes in levels of asylum flows, we compare it to a competing method based on the use of exponentially weighted moving averages (EWMAs) (e.g., Sonesson 2003; Frisén and Sonesson 2006). This method calculates the EWMA at time t,
To provide a fair comparison to the Cusum model, the parameters of EWMA, α, and L must be set to reasonable values. Here, these parameters are chosen using tables provided by Lucas and Succucci (1990), which present ranges of optimal parameters for various combinations of the desired false alarm frequency and the size of the mean shift to be detected. To be roughly consistent with the Cusum model, we allow for optimal detection of a mean shift of two standard deviations, while maintaining an average run length of 500 observations for series with no change of level. The estimation results in parameter values of L = 3.0455 and α = 0.365.
Table 2 gives the month in which an alarm was triggered by the EWMA and Cusum models based on the data from five countries, looking at both application and border apprehension series for each country. Highlighted cells indicate the earliest data for each combination of country and data type. In eight of 10 cases, both monitoring schemes trigger alarms at the same time. In the case of Ukraine, the alarm is one month earlier in the EWMA scheme, although this alarm is triggered in the very first month of the monitoring period. For the case of Afghanistan, using the Frontex data, we see that the Cusum triggers an alarm four months before the EWMA-based scheme. Overall, however, the general agreement between the two approaches gives reasons for confidence in the use of either scheme.
Dates at Which Alarms Were Triggered by Exponentially Weighted Moving Average (EWMA) and Cusum Monitoring Schemes Using Both Eurostat and Frontex Data for Five Countries.
Overall, our experience with the use of the Cusum, or alternative approaches such as EWMA, shows the potential of simple methods as elements of an early warning system. More advanced versions should ideally be based on multiple data sources and require cross-checking of alerts between them. Such analyses would require harmonizing the data in a systematic way to ensure consistency by mapping them onto common concepts and definitions.
Conclusions and Recommendations
The results presented in the previous section reveal very encouraging prospects for applying early warning systems as a policy and operational support tool in the context of asylum migration. Depending on the choice of variables and the definition of thresholds, the models can be useful for a range of applications, from simple monitoring of flows to increasing preparedness and enhancing contingency planning. The proposed approach's key feature enables a shift of perspective, from purely reactive toward proactive, in an environment when any advance warning is very important from both humanitarian and resource perspectives. The built-in interactive features necessitate proper exchange and coproduction of knowledge between decision-makers and modelers, whereas the model design and specification can be left to analysts, as setting a threshold at a certain level is ultimately a policy, or even political, decision.
Of course, there are some clear limitations to the proposed approach. Most importantly, in the context of uncertain and volatile asylum migration, the early warning models do not generate proper migration predictions; their main aim is to provide advice to analysts and decision-makers, rather than to predict future migration. Likewise, migration's uncertainty is not reduced by the early warning models, but they do allow for better management of flows. One implication of these limitations of early warnings is that the outcomes of the modeling process, and their inherent uncertainty, must be communicated clearly and honestly, also with respect to what the models cannot achieve. Another consideration is that formal monitoring of trends to identify early warnings is never a finished job, but rather a continuously evolving process of successive model adjustments. Finally, as mentioned before, statistical modeling cannot be the only component of a wider early warning system, even though it can be very important and offer unique insights.
Another set of challenges related to early warning models concerns data availability, comparability, quality, and completeness. As mentioned above, data limitations, such as the use of the lodging of applications, rather than arrivals, possible double-counting of applicants, the presence of applications from people already in the country, and so on, require additional care when setting up early warning models and interpreting their results. To that end, a statistical description of the data collection processes and their errors can be beneficial, providing additional insights into the uncertainty of the underlying processes.
Despite these limitations, the approach presented in this article is flexible and can be extended in several ways. In terms of exploring the potential of alternative data sources, there may be additional mileage in using “big data,” for example, from a mobile phone or social media “digital footprints,” in addition to more traditional data sources, such as administrative records. Migration-related applications of “big data” and “digital traces” are currently gaining traction, with several available examples, from a feasibility study related to labor migration by Hughes et al. (2016) to an analysis of flows driven by a natural disaster (Puerto Rican migration following Hurricane Maria, Alexander, Polimis and Zagheni 2019) or to migration with a substantial asylum component (Venezuelan “exodus” in Palotti et al. 2020). The “new data” situation relevant for Europe has been recently reviewed by Spyratos et al. (2018), and a general overview of the big data perspective on migration is provided in Sîrbu et al. (2021).
Besides, some ongoing projects aim to gain more understanding of refugee mobility by monitoring social media (e.g., Twitter or Instagram posts by refugees; Iacus 2017), conflict events (e.g., based on the GDELT data—https://www.gdeltproject.org—as done at EASO (2016b) or mobile phone Call Details records to better understand refugees’ secondary movements, see Sterly et al. 2019), or satellite imagery (e.g., Satellite Sentinel Project—http://www.satsentinel.org—or Migration Radar 2.0). Other projects look at collecting information on web searches for asylum/migration-related topics. For example, Böhme, Gröger and Stöhr (2020) recently found that georeferenced internet search data can effectively predict bilateral migration flows. Reconciling the differences between various sources remains a key challenge for future model development.
Even though alternative sources on their own are unlikely to enable estimation of the levels of migration flows and eliminate the need for traditional data, their potential should not be underestimated. First, asylum-seekers’ propensity to rely on mobile technologies is well documented (Kingsley 2016), so their journeys are likely to leave many “digital traces,” enabling an analysis of flows along different routes, subject to privacy constraints. Second, these sources offer unique and timely data, which may be able to detect changes in trends at much finer-grained time scales than is the case with traditional data. Indeed, a uniquely appealing feature of “digital traces” is their potentially finer time-granularity, in comparison with even the highest-frequency registrations (e.g., Spyratos et al. 2018). As mentioned before, current developments at EASO aim to include some of those data to boost the EU's asylum preparedness system (for a prototype work, see Carammia, Iacus and Wilkin 2020).
Subject to ensuring that the ethical, privacy, and data protection standards are appropriately maintained and that the information is safeguarded against “dual use” by malevolent actors, there is much potential in developing formal methods for linking traditional and nontraditional data sources. In the case proposed here, a natural starting point would be to feed the digital trace data into a change-point-detection model, such as (1) or (2), to identify and predict possible changes in patterns. As many such data sources can have arbitrarily fine time-granularity, an important caveat is that calibrating the warning thresholds depending on the data frequency would require attention so that the models do not “overreact.” After all, a threshold of “one-in-a-hundred-observations” would randomly trigger an unwarranted alert on average once every two years for weekly data, once every three to four months for daily data, and every four days for hourly data.
A second natural extension of the proposed approach would be to move from country-specific early warning models to an interlinked multicountry and multisource early warning system so that the advantages of a more general migration modeling framework could be utilized more fully (see the Online Appendix for examples). In particular, the elements of such a system could additionally include information on policy changes and some external variables, such as intensity of conflict in the origin regions, some of which may be available instantaneously or only with a short delay. In this context, the input of policy or country-specific experts into the modeling process becomes even more important: detailed domain knowledge can not only help fine-tune the system but also provide more holistic insights that are not available from quantitative data alone.
Yet another extension could consist of combining early warning models with computer simulations to provide an interactive framework for stress-testing various policy and operational responses to changes in migration flows. Even though the design and implementation of such models can be resource-intensive, simulation approaches can offer unique insights into systems with many interacting elements and a prominent role of human agency. Prototype simulation models for other forms of migration exist, including environmental (Smith et al. 2008) and labor migration (Klabunde and Willekens 2016), which could serve as starting points for developing a similar model specifically tailored for asylum.
In terms of recommendations for users of the models and migration management practitioners, three main lessons emerge from the modeling exercise presented here. First, uncertainty in migration can be managed to some extent by using appropriate tools. Second, making model building work must be a continuously updated process, rather than a one-off event. Third, honest, open, and transparent two-way communication between analysts and model users is a prerequisite for their successful practical application. Still, given the high volatility of asylum flows, the very high stakes involved in asylum migration, and the need for rapid humanitarian, operational, and policy responses, a methodological movement toward a proactive early warning system enabling more informed contingency planning is already long overdue.
Footnotes
Acknowledgments
The paper follows from a review carried out for the European Asylum Support Office (EASO), reported in Bijak, Forster and Hilton (2017). We thank Tim Cooper and Teddy Wilkin for their comments on the initial report, and two anonymous reviewers as well as the Editors of IMR for their very helpful suggestions on the initial draft. All the views and interpretations in this paper are those of the authors and do not necessarily represent the views of the EASO, European Commission, or any institutions with which the authors are or were affiliated. The authors are listed in reverse alphabetical order.
Data Availability Statement
The publicly available Frontex data used in our analysis can be obtained from https://frontex.europa.eu/along-eu-borders/migratory-map/ and the Eurostat statistics on asylum are at
(as of August 1, 2020). Data from Greece and Italy can be obtained from the respective authorities upon request.
Declaration of Conflicting Interests
JN is a former staff member of EASO, at the time of writing employed at the EC Joint Research Centre, Brussels, and currently at Cedefop, Thessaloniki. MC is also a former staff member of EASO, currently working at the University of Catania. JB, JJF, and JH declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The background work presented in this paper has been prepared for and funded by the European Asylum Support Office (EASO), under the contract EASO/2015/290, and continued within the ESRC Centre for Population Change (ES/K007394/1) and the Horizon 2020 project QuantMig: Quantifying Migration Scenarios for Better Policy (H2020-Migration-870299).
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
