Abstract
Scholars who have sought to identify the triggers of rare political events have met with limited success. With respect to civil war, studies teach us to expect conflict where it is feasible. However, although we understand where civil conflict occurs, we do not quite understand when it occurs. Focusing on civil conflict, I argue that time-variant and time-invariant explanations relate to the outcome by means of two distinct causal processes, which has implications for the identification of triggers of rare events. I provide an easily implementable approach to improve rare event estimation that uses matching to leverage constant attributes to estimate the effects of rare predictors. I demonstrate the utility of this procedure by providing an aggregate and disaggregate example of civil conflict onset estimation.
The spring of 2011 saw protest and violence sweep across Syria, plunging the country into civil war. Despite 20 years of seeming stability, Syria has long had a real propensity for civil conflict. As key studies of civil war have taught us, we may expect civil war where it is feasible: in countries with low GDP, mountainous terrain, and a large population (e.g. Collier et al., 2009; Fearon and Laitin, 2003). Based on these characteristics, Syria had been prone to conflict and the outbreak of civil war should hardly come as a surprise. However, based on these same characteristics, Syria had been equally prone to conflict 5, 10 or 20 years ago. This illustrates our limited success at generalizing across rare political events—whether it is war, rebellion, regime change, ethnic violence, or genocide: we understand where these major events occur, yet despite considerable effort, we have limited understanding of the triggers that determine when they occur.
The seemingly insurmountable difficulties to identifying the triggers of rare events have led scholars to focus on identifying where conflict can occur (e.g. Buhaug et al., 2014; Hegre et al., 2013) and advance error, noise, or idiosyncratic exogenous shocks as causal triggers for when conflicts occur (e.g. Fearon and Laitin, 2003; Gartzke, 1999; Harff, 2003). With respect to civil conflict in particular, this has led to a deeply unsatisfactory conclusion: where civil war can occur, it will occur eventually (Collier et al., 2009, pp. 23–24).
Why has the quantitative advancement over the last 20 years resulted in such limited understanding of the onset of rare, but highly salient issues, such as civil conflict, war, and mass killing? Scholars have sought the answer to this issue in the improvement of data quality (e.g. Cederman et al., 2011), the adoption of mixed-method research (e.g. Goertz, 2016), disaggregation and micro-level analysis (e.g. Kalyvas, 2006; Weinstein, 2003), a greater emphasis on effect sizes over p-values (Ward et al., 2010), and recently through the inclusion of event data (e.g. Chiba and Gleditsch, 2017).
A key issue that has received little attention is that current practices of conflict research are geared toward finding explanations that differ across, but not within units. The widespread use of observation-year data has unintentionally focused our attention on indirect, consistent, but time-invariant (constant) ‘attribute’ variables, e.g. mountains, over direct, time-variant ‘predictor’ variables, e.g. elite rivalry in authoritarian regimes or economic and environmental crisis. The importance of within-unit change is generally recognized and scholars of civil conflict commonly include some time-variant variables in their analyses (e.g. Chiba and Gleditsch, 2017). While this is a step in the right direction, I argue that this is not sufficient, as not all time-variant change is equal. As a consequence, instances of change that may actually trigger civil conflict remain both under-studied and under-theorized.
This article addresses the difficulties associated with the analysis of predictors—a time-variant type of causal variable. It focuses on civil conflict, because it is a salient issue that has seen a large number of quantitative studies over the two decades, but has nonetheless seen little progress with respect to these time-variant predictors. The article does not aim to predict conflict but engages core issues that must precede any meaningful study into the prediction of rare events instead. A better understanding of these issues should improve our theories, data collection, and analysis.
The main contribution of this paper is threefold. First, this study argues that further improvements in our understanding of conflict onset can be found in a re-evaluation of what it means for a predictor of rare events to vary. Building on the Neyman–Rubin framework of causation (e.g. Holland, 1986), this study posits that: (1) time-variant ‘predictors’ and time-invariant ‘attributes’ relate to the outcome by means of two distinct causal processes; and (2) that this distinction is especially relevant to the analysis of rare events such as civil conflict. We should therefore distinguish between these two processes both in the construction of our theories and in the design of our empirical tests.
Second, this paper argues that dominant estimation procedures are geared toward finding time-invariant attributes over time-variant predictors. Estimation difficulties that are particular to predictors may lead scholars to underestimate or prematurely discard the role of predictors in conflict outcomes. This paper outlines some of the problems for estimation and provides suggestions on how to deal with these problems as part of a mixed-method research agenda (e.g. Goertz, 2016). While there exist excellent statistical techniques, e.g. boolean logit and probit (Braumoeller, 2003), that can be applied for modeling attributes and predictors, these techniques have not entered the statistical mainstream (Mahoney et al., 2013: 79). Moreover, the adaptation of these techniques for modeling attributes and predictors nonetheless requires a full understanding of the problems outlined in this paper. Through a mixed-method approach that uses simple and familiar statistical tools, such as bootstrapping and matching, this paper provides accessible descriptive tools that are particularly suited for exploratory analysis of predictors of rare events. Moreover, by separating the estimation of attributes from the estimation of predictors, the suggested approach allows for a more efficient collection of higher quality data on predictors.
Third, I argue for a broad mixed-method research agenda with a renewed focus on the subset of time-variant predictors that may explain when civil conflict occurs. The quantitative advances of the previous decade have robustly established in which countries we may expect civil war to occur. Furthermore, new disaggregate studies are currently pinpointing where within these countries conflicts are most likely to originate. Despite considerable effort, however, only limited progress has been made toward establishing when we may expect civil conflict. A renewed focus on the predictors of conflict will not only provide us with a better understanding of conflict dynamics, but may also open up research into the conditions that make conflict less likely to occur.
This paper proceeds as follows. The first section establishes the distinction between the causal processes of direct time-variant predictors and indirect time-invariant attributes. Then it turns toward the implications for past and current civil conflict research. Third, the paper provides some suggestions for analysis as well as a simple estimation procedure based on established practices to deal with the problems related to the evaluation of predictors of rare events. Two examples, one aggregate and one disaggregate, further demonstrate the utility of this procedure for the estimation of rare predictors.
Interaction of predictors and attributes
Mountainous terrain has been argued to be one of the strongest predictors of civil conflict onset (e.g. Fearon and Laitin, 2003; Ward et al., 2010). Mountains are constant, however; surely, a constant cannot predict such violent change as the onset of civil conflict? Yet mountains are nonetheless found to be associated with an increased propensity for conflict. Terrain features, such as mountains or jungle, may allow insurgents to avoid government forces and increase insurgent survivability (Fearon and Laitin, 2003), and may therefore enable civil conflict, without actually causing civil conflict itself. Therefore, constant variables such as terrain have a key role in the prediction of conflict: although these constant attributes cannot trigger civil war directly, they may interact with and amplify the effects of immediate predictors.
Causal explanations of rare events are therefore composed of two distinct components: (1) one or more ‘predictors’—the immediate conditions that may directly cause the event; and (2) a variety of ‘attributes’—the environmental conditions that interact with the direct predictor conditions to cause the event. The Neyman–Rubin framework of causation provides us with the conceptual framework to distinguish between these time-variant predictors and their time-invariant environment. Within this framework, units—e.g. countries, provinces, or individuals—are assumed to be subjected to a treatment or a control condition of the trigger or predictor. Moreover, regardless of whether a unit is subjected to the treatment or not, it is essential for causal inference that any unit could potentially be exposed to either condition. Predictors should, in other words, have the potential to vary within units. In contrast, when a variable only differs across units, it cannot be a cause of variation in outcome within units, but it is what we call an attribute of the unit (Berk, 2004; Holland, 1986). 1 For example, although mountains may vary across units, like gender, they do not vary within units and are therefore an attribute rather than a potential predictor. 2
To a geologist, however, the claim that mountains are constant is surely false: mountain ranges erode, plates collide. However, from a political science perspective this change is irrelevant. Clearly, not all variation over time makes an explanatory variable qualify as a potential predictor. An attribute, therefore, need not be constant in order to have an indirect relation to the outcome. Rather, attributes exist on much larger time frames than the phenomena to be explained: attributes have an indirect relation to the outcome when they are relatively time invariant. Therefore, the distinction between an attribute and a predictor should be made on the basis of the variation they display relative to the outcome.
When the outcome is a rare event such as civil conflict, variables that change only incrementally cannot explain when a country will experience civil war. For example, although economic development and population do vary over time within countries, they do not generally vary on a scale that is relevant to the onset of civil war: the US was highly developed with a high population in 1950 and it still is today. 3 Although attributes, such as GDP per capita may show us where a rare event such as civil conflict may occur, it should be obvious to the reader that without substantial temporal variation we cannot understand when these events occur. 4 For any potential cause to directly bring about a rare event outcome such as civil war it should display significant change.
Causal complexity and the analysis of attributes and predictors
Conflict events are casually complex events (e.g. Braumoeller, 2003) with particular implications for the analysis of attributes and predictors. The causal complexity of conflict is reflected in conditionality, equifinality, and rare events. First, conditionality 5 refers to the conditional relationship between attributes and predictors in determining the outcome; both need to be present to generate the outcome. With respect to conflict, for example, attributes determine which countries are at risk and predictors determine which at-risk countries experience conflict. The Netherlands, for example, is not at risk of conflict based on its attributes; it has a high GDP per capita, a small population, a long democratic tradition, is embedded in the EU, and bridges are its main source of elevation. Owing to these attributes, a predictor such as a severe economic crisis would cause problems, but would not lead to civil war. Yemen, conversely, is at risk; it has a low GDP per capita, mountainous terrain, an authoritarian government, and deep religious and ethnic divisions. Owing to these attributes, a severe economic crisis could potentially trigger civil war. To put in an analogy, attributes are the powder keg and predictors the spark. 6 As such, the effect of the predictor is conditional on the values of the attributes—and vice versa.
Second, equifinality refers to the existence of multiple causal paths that lead to the same outcome. Equifinality is a common characteristic of predictors. Within an at-risk country, a variety of predictors could potentially trigger civil war, e.g. severe economic crisis, a purge of regime elites, or ethnic riots. It is theoretically possible for equifinality to exist within attributes if (combinations of) attributes independently determine whether a case is at risk of the outcome—when either mountains or low GDP per capita would independently leave a country at risk of conflict, for example. However, as extant theoretical explanations of rare conflict events do not actually suggest equifinality of attributes, this paper assumes that equifinality is primarily a characteristic of predictors: at-risk countries could experience conflict through a variety of predictors.
Finally, rare events refer to the existence of many cases in which a conflict outcome is absent and only a few in which a conflict outcome is present. The outcome is rare not only because a small subset of the population is at risk based on the attributes, but also because predictors that have sufficient impact to bring about the outcome are rare themselves. For example, with respect to civil conflict, only a small subset of all countries are at risk and even within these at-risk countries civil war onsets are rare. This implies that predictors are rare themselves, especially if we take equifinality into account.
What do conditionality, equifinality, and rare events mean for the estimation of attributes and predictors? For the estimation of attributes, conditionality, equifinality, and rare events are relatively unproblematic. Conditionality does weaken the correlation between attributes and the outcome because the outcome is conditional on the presence of one or more predictors. Still, the effect of conditionality is mitigated by equifinality of predictors; owing to the equifinality of predictors, an at-risk country will have multiple paths to civil conflict. Also, countries that are not at risk based on the attributes will consistently lack the outcome. Therefore, the effects of attributes will be reflected in statistical estimation irrespective of whether any predictors are included in the estimation. Attributes will be correlated with conflict because the trigger effect of predictors is reliably captured by the stochastic element or error term. 7 For the estimation of attributes, war is indeed the error term (after Gartzke, 1999). Lastly, while the outcome is rare, attributes are not: both high and low values of the attributes occur with regularity in the data. Taken together, attributes display a rare event data structure for which standard solutions exist (e.g. King and Zeng, 2001). 8
Conversely, for the estimation of predictors, conditionality, equifinality, and rare events are potentially problematic. Conditionality weakens the correlation between predictors and the outcome because the effect of the predictor is conditional on the value of the attributes. When the at-risk population is a small subset of the total population—as is the case in conflict and mass violence events—the effect of the predictor will mostly be zero and only occasionally correlated with the outcome. The inclusion of cases that have no risk of conflict weakens the correlation between predictors and conflict because it adds observations with positive values for predictors that cannot bring about civil conflict. In estimations based on the full population—as is the dominant procedure in quantitative studies (Mahoney and Goertz, 2006)—the effect of the predictor will mostly be zero and only occasionally correlated with the outcome. In estimations based on the at-risk population only, however, the effect of the predictor will be strongly correlated to the outcome. These estimation problems are further exacerbated by equifinality and rare occurrence of predictors: even in the at-risk population, most of the occurrences of civil conflict will be generated by rival predictors.
To help illustrate the effects of conditionality, equifinality, and rare events, Figure 1 shows a schematic diagram of a logit estimation for attributes and predictors. It has 10 observations with half of the population at-risk. There are two observations with the predictor (filled dot), one of which is also an at-risk observation. There are also two observations with a rival predictor (crossed dot), one of which is also an at-risk observation. As shown on the left curve for attributes, both conflict observations occur at high risk based on the attributes. The correlation between attributes and the outcome is weakened by conditionality because most of the at-risk observations lack the predictor, but it is still there. As shown on the right curve for predictors, however, conditionality and the equifinality of the rival predictor leave very little if any correlation between the predictor and the outcome.

Equifinality and conditionality on attributes and predictors.
In sum, of the small number of predictor observations, most will not display the outcome, because they occur outside the at-risk population (conditionality). Of the observations where the outcome is present, only a fraction will have the predictor. Together, conditionality, equifinality, and rare events make predictors of civil conflict especially difficult to examine. It is therefore unsurprising that empirical studies of civil conflict generally do not address the mechanisms of change that are expected to predict civil conflict.
The dearth of predictors in civil conflict theory
The previous decade witnessed the accumulation of a solid body of knowledge with respect to the onset of civil war. 9 However, these quantitative studies have predominantly tested the effects of attributes on civil war onset. 10 A key theoretical debate with respect to the onset of civil war has revolved around the question whether the onset of civil war can best be explained by opportunity structures on the one hand (e.g. Fearon and Laitin, 2003), or by incentives like “greed” or “grievance” on the other (e.g. Collier and Hoeffler, 2004). In this debate, opportunity relates to the structural environment that a rebellion requires to survive; greed stands for the private gains that motivate and aid a rebellion; and grievance focuses on relative deprivation and inequality of large groups of people that fuel the willingness to fight the central government.
When we examine these causal theories from the perspective of attributes and predictors, it becomes clear that these theories are not on equal footing. Theoretical arguments related to feasibility or opportunity structures have an attributal relation to the outcome. Fearon and Laitin (2003), for example, find opportunity structures or the conditions that favor insurgency to best predict civil war onset (see also Collier et al., 2009). As few individuals are sufficient to start and sustain a rebellion, motivation is unimportant (e.g. Collier et al., 2009; Fearon and Laitin, 2003; Mueller, 2004). In the words of Collier et al. (2009: 2): “where a rebellion is financially and militarily feasible it will occur”. Specifically, poverty, state weakness, rough terrain, and a large population make countries prone to insurgency (Collier et al., 2009; Fearon and Laitin, 2003). It should be clear that these time-invariant proxies for feasibility are attributes and cannot predict when civil war will occur.
Moreover, even though theoretical arguments of greed and grievance may have a time-variant relation to the outcome, they are seldom operationalized as predictors. Most proxies adopted to test the greed thesis are time-invariant from an onset perspective and cannot explain when civil war occurs. For example, in their 1998 and 2004 articles, Collier and Hoeffler find finance availability to be conducive to civil war onset, particularly in the form of natural resources. Exploitable resources may both provide an incentive for the capture of state power and pay for the rebellion itself by providing economic incentives to rebel soldiers (see also Regan and Norton, 2005). However, resource rents are never examined in a matter that would capture temporal variation, nor do the theoretical explanations explicitly account for change.
Like greed, grievance shows little variation over time. Grievance is operationalized as ethnic or religious fractionalization (e.g. Collier and Hoeffler, 2004; Collier et al., 2009; Fearon and Laitin, 2003) or economic inequality (e.g. Regan and Norton, 2005) and is (mostly) constant across all years for each country. Although there is some disagreement on whether grievance affects civil war onset, 11 scholars have consistently measured grievance as constant (e.g. Buhaug et al., 2014).
Similarly, the promising move toward disaggregation, while greatly improving measurement and analysis of (regional) attributes, has yet to result in a focus on time-varying predictors (e.g. Cederman et al., 2011; Raleigh and Hegre, 2009). Few of the hypotheses of greed, grievance, or feasibility seem to take any of the mechanisms of change into consideration. 12 Consequently, most theories of civil war onset are theories of attributes, rather than predictors. We are therefore in need of theoretical refinement that can explain the change to civil conflict.
Only recently, a few insightful studies have begun to take change and predictors seriously (e.g. Chiba and Gleditsch, 2017; Roessler, 2011). In a study of conflict prediction, Chiba and Gleditsch (2017) find modest, yet inconsistent, increases in civil war prediction for some predictors. They therefore conclude that structural conditions or attributes contribute most to predictive models. While these conclusions are in line with the estimation difficulties addressed above, they underrate the potential of predictors in our causal understanding of conflict. 13
Toward a mixed-method research agenda for predictors
Insofar as scholars have researched predictors, they have found weak and inconsistent effects. Researchers who seek to investigate predictors of conflict have several quantitative and qualitative options available, such as interaction models (e.g. Clark et al., 2006), boolean logit/probit (Braumoeller, 2003), hierarchical modeling (e.g. Gelman and Hill, 2006), QCA (Ragin, 2008), matching (e.g. Sekhon, 2009) and selection of at-risk cases (e.g. Mahoney and Goertz, 2004; Seawright and Gerring, 2008). Unfortunately, a detailed treatment of qualitative strategies, such as QCA, is beyond the scope of the paper, which will focus on the advantages and disadvantages of key quantitative options.
Currently, there exist various cross-case quantitative strategies to model the conditionality between attributes and predictors, e.g. specify an interaction, boolean probit (or logit), and hierarchical modeling (e.g. Braumoeller, 2003; Clark et al., 2006; Gelman and Hill, 2006). However, these are seldom used for conflict research because scholars commonly assume that the conditionality of predictors in dichotomous models, e.g. logit and probit, can simply be captured by compression (e.g. Berry et al., 2010). Compression is an artifact of dichotomous models that fit the predicted probability on an S-curve. From a marginal effects perspective, compression results in an interactive relationship between variables. Specifically, the marginal effects of a specific variable are dependent on the other variables, particularly at the center (e.g. Berry et al., 2010; Rainey, 2016). However, this is fully conditional on the position on the curve; the effects of the variables in the model do not change relative to the other variables in the model. The model is therefore one of additive conditionality, which fits well with our understanding of attributes that combine to form a baseline risk of conflict. However, the relationship between attributes and predictors is one of multiplicative conditionality; it is independent of the position on the curve, but depends on the values of the attributes. The conditional relationship should therefore be specified as an interaction with a product term, 14 a hierarchical model with varying intercept (and slope), or boolean logit/probit.
These models have the advantage of precision and are well supported by a well-developed literature, but all suffer from the “curse of dimensionality” (Bellman, 1961: 94) and require considerable variation (Braumoeller, 2003). Unfortunately, the marginal effects of predictors occur at the high end of the attribute distribution (Berry et al., 2012), where observations are sparse and where we can expect high multicollinearity and large standard errors. While large standard errors correctly reflect the uncertainty in the data, they should not lead researchers to abandon research into predictors. Researchers that want to use interactive models to model predictors should note: (1) that the effect of the interaction in dichotomous models is dependent on the position on the probability curve—and can be both positive and negative (e.g. Ai and Norton, 2003); (2) that the focus should be on the total marginal effect of the predictor over the marginal effects or statistical significance of the individual terms (e.g. Brambor et al., 2006); and (3) that predictors will have larger standard errors than attributes. Fortunately, conflict events are rare, but this does force the field to accept greater uncertainty with respect to predictors with large effect sizes (e.g. Ward et al., 2010; Ziliak and McCloskey, 2008).
Moreover, Boolean probit is the only of these quantitative approaches that explicitly deals with equifinality (Braumoeller, 2003). It therefore provides the statistical machinery to model the complex interaction between attributes and predictors, provided that researchers are aware of the causal relation between attributes and predictors. With sufficient and high-quality data on a variety of attributes and predictors, boolean probit is clearly superior to alternatives. However, it requires data on all relevant predictors for all observations, does not allow for partial collection of data on predictors (i.e. for at-risk cases only), and does not provide guidance on which predictors to enter into the model. It is therefore less suited for exploratory analysis. It also has yet to enter the statistical mainstream. As the study into predictors matures, boolean probit should become increasingly relevant. For now, we should also consider alternatives.
Separating the analyses of attributes and predictors
To further research into predictors, I suggest a simple and easily implementable technique that may be used for exploratory analysis of potential predictors. The suggested approach is a simple ‘split sample’ approach that leverages the analysis of static attributes to identify at-risk observations. These at-risk observations can then provide the focus for additional data collection on potential predictors and also help evaluate the effects of these predictors. I explicitly do not argue that this is the best or only approach possible. The main comparative advantage of the suggested approach is that it is easy to adopt for applied researchers and that it allows for a more efficient collection of higher quality data on predictors. It is therefore particularly well suited for exploratory analysis. However, as research into predictors matures, boolean probit should probably become the dominant estimation technique to estimate the effects of rival predictors.
Attribute analysis
For the analysis of attributes, we can leverage the fact that attributes are mostly static and that their effect on the outcome should therefore not change over time. A country that does not have any significant changes in its attributes for over 20 years, for example, should have a similar risk of conflict over these 20 years, irrespective of the year in which conflict actually occurs. Therefore, the relevant outcome variable for the estimation of attributes is not ‘conflict onset’ but ‘conflict occurrence’, i.e. does conflict occur at any time during this 20-year period. This allows for a static analysis of attributes in which we can count these 20 years as a single observation instead of 20 observations. The static analysis of attributes is a modest, but key, innovation, not only because it eliminates data inflation, but more importantly because it mostly eliminates omitted variable bias from omitted predictors. While predictors are key to conflict onset, they have no meaningful relationship to conflict occurrence. 15 Therefore, attributes can be analyzed separately from predictors in a static model.
To analyze attributes, I suggest that researchers (1) recode the outcome variable to a static variable, e.g. conflict occurrence, based on substantive changes in attributes and (2) select a single observation for each period in which there is no meaningful change in attributes. To generate a static variable, the researcher should first identify the periods in which there is no meaningful change in attributes for any given country, hereafter ‘country spell’. Here, researchers should use substantive knowledge of the measurement scale of the attribute to determine whether a change in attributes is meaningful and be explicit about their choices. 16 The researcher should then code the static ‘occurrence’ outcome. Occurrence should take a positive value for all observations in any country spell if there are any onsets within the spell. For example, if a country has two civil conflicts in a 50 year period without any real changes in attributes, we would code conflict occurrence as 1 for all 50 years. However, if both conflicts occur during a 20-year period of authoritarianism and no conflicts occur during a 30-year democratic period, we would code conflict occurrence as 1 for the 20 years of authoritarianism, and 0 for the democratic period.
After recoding the outcome variable, the researcher should enter each spell as a single observation in the attribute analysis. The researcher can do this by selecting a single year from each spell to enter into the analysis. Here, the assumption is that each spell represents a single observation and that all country-years can be substituted within a given spell. Alternatively, a researcher could use bootstrapping to make repeated draws from the entire population. The size of each draw should be the same size as the total number of spells to avoid artificial inflation of the number of observations. 17 Because each observation in a spell has an equal probability of being selected, bootstrapping is effectively averaging over all attributes within a spell. The main advantage of bootstrapping is that it automates the selection process. It also takes all information in the data into account. For example, longer country spells contain more information than shorter spells. With bootstrapping, observations from longer spells have a greater probability to be drawn. After selecting observations by hand or by bootstrapping, the researcher can estimate the effects of the attributes.
Case selection and additional data collection
One of the biggest advantages of separating the analysis of attributes from the analysis of predictors is that it allows for a much more efficient collection of higher-quality data on predictors (after King and Zeng, 2001). Following attribute analysis, the researcher can estimate the attributal susceptibility to conflict by generating the predicted probabilities for the occurrence of conflict for all country years based their attribute values. This allows the researcher to collect data on predictors for those cases where the outcome can actually occur (Mahoney and Goertz, 2004).
Research into predictors is still in its infancy and would therefore benefit from more and higher-quality data on potential predictors. Unfortunately, high-quality data on predictors is often hard to collect. Elite rivalry within authoritarian regimes, for example, could be a potential predictor of civil conflict as it may lead regime elites to provoke conflict to win internal rivalry (e.g. see Chiozza and Goemans, 2011; Roessler, 2011; Van der Maat, 2015). However, high-quality data on elite rivalry within secretive authoritarian regimes is difficult and time intensive to collect. Selection of at-risk cases on the basis of attributes for data collection could, therefore, propel research into key predictors.
Predictor analysis
For the analysis of predictors, we can leverage the attributal susceptibility generated by attribute analysis to reduce the problems of conditionality, equifinality, and rare predictors: first, by selecting cases where conflict can actually occur; and second, by matching observations in which the predictor occurs to observations with a similar risk of conflict based on their attribute values. For the analysis of predictors, simple selection of observations with a high attributal susceptibility can mitigate the problem of conditionality. Recall that conditionality implies that the effect of predictors is conditional on the attribute values and that substantively significant predictors should, therefore, be related to the outcome in the at-risk population. Researchers should, therefore, select on attributal susceptibility to conflict and analyze predictors for those cases where the outcome can actually occur (e.g. Mahoney and Goertz, 2004).
Matching on attributes can further mitigate the problems of equifinality and rare events. Matching is a quantitative case comparison (Seawright and Gerring, 2008) and pre-processing technique (Iacus et al., 2012) that increases the quality of the comparison, i.e. increases balance in the data. Of particular importance for the analysis of predictors is the risk of conflict captured in the attributal susceptibility dimension. Matching on attributal susceptibility confines the test of predictors to cases that are most similar with respect to the risk of conflict. Matching on attributal susceptibility has a couple of advantages. First, the selection of similar units for comparison avoids extrapolation beyond the data, which potentially mitigates equifinality, i.e. multiple predictors that lead to the outcome. Predictors are rare and therefore do not necessarily have observations along the entire range of attributal susceptibility. Rival predictors that occur at high levels of attributal susceptibility and outside the support for the predictor under examination may therefore weaken the correlation between the predictor and outcome. Matching on attributal susceptibility therefore ensures that the data is relevant for the predictor under examination.
There exist a variety of matching approaches, such as Propensity Score Matching, Mahalanobis Distance Matching, or Coarsened Exact Matching. Of these, Coarsened Exact Matching (CEM) is most suited for matching on attributal susceptibility because it allows the researcher to balance observations on key attributes, emulating a blocked randomized experimental design (King and Nielsen, 2016). For predictor analysis, researchers should at least balance on attributal susceptibility. This assumes that observations that lack the predictor but have the same risk of conflict as observations with the predictor are good counterfactuals. Moreover, CEM allows the researcher to easily match on key additional attributes, which ensures that balance does not worsen on these particular attributes. For example, if the researcher considers countries that completely lack mountains (e.g. small deltas and island states) substantively important for creating counterfactuals, the researcher can add mountains to the CEM matching to ensure that predictor observations with mountains are not matched to those without.
In order to show how suggested approach can be adopted for exploratory analysis of predictors for civil conflict, I will provide two empirical examples: the first is an aggregate analysis loosely based on the body of knowledge that was developed over the previous decade (e.g. Collier et al., 2009; Fearon and Laitin, 2003; Regan and Norton, 2005; Sambanis, 2004); and the second, reanalyzes the insightful disaggregate study of Cederman et al. (2011) for time-variant predictors. The empirical part of this study is designed to show how scholars can adopt these simple tools to examine how attributes and predictors interact.
Example I: aggregate analysis of civil conflict
Here, I adopt the suggested mixed method approach to provide a simple aggregate analysis of civil war onset to account for the direct and indirect effect of predictors and attributes, respectively. I use a base dataset of country-year data from 1970 until 2004 18 that adopts the PRIO/Upsala coding of civil conflict onset, 19 which captures the onset of conflict in its earliest stages and, therefore, mitigates endogeneity concerns. 20
The independent variables can be categorized in time-invariant attributes and time-variant predictors. Our first task is to estimate the attributal susceptibility on the basis of time-invariant attributes to establish the conditions under which civil war can actually occur. Primary candidates are attributes that have been found to robustly correspond to civil conflict in a wide range of studies (Hegre and Sambanis, 2006): the economic development of a country in per capita GDP; the size of the country measured as a log of the population; 21 the extent to which a state has mountainous terrain; 22 and social fractionalization, 23 i.e. the higher value of ethnic and religious fractionalization. Of the variables that did not demonstrate consistent effects (Hegre and Sambanis, 2006), economic inequality as measured in a Gini coefficient is included, because recent studies do find convincing effects for inequality (e.g. Buhaug et al., 2014). 24 As can be seen from the descriptives in Table 1, countries differ greatly with respect to these attributes.
Descriptives of outcome, predictor and attributes
As an example of a time-variant predictor for civil conflict I adopt economic crisis, which I argue is different in kind from regular economic growth and decline as measured by change in GDP per capita. Particularly, an economic crisis—measured as especially large shocks in GDP per capita—may be a catalyzer for change within society, because it (1) may make grievances more salient and (2) may make individuals more risk-acceptant. Economic crisis may increase grievance by directly threatening key physiological needs (Maslow, 1943). Economic crisis generates widespread unemployment and makes essential goods such as bread and housing prohibitively expensive. This may increase the willingness to take risks in order to address these needs. Also, people are generally more risk-acceptant to prevent loss (e.g. Jervis, 1992; Kahneman and Tversky, 1979). The loss generated by economic crisis is therefore expected to make individuals more risk-acceptant. This increased risk acceptance induced by economic crisis within substantial parts of society in turn lowers the threshold for collective action (e.g. Kuran, 1991) and could therefore be a potential predictor for civil violence.
I operationalize a severe economic crisis as a 10% decline in GDP with respect to the previous year, lagged by one year. Note that I do not consider economic crisis a particularly novel or theoretically interesting variable as one could argue that it is captured in previous studies by change in GDP (e.g. Collier et al., 2009). However, it is a good example of an undisputed predictor that is widely believed to be able to directly predict civil conflict and therefore serves as a good example for the proposed technique. Moreover, the actual time within which a predictor is expected to lead to onset is unclear. I assume that an economic crisis predictor will make a country more likely to experience civil war within the following 2 years. 25
As can be seen from the first column of Table 2, economic crisis, which is the only time-variant predictor, does not seem related to the onset of civil conflict. The attributes on the other hand all seemingly correspond to civil war onset. However, when we examine these attributes separately in a static model, the picture changes. Contrary to earlier findings, mountains are no longer predictive of civil conflict. Therefore, it seems that mountainous terrain does not hold up to the adjustment of sample size; the effects of mountains, if any, on the occurrence of low-intensity civil conflict are small and inconsistent. It may still be that mountains affect the intensity and recurrence of conflict. However, mountains tell us little about where to expect low-intensity conflict to initiate. Similarly, social fractionalization does not hold up well to the smaller sample size.
Attribute estimation
Standard errors are in parentheses: *p < 0.1; **p < 0.05; ***p < 0.01.
As expected, the attributes of per capita GDP and population robustly correspond to the occurrence of conflict: countries with a low GDP per capita and a high population are clearly in the set of countries where conflict can occur. It is more striking, however, that the Gini index seems robust to the smaller sample size. Few studies find effects for inequality as a measure for grievance, probably because of data limitations (e.g Collier and Hoeffler, 2004; Fearon and Laitin, 2003; Hegre et al., 2003). Although any effects of inequality could potentially be an artifact of imputation choices, the aggregate analysis of attributes indicates that there may be more to inequality as a condition under which civil war is more likely to occur. This finding is supported by the disaggregate study of Cederman et al. (2011). Because per capita GDP and high population have a strong, and inequality has a weak, relation to conflict occurrence we will want to use these attributes in our estimation of the attributal susceptibility for each country-year observation. Moreover, because a likelihood ratio test indicates that the full model makes the probability of observing the data significantly more likely than alternatives without Mountains or Social Fractionalization, the attributal susceptibility is estimated from the full model of column 2 in Table 2.
Now we can adopt Coarsened Exact Matching (Iacus et al., 2012) to match on the attributal susceptibility. Before matching, the attributal susceptibility has a univariate distance measure of 0.20, which leads us to conclude that there is moderate imbalance. Because the attributal susceptibility is very fine grained, we can match very closely by coarsening the variable only a little, creating 50 strata, of which 47 generate successful matches, thereby greatly reducing distance between treatment cases with economic crisis and the most similar counterfactual (distance 0.04).
The first model of Table 3 shows the effect of crisis on conflict onset within 2 years for all countries and all years after attributal susceptibility matching: economic crisis displays a consistent relationship with the onset of conflict. The effect of economic crisis is substantial as the risk of experiencing civil conflict increases from 9 to 15% on average. 26 Moreover, when we estimate the effect of crisis on the half of the countries with the greatest propensity for conflict (attributal susceptibility is greater than 0.54) in model 2, the effect of economic crisis becomes even greater. When countries at risk experience economic crisis, the average probability of conflict of 17% is almost doubled to 32%. Moreover, the effect retains significance despite the drop in the number of observations. As can be seen from models 3 and 4, these effects hold even when accounting for a country’s conflict history and attributes. 27 Specifically, economic crisis increases the risk of conflict by 8% in both dynamic specifications. Model 4 is presented to show that key attributes are already accounted for in the earlier models, because of the matching on attributal susceptibility.
Country experiences a civil conflict within 2 years
Analysis after attribute matching. Standard errors are in parentheses: *p < 0.1; **p < 0.05; ***p < 0.01.
Civil conflict has seen a large number of aggregate studies, making it an unlikely candidate for novel findings. Nonetheless, the results from the suggested approach indicate that not all previously established “predictors” of conflict are equal. The attributes per capita GDP, population, and inequality indicate where civil conflict might occur, whereas the effects of mountains and social fractionalization seemingly do not. Moreover, economic crisis may trigger civil conflict in countries that are more likely to experience civil conflict based on these attributes. More importantly, however, this example shows that the inflation of observations with respect to attributes leads us to underestimate the standard errors of these attributes. This in turn may lead to an undue focus on attributes and may hide the effects of time-variant predictors of conflict.
Example II: disaggregate analysis of civil conflict
The promising turn toward disaggregate analysis in the field of civil conflict research is yet to coincide with a focus on changing predictors. Currently, most studies examine how disaggregate attributes correspond to civil violence. 28 Moreover, insofar as time-variant variables are included in the analysis of conflict, within-unit change is marginal and generally not on a scale that is informative for the prediction of civil war. For example, Cederman et al. (2011) adopt mostly static measures in their study of the effects of economic inequality between ethnic groups (horizontal inequality) on civil conflict. They find that ethnic minorities that have greater wealth or poverty than the national average are overrepresented in civil conflict. Given large discrepancies between the findings of qualitative and quantitative studies with respect to inequality, this is an important finding. However, while the study shows that grievances improve our understanding of where civil conflict can occur, the question of whether grievances can explain when conflict is likely to occur is left open. I therefore not only seek to argue that time invariant factors are best estimated using a static model, but more importantly, that the omission of time-variant causal hypotheses is a missed opportunity.
I expect that the onset of grievances should be especially likely to induce civil conflict. This leads me to adopt the Cederman et al. (2011) data to answer the question: does the onset of political exclusion correspond to a higher risk of civil conflict? First, I re-analyze the Cederman et al. (2011) model to account for the effects of change in grievances. Their preferred model is presented in column 1 of Table 4. Following Cederman et al. (2011), observations run from 1990 until 2005 and exclude ethnic groups with a population smaller than 500,000. Inequality stands for the symmetric logged measure of horizontal inequality, Excluded is a dummy that represents political exclusion, Power Balance is ethnic group’s share of the population, and No. Excluded Groups is the number of politically excluded groups in a country. 29 Inequality, Excluded, Power Balance, GDP/capita, and No. Excluded Groups are all attributes that change very little over time. However, even though the independent variables in the Cederman et al. (2011) model are time-invariant attributes, their effects are estimated using country-year observations. 30
Attribute estimation
Standard errors are in parentheses: *p < 0.1; **p < 0.05; ***p < 0.01.
Therefore, model 2 in Table 4 shows a bootstrapped static model that estimates the effects of these attributes without country-year inflation. 31 Results show that the effects of political exclusion, GDP per capita and the number of excluded groups hold, despite the much smaller sample size. Moreover, the measure of horizontal inequality remains significant at the 90% level. However, the consistency of the effects of power balance disappears completely and the coefficient even changes sign. The effects of power balance in the original model are therefore likely an artifact of the inflated sample size. Still, based on a likelihood ratio test of the model with and without power balance, power balance is retained in the estimation of the attributal susceptibility.
Now it is but a small step to estimate the attributal susceptibility to select and match on the estimated attributes. Results reveal that half of the ethnic groups have less than a 0.4% probability of experiencing conflict at any time. Moreover, only a quarter of groups have a probability of conflict that is greater than 3.1% and only 1 in 10 cases has a conflict probability greater than 15.7%. Apparently the baseline propensity for conflict based on the attributes is very low. 32 Still, we can match on attributal susceptibility to estimate the effects of the onset of political exclusion as the predictor of grievance. The dummy for the onset of political exclusion is lagged for one year to mitigate endogeneity concerns. The results after matching on attributal susceptibility are shown in the first column of Table 5. 33 The effects of onset of political exclusion are consistent and very large, even before selection: the 95% confidence interval of the effect is somewhere between 1.5 and 34.5 percentage points. This is a strong effect that on average increases the probability of conflict occurring within 2 years from 0.04 to 0.18. Still, when we select those observations that are most likely to experience civil conflict on the base of their attributes (model 2), the effect of political exclusion retains significance despite the smaller sample size and becomes even greater. Specifically, the 50% of groups that have the greatest attributal susceptibility for conflict on average have a 0.05 probability of experiencing conflict. This increases to as much as a 0.21 probability within 2 years following the political exclusion of the ethnic group. The 1–40 percentage point increase in conflict risk attributed to political exclusion shows that time-variant predictors are too important to ignore.
Ethnic group experiences a civil conflict within 2 years
Standard errors are in parentheses: *p < 0.1; **p < 0.05; ***p < 0.01.
Models 3 and 4 show two panel specifications: in model 3 the effects of political exclusion onset are estimated before matching and selection; and in model 4 these effects are estimated after matching on attributal susceptibility and selection of the 50% of ethnic groups most likely to experience conflict. 34 Only after matching on attributal susceptibility and selecting on those ethnic groups that are most likely to experience conflict are the effects of political exclusion onset uncovered. The differences between these models show that dominant estimation approaches favor attributes over predictors. 35 The results presented here not only show that political exclusion as a time-variant predictor of grievance has a substantial effect on civil war onset, but also re-affirm the argument that the interaction between time-invariant attributes and time-variant predictors may merit a closer look in a wide range of studies.
Conclusion
To date, scholars that have sought to examine the predictors of civil conflict have met with weak and inconsistent results. As I argue in this paper, the attributal context determines the relationship between potential predictors and the outcome. 36 We should therefore expect predictors to interact with relatively static attributes, which should inform both our construction of theory and measurement. However, despite their importance, predictors are hard to estimate, owing to problems of conditionality, equifinality, and rare events. As a consequence of these difficulties, time-variant predictors are not only rare in civil conflict research, but also meaningful variation of independent variables within cases is almost non-existent. For example, GDP and population size may change, but they do not vary on a scale that can predict the onset of civil conflict. Therefore it is unsurprising that we are better able to determine where we may expect civil conflict than when it may occur.
The aim of this study was not to address the difficulties associated with panel data, 37 but rather to demonstrate how the two distinct causal processes of attributes and predictors can be integrated and estimated using simple and easily adoptable approaches. By separating the analysis of attributes from the analysis of predictors, researchers can both improve data collection and better identify predictors; matching and selection on key attributes can address some of the difficulties associated the analysis of predictors. As can be seen from the examples, this exploratory approach has the potential to uncover provocative causal associations that would otherwise remain hidden in the data. A simple analysis of aggregate and disaggregate findings uncovers that severe economic crisis and political exclusion of ethnic groups may directly induce the onset of conflict in countries or regions that are at risk, thereby pointing toward grievance as a potential immediate cause of civil conflict. On the basis of these very limited studies it seems that feasibility may explain in which country or region civil conflict will occur, but grievance may explain when civil conflict will occur.
Severe economic crisis and political exclusion are merely examples of potential causal predictors that remain hidden under conventional panel data analysis. There should exist multiple predictors that can trigger civil conflict. The exploratory analysis proposed in this paper is easy to implement and should be able to uncover these predictors. As research into predictors matures, scholars should, ultimately, adopt approaches like Boolean Probit to test predictors against predictors and fully model causal complexity. Even then, the approach suggested in the paper should help researchers to explore their data. Last, a better understanding of the predictors of civil conflict should also greatly improve conflict prediction models. Currently, most models for conflict prediction are based on attributes alone. Without predictors, these predictive models are missing a key part of the puzzle. Ultimately, predictors may help pioneer research into prevention of civil conflict as this requires models that can predict when civil war occurs. For theoretical and normative reasons scholars should take predictors seriously.
Supplementary Material
Supplementary Material, Web_Appendix_Simple_Complexity – Simplified complexity: Analytical strategies for conflict event research
Supplementary Material, Web_Appendix_Simple_Complexity for Simplified complexity: Analytical strategies for conflict event research by Eelco van der Maat in Conflict Management and Peace Science
Footnotes
Acknowledgements
I would like to thank Giacomo Chiozza, Joshua Clinton, Ryan Moore, Gary Goertz, Lee Seymour, James Lee Ray, Sheida Novin, Frederico Batista, Mollie Cohen, Matthew DiLorenzo, Bryan Rooney, Steve Utych, and the audiences at MPSA and APSA for their insightful comments and constructive criticisms.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
