A warning on separation in multinomial logistic models

Abstract

Oppenheim et al. (2015) provides the first empirical analysis of insurgent defection during armed rebellion, estimating a series of multinomial logit models of continued rebel participation using a survey of ex-combatants in Colombia. Unfortunately, many of the main results from this analysis are an artifact of separation in these data – that is, one or more of the covariates perfectly predicts the outcome. We demonstrate that this can be identified using simple cross tabulations. Furthermore, we show that Oppenheim et al.’s (2015) results are not supported when separation is explicitly accounted for. Using a generalization of Firth’s (1993) penalized-likelihood estimator – a well-known solution for separation – we are unable to reproduce any of their conditional results. While our (re-)analysis focuses on Oppenheim et al. (2015), this problem appears in other research using multinomial logit models as well. We believe that this is both because the discussion on separation in political science has primarily focused on binary-outcome models, and because software (Stata and R) does not warn researchers about seperation in multinomial logit models. Therefore, we encourage researchers using multinomial logit models to be especially vigilant about separation, and discuss simple red flags to consider.

Keywords

Multinomial logit separation penalized maximum likelihood rare events

The analysis of qualitative outcomes (i.e. binary, ordinal, or nominal data) is ubiquitous in political science, with research into conflict, coalition formation, vote choice, policy adoption, etc. When predictors are also discrete, researchers need to be mindful of possible separation: that is, perfect prediction of the outcome. Under separation, sample analysis produces implausible estimates of population parameters. While this is now well understood in binary-outcome models (Rainey, 2016; Zorn, 2005), applied research often fails to recognize that the same concern is present with ordinal- and nominal-valued outcomes.

Oppenheim, Steele, Vargas, and Weintraub (2015) provides a useful example of the consequences of failing to recognize such separation. In their multinomial logistic models of insurgent defection, Oppenheim et al. (2015) includes several interactions of rare, binary predictors. In so doing, they partition the data such that there are several combinations with no observations. As a result, the reported risk ratios on these predictors are in the hundreds of thousands – implying, for example, that less than 1 in 1 million similar rebels would be captured instead of demobilizing. However, when we explicitly account for separation in Oppenheim et al.’s (2015) models – adding a penalty to the score vector (Kosmidis and Firth, 2011) – none of their conditional expectations are supported.

The Oppenheim et al. (2015) piece presents a larger question: why does separation often go unrecognized in multinomial logistic models? We discuss two possible reasons: i) most discussions of separation in political science have focused on the binary-outcome case, and ii) statistical software (Stata and R) does not handle separation in a consistent manner. As such, we encourage researchers with nominal outcomes to be particularly vigilant about possible separation.

Separation in (multinomial) logistic regression

With discrete data, separation occurs when one or more covariates correctly classifies – that is, predicts the outcome for – each observation. More formally, complete separation occurs when a subvector $X_{s} \in X$ deterministically locates each observation in $Y = {1, 2, \dots, g}$ , where $g$ is the number of categories in the outcome. Under separation, likelihood estimates of the offending covariates do not exist and tend to infinity (Albert and Anderson, 1984). Yet, in many cases software routines return finite-valued estimates, reporting the last value obtained prior to halting the likelihood search. This quantity provides no information on the underlying population parameter, and thus researchers should avoid drawing inferences on the relationship between the predictor(s) and the outcome.

While these issues seem well understood by political scientists when estimating binary-outcome models (Rainey, 2016; Zorn, 2005), less care is taken by researchers estimating multinomial models (e.g. Forsberg, 2013; Koga, 2011). For example, Fortna (2015) explicitly notes that the effect of a predictor, Africa, cannot be reported in a set of binary logistic models due to perfect prediction. Yet, when several variables – including the main predictor, Terrorist Rebel Group – perfectly classify the outcomes in subsequent multinomial logistic models, the issue is not discussed.

What explains this oversight given that the consequences from separation are exactly the same as those confronted in the binary logit case? First, many researchers may be unaware of the correspondence between separation in the binary and multinomial logistic models. While many of the seminal works on separation discuss the more general $g$ -group logistic model (Albert and Anderson, 1984), treatments in political science tend to focus predominantly on the binary logit model for increased clarity in the presentation. As an unintended result, researchers may be less aware of the need to consider separation in multinomial models.

Second, statistical software packages often have different conventions for handling separation, which may confuse researchers. Worse still, some packages vary in their own respective treatment of separation across different commands. In Stata, for example, if there is perfect prediction in a binary logit model (logit), offending observations are dropped and users receive a warning message. However, with multinomial logit (mlogit), these observation are retained and there is no warning message provided (Long and Freese, 2006). Similarly, in R, both the multinom and mlogit functions do not warn of possible separation. As such, researchers cannot regularly rely on statistical software warnings to identify separation.

To demonstrate the consequences of failing to recognize separation in multinomial logistic models, we next reanalyze Oppenheim et al. (2015).

Oppenheim et al. (2015)

Conflict studies has increasingly turn to individual-level data to better understand the microfoundations of political violence. While others have focused on initial joining behavior, Oppenheim et al. (2015) offers the first empirical analysis of continued participation in an ongoing rebellion. Specifically, why do rebels choose to defect or remain loyal? Oppenheim et al. (2015) argues a combatant’s decision is a function of their initial reason for joining (ideological or economic), the subsequent behavior of the group (undergoing ideological indoctrination and/or participating in peasant abuse), wartime experiences (pressure from armed forces), and interactions therein. For concision, we summarize Oppenheim et al.’s (2015) theoretical expectations in Table 1.

Table 1.

Oppenheim et al.’s (2015) expectations for demobilzation.

	Ideological joiner (↓; H1)	Economic joiner (↑; H2)
Political indoctrination		↓ (H3)
Material gain	↑ (H4)	↓ (H5)
Besieged by armed forces	↓ (H6)	↑ (H7)

Note: Arrows indicate whether demobilization/side-switching is more (↑) or less (↓) likely under the given conditions. The column titles give the unconditional expectations (H1 and H2), while each of the elements give the conditional expectations (i.e. interactions of the row and column predictors).

To test these, Oppenheim et al. (2015) uses survey data from Fundación Ideas para la Paz on Colombian ex-combatants. Specifically, their analysis uses a sample of 582 respondents who joined left-wing guerilla groups (e.g. FARC, ELN) but were subsequently captured (49), individually demobilized (506), or switched sides to the paramilitary group (27). These data are used to construct a nominal dependent variable (i.e. captured, demobilized, switched), which is then analyzed in a series of multinomial logistic models. Using a set of binary predictors and interactions (e.g. Economic need × Political indoctrination) on the rebels’ histories, Oppenheim et al. (2015) concludes broad support for their expectations.

However, we find that several of their main results are due solely to separation. While the unconditional relationships hold (H1 and H2), none of the conditional arguments, which serve as the basis for much of their theory, find support. We demonstrate this in two ways. First, a simple cross tabulation presented in Table 2 shows that there are no observations for several of the conditions – meaning there is no variation within that category. For example, the sample contains no instances of a captured rebel who both joined for economic reasons and was not politically indoctrinated. The same is true for two other combinations of conditions (as indicated by the bold zeros in Table 2). This absence of observations is not surprising given they occur under conditions when rare outcomes (i.e. only 49 individuals were captured and 27 switched) are intersected with rare predictors (i.e. only 38 respondents reported peasant abuse). In the presence of these empty cells, maximum likelihood estimates do not exist.

Table 2.

Cross tabulation of outcome categories and interactions from Oppenheim et al. (2015).

	Indoctrination and Economic need				Abuse and Ideological reasons				Abuse and Economic need
	Indoct. (0)		Indoct. (1)		Abuse (0)		Abuse (1)		Abuse (0)		Abuse (1)
	Need (0)	Need (1)	Need (0)	Need (1)	Ideol. (0)	Ideol. (1)	Ideol. (0)	Ideol. (1)	Need (0)	Need (1)	Need (0)	Need (1)
Captured	4	0	40	5	41	6	2	0	43	4	1	1
Demobilized	42	14	401	78	430	70	31	4	412	88	31	4
Switched	9	2	11	7	24	2	2	1	17	9	3	0

What can researchers do? Heinze and Schemper (2002) show that the penalized-likelihood strategy of Firth (1993) can recover finite-valued estimates in the presence of separation – a solution that is now widely used in political science (Zorn, 2005). Kosmidis and Firth (2011) extend this penalized-likelihood strategy to multinomial logit, enabling us to reanalyze Oppenheim et al. (2015) using a bias-corrected MNL. In short, this penalizes the likelihood by the square root of the determinant of the information matrix (i.e. Jeffreys prior). More intuitively, this is roughly analogous to adding a small value to the frequencies in Table 2, ensuring no separation. We prefer this over other solutions as: i) it is the natural extension of a solution already familiar to political scientists, and ii) functions are available (brglm2 in R) to easily implement this procedure.

The results are presented in Table 3, which first reproduces the findings from Oppenheim et al. (2015) using multinomial logit (MNL) and then attempts to replicate these results using Kosmidis and Firth’s (2011) penalized multinomial logit (Firth-MNL).¹ In the absence of separation, these estimators produce similar results, but here we see dramatic differences. To illustrate, consider the interaction Economic need × Indoctrination in Model 3. With MNL, we observe an implausibly low estimate ( $- 12.580$ ) on the interaction itself, and a correspondingly implausibly high estimate (12.923) on the constituent term Economic need – each with p-values less than $0.1 \times 10^{- 80}$ . Corresponding risk ratios indicate that an economic joiner without indoctrination is 409,755 times more likely to demobilize than be captured. Put differently, this implies that we would see 1 million demobilized rebels before a single captured one.

Table 3.

Replication of Models 3–5 from Oppenheim et al. (2015) – naïve multinomial logit (MNL) vs. bias-corrected multinomial logit (Firth-MNL).

	Model 3				Model 4				Model 5
	Demobilized		Switched		Demobilized		Switched		Demobilized		Switched
	MNL	Firth-MNL	MNL	Firth-MNL	MNL	Firth-MNL	MNL	Firth-MNL	MNL	Firth-MNL	MNL	Firth-MNL
Ideological reasons					−0.182	−0.237	−2.280**	−1.903*
					(0.439)	(0.459)	(0.921)	(0.989)
Economic need	12.923***	0.903	12.600***	0.698					0.686	0.590	1.697**	1.597**
	(0.493)	(1.598)	(0.719)	(1.750)					(0.592)	(0.515)	(0.723)	(0.654)
Economic need × Political indoctrination	–12.580***	−0.632	–11.051***	0.773
Economic need × Political indoctrination	(0.661)	(1.664)	(0.986)	(1.870)
Political indoctrination	0.067	0.157	–2.081***	−1.966***
Political indoctrination	(0.414)	(0.544)	(0.597)	(0.717)
Ideological reasons × Peasant abuse					10.625***	−1.094	12.049***	0.015
Ideological reasons × Peasant abuse					(1.083)	(1.895)	(2.133)	(2.353)
Economic need × Peasant abuse									−2.551*	−2.310	−15.939***	−3.245
Economic need × Peasant abuse									(1.416)	(1.490)	(1.611)	(2.255)
Peasant abuse					0.515	0.319	0.966	0.934	1.207	0.823	2.100*	1.805*
Peasant abuse					(0.741)	(0.696)	(0.981)	(0.962)	(1.030)	(0.869)	(1.185)	(1.060)
Age at recruitment	0.086	0.082*	0.026	0.025	0.093	0.089**	0.044	0.045	0.084	0.079*	0.041	0.041
Age at recruitment	(0.057)	(0.042)	(0.067)	(0.053)	(0.058)	(0.042)	(0.071)	(0.053)	(0.058)	(0.043)	(0.069)	(0.054)
Year of birth	−0.004	0.000	−0.089**	−0.084**	−0.006	−0.002	−0.098	−0.091**	−0.007	−0.004	−0.077*	−0.072*
Year of birth	(0.037)	(0.030)	(0.043)	(0.039)	(0.041)	(0.030)	(0.047)	(0.040)	(0.038)	(0.031)	(0.045)	(0.039)
Education	0.021	0.025	0.086	0.083	0.018	0.021	0.097	0.094*	0.019	0.022	0.085	0.072
Education	(0.033)	(0.034)	(0.054)	(0.051)	(0.033)	(0.034)	(0.063)	(0.052)	(0.033)	(0.034)	(0.062)	(0.052)
Male	0.763*	0.760**	1.237*	1.123*	0.741*	0.740**	1.117	1.018*	0.763*	0.759**	1.186*	1.074*
Male	(0.411)	(0.315)	(0.654)	(0.617)	(0.389)	(0.314)	(0.681)	(0.605)	(0.410)	(0.316)	(0.693)	(0.608)
Constant	7.029	−0.378	174.405**	165.513**	12.272	4.787	190.685**	177.578**	14.603	7.218	147.897*	139.215*
Constant	(74.171)	(60.315)	(85.602)	(77.218)	(81.784)	(60.039)	(93.280)	(79.333)	(76.571)	(61.472)	(89.897)	(77.309)
Observations	582	582	582	582	582	582	582	582	582	582	582	582
AIC	527.177	496.766			538.872	508.609			536.827	506.446

Following Oppenheim et al. (2015), Captured is used as the reference category. Standard errors in parentheses. * = p<0.1, ** = p <0.05, *** = p<0.01.

With Firth-MNL, the coefficient estimates on these same covariates are dramatically attenuated (–0.856 and 1.113, respectively) and do not approach significance even at the p < 0.1 level. The same is true for the interactions considered in Models 4 and 5, corresponding exactly to the combinations for which there were empty cells in Table 2. In short, rather than providing evidence in support of Oppenheim et al.’s (2015) conditional expectations, these large coefficient values actually indicate separation.

Discussion

Failing to recognize separation in models with qualitative outcomes can produce inaccurate inferences. As Oppenheim et al. (2015) demonstrates, researchers may even conclude the anomalous estimates provide very strong support for their theories. Moreover, it illustrates that even where researchers recognize the presence of perfect prediction, its consequences for estimation may not be understood.² Rather than offer support, parameter values recovered under separation convey little useful information about the underlying population parameters of interest.

This is not to say that the conditional expectations articulated in Oppenheim et al. (2015) are wrong, only that they are not supported by these data. There is insufficient information from which to draw reliable inferences over the claims they make. The descriptive evidence – observed frequencies – suggests that they may indeed be right, but to discriminate this empirically would require more data exhibiting greater variation.

Separation is not a problem unique to Oppenheim et al. (2015). As such, we highlight several points in concluding. First, researchers should recognize that separation concerns apply analogously to nominal-outcome models. Second, researchers should undertake simple diagnostics such as cross tabs, which can reveal sparse data coverage. Third, researchers should scale predictors (as suggested in Gelman, Jakulin, Pittau et al., 2008) and beware of large coefficients and standard errors, as these can be indicative of separation. Finally, researchers should consider principled solutions to separation – often through a penalty or a prior – for robustness.³ Here we have demonstrated one of these approaches and shown its efficacy in helping to avoid unsubstantiated inferences.

Supplemental Material

rap_appendix – Supplemental material for A warning on separation in multinomial logistic models

Supplemental material, rap_appendix for A warning on separation in multinomial logistic models by Scott J. Cook, John Niehaus and Samantha Zuhlke in Research and Politics

Footnotes

Acknowledgements

Thanks to Carlisle Rainey, Liam McGrath, Timm Betz, two anonymous reviewers, and the editor for their helpful comments. All remaining errors are ours alone. Replication materials are available as Cook, et al. (2018) at .

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Supplementary material

The supplementary files are available at http://journals.sagepub.com/doi/suppl/10.1177/2053168018769510. The replication files are available at: .

Notes

Carnegie Corporation of New York Grant

This publication was made possible (in part) by a grant from Carnegie Corporation of New York. The statements made and views expressed are solely the responsibility of the author.

References

Albert

Anderson

(1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71(1):1–10.

Firth

(1993) Bias reduction of maximum likelihood estimates. Biometrika 80(1):27–38.

Forsberg

(2013) Do ethnic dominoes fall? Evaluating domino effects of granting territorial concessions to separatist groups. International Studies Quarterly 57(2):329–340.

Fortna

(2015) Do terrorists win? Rebels’ use of terrorism and civil war outcomes. International Organization 69(03):519–556.

Gelman

Jakulin

A, Pittau MG

et al . (2008) A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics 2(4): 1360–1383

Heinze

Schemper

(2002) A solution to the problem of separation in logistic regression. Statistics in medicine 21(16):2409–2419.

Koga

(2011) Where do third parties intervene? Third parties domestic institutions and military interventions in civil conflicts. International Studies Quarterly 55(4):1143–1166.

Kosmidis

Firth

(2011) Multinomial logit bias reduction via the Poisson log-linear model. Biometrika 98(3):755–759.

Long

Freese

(2006) Regression Models for Categorical Dependent Variables Using Stata. College Station, TX: Stata Press.

10.

Oppenheim

Steele

Vargas

et al (2015) True believers, deserters, and traitors. Journal of Conflict Resolution 59(5):794–823.

11.

Rainey

(2016) Dealing with separation in logistic regression models. Political Analysis 24(3):339–355.

12.

Zorn

(2005) A solution to separation in binary response models. Political Analysis 13(2):157–170.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.10 MB