Abstract
The aim of the current investigation was to examine the appropriateness of propensity score methods for the study of incarceration effects on children by directing attention to a range of conceptual and practical concerns, including the exclusion of theoretically meaningful covariates, the comparability of treatment and control groups, and potential ambiguities resulting from researcher-driven analytic decisions. Using data from the Fragile Families and Child Wellbeing Study, we examined the effects of maternal and paternal incarceration on a range of child well-being outcomes, including internalizing and externalizing problem behaviors, Peabody Picture Vocabulary Test scores, and early juvenile delinquency. Our findings suggested that propensity scores and treatment effect estimates are highly sensitive to a number of decisions made by the researcher, including aspects where little consensus exists. In light of the conceptual underpinnings of propensity score analysis and existing data limitations, we suggest the potential utility of different identification methods and specialized data collection efforts.
Introduction
The rapid increase in the U.S. incarceration rate during the past several decades has moved researchers and policy makers to consider its effects on families, communities, and in particular, children of incarcerated parents (see Travis, Western, and Redburn 2014 for a review). According to recent estimates, more than half of U.S. prisoners—including roughly 52 percent of state inmates and 63 percent of federal inmates—are parents of minor children (Glaze and Maruschak 2008; Pew Charitable Trusts 2010). This corresponds to 2.3 percent of all children in the United States younger than age 18; however, the number of children touched by parental incarceration is much larger as these figures capture only those with a parent currently serving time. Thus, whether the incarceration of a parent has collateral consequences for the next generation has become a particularly important empirical question and one that has received considerable research attention during the past several years.
Parental incarceration is associated with children’s problem outcomes across a broad range of domains, including mental and physical health, behavioral problems, academic achievement, material hardship, and involvement with the criminal justice system (Cho 2009a, 2009b; Foster and Hagan 2015; Haskins 2014, 2016; Murray, Farrington, and Sekol 2012; Murray, Loeber, and Pardini 2012; Turney 2017; Turney and Wildeman 2015; Wildeman 2010; Wildeman and Andersen 2017). Beyond effects for the individual child, scholars suggest that parental incarceration has contributed to racial inequality in child well-being given the disproportionate impact of incarceration on individuals and families of color (Lee et al. 2015; Wakefield and Wildeman 2011, 2013). Yet despite the apparent convergence of research evidence indicating that parental incarceration has consequences for children’s health and development, methodological and conceptual concerns related to selection bias continue to call into question the causal nature of incarceration effects on children (Giordano and Copp 2015; Hagan and Dinovitzer 1999; Johnson and Easterling 2012; Sampson 2011).
In light of the field’s growing sensitivity to concerns about selection bias, scholars have looked to an array of advanced statistical techniques. Particularly noteworthy is a set of recent investigations that exploits an exogenous policy shock to examine the effect of parental incarceration on children (e.g., Andersen and Wildeman 2014; Wildeman and Andersen 2017). For example, Wildeman and Andersen (2017) used a difference-in-differences framework to examine the effect of paternal incarceration on children’s risk of ever being charged with a crime as young adults using Danish registry data. Their findings provided compelling evidence of a causal effect of paternal incarceration on children’s later criminal justice involvement within the Danish context. Unfortunately, U.S.-based studies are lagging relative to studies from other countries, where scholars have examined incarceration effects across a range of outcomes using better data and designs.
More commonly, U.S.-based studies rely on other approaches to isolate the effect of incarceration while simultaneously accounting for other sources of adversity. A particularly common approach is propensity score analysis. 1 Yet the proliferation of propensity score analysis has occurred without full consideration of its appropriateness for the estimation of incarceration effects on children and often with little attention to its underlying assumptions. This is potentially problematic, as the use of such methods has become the standard upon which the scientific rigor of incarceration effects research is assessed (i.e., findings based on more “basic” estimation strategies are afforded less weight). Moreover, findings from this work have made their way into important policy discussions.
The goal of this article is to suggest some general limitations of relying on propensity score models, drawing on a specific illustration that explores the use of propensity score methods for the study of incarceration effects on children. We begin by focusing attention on the conceptual underpinnings of propensity score methods. Next, we provide an empirical demonstration of propensity scores using data from the Fragile Families and Child Wellbeing Study (FFCWB) to underscore potential ambiguities present in propensity score analysis. Our findings call into question the appropriateness of propensity score models to address the effect of parental incarceration on child well-being and provide future directions for scholars committed to further developing our understanding of the mechanisms driving problem outcomes among the children of incarcerated parents. Consistent with existing treatments of propensity score analyses (e.g., King and Nielsen 2016; Loughran et al. 2015; Shadish 2013), our findings may have some implications for the use of propensity score methods across a range of other topic areas.
The Application of Propensity Score Methods to Incarceration Effects Research
In the incarceration effects literature, scholars have long recognized that the men and women who go to prison differ in ways that not only affect their likelihood of experiencing incarceration but also influence their familial relationships and family socioeconomic well-being (see Murray and Farrington 2008). It is therefore quite difficult to determine whether the problem outcomes observed among the children of incarcerated parents are due to the incarceration itself or these other sources of adversity (e.g., Giordano 2010; Johnson and Easterling 2012). Arguably the best way to address the selection bias endemic to research on incarceration effects would be to conduct an experiment; however, as this is generally not feasible, the use of statistical techniques that approximate experimental designs and produce estimates of incarceration effects have proliferated. Among these, propensity score methods are by far the most common.
So how do propensity score methods overcome the problem of selection bias (and satisfy the strong ignorability assumption)? In theory, researchers account for strong ignorability by identifying a set of covariates that characterize the selection process. With respect to incarceration effects research, these include factors associated with incarceration and/or child well-being outcomes. Individuals are then matched based on this vector of covariates with members of the treatment and control groups set to differ based only on their exposure to the treatment (i.e., parental incarceration). Although there exist no tests to confirm whether strong ignorability holds, it is reasonable to expect some discussion of the basis upon which one can assume strong ignorability given the observed covariates. With few exceptions, this assumption is rarely discussed in the literature, and where it is mentioned, scholars often sidestep much deliberation of the issue by indicating that propensity score models do not account/correct/adjust for unobserved heterogeneity.
The problem of unobserved heterogeneity is universal to propensity scores, and consequently, there is almost always some uncertainty as to whether the selection bias has been eliminated from the estimation of the treatment effect. Nevertheless, this uncertainty is typically due to the complex nature of treatment assignment and therefore the difficulty of identifying the covariates involved in the selection process (Steiner et al. 2010). In incarceration effects research, however, scholars are typically aware of the exclusion of at least some constructs that are germane to selection, as data limitations preclude the examination of certain key social selection forces (Sampson 2011). More specifically, and as elaborated on in more detail in other work (Giordano and Copp 2015), research on incarceration effects has seldom included controls for parental criminality despite the well-documented association between criminal offending and sentencing decisions. Instead, propensity score models rely predominantly on a roster of family background characteristics that, although correlated with the selection process and the outcomes of interest, are not inherent to decisions to incarcerate. Because offending is a necessary condition for incarceration and has also been linked to an array of child well-being outcomes, it seems that failure to account for the parent’s offending behaviors may pose a rather considerable threat to strong ignorability and, consequently, causal estimates.
Some scholars have attempted to address this issue indirectly by conducting sensitivity analyses post hoc and determining how substantial the unobserved effects would have to be to render findings nonsignificant. Yet findings from these analyses are somewhat difficult to interpret substantively. For example, Turney and Wildeman (2015) estimate the gamma statistic (Γ) for hidden biases (Rosenbaum 2002) and find that to reverse the conclusion that incarceration causes detriments to child well-being, unobserved characteristics would have to increase the odds of incarceration by 70 percent, 130 percent, and 150 percent for internalizing, externalizing, and delinquent behaviors, respectively. Given that offending variables (e.g., offending history, offense severity) have been identified as among the strongest predictors of sentencing outcomes (e.g., Kramer and Steffensmeier 1993; Spohn 1994; Steffensmeier and Demuth 2001)—including decisions to incarcerate—it seems plausible that parental (offending) behaviors may increase the odds of being incarcerated by a factor as large as those presented above (see also Giordano and Copp 2015). 2 Rosenbaum (2002) discusses the sensitivity of certain types of research questions to large versus small hidden biases. In our view, even large Γ values in incarceration effects research do not confer the same sense of security as they may in other areas of research given the inability of existing data to balance overselection forces that are intrinsically tied to the treatment.
A second, and commonly overlooked, condition of strong ignorability is nonzero probability (Rosenbaum and Rubin 1983). The basic idea of nonzero probability is that all individuals (including members of both the treatment and control conditions) have at least some chance of receiving the treatment. This suggests that propensity score methods are incompatible with certain research questions, such as the measurement of effects where the potential outcome is zero in “virtually all practical situations” (Shadish 2013:134). Although propensity score applications have demonstrated that children belonging to the treatment and control groups are similar across a range of sociodemographic, family, neighborhood, and parental characteristics, the omission of constructs tapping parents’ actual offending likely overlooks the fact that they differ in ways that affect their likelihood of experiencing parental incarceration. 3 In practical applications, the nonzero probability assumption is typically upheld as members of the treatment and control groups are identified at all levels of the propensity score. This is based, however, on applications that have been unable to eliminate the effects of confounding due to the omission of parental behaviors.
There are a number of important caveats. Importantly, studies that include confinement in local jails may tap time spent in pretrial detention, during which individuals are presumed innocent and have yet to be convicted of any crimes. There are also limitations to survey data that preclude researchers’ ability to perfectly capture parental incarceration exposure. While parental offending behaviors are seldom accounted for in the incarceration effects literature, research based on the FFCWB data has included measures of parental intimate partner violence and substance abuse; however, reported base rates are substantially lower than those obtained from surveys of prison populations. Thus, it does not appear that existing data have sufficiently captured the nature or extent (i.e., chronicity) of these behaviors—factors that weigh heavily in the decision to incarcerate and may also affect child well-being. Finally, a small portion of the currently and/or previously incarcerated has been wrongfully accused. However, it is worth noting that a child whose parent has never engaged in illegal activity has little chance of experiencing a parent’s imprisonment. Although this assumption is rarely assessed in criminological research, it bears further consideration—particularly in future analyses that include constructs that more precisely model the selection process.
Even if we were to assume that all members of the treatment and control groups do indeed have some chance of experiencing the treatment, the more basic question of the accuracy of comparisons made between treatment and control groups remains. Indeed, this has been a core issue among researchers using propensity score methods. And there are a number of helpful sensitivity analyses, including the use of different matching algorithms and other modifications to reduce the region of common support and assess the stability of the estimates. Yet fundamentally, if there is insufficient overlap in the behavioral histories of people who to go prison and those who do not, propensity score methods are not an ideal solution to the problem.
Recognizing the often-striking variability in the probability of experiencing parental incarceration, both within and across samples, a more recent approach employed by scholars of incarceration effects has been to consider the effect of parental incarceration on child well-being by the propensity for experiencing this event (Turney 2017; Turney and Wildeman 2015). This approach is referred to as the stratification-multilevel method of estimating heterogeneous treatment effects (see Xie, Brand, and Jann 2012 for an overview) and has been identified as a potential strategy for removing most of the selection bias between the treatment and control groups (Rosenbaum and Rubin 1984). Researchers employing this approach first estimate propensity scores for each unit and then construct balanced strata based on those scores (Xie et al. 2012). Treatment effects are estimated within stratum, and trends are identified across strata-specific effects. This approach has a number of advantages, including the ability to examine variability in incarceration effects as a function of the factors that shape individuals’ chances of experiencing parental incarceration. Moreover, with respect to selection bias, this approach is thought to limit bias by comparing units that are more similar across observed covariates and the likelihood of receiving treatment. It is unclear, however, whether unobserved heterogeneity is really less problematic in heterogeneous treatment effect models, as current applications have necessarily had to rely on a rather limited roster of covariates to comply with the within-stratum balance requirement (Turney and Wildeman 2015; see also Turney 2015). Thus, if one of the advantages of heterogeneous treatment effect models is bias reduction, then introducing bias by excluding potential confounders seems at odds with the original goals of relying on this methodological strategy.
The Current Study
The increase in the use of propensity score models in incarceration effects research is unsurprising given the desire to estimate causal effects using nonexperimental data. However, the application of propensity score analyses to research on the consequences of parental incarceration for child well-being is potentially problematic for reasons that limit the ability of researchers to infer causality. This includes the exclusion of theoretically meaningful covariates, resulting in omitted variable bias. In addition, well-executed propensity score designs require a comparable control group; nevertheless, the comparability of treatment and control groups using existing broadly representative data is unclear. More fundamentally, propensity score analyses implement a counterfactual framework to establish causality, which in the case of incarceration effects research, considers what would have happened to the children of incarcerated parents had they not experienced the incarceration event (and vice versa). One of the underpinnings of propensity score analysis is the idea that individuals have some chance of being in either the treatment or control group (i.e., nonzero probability). This sets up a somewhat difficult question conceptually as children of nonoffending parents are unlikely to experience the treatment. For these reasons, we believe that it is important for researchers to continue to pursue other innovative and advanced methodologies to examine whether and how parental incarceration transmits risk to children.
Yet in addition to these more conceptual concerns, which confront the issue of whether propensity score methods are appropriate for the study of incarceration effects, there exist a number of practical concerns with respect to the implementation of propensity score analyses in incarceration effects research. In the current investigation, we used data from FFCWB to provide an empirical demonstration of propensity score methods to estimate incarceration effects. 4 In particular, the current analyses highlight potential ambiguities that may arise during propensity score generation, treatment effect estimation, and postestimation sensitivity analyses. As such, we focus on the following: (a) identifying covariates and estimating propensity scores, (b) estimating the treatment effect, (c) performing sensitivity analyses, and (d) examining effect heterogeneity. Our hope is that this effort sparks additional discussion regarding the need for specialized data collection efforts to address existing data limitations and the potential utility of pursuing other methodological approaches that may be better equipped to address the question of whether/how incarceration confers risk to children.
Analytic Strategy
We began by selecting a group of covariates to construct the propensity scores using the FFCWB data. Covariate selection is a critical step in propensity score analyses, yet given differences in data quality and the availability of covariates, researchers must choose from available indicators. Accordingly, we estimated propensity scores for each observation using a set of covariates that, based on prior research, represent factors associated with parental incarceration and/or child well-being. We included constructs tapping the family climate (e.g., parental substance abuse problem and reports of domestic violence), but note that further controls for parent/family antisocial lifestyle—an important yet undertheorized source of social selection—are unavailable in the data. We examined the distribution of the propensity scores across the treatment and control groups for maternal and paternal incarceration. Next, we examined descriptive analyses for the full set of covariates prior to and after matching and assessed covariate balance to confirm that the only observable difference across treatment and control groups is the experience of parental incarceration. We then estimated the average treatment effect on the treated based on the propensity scores using kernel matching, a nonparametric matching estimator that potentially uses all members of the control group to create a counterfactual observation for a treatment group member (Caliendo and Kopeinig 2008; Guo and Fraser 2010). To examine the robustness of the results, we conducted a series of sensitivity analyses, comparing our findings across a range of matching algorithms and specifications. Finally, we examined heterogeneous treatment effects using multilevel propensity score models to further address the issue of bias under unconfoundedness.
Results
Identifying Covariates and Estimating Propensity Scores
In Figure 1 we present the distribution of the propensity scores across treatment and control groups in the FFCWB data. The average propensity score for maternal incarceration was 0.09 (0.20 for treatment group and 0.08 for control group), and scores ranged from 0.00 to 0.80. For paternal incarceration, the average propensity score was 0.31 (0.43 for treatment group and 0.26 for control group), and scores ranged from 0.01 to 0.88. Although researchers seldom present these distributions, they may be a helpful tool for determining where common support exists and whether we should be concerned with matches at the upper end of the propensity score range where scores for the untreated cases are especially sparse. Much of the distribution of the control group overlapped with that of the treatment group for both maternal and paternal incarceration; however, we see a sizeable concentration of scores at the lower end of the propensity score distribution. Focusing first on maternal incarceration, 75 percent of the propensity scores of the control group were less than 0.10, whereas only one third of the treatment cases fell within this range. Conversely, 50 percent of observations in the treatment group had propensity scores greater than 0.14, compared to less than 15 percent of control group observations. Similarly, 75 percent of the paternal incarceration propensity scores for the control group were less than 0.38, compared to only two fifths of the treatment group, and half of the treatment group had scores greater than 0.44, compared to less than one fifth of the control group. Examination of these distributions helps determine the sensitivity of propensity score estimates to covariate selection and demonstrates that fewer appropriate comparisons may become available in the data as we come closer to approximating the life circumstances of children who have experienced parental incarceration.

Distribution of the Propensity Score for Maternal and Paternal Incarceration among Treatment and Control Group.
Estimation of the Treatment Effect
We present descriptive statistics (see Table 1), including sample means for all study variables. Next, we employ kernel-based matching (Epanechnikov with bandwidth = 0.06), which unlike other standard matching algorithms uses weighted averages of all control-group observations to construct the counterfactual outcome. A major advantage of this approach is that it uses more information than matching algorithms that draw on only a subset of control group observations. An obvious limitation of this approach is that precisely because all control group members are used, bad matches are potentially included in the process. In these analyses, we used a generalized version of kernel matching (i.e., local linear matching) as this approach better handles data in which the control group observations are not distributed symmetrically around the treatment group observations (Smith and Todd 2005)—as is the case in the current investigation. Postmatching descriptive analyses revealed that covariate values of the treatment and control groups were nearly identical. That is, t-test comparisons of means across the treatment and control groups for maternal and paternal incarceration suggested that none of the differences were significant postmatching. The percentage reduction in bias similarly indicated that matching substantially reduced covariate imbalance. This suggests that independent of treatment status, observations with similar propensity scores should have the same distribution on observable characteristics (Becker and Ichino 2002). Any observed differences between the outcomes of treatment and control group members can be attributed to the treatment.
Descriptive Statistics of Study Variables (n = 3,196).
Note: y = year; b = baseline.
Source: Fragile Families and Child Wellbeing Study.
As other scholars have noted, however, it cannot be inferred that exposure to treatment is random based on balance alone, and additional attention must be directed to the strong ignorability assumption. In the current investigation we moved forward with our analyses in light of these findings and return to the issue of strong ignorability in the conclusion. It is also important to note that although matching did appear to account for the differences between treatment and control groups along the propensity score, a number of standardized differences exceeded 10 percent in absolute value postmatching. These differences were greater in the analyses of maternal incarceration.
Table 2 presents estimates for the average effect of maternal and paternal incarceration on internalizing problem behaviors, externalizing problem behaviors, Peabody Picture Vocabulary Test (PPVT), and delinquency. We provide differences based on the unmatched sample in the first column, which indicate that as compared to their peers with no history of parental incarceration, children of currently/previously incarcerated mothers are more likely to exhibit externalizing problem behaviors (b = 0.039, p < .01), have lower PPVT scores (b = −1.880, p < .05), and engage in early juvenile delinquency (b = 0.490, p < .001). We present the matched differences in the next column. These matched differences indicate that in the case of maternal incarceration, there were no statistically significant differences in our outcomes for the treatment and control groups. The results for paternal incarceration, however, indicate significant differences across two of the four outcomes; based on results from the matched sample, children of incarcerated fathers report greater internalizing and externalizing problem behaviors (b = 0.023, p < .01 and b = 0.035, p < .01, respectively). Similar to the results of models examining maternal incarceration, no significant differences were identified for PPVT scores or juvenile delinquency between the treatment and control groups for paternal incarceration.
Propensity Score Matching Estimates of the Average Effect of Parental Incarceration on Child Well-being.
Source: Fragile Families and Child Wellbeing Study.
p < .05. **p < .01. ***p < .001.
Sensitivity Analyses
Given different substantive conclusions based on the unmatched and matched samples, we employed a number of sensitivity analyses to determine the robustness of the findings (see Table 3). We began by “trimming” observations, which effectively imposes common support by dropping a specified percentage of treatment observations (Guo and Fraser 2010). We reestimated our models, trimming 2 percent, 5 percent, and 10 percent of observations (Heckman, Ichimura, and Todd 1997). This approach helps identify whether the treatment effects are sensitive to the distributional properties of the estimated propensity scores. The substantive findings of the matched sample remained unchanged for analyses of maternal incarceration effects. More specifically, the matched differences indicated that there were no significant differences between the treatment and control groups in internalizing problem behaviors, externalizing problem behaviors, PPVT scores, and delinquency; this conclusion remained unchanged after trimming 2 percent, 5 percent, and 10 percent of observations. Findings based on analyses of paternal incarceration effects, however, did vary across these different specifications. After trimming 2 percent and 5 percent of observations, the findings were substantively similar to results based on the full sample. But after trimming 10 percent of observations, the effect of internalizing problem behaviors was no longer significant.
Propensity Score Matching Estimates of the Average Effect of Parental Incarceration on Child Well-being, Sensitivity Analyses.
Note: Values in boldface are statistically significant.
Source: Fragile Families and Child Wellbeing Study.
Nearest five neighbors with a caliper of .005.
Nearest neighbor without replacement with a caliper of .25 × standard deviation of logit of propensity score (1.106893).
p < .05. **p < .01. ***p < .001.
Additional sensitivity analyses used different bandwidth sizes. That is, whereas the average effect models in Table 2 relied on the default bandwidth (.060), we examined the average effects based on four additional models using bandwidths ranging from .005 to .800. Similar to the findings described above, findings were substantively similar across the different bandwidth specifications for maternal incarceration, providing additional support for the null findings based on the average effect estimates presented in Table 2. Yet the results of models examining paternal incarceration were less stable. The effect of paternal incarceration on delinquency was significant using a narrower bandwidth (bandwidth = .005) (b = 0.239, p < .05), and this effect dissipated in models using wider bandwidths. In contrast, the effect of paternal incarceration on internalizing problem behaviors was nonsignificant in models using a narrower bandwidth; however, this effect became statistically significant in models using a wider bandwidth (b = 0.022–0.024, p < .01).
Supplemental models also examined results using nearest neighbor matching, including nearest neighbor without replacement (caliper = .250 × standard deviation of logit of propensity score) and nearest five neighbors (caliper = .005), and the null findings for maternal incarceration were robust to these variations with one exception. That is, the effect of maternal incarceration on early juvenile delinquency was significant in models using nearest neighbor without replacement (b = 0.375, p < .05). Findings based on the estimation of paternal incarceration effects varied considerably. Whereas the average effect estimates using kernel matching (see Table 2) indicated a significant effect of paternal incarceration on internalizing problem behaviors, this effect was nonsignificant in models using nearest neighbor matching. Furthermore, the effect of paternal incarceration on juvenile delinquency was not significant at conventional levels in the average effect estimates presented in Table 2; however, this effect was significant in the nearest neighbor models. Finally, in a model using nearest neighbor without replacement (caliper = .250 × standard deviation of logit of propensity score), the effect of paternal incarceration on PPVT was significant (b = −2.484, p < .001). Thus, whereas analyses of maternal incarceration effects appeared robust to these variations, findings for paternal incarceration were sensitive to the matching algorithm, region of common support, and bandwidth. 5 These findings demonstrate the potential variability of any substantive conclusions drawn from these analyses.
Heterogeneous Treatment Effects
A final set of analyses examined heterogeneous treatment effect models for maternal incarceration. Using Turney and Wildeman (2015) as a guide, we began by identifying a set of matching covariates, including factors associated with incarceration and/or child well-being, to generate a propensity score for each observation. We restricted the number of covariates included in the matching equation to ensure that covariate values for the treatment and control groups were similar within these subgroupings. Based on the propensity score estimates obtained, we grouped observations into three strata such that those with the lowest propensities of experiencing the treatment were in stratum 1 and those with the highest propensities were in stratum 3. The range of the propensity scores was narrower as a result of constraining the model estimating the propensity scores to a smaller set of covariates. That is, the upper limit was reduced from 0.80 to 0.33.
We examined the sensitivity of these findings by first considering how the exclusion of covariates influenced the identification of stratum. We found that the designation of individuals to stratum would have changed entirely with the inclusion of the full roster of covariates. That is, the narrow propensity range identified above is not merely indicative of scores being constrained but rather changes in individuals’ relative propensities of experiencing the treatment. For example, whereas members of strata 1, 2, and 3 had propensity scores ranging from 0.01 to 0.05, from 0.05 to 0.10, and from 0.10 to 0.33, respectively, according to propensity score estimates using all study variables, scores for stratum 1 ranged from 0.00 to 0.53, stratum 2 from 0.01 to 0.50, and stratum 3 from 0.02 to 0.80. While this method is lauded for reducing bias by comparing more homogeneous groups within stratum, our analyses suggested that the decision to exclude covariates to achieve balance may introduce more bias and result in making comparisons between groups that substantively differ in ways that are integral to the selection process.
Furthermore, in addition to achieving balance within stratum, another consideration of multilevel propensity score methods is that the average propensity scores of treatment and control members were statistically similar within stratum. In the FFCWB data, it was not possible to group individuals such that the covariates were balanced and the propensity scores were similar. The greatest difference in scores were observed in stratum 1 (p < .05) and stratum 3 (p < .001), which includes both the highest- and lowest-risk groups. These findings suggest that the omission of known confounds can significantly influence our assessment of the impact of parental incarceration on child well-being.
Conclusions
In the current investigation, we addressed the potential limitations of propensity scores for the study of incarceration effects by directing attention to a series of methodological and practical concerns. We focused in particular on the assumption of strong ignorability and argued that existing applications have often provided limited attention to this key tenant of propensity score techniques. As noted by Loughran and Mulvey (2010), “when attempting to make causal inference, we must make sure there is nothing unobservable which is potentially biasing our estimates despite our best efforts to control for observables” (p. 178). Although it is impossible to completely rule out unobserved heterogeneity, a number of scholars have expressed concern regarding the exclusion of parental behaviors (Giordano 2010; Giordano and Copp 2015; Johnson and Easterling 2012; Sampson 2011)—a particularly important source of selection bias. Thus, it is potentially problematic to assume basic comparability across groups when, due to data limitations, perhaps the strongest predictor of the treatment is routinely excluded from the matching algorithm. Lacking data to appropriately model the selection process, the use of propensity score methods (and resulting causal estimates) may be more problematic for investigating incarceration effects than is typically assumed.
Yet even if we were able to provide good measurement of the selection process, whether propensity scores are an appropriate method for providing estimates of incarceration effects using broadly representative data is unclear. Using large, representative samples, we must confront the sizeable differences between those with and without exposure to parental incarceration (Kirk and Wakefield 2018). This issue is further complicated by the lack of adequate controls for parental behaviors and brings into question the quality of the matches obtained. While scholars often focus on the issue of unobserved heterogeneity, an equally important consideration is the design of a good comparison group. It is difficult to imagine a scenario where children with nonoffending parents would have a similar propensity to experience a parent’s confinement as their peers who have in fact experienced parental incarceration. However, this is effectively the type of comparison that is being made.
In addition, propensity scores are highly sensitive to a number of decisions made by the researcher, including aspects where little consensus exists. In the current investigation, we provided an empirical demonstration of propensity score techniques, focusing in particular on some of the more critical decision-making points and highlighting the variable nature of the findings as a result of the researcher’s analytic choices. Given that there is no clear guide on what to include in the initial model estimating the propensity score, we began by identifying a set of selection constructs. We found that despite achieving balance, the treatment and control groups were quite distinct such that few treatment cases were found in the lower end of the distribution and few control cases in the upper end of the distribution, raising concerns about the appropriateness of available comparisons. In the next step, we estimated the treatment effect and found that depending on the modeling strategy, we obtained substantively different findings with respect to the effect of paternal incarceration (null maternal incarceration effects were largely consistent across our analyses).
The finding of a null average effect of maternal incarceration, however, does not rule out the possibility that individuals may respond differently to treatment (i.e., posttreatment heterogeneity). The potential for heterogeneity in incarceration effects has been explored in some of the more recent incarceration effects research, indicating that the detrimental effects of maternal (Turney and Wildeman 2015) and paternal (Turney 2017) incarceration on child well-being are most strongly felt among those least likely to experience these events. Understanding variation has become an important focus of incarceration effects research (Travis et al. 2014), and thus it is quite likely that scholars will continue to use this method in future investigations. Accordingly, in the current investigation we wanted to further explore this analytic technique and determine its utility for future research in the incarceration effects tradition.
Our findings revealed potential concerns regarding existing heterogeneous treatment effect estimates. In particular, the exclusion of covariates, which is often necessary to achieve balance within a stratum, resulted in individuals’ being shifted across strata. That is, individuals who were placed in the high-risk stratum based on the full roster of covariates were potentially recategorized as medium or low risk when using the more limited set of covariates, suggesting that the loss of covariates resulted in a significant loss of information, which altered the substantive meaning of low, medium, and high risk in this context. Based on our exploration of this particular methodological approach, we suggest that although heterogeneous treatment effect models are a useful strategy for examining differential responses to treatment, they may not be equally adept at estimating posttreatment heterogeneity across treatment types. Indeed, the desire to estimate how effects differ among individuals is ubiquitous across areas of social science research, and similar concerns have been voiced elsewhere. Of particular note is a recent article by Breen, Choi, and Holm (2015) in which the authors demonstrate that baseline bias varies across values of the propensity score, and thus “conventional selection bias … can easily be confused with heterogeneity of causal effects” (p. 351).
It is important to develop alternative methodological strategies that bypass the either-or assumptions of much of this line of research (i.e., is it incarceration or the other disadvantages that drive the detriments to child well-being?). Although the above considerations suggest the need for caution in making causal claims about incarceration effects net of these coexisting adversities, strategies that capture synergistic and reciprocally related effects may hold the most promise. For example, data sets that contain repeated measures of a broader portfolio of factors, including criminal justice experience, parental characteristics (e.g., parents’ antisocial behavior), and family dynamics (e.g., parents’ use of coercive parenting, financial circumstances) can be leveraged to actively model ways in which parental incarceration and these other dimensions of the child’s experience operate together and upon one another to affect key well-being outcomes.
Supplemental Material
SRD779306_Online_Supplement – Supplemental material for Parental Incarceration and Child Well-being: Conceptual and Practical Concerns Regarding the Use of Propensity Scores
Supplemental material, SRD779306_Online_Supplement for Parental Incarceration and Child Well-being: Conceptual and Practical Concerns Regarding the Use of Propensity Scores by Jennifer E. Copp, Peggy C. Giordano, Wendy D. Manning and Monica A. Longmore in Socius
Footnotes
Notes
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
